Monday 31 August 2020

Execute pipelines from an other Data Factory Async

Case
Can I use the Execute Pipeline Activity to execute pipelines from an other Data Factory? Or do we need an other activity for this?
Can you use the Execute Pipeline Activity for this?
























Solution
First of all why would you like to have multiple Data Factories other then for a DTAP-street (Development, Testing, Acceptance and Production). There are several reasons, for example:
  • Different departments/divisions each having their own Data Factory
    For example to prevent changes to your pipelines by users of a different department or to make it easier to split the Azure consumption between two divisions.
  • Different regions for international companies
    Either due legal reasons where for example the data may not leave the European Union
    Or to prevent paying unnecessary outbound data costs when your data is spread over different regions
  • Security reasons
    To prevent others to use the access provided via the Managed Identity of ADF. If you give your ADF access via MSI to an Azure Key Vault or an Azure Storage Account then everybody using that ADF can access that service via ADF.
If you have any other good reasons to use multiple Data Factories please let us know in the comments below.

And although you may have multiple Data Factories you could still use one Data Factory to execute pipelines from a different Data Factory. However you cannot use the Execute Pipeline Activity because it can only execute pipelines within the same Data Factory. 

You can either use the Web Activity or the Web Hook Activity for this. The Web Activity always executes the pipeline asynchronous. This means it does not wait for the result. The Web Hook Activity executes the child pipeline synchronous. Which means it waits until the child pipeline is ready and you could also retrieve the execution result via a call back. In this blog post we will show the asynchronous Web Activity and in an other blog post we will show the synchronous Web Hook Activity.


1) Give parent access to child via MSI
We will not use a user to execute the pipeline in the child(/worker) Data Factory, but instead we will give the managed identity (MSI) of the parent(/master) Data Factory access to the child(/worker) Data Factory. The minimum role needed is Data Factory Contributor, but you could also use a regular Contributor or Owner (but less is more).
  • Go to the child(/worker) Data Factory (DivisionX in this example)
  • In the left menu click on Access control (IAM)
  • Click on the +Add button and choose Add role assignment
  • Select Data Factory Contributor as Role
  • Use Data Factory as Assign access to
  • Optionally change the subscription
  • Optionally enter a (partial) name of your parent ADF (if you have a lot of data factories)
  • Select your parent ADF and click on the Save button
Give one ADF access to other ADF















2) Determine URL
This solution will call the Create Run RestAPI of ADF to execute the pipeline. For this you need to replace the marked parts of the URL below by the Subscription ID, Resource Group name, Data Factory name and the Pipeline name of the child(/worker) pipeline. We will use this URL in the next step.

Example URL:
https://management.azure.com/subscriptions/aaaaaa-bbbb-1234-cccc-123456789/resourceGroups/DivisionX/providers/Microsoft.DataFactory/factories/DevisionX-ADF/pipelines/MyChildPipeline/createRun?api-version=2018-06-01


3) Web Activity
So for this first example we will use the Web Activity. This will execute the pipeline of the child(/worker) pipeline, but you will not see the result in the master ADF. However you can see the executions in the monitor of the child(/worker) ADF.
  • Go to you master ADF and click on Author & Monitor
  • Create a new pipeline and add a Web Activity to the canvas of the new pipeline
  • Give it a suitable descriptive name on the General Tab
  • Go to the Settings tab and enter the URL of the previous step
  • Choose POST as Method
  • Add a new header called Content-Type and with value application/json
  • As Body enter a JSON message. This could either be a dummy message or you could supply parameters in this message. The child parameter is called myParam1: {"myParam1":"bla bla"}
  • Use MSI as Authentication method
  • Enter this URL as Resource: https://management.core.windows.net/
Web Activity calling a child pipeline in an other ADF
























Our dummy child pipeline in a different ADF only contains a Wait activity that waits 30 seconds. To force a child pipeline to fail we used a Stored Procedure activity that executes a RAISERROR statement.
Notice the parameter called myParam1















4) Testing
Now trigger the new master pipeline and check the monitor of the master ADF. For this example we executed one child pipeline that fails and one that succeeds, but is will show that both were successful. Also notice the execute duration of both the master and the worker pipelines.
ADF Monitor of both Master and worker












Summary
In this blog post you learned how to give one ADF access to an other ADF and to execute pipelines in an other ADF. The Web activity solution is very basic and will not show you the result, but it does allow you to pass through values via parameters.

In a next post we will show you the Webhook activity solution which does allows you to call back the master pipeline and show the execution result.

6 comments:

  1. good document, if I just want to run the pipeline in the SAME adf with web activity, what kind of security I need setup? I keep getting token error.

    ReplyDelete
    Replies
    1. Not sure why you want to use the Web Activity when you have the Pipeline Activity. Nevertheless when you want to try it you just need to do step 1 for its own ADF (give the ADF access to its own resources).

      Delete
  2. Hello, you can do this to synapse pipeline from data factory?

    ReplyDelete
    Replies
    1. No problem. Just use the Rest API of Synapse: https://docs.microsoft.com/en-us/rest/api/synapse/data-plane/pipeline/create-pipeline-run

      Delete
  3. hello, is there a way to find out the identity of the parent pipeline in the child without passing the info as a parameter ? thanks

    ReplyDelete

All comments will be verified first to avoid URL spammers. यूआरएल स्पैमर से बचने के लिए सभी टिप्पणियों को पहले सत्यापित किया जाएगा।