Showing posts with label RESTAPI. Show all posts
Showing posts with label RESTAPI. Show all posts

Wednesday 1 December 2021

ADF: Looping through pipelines and execute them

Case
I want to loop through my ADF pipelines and then execute them in a foreach loop, but the pipeline property of the Execute Pipeline activity doesn't support dynamic content. Is this possible?
Execute Pipeline





















Solution
The standard Execute Pipeline activity is pretty much useless for this specific case, but with another activity it is possible. However don't use this workaround to execute a whole bunch of similar pipelines like we used to do in the SSIS era. In that case it is just better to invest your time in creating a more flexible/configurable pipeline that can handle multiple tables or files.

Now the workaround:
Looping and executing pipelines













It uses a Web activity to get all pipelines via a Rest API. Then there is a Filter to get only a selection of all those pipelines. After that the Foreach loop with another Web activity in it will execute the pipelines via a Rest API call.

1) Access control (IAM)
This solution uses Rest APIs from Azure Data Factory. This means we need to give this ADF access to its own resources so that is can call those Rest APIs.
  • Go to your ADF in the Azure Portal
  • Click on the ellipsis button (three dots) to copy the ADF name
  • Click on Access control (IAM) in the left menu
  • Click on +Add and choose Add role assignment
  • Select the role with just enough access (less is more). Data Factory Contributor is perfect for this example
  • Then select members. In the search window you can paste your ADF name. Click on your ADF and then push the Select button
  • Now review and assign the role to your ADF
Give your ADF access to its own resources












2) Web activity - Get all pipelines
We need a collection of pipelines for the Foreach loop. The Rest API list-by-factory retrieves all pipelines from a single Data Factory. You need to prepare the Rest API url by replacing all parts between the curly braces with the info of your own ADF (also remove the curly braces):

https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.DataFactory/factories/{factoryName}/pipelines?api-version=2018-06-01

Tip: If you copy the URL of the ADF overview page in the Azure portal then you have all the information you need.
  • Add the Web activity to your pipeline and give it a suitable name (you need it in the next activity)
  • On the Settings tab enter the adjusted URL from above in the URL field
  • Select GET as method
  • Set Authenication to Managed Identity
  • Use https://management.azure.com/ in the Resource field
Web activity - Get all pipelines

























3) Filter pipelines
Now we have all pipelines available in our collection, but we need to add a filter to only get the required pipelines. The easiest way to do this is by putting them all in a specific folder or giving them all the same prefix. If you're using a folder then you can use the first expression. It first checks whether the pipeline contains a property named 'folder' because pipelines in the root will not have this property. Then it checks whether that property is filled with the value 'demo' (the name of our folder).
@and(
   contains(item().properties, 'folder'),
   equals(item().properties.folder.name,'demo')
)
If you want to use the prefix then the expression is less complex with only a startswith expression: @startswith(item().name, 'Sub_')

  • Add the Filter activity to the pipeline and give it a suitable name. We need it in the Foreach. Connect it to the Web activity.
  • For Items add the following expression @activity('Get All Pipelines').output.value (enter your own activity name)
  • For Condition add one of the above expressions
Filter activity






















4) Foreach loop
The foreach loop is very straightforward. Use the filter activity in the items field with an expression like this @activity('Filter on folder demo').output.value (replace the activity name).
Foreach activity

















5) Web activity - Execute pipeline
Within the foreach loop we need to add an other Web activity. This one will call the create-run Rest API which will execute an ADF pipeline. Just like in step 2 we need to adjust the example URL from the documentation:

https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.DataFactory/factories/{factoryName}/pipelines/{pipelineName}/createRun?api-version=2018-06-01

However we need to make it change every iteration of the Foreach because the pipeline name is part of the URL. You could create a expression where you only change that pipeline name and leave the rest hardcoded, but besides the pipeline name that collection of pipelines also contains a ID property that contains 90% of the URL we need. If you debug the pipeline and check the output of the first Web activity or the Filter activity you can check their output. The ID property is filled with something like:

/subscriptions/7q618f21-ad62-4e1d-9019-4a23beda7777/resourceGroups/RG_ADF_DEV/providers/Microsoft.DataFactory/factories/DataFactory2-DEV/pipelines/Sub_pipeline1

We only need to add something in front of it and behind of it and then you have the correct URL:
@concat('https://management.azure.com',
replace(item().id, ' ', '%20'),
'/createRun?api-version=2018-06-01'
)
The replace is to replace spaces, which are not allowed in a URL, by %20. So if a pipeline name contains a space then it will be replaced.
  • Add the Web activity to the Foreach and give it a suitable name 
  • On the Settings tab enter the expression above as URL by first clicking on the 'Add dynamic content' link below the field
  • Select POST as method
  • Now we don't need a Body for the Rest API but the Web activity requires it when the method is POST. Enter a dummy JSON message like: {dummy:"dummy"}
  • Set Authenication to Managed Identity
  • Use https://management.azure.com/ in the Resource field
The Web activity calls the pipelines asynchronous (execute and don't wait for an answer). If you want a synchronous call and perhaps get some feedback if the pipeline fails then you need to replace the Web activity by a Webhook activity.

6) The result
Now run the pipeline and check the result. Make sure to check the monitor. Then you will see one big disadvantage. Each execution will become a separate run with each its own Run ID. This means it will be a little bit more work to connect these in your logging, but it is possible because the output of the Web activity will return the Run ID.
ADF Monitor






Conclusion
In this post you learned how to execute pipelines in a loop via the Web activity and Rest APIs. Because each pipeline will get its own Run ID the logging needs some extra attention. You can also use the Web activity contruction to execute pipelines from an other Data Factory.

Unfortunately you cannot use this same trick for Data Flows because at te moment there is no Rest API to execute a pipeline (only in debug mode).




Monday 1 March 2021

Pausing/resuming Dedicated SQL Pools from Synapse

Case
In a previous post we described how we can pause and resume our Azure Dedicated SQL pools via Azure Data Factory. The question we are going to solve now: "How can we pause and resume from within an Azure Synapse workspace"?


Pause / Resume SQL dedicated Pools

























Solution
Pausing and resuming from within an Azure Synapse workspace is very similar, but there are some differences because there are two types of Azure Dedicated SQL Pools. When you create it from within the Azure Synapse Workspace then the provider will be 'Microsoft.Synapse' (hosted in Synapse), but when you create it outside the Azure Synapse Workspace then the provider will be 'Microsoft.Sql' (hosted in SQL Server). Both have different Rest APIs and different security roles. In this example we will demo the Synapse hosted version but we will also mention the differences. For the SQL version you could also take a look at our previous post which handles the SQL version from within Data Factory.


1) Give Azure Synapse Access to SQL Dedicated Pools
To call the Rest API we need to grant our Azure Synapse workspace access to the Dedicated SQL Pool or more specific to the Synapse hosting it. Perhaps a bit odd, but we need to grant Synapse access to its own resources (for the SQL version you need grant access to the SQL Server hosting the SQL Pool). 

For the security role you always need to avoid 'Owner' and use 'Contributor'. This is because the Owner role can also change security related items (for the SQL version you can use 'SQL Server Contributor')
  • In the Azure portal go to the Azure Synapse Workspace hosting the SQL Pool that you want to pause or resume
  • In the left menu click on Access control (IAM)
  • Click on Add, Add role assignment
  • In the 'Role' drop down select 'Contributor'
  • In the 'Assign access to' drop down select user, group or service principal.
  • Search for your Synapse Studio name (in our example‘gansdorp’), select it and click on Save

Grant Synapse Contributor role to SQL Server

















If you forget this step then you will receive an authorization error while executing your Synapse pipeline.
2108 Authorization Failed

 












{"error":
{"code":"AuthorizationFailed"
,"message":"The client 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' does not have authorization to perform action 'Microsoft.Synapse/workspace/resume/action' over scope '/subscriptions/xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/RG_bitools/providers/Microsoft.Synapse/workspace/gansdorp/dedsqlpools' or the scope is invalid. If access was recently granted, please refresh your credentials."}
}

2) Determine URL
Now it is almost time to edit your Synapse pipeline. The first step will be adding a Web activity to call the Rest API (for the SQL version you must use this Rest API). Find the operation you want to perform and then find the example URL to construct the new URL.

Pause compute
https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Synapse/workspaces/{workspacename}/sqlPools/{DedicatedSQLPoolName}/pause?api-version=2019-06-01-preview

Resume compute
https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Synapse/workspaces/{workspacename}/sqlPools/{DedicatedSQLPoolName}/resume?api-version=2019-06-01-preview

Within these URL's you need to replace all parts that start and end with a curly bracket: {subscription-id}, {resource-group-name}, {workspacename} and {DedicatedSQLPoolName} (including the brackets themselves). 

Example URL
https://management.azure.com/subscriptions/aaaa-bbbb-1234-cccc/resourceGroups/RG-DEV/providers/Microsoft.Synapse/workspaces/gansdorp/dedsqlpool/pause?api-version=2019-06-01-preview
Example Rest API URL










3) Add Web Activity
To call the Rest API we will use the Web Activity in the Synapse pipeline. The actual Rest API call is synchronous, which means it waits for it to finish the pause or resume action and then it will return a message. This also means that you don't have to build any checks to make sure it is already online.
  • Add a Web activity to your pipeline and give it a suitable name
  • Go to the Settings tab and use the URL from the previous step in the URL property
  • Choose POST as method
  • Fill in {} as body (we don't need it, but it is required)
  • Choose MSI as authentication method
  • As the last step enter this URL https://management.azure.com/ as Resource

web activity






















To first check the current state of the SQL Pool you can use the Get Rest API and then use an If Condition to check its output with an expression like @Activity('Get Current State').Output.Properties.State

Summary
In this post you learned how pause and resume your Dedicated SQL Pool within Azure Synapse Studio, the method is very similar to the ADF version. There are some differences, however that is mostly caused by the Service that is hosting the SQL Pool. This could be either a SQL Server or your own Synapse Workspace. It is expected that one of both will probably disappear, because it isn't likely that Microsoft will be maintaining two different services with both their own set op Rest APIs.


Saturday 13 February 2021

Scaling Azure Analysis Services with ADF only

Case
I want to upscale and downscale my Azure Analysis Services (AAS) from within Azure Data Factory, but I don't want write any code or use other Azure services like Azure Automation or Azure Logic Apps to do this. Is there an Azure Data Factory-only solution where we only use the standard pipeline activities from ADF?
Save some money on your Azure Bill by pausing AAS




















Solution
Yes you can use the Web Activity to call the Rest API of Azure Analysis Services (AAS), but that requires you to give ADF permissions in AAS via its Managed Service Identity (MSI). If you already used our Process Model example, then this is slightly different (and easier).


1) Add ADF as contributer to AAS
Different than for processing one of the AAS models we don't need SSMS to add ADF as an Server Administrator. Instead we will use Access control (IAM) on the Azure portal to make our ADF a contributor for the AAS that we want to pause or resume.
  • Go to your AAS the Azure portal
  • In the left menu click on Access control (IAM)
  • Click on + Add and choose Add role assignment
  • In the new Add role assignment pane select Contributor as Role
  • In the Assign access to dropdown select Data Factory
  • Select the right Subscription
  • Now Select your Data Factory and click on the Save button
Add ADF as Contributor to AAS















2) Add Web Activity
In your ADF pipeline you need to add a Web Activity to call the Rest API of Analysis Services. First step is to determine the Rest API URL. Replace in the string below, the <xxx> values with the subscription id, resource group and servername of your Analysis Services. The Rest API method we will be using is 'update':
https://management.azure.com/subscriptions/<xxx>/resourceGroups/<xxx>/providers/Microsoft.AnalysisServices/servers/<xxx>/?api-version=2017-08-01

Example:
https://management.azure.com/subscriptions/a74a173e-4d8a-48d9-9ab7-a0b85abb98fb/resourceGroups/bitools/providers/Microsoft.AnalysisServices/servers/bitools2/?api-version=2017-08-01

Second step is to create a JSON message for the Rest API. The tier in this message is either: Developer, Basic or Standard. Within those tiers you have the (instance) name like B1, B2, S1, S2 until S9. Note that you can upscale from Developer to Basic to Standard, but you cannot downscale from Standard to Basic to Developer. Since Developer has only one instance you will probably never use that within this JSON message.
{
   "sku":{
      "capacity":1,
      "name":"S1",
      "tier":"Standard"
   }
}
or
{
   "sku":{
      "capacity":1,
      "name":"B2",
      "tier":"Basic"
   }
}
  • Add the Web activity to your pipeline
  • Give it a descriptive name like Upscale AAS (or Downscale AAS)
  • Go to the Settings tab
  • Use the Rest API URL from above in the URL property
  • Choose PATCH as Method
  • Add the JSON message from above in the Body property
  • Under advanced choose MSI as Authentication method
  • Add 'https://management.azure.com/ in the Resource property (different than process example)
Web Activity calling the AAS Rest API

Then Debug the Pipeline to check the scaling action





























3) Retrieve info
By changing the method type from PATCH to GET (body property will disappear), you can retrieve information about the AAS like pricing tier (or status). You could for example use that to first check the current pricing tier before changing it.
Retrieve service info via GET


Summary
In this post you learned how change the pricing tier from your Analysis Services to save some money on your Azure bill. The big advantage of this method is that you don't need other Azure services which makes maintenance a little easier. In a previous post we already showed you how to pause or resume your AAS with the Rest API.

Monday 11 January 2021

Scaling Azure Dedicated SQL Pools from ADF

Case
Is there a solution to upscale and downscale my Azure Dedicated SQL Pool from the Azure Data Factory pipeline without scripting? I know there are PowerShell solutions, but I rather use a no-code solution. What are my options?
Scaling Azure Dedicated SQL Pools

 






















Solution
Fortunately you can now use the Rest API's of Azure Dedicated SQL Pools (formerly known as Azure SQL Data Warehouse and for a short period as Azure Synapse Analytics) to down- or upscale the compute. So no coding required.

1) Give ADF Access to SQL Pool
To call the Rest API we need to give ADF access to the SQL Pool or more specific to the SQL Server hosting that SQL Pool. We need a role that can only change the database settings, but nothing security related: Contributer, SQL DB Contributer or SQL Server Contributer.
  • Go to the Azure SQL Server of the SQL Pool that you want to scale up or down with ADF
  • In the left menu click on Access control (IAM)
  • Click on Add, Add role assignment
  • In the 'Role' drop down select 'SQL DB Contributer'
  • In the 'Assign access to' drop down select Data Factory
  • Search for your Data Factory, select it and click on Save
Grant data factory SQL DB Contributor role to SQL Server














If you forget this step then you will receive an authorization error while executing your ADF pipeline.
2108 Authorization Failed

 












{"error":
{"code":"AuthorizationFailed"
,"message":"The client 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' does not have authorization to perform action 'Microsoft.Sql/servers/databases/resume/action' over scope '/subscriptions/xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/RG_bitools/providers/Microsoft.Sql/servers/SQL_bitools/databases/bitools' or the scope is invalid. If access was recently granted, please refresh your credentials."}
}

2) Determine URL
Now it is almost time to edit your ADF pipeline. The first step will be adding a Web activity to call the Rest API, but before we can do that we need to determine the URL of this API which you can find here.

Scaling
https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Sql/servers/{server-name}/databases/{database-name}?api-version=2014-04-01-preview

Within this URL you need to replace all parts that start and end with a curly bracket: {subscription-id}, {resource-group-name}, {server-name} and {database-name} (including the brackets themselves). Don't use a URL (bitools.database.windows.net) for the database server name, but use only the name: bitools.

Example URL
https://management.azure.com/subscriptions/aaaa-bbbb-1234-cccc/resourceGroups/RG_Bitools/providers/Microsoft.Sql/servers/bitools2/databases/bitools?api-version=2014-04-01-preview
Example URL









3) JSON message for Rest API
The Rest API above expects a JSON message with the pricing tier. A list of all pricing tiers can be found here in the column 'Data warehouse units'. Here are two example which you need to adjust for your requirements:
{
    "properties": {
        "requestedServiceObjectiveName": "DW200c"
    }
}
or
{
    "properties": {
        "requestedServiceObjectiveName": "DW1000c"
    }
}

Note: Just in case you get one of these errors below. The json example in the documentation is incorrect at the moment of writing:
  • Quotation marks around the data warehouse units are missing, which returns the following error:{"error":{"code":"InvalidRequestContent","message":"The request content was invalid and could not be deserialized: 'Unexpected character encountered while parsing value: D. Path 'properties.requestedServiceObjectiveName', line 3, position 41.'."}}
  • The c is missing after the data warehouse units (gen1 vs gen2) which returns the following error:
    {"code":"45122","message":"\u0027Azure SQL Data Warehouse Gen1 has been deprecated in this region. Please use SQL Analytics in Azure Synapse.\u0027","target":null,"details":[{"code":"45122","message":"\u0027Azure SQL Data Warehouse Gen1 has been deprecated in this region. Please use SQL Analytics in Azure Synapse.\u0027","target":null,"severity":"16"}],"innererror":[]}



4) Add Web Activity
To call the Rest API we will use the Web Activity in the ADF pipeline. The actual Rest API call is synchronous, which means it waits for it to finish the pause or resume action and then it will return a message. This also means that you don't have to build any checks to make sure it is already online.
  • Add a Web activity to your pipeline and give it a suitable name
  • Go to the Settings tab and use the URL from step 2 in the URL property
  • Choose PATCH as method
  • Add a new header with the name 'Content-Type' and the value 'application/json'
  • Fill in the JSON message from step 3 as body
  • Choose MSI as authentication method
  • As the last step enter this URL https://management.azure.com/ as Resource
Use a Web activity to call the Rest API




















If you want to scale up before your ETL/ELT process and scale down afterwards then you need two separate Web activities or one clever child pipeline with parameters that you execute from your main pipeline.


Summary
In this post you learned how to upscale and downscale your Dedicated SQL Pool to save some money on your Azure bill without writing any code. Note that at the moment of writing live scaling is not yet available and that you will loose the connection to your Dedicated SQL Pool for a couple of minutes.

Also note that ADF pipelines slightly differ from Azure Synapse Analytics pipelines. So if you consider switching to Synapse workspaces because you apparently already use Dedicated SQL Pools then you have to make some small adjustments to this specific task which will be described in a next post. In an other post we already showed how to pause and resume your Azure SQL Pools from within ADF.



Wednesday 6 January 2021

Pausing and resuming Dedicated SQL Pools from ADF

Case
Is there a solution to pause and resume an Azure Dedicated SQL Pool from the Azure Data Factory pipeline without scripting? I know there are PowerShell solutions, but I rather use a no-code solution. What are my options?
Pause and resume Azure Dedicated SQL Pools
























Solution
Luckily you can now use the Rest API's of Azure Dedicated SQL Pools (formerly known as Azure SQL Data Warehouse and for a short period as Azure Synapse Analytics) to pause or resume the compute.

1) Give ADF Access to SQL Pool
To call the Rest API we need to give ADF access to the SQL Pool or more specific to the SQL Server hosting that SQL Pool. We need a role that can only change the database settings, but nothing security related: Contributor, SQL Server Contributor or  SQL DB Contributor. Choose the role with just enough permissions to perform the task and avoid the Owner role.
  • Go to the Azure SQL Server of the SQL Pool that you want to pause or resume with ADF
  • In the left menu click on Access control (IAM)
  • Click on Add, Add role assignment
  • In the 'Role' drop down select 'SQL DB Contributor'
  • In the 'Assign access to' drop down select Data Factory
  • Search for your Data Factory, select it and click on Save
Grant data factory SQL DB Contributor role to SQL Server














If you forget this step then you will receive an authorization error while executing your ADF pipeline.
2108 Authorization Failed

 












{"error":
{"code":"AuthorizationFailed"
,"message":"The client 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' does not have authorization to perform action 'Microsoft.Sql/servers/databases/resume/action' over scope '/subscriptions/xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/RG_bitools/providers/Microsoft.Sql/servers/SQL_bitools/databases/bitools' or the scope is invalid. If access was recently granted, please refresh your credentials."}
}

2) Determine URL
Now it is almost time to edit your ADF pipeline. The first step will be adding a Web activity to call the Rest API, but before we can do that we need to determine the URL of this API which you can find here.

Pause compute
https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Sql/servers/{server-name}/databases/{database-name}/pause?api-version=2014-04-01-preview

Resume compute
https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Sql/servers/{server-name}/databases/{database-name}/resume?api-version=2014-04-01-preview

Within these URL's you need to replace all parts that start and end with a curly bracket: {subscription-id}, {resource-group-name}, {server-name} and {database-name} (including the brackets themselves). Don't use a URL (bitools.database.windows.net) for the database server name, but use only the name: bitools.

Example URL
https://management.azure.com/subscriptions/aaaa-bbbb-1234-cccc/resourceGroups/RG_Bitools/providers/Microsoft.Sql/servers/bitools2/databases/bitools/resume?api-version=2014-04-01-preview
Example URL










3) Add Web Activity
To call the Rest API we will use the Web Activity in the ADF pipeline. The actual Rest API call is synchronous, which means it waits for it to finish the pause or resume action and then it will return a message. This also means that you don't have to build any checks to make sure it is already online.
  • Add a Web activity to your pipeline and give it a suitable name
  • Go to the Settings tab and use the URL from the previous step in the URL property
  • Choose POST as method
  • Fill in {} as body (we don't need it, but it is required)
  • Choose MSI as authentication method
  • As the last step enter this URL https://management.azure.com/ as Resource
Use a Web activity to call the Rest API
























You could also first check the current status via the Check database state Rest API and then use an expression like @activity('Get Status').output.properties.status to retrieve the current state.

Summary
In this post you learned how to pause and resume your Dedicated SQL Pool to save some money on your Azure bill without writing any code. Note that you only pause the compute and that you still have to pay for the storage of the SQL Pool.

Also note that ADF pipelines slightly differ from Azure Synapse Analytics pipelines. So if you consider switching to Synapse workspaces because you apparently already use Dedicated SQL Pools then you have to make some small adjustments to this specific task which will be described in a next post. In an other post we will also show how to scale your Azure SQL Pools from within ADF.


Tuesday 13 October 2020

Execute pipelines from an other Data Factory Sync

Case
Can I use the Execute Pipeline Activity to execute pipelines from an other Data Factory? Or do we need an other activity for this?
Can you use the Execute Pipeline Activity for this?
























Solution
First of all why would you like to have multiple Data Factories other then for a DTAP-street (Development, Testing, Acceptance and Production). There are several reasons, for example:
  • Different departments/divisions each having their own Data Factory
    For example to prevent changes to your pipelines by users of a different department or to make it easier to split the Azure consumption between two divisions.
  • Different regions for international companies
    Either due legal reasons where for example the data may not leave the European Union
    Or to prevent paying unnecessary outbound data costs when your data is spread over different regions
  • Security reasons
    To prevent others to use the access provided via the Managed Identity of ADF. If you give your ADF access via MSI to an Azure Key Vault or an Azure Storage Account then everybody using that ADF can access that service via ADF.
If you have any other good reasons to use multiple Data Factories please let us know in the comments below.

And although you may have multiple Data Factories you could still use one Data Factory to execute pipelines from a different Data Factory. However you cannot use the Execute Pipeline Activity because it can only execute pipelines within the same Data Factory. 

You can either use the Web Activity or the Web Hook Activity for this. The Web Activity always executes the pipeline asynchronous. This means it does not wait for the result. The Web Hook Activity executes the child pipeline synchronous. Which means it waits until the child pipeline is ready and you could also retrieve the execution result via a call back. In this blog post we will show the synchronous Webhook Activity and in a previous blog post we already showed the asynchronous Web Activity.

1) Give parent access to child via MSI
We will not use a user to execute the pipeline in the child(/worker) Data Factory, but instead we will give the managed identity (MSI) of the parent(/master) Data Factory access to the child(/worker) Data Factory. The minimum role needed is Data Factory Contributor, but you could also use a regular Contributor or Owner (but less is more).
  • Go to the child(/worker) Data Factory (DivisionX in this example)
  • In the left menu click on Access control (IAM)
  • Click on the +Add button and choose Add role assignment
  • Select Data Factory Contributor as Role
  • Use Data Factory as Assign access to
  • Optionally change the subscription
  • Optionally enter a (partial) name of your parent ADF (if you have a lot of data factories)
  • Select your parent ADF and click on the Save button
Give one ADF access to other ADF















2) Add Call Back to Child Pipeline
The parent(/master) pipeline will call the child(/worker) and wait until it receives a call back or until the timeout exceeds. Therefor we need to add a call back activity (in the form of a Web/Webhook activity) to the child pipeline. This activity should be the last activity in your pipeline. There can be more (for example one for success and one for failure), but only the first call back will be handled by the parent.
  • Go to the child(/worker) Data Factory (DivisionX in this example)
  • Open ADF via the Author &Monitor link
  • Open the pipeline that you want to execute from the parent(/master) pipeline
  • Add a String parameter called callBackUri. The value will be automatically provided by the parent
  • Add a String parameter called myInputParam1. The value will be manually provided by the parent
  • Add a Web Activity with the following settings
    • URL: @pipeline().parameters.callBackUri   (to retrieve the pipeline parameter)
    • Method: POST
    • Body: {"Output":{"myOutputParam1":"failed"},"StatusCode":"401"}
      Every status code above 399 will tell the parent that the child failed. This can be a random number, but you could also use the official list to give a little more meaning. An other option is to use the output tag which can contain one or more 'output' parameters that can be read by the parent pipeline. You could for example pass through the error message with the Add dynamic content option.

      If you also want to pass an error description to the parent then you must extend the json message with an error tag {"Output":{"myOutputParam1":"failed"},"StatusCode":"401","error":"ErrorCode":"ParameterError","Message":"Required parameters where not provided"}}
Add two pipeline parameters










Web Activity for callback 













3) Determine URL
This solution will call the Create Run RestAPI of ADF to execute the pipeline. For this you need to replace the marked parts of the URL below by the Subscription ID, Resource Group name, Data Factory name and the Pipeline name of the child(/worker) pipeline. We will use this URL in the next step.

Example URL:
https://management.azure.com/subscriptions/aaaaaa-bbbb-1234-cccc-123456789/resourceGroups/DivisionX/providers/Microsoft.DataFactory/factories/DevisionX-ADF/pipelines/MyChildPipeline/createRun?api-version=2018-06-01


4) Webhook Activity
So for this second example we will use the Webhook Activity. This will execute the pipeline of the child(/worker) pipeline, but now it will wait until it receives a call back or until the timeout exceeds. Besides a status it can also retrieve messages from the child(/worker) ADF. See the json message in step two.
  • Go to your master ADF and click on Author & Monitor
  • Create a new pipeline and add a Webhook Activity to the canvas of the new pipeline
  • Give it a suitable descriptive name on the General Tab
  • Go to the Settings tab and enter the URL of the previous step
  • Choose POST as Method
  • Add a new header called Content-Type and with value application/json
  • As Body enter a JSON message. This could either be a dummy message or you could supply parameters in this message. The child parameter is called myParam1: {"myInputParam1":"bla bla"}
  • Use MSI as Authentication method
  • Enter this URL as Resource: https://management.core.windows.net/
Webhook Activity calling a child pipeline in an other ADF























5) Testing
Now trigger the new parent pipeline and check the monitor of the child ADF. You will see it receives two parameters. One provided in the JSON message in de BODY property and one provided by the Webhook activity itself.
Child pipeline - two parameters










Note that the child pipeline did not fail, because we handled the error. However the parent pipeline did fail because we sent a status code higher than 399. Next check the monitor of the child ADF and see the output parameters of the Webhook activity. It received a value from the child pipeline that could be used in a next activity
Parent pipeline - the result of the callback















Summary
In this blog post you learned how to give one ADF access to an other ADF and how to execute pipelines in that other ADF. The Webhook activity solution gives you a little more control compared to the Web activity. And with the correct json messages you can pass through messages and parameters from the parent to the child and back to the parent.

.

Monday 31 August 2020

Execute pipelines from an other Data Factory Async

Case
Can I use the Execute Pipeline Activity to execute pipelines from an other Data Factory? Or do we need an other activity for this?
Can you use the Execute Pipeline Activity for this?
























Solution
First of all why would you like to have multiple Data Factories other then for a DTAP-street (Development, Testing, Acceptance and Production). There are several reasons, for example:
  • Different departments/divisions each having their own Data Factory
    For example to prevent changes to your pipelines by users of a different department or to make it easier to split the Azure consumption between two divisions.
  • Different regions for international companies
    Either due legal reasons where for example the data may not leave the European Union
    Or to prevent paying unnecessary outbound data costs when your data is spread over different regions
  • Security reasons
    To prevent others to use the access provided via the Managed Identity of ADF. If you give your ADF access via MSI to an Azure Key Vault or an Azure Storage Account then everybody using that ADF can access that service via ADF.
If you have any other good reasons to use multiple Data Factories please let us know in the comments below.

And although you may have multiple Data Factories you could still use one Data Factory to execute pipelines from a different Data Factory. However you cannot use the Execute Pipeline Activity because it can only execute pipelines within the same Data Factory. 

You can either use the Web Activity or the Web Hook Activity for this. The Web Activity always executes the pipeline asynchronous. This means it does not wait for the result. The Web Hook Activity executes the child pipeline synchronous. Which means it waits until the child pipeline is ready and you could also retrieve the execution result via a call back. In this blog post we will show the asynchronous Web Activity and in an other blog post we will show the synchronous Web Hook Activity.


1) Give parent access to child via MSI
We will not use a user to execute the pipeline in the child(/worker) Data Factory, but instead we will give the managed identity (MSI) of the parent(/master) Data Factory access to the child(/worker) Data Factory. The minimum role needed is Data Factory Contributor, but you could also use a regular Contributor or Owner (but less is more).
  • Go to the child(/worker) Data Factory (DivisionX in this example)
  • In the left menu click on Access control (IAM)
  • Click on the +Add button and choose Add role assignment
  • Select Data Factory Contributor as Role
  • Use Data Factory as Assign access to
  • Optionally change the subscription
  • Optionally enter a (partial) name of your parent ADF (if you have a lot of data factories)
  • Select your parent ADF and click on the Save button
Give one ADF access to other ADF















2) Determine URL
This solution will call the Create Run RestAPI of ADF to execute the pipeline. For this you need to replace the marked parts of the URL below by the Subscription ID, Resource Group name, Data Factory name and the Pipeline name of the child(/worker) pipeline. We will use this URL in the next step.

Example URL:
https://management.azure.com/subscriptions/aaaaaa-bbbb-1234-cccc-123456789/resourceGroups/DivisionX/providers/Microsoft.DataFactory/factories/DevisionX-ADF/pipelines/MyChildPipeline/createRun?api-version=2018-06-01


3) Web Activity
So for this first example we will use the Web Activity. This will execute the pipeline of the child(/worker) pipeline, but you will not see the result in the master ADF. However you can see the executions in the monitor of the child(/worker) ADF.
  • Go to you master ADF and click on Author & Monitor
  • Create a new pipeline and add a Web Activity to the canvas of the new pipeline
  • Give it a suitable descriptive name on the General Tab
  • Go to the Settings tab and enter the URL of the previous step
  • Choose POST as Method
  • Add a new header called Content-Type and with value application/json
  • As Body enter a JSON message. This could either be a dummy message or you could supply parameters in this message. The child parameter is called myParam1: {"myParam1":"bla bla"}
  • Use MSI as Authentication method
  • Enter this URL as Resource: https://management.core.windows.net/
Web Activity calling a child pipeline in an other ADF
























Our dummy child pipeline in a different ADF only contains a Wait activity that waits 30 seconds. To force a child pipeline to fail we used a Stored Procedure activity that executes a RAISERROR statement.
Notice the parameter called myParam1















4) Testing
Now trigger the new master pipeline and check the monitor of the master ADF. For this example we executed one child pipeline that fails and one that succeeds, but is will show that both were successful. Also notice the execute duration of both the master and the worker pipelines.
ADF Monitor of both Master and worker












Summary
In this blog post you learned how to give one ADF access to an other ADF and to execute pipelines in an other ADF. The Web activity solution is very basic and will not show you the result, but it does allow you to pass through values via parameters.

In a next post we will show you the Webhook activity solution which does allows you to call back the master pipeline and show the execution result.

Sunday 14 June 2020

Pause and Resume Analysis Services with ADF only

Case
I want to start and stop my Azure Analysis Services from within Azure Data Factory, but I don't want write code or use other Azure services like Azure Automation or Azure Logic Apps to do this. Is there an Azure Data Factory-only solution where we only use the standard pipeline activities from ADF?
Save some money on your Azure Bill by pausing AAS




















Solution
Yes you can use the Web Activity to call the Rest API of Azure Analysis Services (AAS), but that requires you to give ADF permissions in AAS via its Managed Service Identity (MSI). If you already used our Process Model example, then this is slightly different (and easier).


1) Add ADF as contributer to AAS
Different than for processing one of the AAS models we don't need SSMS to add ADF as an Server Administrator. Instead we will use Access control (IAM) on the Azure portal to make our ADF a contributor for the AAS that we want to pause or resume.
  • Go to your AAS the Azure portal
  • In the left menu click on Access control (IAM)
  • Click on + Add and choose Add role assignment
  • In the new Add role assignment pane select Contributor as Role
  • In the Assign access to dropdown select Data Factory
  • Select the right Subscription
  • Now Select your Data Factory and click on the Save button
Add ADF as Contributor to AAS















2) Add Web Activity
In your ADF pipeline you need to add a Web Activity to call the Rest API of Analysis Services. First step is to determine the Rest API URL. Replace in the string below, the <xxx> values with the subscription id, resource group and servername of your Analysis Services. The Rest API method we will be using is 'Suspend' but you can replace that word by 'Resume' to startup the AAS:
https://management.azure.com/subscriptions/<xxx>/resourceGroups/<xxx>/providers/Microsoft.AnalysisServices/servers/<xxx>/Suspend?api-version=2017-08-01

Example:
https://management.azure.com/subscriptions/a74a173e-4d8a-48d9-9ab7-a0b85abb98fb/resourceGroups/bitools/providers/Microsoft.AnalysisServices/servers/bitools2/Suspend?api-version=2017-08-01

Second step is to create a JSON message for the Rest API. Well the Rest API doesn't use it, but it is required in the Web activity when you use POST as method. So you just need to create a dummy json message:
{
    "Dummy": "Dummy"
}
  • Add the Web activity to your pipeline
  • Give it a descriptive name like Pause AAS (or Resume AAS)
  • Go to the Settings tab
  • Use the Rest API URL from above in the URL property
  • Choose POST as Method
  • Add the dummy JSON message from above in the Body property
  • Under advanced choose MSI as Authentication method
  • Add 'https://management.azure.com/ in the Resource property (different than process example)
Web Activity calling the AAS Rest API



















Then Debug the Pipeline to check the suspend/resume action














3) Retrieve info
By changing the method type from POST to GET (body property will disappear) and removing the method (suspend or pause) from the URL, you can retrieve information about the AAS. Like status and pricing tier. You could for example use that to first check the status before changing it.
Retrieve service info via GET


Summary
In this post you learned how pause and resume your Analysis Services to save some money on your Azure bill. The big advantage of this method is that you don't need other Azure services which makes maintenance a little easier. In a next post we will also show you how to change the pricing tier via the Rest API.