Wednesday, 8 June 2022

Break or stop ForEach loop in ADF and Synapse

Case
I have a pipeline with a parallel Foreach loop to execute for example multiple Stored Procedures, but if one of those fails I want to stop the Foreach executing more Stored Procedures. Can I break the Foreach loop in case of an error without turning it from parallel into sequential? 
Can you stop a ForEach Loop? Nope!















Solution
You can't stop the foreach loop itself on error, but if an iteration fails then you can cancel the entire pipeline that is running the foreach loop. When the pipeline is running it gets a RUN ID and you can use that to call a Rest API to cancel the current pipeline run if one of the Stored Procedures inside the ForEach fails.
Cancel the entire pipeline














1) Give ADF/Synapse access to its own Rest APIs
To use the Rest APIs of ADF or Synapse within a pipeline you first need to give ADF or Synapse access to its own Rest APIs. This can be done in the Azure Portal on the overview page of the service.
  • Open the Azure Portal in your browser and go to the overview page of your Data Factory or Synapse Workspace.
  • In the left menu click on Access control (IAM)
  • Click on +Add and then choose Add role assignment
  • Select the role Contributor or for ADF Data Factory Contributor and click on Next
  • Under Assign access to select Manged identity
  • Under Members click on +Select members
  • Now you need to select Synapse workspace or Data Factory (V2) and search for your ADF or Synapse
  • Click on your Synapse or ADF and click on the Select button
  • Optionally enter a description: "Give ADF/Synapse access to its own Rest APIs"
  • Click on the Review + assign button (twice) 
The animated GIF is from a Synapse workspace, but it is identical to Data Factory. Just select Data Factory as Managed identity type.
Giving Synapse access to its own Rest APIs
















2) Add Web Activity in Foreach on fail
In the ForEach loop construction we need to add a Web Activity to call the Rest API that cancels the Pipeline run. Data Factory and Synapse have a different URL to cancel the pipeline. Within the example URL you need to replace the red parts with the curly brackets by either a hard code value, a parameter or a System Variable. For Data Factory and Synapse you can use the same System Variabes to retrieve the factoryName/workspaceName and the runId: pipeline().DataFactory / pipeline().RunId

Azure Data Factory:
https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.DataFactory/factories/{factoryName}/pipelineruns/{runId}/cancel?api-version=2018-06-01

@concat('https://management.azure.com/subscriptions/',
pipeline().parameters.subscriptionId
'/resourceGroups/',
pipeline().parameters.resourceGroup,
'/providers/Microsoft.DataFactory/factories/',
pipeline().DataFactory,
'/pipelineruns/',
pipeline().RunId,
'/cancel?api-version=2018-06-01')

Azure Synapse Workspace
{endpoint}/pipelineruns/{runId}/cancel?api-version=2020-12-01

or a little easier:
https://{workspaceName}.dev.azuresynapse.net/pipelineruns/{runId}/cancel?api-version=2020-12-01

@concat('https://',
pipeline().DataFactory,
'.dev.azuresynapse.net/pipelineruns/',
pipeline().RunId,
'/cancel?api-version=2020-12-01')

  • In the ForEach add a Web Activity and connect it to your existing activity
  • Make sure to change the type of the line from Success to Failure
  • Give the Web Activty a useful name and go to the Settings tab
  • For URL use and expression like above (don't forget to add parameters if you use them)
  • Set the Method to POST
  • Enter a fake/dummy JSON message in the body. We don't need it, but it is required
  • Set the Authentication to System Assigned Managed Identity
  • And last set the Resource to https://management.azure.com/
Web Activity to Cancel the pipeline run





















3) Testing
For this example we used 10 Stored Procedures of which one of them fails on purpose. First make sure to publish the pipeline and then trigger the pipeline (don't use Debug). Then check the monitor to see the result.
One Stored Procedure Failed, rest is cancelled




















Conclusion
In this post you learned how to stop a running pipeline when one of the activities within a foreach loop fails so that is doesn't unnecessary executes new activities. One of the downsides is that in the logs the status of your pipeline will be 'Cancelled' instead of 'Failed', but it saves you running (and paying for) unnecessary activities.

Another downside is that it does not work in debug mode because then you don't get a RUN ID and there is no pipeline run to cancel. The workaround for that is to put the Web Activity that cancels your pipeline in an IF construction that only cancels if the RUN ID is not 0.

The usefulness of this construction get better if you have a lot of iterations and hopefully it's not the last iteration that failed. Let us know in the comments what you think of a construction like this and if you have an alternative solutions.