Thursday, 30 April 2020

Azure Data Factory - Parameters event based triggers

My files arrive at various moments during the day and they need to be processed immediately on arrival in the blob storage container. At the moment each file has its own pipeline with its own event based trigger. Is there a more sustainable way where I don't have to create a new pipeline for each new file?
Event based triggers can pass through parameters

Yes, if your source files have names that can easily be matched to table names then it is very easy and you only need a couple of parameters and a single Copy Data activity. If you cannot match a source file by its name then you might want to add a Lookup activity that can find the match in a configuration table. Or if unexpected files can also arrive in that same blob storage container then you also might want to add the Lookup activity to check whether the file is expected.

The solution uses parameters that can be filled by the event based trigger. With a simple expression you can pass on the filename and folderpath to the pipeline.

1) Pipeline parameters
Start with a simple pipeline that only contains a single Copy Data activity that copies a specific file to a specific SQL Server table. Then add two parameters to this pipeline. One for the filename and one for the folderpath: SourceFileName and SourceFolderPath. The folder path parameter is optional for this basic example and will contain the containername and the foldername combined: "container/folder".
Add 2 pipeline parameters

2) Dataset sourcefile
Go to the Dataset of your source file from the blob storage container. In the Parameters tab add a new string parameters called FileName. As default value you can use the name of the file. Optionally you can add a second string parameter for the folderpath.
Add parameter FileName to dataset

Next step is to use this newly created dataset parameter in the connection to the file. Go to the Connection tab of your dataset and replace the filename with some dynamic content: @dataset().FileName
Add dynamic content for filename

If you want to use the folderpath parameter as well then you can add it to the container textbox and leave the Directory textbox empty (the containername is mandatory, but besides the container name you can also add the folderpath).

3) Dataset stage table
Now repeat the same actions for the dataset that points to your stage table. First add a parameter called TableName in the parameter tab.
Add parameter TableName to dataset

And then replace the Table property with some dynamic content: @dataset().TableName (check the Edit checkbox below the table before adding the expression).
Use dataset parameter to configure the tablename

4) Copy Data activity - Source
Now go back to your pipeline and edit the Source of your Copy Data activity. Notice that you now see a new Dataset property called FileName that points to the parameter of the dataset. This is where you fill the dataset parameter with the value of the pipeline paramter. The value should be set with some dynamic content: @pipeline().parameters.SourceFileName. This is the pipeline parameter created in step 1.
Copy Data activity - Dynamic content in FileName

5) Copy Data activity - Sink
Now repeat the same for the Sink where you now see a new Dataset property called TableName. Here is where you need to match the filename to the table name. In our example the tablename is equal to the filename without the extension. Therefor we replaced the extension with the replace expression: @replace(pipeline().parameters.SourceFileName,'.csv', '')
Copy Data activity - Dynamic content in TableName

6) Edit Trigger
The last step of the solution is passing through the filename and folderpath from the trigger to the pipeline parameters with an triggerbody expression that you also see in Azure Logic App and Power Automate:
Returns the filename of the file that caused the trigger:
Returns the folderpath of the file that caused the trigger:

When you edit or create a new trigger in the pipeline then the last step is filling in the variables. By using the above expressions they automatically fill them with information of the file that caused the trigger.
triggerBody() expressions

When two or more file are uploaded then each of them will trigger the same pipeline separately, but all with different values in the pipeline parameters.

7) Testing
After publishing all changes we can upload several files, one at a time or multiple at once, to test the new trigger pipeline. The last two rows indicate that the files where uploaded at the same time, but both triggered the pipeline separately. You can check the parameters by clicking on [@].
Monitoring the pipeline runs

In this post you learned how to use the triggerBody() expression from the event based triggers. You can pass on its value to the Pipeline parameters and then pass on those pipeline parameters to the dataset parameters.

With this very basic construction you can handle as many files as you like. If you do not need to process them as once then we suggest to use the foreach loop construction instead.

No comments:

Post a comment

Related Posts Plugin for WordPress, Blogger...