Sunday, 1 November 2020

ADF Snack: Adding metadata columns

Case
How do I add (and populate) metadata columns to my stage tables from within Azure Data Factory without executing a Stored Procedure with an update query afterwards?
Adding metadata columns in ADF












Solution
The Copy Data activity has on the Source pane a section called Additional columns that allows you to create extra columns and populate them with values from Data Factory like the run id, run date, pipeline name and file path. It also allows you to copy columns (f.e. to populate two destination columns with the same source value), but It does not allow you to add fancy stuff like a hash of all columns. For the fancy ETL stuff you should take a look at Mapping Data Flows or solve it afterwards with that Stored Procedure activity and some TSQL code.

1) Copy data acticity
For this example we will use a CSV file as source and add a couple of new metadata columns that are not available in the CSV file itself. We will also copy one of the existing columns to a new column. This is very useful if you want to populate two destinations columns with the value from one source column.
  1. Go to your existing Copy Data Activity and then to the Source pane.
  2. At the bottom you will find the Additional columns section. Click on the + New icon to add new columns.
  3. The first column is using an out of the box value by selecting $$FILEPATH in the value column. It retrieves the filename of the source file. Note that this is of course only available when your source is a file.
  4. The second column is using the column copy option. Select $$COLUMN in the value column and then in the third column the column name from the file that you want to copy. If no existing columns are available then hit the Refresh icon in the Additional columns section (or check the dataset).
  5. The third column is using a hardcoded text describing the source application where the file came from.
  6. The next three columns are filled via the dynamic content option to add values from Data Factory. In this example @pipeline().RunId, @pipeline().TriggerTime and @pipeline().Pipeline. You could also add columns with values from parameters or variables.
Adding six additional (metadata) column





















2) The result
To see the result of adding the additional columns click on the Preview data icon. Next you can optionally change the values of the expressions. To see the final preview click on the OK button.
Preview of the Additional columns
















Summary
In this post you learned how to add extra columns with metadata information without using UPDATE queries in an Stored Procedure Activity. The options are a bit basic, but with for example the Get Metadata activity you could retrieve additional information from the file like the last modified date or file size and add that as an Additional column as well.
Adding additional info from Get Metadata activity




1 comment:

  1. What if the source file is in .xlsx format or binary format files, copy data does not have the "additional column" option in that case

    ReplyDelete