Sunday, 16 February 2020

Use Azure Key Vault for Azure Data Factory

Case
I need to use some passwords and keys in my Azure Data Factory (ADF) pipelines, but for security reasons I don't want to store them in ADF. How do I prevent storing sensitive data in Azure Data Factory?
Using Azure Key Vault for Azure Data Factory













Solution
Let's say you want to connect to a database in ADF with SQL Authentication. Instead of saving the password in a Linked Services connection of ADF, you can store that password in an Azure Key Vault as a secret. Then you give ADF read rights (GET Secret) to that Key Vault to retrieve the password stored in the secret. In the Linked Services connection you configure which Key Vault to use and the name of your secret.

The example below will show all individual steps in detail including creating an Azure Key Vault, but assumes you already have an Azure Data Factory with pipelines.

1) Create Key Vault
First step is creating a key vault. If you already have one then you can skip this step.
  • Go to the Azure portal and create a new resource
  • Search for key vault
  • Select Key Vault and click on Create
  • Select your Subscription and Resource Group 
  • Choose a useful name for the Key Vault
  • Select your Region (the same as your other resources)
  • And choose the Pricing tier. We will use Standard for this demo
Creating a new Key Vault
















2) Add Secret
Now that we have a Key Vault we can add the password from the SQL Server user. The Key Vault stores three types of items: Secrets, Keys and Certificates. For passwords, account keys or connectionstrings you need the Secret.
  • Go to the newly created Azure Key Vault
  • Go to Secrets in the left menu
  • Click on the Generate/Import button to create a new secret
  • Choose Manual in the upload options
  • Enter a recognizable and descriptive name. You will later on use this name in ADF
  • Next step is to add the secret value which we will retrieve in ADF
  • Keep Content type Empty and don't use the activation or expiration date for this example
  • Make sure the secret is enabled and then click on the Create button
Adding a new secret to Azure Key Vault
















3) Access policies
Now we have to give ADF access to the Key Vault to read its content. You can now find ADF by its name so you don't have to search for its managed identity guid, but using that guid is also still possible.
  1. Go to Access policies in the left menu
  2. Click on the blue + Add Access Policy link
  3. Leave Configure from template empty
  4. Leave Key permissions unselected (we will only use a Secret for this example)
  5. Select Get for Secret permissions
  6. Leave Certificate permissions unselected (we will only use a Secret for this example)
  7. Click on the field of Select principal to find the name of your Azure Data Factory
  8. Leave Authorized application unchanged
  9. Click on Add and a new Application will appear in the list of Current Access Policies
Add Access policy
















If you want to use the Managed Identity GUID then first go to ADF and click on Properties in the left menu. There you will find the Managed Identity Object ID that you can use to search for principals under the access policies.
The Managed Identity of ADF


















4) Create Linked Service connection to Azure Key Vault
Now we need to let ADF know about your new Azure Key Vault by adding an extra Linked Service connection to your Key Vault.
  1. Go to ADF and open the Author & Monitor editor
  2. Within the new tab go to the Author section (Pencil icon) and click on connections to see all Linked Services
  3. Add a new Linked Service by clicking the + New button
  4. Search for (Azure) Key Vault and click on continue to enter all connection details
  5. First enter a descriptive name and optionally a description
  6. Select your subscription
  7. Select your newly created Key Vault
  8. Test the connection and if successful click on Create
Add Key Vault Linked Service


















5) Create Linked Service connection
Last step is creating the SQL Server Linked Service connection in ADF. In this example we used a SQL Server connection, but the same principal works for all connections in ADF. Note: that you can also retrieve the entire connection string from the Key Vault, but for this example we only retrieve the password secret.

Before you start make sure the Linked Service connection to the Key Vault has been  published. Otherwise you get an error message when hitting the create button
  1. Add a new Linked Service by clicking the + New button
  2. Search for (Azure) SQL Database and click on continue to enter all connection details
  3. First enter a descriptive name and a description
  4. Then use Connection string (not Azure Key Vault) to select your server and database
  5. After that use Azure Key Vault (not Password) to retrieve the password
  6. Select the Key Vault from the previous step
  7. Enter the name of the secret (that you created in step 2)
  8. Test the connection and if successful hit the create button
  9. Now you can use this Linked Services connection in your pipelines
Create connection and retrieve password via Key Vault



















Conclusion
In this post you learned how to store sensitive data for your ETL the right way by creating a Key Vault and use it in Azure Data Factory. An important step is to give ADF rights to retrieve secrets from the Key Vault. Either via its name of the Managed Identity.

In a previous post we also showed you how to use the same Key Vault in an Azure Automation Runbook to avoid passwords and keys in the PowerShell code. In a future post we will show you how to use it in other tools like Azure Databricks or Azure Functions.

It is also possible to use Key Vault secrets in the pipelines instead of in the Linked Services, but this is explained in a separate post.