I don't want to use the Access Keys to access my Data Lake with Azure Data Factory. Is there a better alternative?
Don't use the Access Keys |
Solution
There are various options to authorize access to your Storage Account, but using Managed Service Identity (MSI) is probably the easiest and safest. This is because you don't need to use any secrets like passwords or keys that could end up in the wrong hands. This means you give a specific ADF access to your Storage Account via its Managed Service Identity (MSI). Each deployed ADF can be found in the Azure Actice Directory and you can assign a role like Storage Blob Data Contributor or Storage Blob Data Reader to that ADF and give for example access to an entire Storage Account or a specific container.
For this example we have an existing Azure Storage Account (general purpose v2) and an existing Azure Data Factory. We will give ADF write access to the entire storage account. Networking (nsg, VNETs, subnets, etc.) is out of scope for this example.
1) Access Control (IAM)
First step is to configure authorize your Data Factory within the stored account. This is where we will give ADF the role Storage Blob Data Contributor. This will allow ADF to read, write and delete Azure Storage containers and blobs. There is also an optional conditions step where you can add additional rules.
- Go to the Azure Portal and then to the Storage Account where ADF needs access to
- In the left menu click on Access Control (IAM)
- Click on +Add and choose Add role assignment
- Select the required role (Storage Blob Data Contributor for this example) and click Next
- Now first check the Managed identity radio button and then click on +Select members
- In the Managed Identity field on the right side select Data Factory and then search for your ADF by name. One or more factories will appear in a list.
- Click on your ADF and then on the Select button.
- A description is optional, but it could be handy later on.
- Now click on Next and optionally add one or more conditions. In the example we wont be adding conditions.
- Next click on Review + assign to finish
2) Test ADF Linked Service
Now go to Data Factory and create a new Linked Service to your Data Lake. Make sure the Authenication Method is Managed Idenity. After that you need to select your Data Lake and hit the Test Connection button.
Create Linked Service to test connection |
If you get a 'forbidden' error like below then:
- Check whether you selected the right ADF under step 1 and the correct Storage Account under step 2.
- Make sure you checked Managed Identity under step 1 (and not 'User, group or service principal')
- Test the network settings for example by creating a linked service with the Account Key.
Conclusion
In this blog post you learned how to give ADF access to your Storage Account via its Managed Service Identity. This is probably the easiest and safest way to authorize ADF. You could use the same trick to for example give ADF access to your Azure SQL Database.