Sunday, 20 February 2022

ADF Snack - Give ADF access to SQL DB via MSI

Case
I'm using Azure Data Factory to populate tables in an Azure SQL Database. I want to avoid using SQL Server authentication/Basic authentication. What is the best alternative?
Using the Managed Service Identie of ADF











Solution
Recently we showed how to give ADF access to a Data Lake via its Managed Service Identity. You can also use that to give ADF access within your database. This will save you the hassle of using accounts and passwords within ADF (or Azure Key Vault).

 
1) Enable Azure Active Directory authentication
First make sure you are using Azure AD authentication for your Azure SQL Server by adding an Azure Active Directory admin within your SQL Server. Without an AAD admin you cannot add AAD users (or Data Factories) to your database.

Go to your Azure SQL Server (not the database) and under Azure Active Directory in the left menu you can check and change this option. To try it out you just could add your own AAD account as AAD admin.
Enable Azure Active Directory authentication












2) Create user in SQL DB
Now use your favourite query tool and login to your database with an AAD account that has the db_owner role. Execute the following create-user-command on your database. The user is the name of your Azure Data Factory. 
-- Add your Azure Data Factory as external to SQL DB
CREATE USER [YourDataFactoryName] FROM EXTERNAL PROVIDER;

-- Check the permissions of your Azure Data Factory
SELECT	DISTINCT pri.principal_id
,		pri.name
,		pri.type_desc
,		pri.authentication_type_desc
,		per.state_desc
,		per.permission_name
FROM	sys.database_principals AS pri
JOIN	sys.database_permissions AS per
		ON per.grantee_principal_id = pri.principal_id
WHERE	pri.name = 'YourDataFactoryName'

-- Remove all permission of your Azure Data Factory
DROP USER [YourDataFactoryName]
This command gives your ADF connect permissions on your database. This is enough to test the connection of a Linked Service in ADF, but is not enough to actually do something with the data.
Add ADF as extermal user













3) Give data permissions
Next step is to give your ADF a database role so that it is able to read or write data depending on your needs. This can be done with the stored procedure sp_addrolemember or via the ALTER ROLE command
-- Two options to add a DB role to your ADF

-- Add db_datareader role to your ADF
EXEC sp_addrolemember N'db_datareader', N'YourDataFactoryName'

-- Add db_datawriter role to your ADF
ALTER ROLE [db_datawriter] ADD MEMBER [YourDataFactoryName]

-- Check roles of your ADF
SELECT				DP1.name AS DatabaseRoleName
,					isnull (DP2.name, 'No members') AS DatabaseUserName   
FROM				sys.database_role_members AS DRM  
RIGHT OUTER JOIN	sys.database_principals AS DP1  
					ON DRM.role_principal_id = DP1.principal_id  
LEFT OUTER JOIN		sys.database_principals AS DP2  
					ON DRM.member_principal_id = DP2.principal_id  
WHERE				DP1.type = 'R'
AND					DP2.name = 'YourDataFactoryName'

-- Two options to remove the role from your ADF
-- EXEC sp_droprolemember N'db_datareader', N'YourDataFactoryName'
-- ALTER ROLE [db_datawriter] DROP MEMBER [YourDataFactoryName]
Give ADF permission to read and/or write data
















4) Create Linked Service via MSI
Now create a new Linked Service in ADF and use Managed Identity as the Authenication type. When you do this you don't have to fill in a password. Hit the Test connection button to see if it works.
Linked Service to your database via MSI















If it's failing check the firewall/VNet/Subnet settings or check whether you executed the above queries on the correct database.
22300 - Cannot connect to SQL Database:
'bitools2.database.windows.net', Database:
'bitools2', User: ''. Check the linked
service configuration is correct, and
make sure the SQL Database firewall
allows the integration runtime to
access. A severe error occurred on the
current command. The results, if any,
should be discarded., SqlErrorNumber=0,
Class=11,State=0, Activity ID: abcdfe.



























Conslusion
In this blogpost you learned how easy it is to give ADF persmissions within your database without using accounts and passwords. Note that those permissions are totally different than when you want to scale your database. The same method with the Managed Service Identity can be used to give ADF access to your Azure Storage Account or Azure Key Vault.



Sunday, 13 February 2022

ADF Snack - Empty file when creating new folder

Case
I'm adding files to a Data Lake container via ADF, but when creating a new folder I'm also getting an empty file with the same name as the folder. How do I avoid the creation of these unwanted empty files?
Unwantend empty files with same name as folders










Solution
You are probably using an Azure Storage Account V2 (general purpose v2) without the hierarchical namespace setting enabled. You can check that on the Overview page of the Storage Account.
Hierarchical namespace disabled
















To solve this you need to enable that setting in your storage account. You can do that bij clicking on 'Disabled' behind Hierarchical namespace on the Overview page of the Storage Account. However there are some options that don't work with the hierarchical namespace like 'point-in-time restore for containers', 'versioning for blobs', 'blob change feed' and 'version-level immutability support''. When you enable the hierarchical namespace setting you will get a warning about this so that you can disable those settings. After Step 2 Validate account you can donwload a json file all the stuff you need to fix first.
Upgrading your storage account














Make sure to consult other users / Data Contributors of that Data Lake before changing those settings or at a last resort you could just create a new separate Storage Account if you are not sure about the settings. An other option could be to create a Python or PowerShell script to clean up the empty files.

Conclusion
In this post your learned how to fix an annoying needless empty file 'bug' for each directory. This is especially handy when you have those flexible pipelines that store your files in neat folder structures per year/month/day. It is just one setting, but it could have consequences for other settings. Thanks to colleague Derek Groen for finding the right setting.




Saturday, 12 February 2022

ADF Snack - Give ADF access to Data Lake via MSI

Case
I don't want to use the Access Keys to access my Data Lake with Azure Data Factory. Is there a better alternative?

Don't use the Access Keys














Solution
There are various options to authorize access to your Storage Account, but using Managed Service Identity (MSI) is probably the easiest and safest. This is because you don't need to use any secrets like passwords or keys that could end up in the wrong hands. This means you give a specific ADF access to your Storage Account via its Managed Service Identity (MSI). Each deployed ADF can be found in the Azure Actice Directory and you can assign a role like Storage Blob Data Contributor or Storage Blob Data Reader to that ADF and give for example access to an entire Storage Account or a specific container.

For this example we have an existing Azure Storage Account (general purpose v2) and an existing Azure Data Factory. We will give ADF write access to the entire storage account. Networking (nsg, VNETs, subnets, etc.) is out of scope for this example.

1) Access Control (IAM)
First step is to configure authorize your Data Factory within the stored account. This is where we will give ADF the role Storage Blob Data Contributor. This will allow ADF to read, write and delete Azure Storage containers and blobs. There is also an optional conditions step where you can add additional rules. 
  • Go to the Azure Portal and then to the Storage Account where ADF needs access to
  • In the left menu click on Access Control (IAM)
  • Click on +Add and choose Add role assignment
  • Select the required role (Storage Blob Data Contributor for this example) and click Next
  • Now first check the Managed identity radio button and then click on +Select members 
  • In the Managed Identity field on the right side select Data Factory and then search for your ADF by name. One or more factories will appear in a list.
  • Click on your ADF and then on the Select button.
  • A description is optional, but it could be handy later on.
  • Now click on Next and optionally add one or more conditions. In the example we wont be adding conditions.
  • Next click on Review + assign to finish
Give ADF access to Data Lake















2) Test ADF Linked Service
Now go to Data Factory and create a new Linked Service to your Data Lake. Make sure the Authenication Method is Managed Idenity. After that you need to select your Data Lake and hit the Test Connection button.
Create Linked Service to test connection














If you get a 'forbidden' error like below then:
  • Check whether you selected the right ADF under step 1 and the correct Storage Account under step 2.
  • Make sure you checked Managed Identity under step 1 (and not 'User, group or service principal')
  • Test the network settings for example by creating a linked service with the Account Key.
24200 - ADLS Gen2 operation failed for: Storage operation
'' on container 'testconnection' get failed with
'Operation returned an invalid status code 'Forbidden''.
Possible root causes: (1). It's possible because the
service principal or managed identity don't have enough
permission to access the data. (2). It's possible because
some IP address ranges of Azure Data Factory are not
allowed by your Azure Storage firewall settings. Azure
Data Factory IP ranges please refer https://docs.micro..
Account: 'abcdefg'. ErrorCode:
'AuthorizationPermissionMismatch'. Message: 'This request
is not authorized to perform this operation using this
permission.'. RequestId: '19e44b-601f-0029-4e30-000000'.
TimeStamp: 'Sat, 12 Feb 2022 16:52:22 GMT'.. Operation
returned an invalid status code 'Forbidden' Activity
ID: 94880d51-g593-418a-8q16-e7dab5c013f3.


































Conclusion
In this blog post you learned how to give ADF access to your Storage Account via its Managed Service Identity. This is probably the easiest and safest way to authorize ADF. You could use the same trick to for example give ADF access to your Azure SQL Database.