Showing posts with label ADF. Show all posts
Showing posts with label ADF. Show all posts

Sunday 8 May 2022

Refresh Power BI Datasets via its name instead of Id

Case
In our last post we refreshed a Power BI dataset from within an ADF or Synapse pipeline. For this we used the IDs of the PBI Workspace and Dataset, but it would be much user friendly to use the name instead of the ID.
Translate Name to Id






Solution
This is a follow up post. Make sure you first create a Pipeline that either uses a Service Principal (SP) or the Managed Identity (MSI) to authorize against Power BI. This example adds three aditional activities before the Web Activity called Call dataset refresh. If you used the Service Principal it is between Get AAD Token and Call dataset refresh


1) Change Parameters
In the previous post we used the parameters named WorkspaceId and DatasetId. Let's change those to WorkspaceName and DatasetName. The type is still a string for both parameters.
Change parameters from Id to Name








2) Get WorkspaceId
To translate the Workspace Name to a Workspace Id we will call a Rest API with a Web Activity. Make sure to use the same authorization as you used in the Call dataset refresh.
  • Add Web Activity to your existing pipeline
  • Give it the name Get WorkspaceId. It will be used later on.
  • Set the URL to the following expression
    @concat(
    'https://api.powerbi.com/v1.0/myorg/groups?$filter=name%20eq%20''',
    pipeline().parameters.WorkspaceName,'''')
    It uses the parameter from step 1 in the filter option from the Rest API. Note that the space is translated to %20
  • Set Method to GET
  • Now either add a authorization header for SP or set the Authenication for the MSI
This will return 1 workspace (if it exists).
Get Workspace Id via its Name













3) Get All Datasets
Next step is to translate the Dataset Name to and ID. This is also done via a Rest API, but unfortunately it doesn't support a filter. So we will get ALL datasets from the Workspace and then filter it afterwards. Again make sure to use the same authorization as you used in the Call dataset refresh.
  • Add Web Activity to your existing pipeline and connect it to Get WorkspaceId
  • Give it the name Get All Datasets. It will be used later on.
  • Set the URL to the following expression
    @concat('https://api.powerbi.com/v1.0/myorg/groups/',
    activity('Get WorkspaceId').output.value[0].id,
    '/datasets')
    It will use the parameter from step to get all datasets from a specific workspace.
  • Set Method to GET
  • Now either add a authorization header for SP or set the Authenication for the MSI
This will return all datasets from the workspace from step 2 (if you have access).
Get all datasets from a workspace













4) Filter DatasetName
The last new activity is a Filter activity which we will use to filter all the datasets from step 3 to only the one we need.
  • Add Filter Activity to your existing pipeline and connect it to Get All Datasets
  • Give it the name Filter DatasetName. It will be used later on.
  • Set Items expression to
    @activity('Get All Datasets').output.value
    This is the output from step 3
  • Set Condition expression to
    @equals(item().name,pipeline().parameters.DatasetName)
    This is the second parameter from step 1
Filter all datasets with a dataset name to get only 1








5) Change expressions
Now go to all the successive (Web) activities and change the expressions where you used a parameter. Now you need the get the WorkspaceId with:
activity('Get WorkspaceId').output.value[0].id
and the DatasetId with:
activity('Filter DatasetName').output.value[0].id


Conclusion
In this follow up post you learned how to translate the Workspace Name and Dataset Name to an ID which is required for the Rest API calls. With the three extra activities you just made your pipeline a little more friendlier to use, but also a little more monkey proof. Because if you delete a dataset and publish it again, it gets a different ID and then you also need to change your pipeline parameter. In a next follow up post we will refresh only certain tables within a Power BI dataset.

thx to colleague Hans Timmerman for helping.

Saturday 30 April 2022

Refresh Power BI Datasets with ADF or Synapse (MSI)

Case
I want to refresh my Power BI Dataset as soon as my ETL process has finished. I don't want to schedule my dataset refresh x minutes/hours after my ADF(/Synapse) pipeline schedule because then I'm either refreshing too soon or waiting unnecessarily long to start the refresh. How can I minimize the time between the end of my ETL proces and the start of my dataset refresh?
Refresh Power BI Dataset












Solution
In the previous blog post we showed you how to use the Rest APIs from Power BI to refresh a dataset from within your ADF(/Synapse) pipeline. For that example we used an Azure Active Directory Service Principal (SP), but there is also an option to use the Managed (Service) Identity (MSI) from ADF(/Synapse). 

Using the MSI not only saves you creating a SP (and storing/recycling its secret), but the pipeline also uses less activities which makes it more readable. However you probably still need some help from a Domain Administrator and/or an Office 365 Administrator, because:
  • You need to create an Azure Active Directory Group and add the MSI of your ADF or Synapse to it.
  •  After that you need to use that AAD Group within the Tenant Settings of Power BI.

The basic version of the pipeline looks like this. The more complex version could for example retrieve the required workspace id and dataset id by its name via additional Rest API calls to Power BI. More on that in the conclusions.
Basic pipeline to refresh a PBI dataset












Now below all the steps to create this pipeline that can refresh your Power BI datasets. There are a lot of steps, but don't let that scare you. It seems way more difficult then it really is.

1) Finding the MSI of your ADF/Synapse workspace
For this example we will give the Managed (Service) Identity of your ADF or Synapse workspace access to Power BI, but how can you find this MSI? They can be found in the Azure Active Directory (AAD) as an EnterPrise Application. This means you can authorize it for services like Power BI just like a regular user from the AAD. The EnterPrise Application has a name (equals to the name of your ADF or Synapse workspace), an object id (a guid) and an application id (also a guid).
Find your ADF (or Synapse workspace) in the AAD



















The Name is the one you need and the Application ID is the one you can use to double check that you have the correct item from the AAD. ADF also shows the Application ID on its Properties page as Managed Identity Application ID. However Synapse only shows the Managed Identity object ID which is useless in this case.

2) Allow MSI to use Power BI APIs
For this first real step you need access to the Power BI admin portal to give the MSI of ADF(/Synapse) access to the Rest APIs from Power BI. 
  • Log in to Power BI online and click on the gear icon in the upper right corner
  • Click on Admin portal in the menu to open the portal
  • Then click on Tenant settings (if you don't see this, continue after the next image)
  • Scroll down to the Developer settings and locate 'Allow service principal to use Power BI APIs'
Either allow the entire organization to use the Rest APIs (not recommended) or specify an Azure Active Directory Group and make sure your MSI is a member of that group (this is where you probably need some help from the Domain Administrator. 
PBI Admin portal - Tenant Settings















If you couldn't find the tenant settings in the Admin Portal from Power BI then you need an Power BI administrator role. Your Office 365 administrator can either give you the role or do the steps above for you. If you are able get this role then you should probably consider using Azure AD Privileged Identity Management (PIM) to first get approval from a colleage to temporarily get this role and also use the PBI audit log to record all logins to the admin portal and record all changes in the tenant settings. 
Not a PBI admin












3) Give permissions in Power BI Workspace
Next step is to give your MSI contributor rights in the PBI Workspace that contains the dataset that you want to refresh.
  • Go to your Power BI workspace and click on Access in the upper right corner
  • Search for your Service Principal by its name (ours is called PowerBI)
  • Select the right permission (contributor) and click on the Add button
  • Verify the changed list of permissions and then click on the Close button
Give MSI permissions in your workspace
















4) Refresh Dataset
Now go to your pipeline in ADF (or Synapse). For refreshing a dataset we need both the workspace and the dataset ID from Power BI. We will store these IDs in two Pipeline parameter so that you can use the pipeline to refresh different datasets. In a later post we will show how to use the names instead of the IDs.
Pipeline parameters for Workspace ID and Dataset ID















The refresh command will be given via a Web activity that executes the Refresh Dataset In Group Rest API from Power BI. The refresh call is asynchronous, so it does not wait for the refresh to finish.
  • Add a Web Activity and give it useful name.
  • The URL uses the two IDs stored in the two pipeline parameters above. Use an expression like this to build the URL (check the parameters names):
    @concat(
        'https://api.powerbi.com/v1.0/myorg/groups/'
        , pipeline().parameters.WorkspaceGuid
        , '/datasets/'
        , pipeline().parameters.DatasetGuid
        , '/refreshes'
        )
  • Set Method to Post
  • The Body can be used to set certain options like no notifications. See the documentation for more options. Enter the following JSON in the body:
    {"notifyOption":"NoNotification"}
  • Set Authentication to System Assigned Managed Identity
  • Set Resource to https://analysis.windows.net/powerbi/api
Refresh a Power BI dataset via Rest API
























5) Check refresh status
The last steps are optionally, but if you want to be sure your dataset has been successfully refreshed we need to check the dataset refresh status in a loop. For this task the UNTIL loop is the best option. Within the loop we have a Web Activity to check the status and an Wait activity (within an If condition) to wait a couple of minutes/seconds because you don't want to check the status each second.
The check status construction











A slightly simpler option would be to just put a Wait Activity before the Web Activity instead of the If construction after the Web Activity.

Now all the individual steps and their screentshot:
  • Add an Until and give it a suitable name
  • The Expression continues the loop until the status is not equals 'Unknown' :
    @not(
        equals(
            first(
                json(
                    string(
                        activity('Get dataset refresh status').output
                    )
                ).value
            ).status,'Unknown'
        )
    )
    Note the name of the activity that will be inside the Until loop (if it is not yet there, it will give an error)
  • Consider changing the default 7 day timeout to the expected max refresh duration with an extra couple of minutes to be sure
Until loop















  • Add a Web Activity within the Until loop and name it 'Get dataset refresh status'. If you change that name, then you have to change it in all the expressions from the next steps.
  • The URL is simular to the Refresh dataset, but it uses a different action 'refreshes' and a top 1 to only get the latest status:
    @concat(
        'https://api.powerbi.com/v1.0/myorg/groups/'
        ,pipeline().parameters.WorkspaceGuid
        ,'/datasets/',pipeline().parameters.DatasetGuid
        ,'/refreshes?$top=1'
        )
  • Set Method to Get
  • Add a header with the name 'Content-Type' and the value 'application/json'
  • Set Authentication to System Assigned Managed Identity
  • Set Resource to https://analysis.windows.net/powerbi/api
Web Activity to get the refresh status
























  • Add an If Condition after the Web activity
  • The expression checks whether the Web Activity still returns 'Unknown' as status (same expression as the Until loop):
    @equals(
        first(
            json(
                string(
                    activity('Get dataset refresh status').output
                )
            ).value
        ).status,'Unknown'
    )
Check status value with IF


















  • Last step is to add a Wait Activity in the True condition.
  • Choose a smart number of seconds. Too low and you will get a lot of unnecessary checks. Too high and you will get an unnecessary long turnaround time.
Wait X seconds














6) Fail pipeline if refresh failed
If the dataset refresh in Power BI fails then the Web Activity will show you the status, but it won't fail the pipeline. The Until loop ends when the status changes. So by adding an additional If Condition that checks whether the refresh status is failed, allows you to use the Fail activity to fail the pipeline.
  • Add a If Condition after the Until loop
  • The expression is simulair to the If Condition within the loop. Just a different status:
    @equals(
        first(
            json(
                string(
                    activity('Get dataset refresh status').output
                )
            ).value
        ).status,'Failed'
    )

  • Within the True condition add the Fail activity
  • You can either enter a default fail message or try to get the error from the Web Activity with an expression like this:
    @string(
        json(
            first(
                json(
                    string(
                        activity('Get dataset refresh status').output
                    )
                ).value
            ).serviceExceptionJson
        )
    )
  • Last task is to pick an error code. This will show up in the logs.
Fail pipeline if refresh fails





















Conclusion
In this post you learned how to refresh a Power BI dataset from within your ETL pipelines. Hopefully you weren't scared of due all those steps. In a next blog post we will show you how to use the workspace name and data set name instead of their IDs because that is much more user friendly and even a little safer because when you delete a dataset and publish it again it will get a different ID. This will need three additional tasks for the translation part.
Translate names to ID's






Another extension that we will show in yet an other blog post is how to refresh only certain tables within that dataset. This will cause some changes to the Body of the refresh call, but also an additional Tenant setting change and... Power BI Premium.

thx to Simon Zeinstra for suggesting the simpler version en helping out with the resource url

Monday 21 March 2022

Refresh Power BI Datasets with ADF or Synapse (SP)

Case
I want to refresh my Power BI Dataset as soon as my ETL process has finished. I don't want to schedule my dataset refresh x minutes/hours after my ADF(/Synapse) pipeline schedule because then I'm either refreshing too soon or waiting unnecessarily long to start the refresh. How can I minimize the time between the end of my ETL proces and the start of my dataset refresh?
Refresh Power BI Dataset












Solution
You can use the Rest APIs from Power BI to refresh a dataset from within your ADF(/Synapse) pipeline. However for this solution you (the DWH / BI developer) will probably need some help from a Domain Administrator and/or an Office 365 Administrator, because:
  • You need an Azure Active Directory Service Principal (SP) and know its name, id and secret
  • This SP needs to be in an Azure Active Directory Group that will used within the Tenant Settings of Power BI.

For this solution we will store the secret (and optionally the id) of the SP in an Azure Key Vault. Then retrieve it within the ADF(/Synapse) pipeline and use it to retrieve a token from the Active Direcory. The last step is to use that token to login to Power BI and refreh the dataset.
Use secret to create token and login to Power BI


















The basic version of the pipeline looks like this. The more complex version could for example retrieve the required workspace id and dataset id by its name via additional Rest API calls to Power BI. More on that in the conclusions.
Basic version of pipeline








Now below all the steps to create this pipeline that can refresh your Power BI datasets. There are a lot of steps, but don't let that scare you. It seems way more difficult then it really is.

1) Allow service principal to use Power BI APIs
For this first step you need access to the Power BI admin portal to give the Service Principal access to the Rest APIs from Power BI. 
  • Log in to Power BI online and click on the gear icon in the upper right corner
  • Click on Admin portal in the menu to open the portal
  • Then click on Tenant settings (if you don't see this, continue after the next image)
  • Scroll down to the Developer settings and locate 'Allow service principal to use Power BI APIs'
Either allow the entire organization to use the Rest APIs (not recommended) or specify an Azure Active Directory Group and make sure your Service Principal is a member of that group (this is where you probably need some help from the Domain Administrator. 
PBI Admin portal - Tenant Settings















If you couldn't find the tenant settings in the Admin Portal from Power BI then you need an Power BI administrator role. Your Office 365 administrator can either give you the role or do the steps above for you. If you are able get this role then you should probably use Azure AD Privileged Identity Management (PIM) to first get approval from a colleage to temporarily get this role and also use the PBI audit log to record all logins to the admin portal and record all changes in the tenant settings. 
Not a PBI admin













2) Give permissions in Power BI Workspace
Next step is to give your Service Principal contributor rights in the PBI Workspace that contains the dataset that you want to refresh.
  • Go to your Power BI workspace and click on Access in the upper right corner
  • Search for your Service Principal by its name (ours is called PowerBI)
  • Select the right permission (contributor) and click on the Add button
  • Verify the changed list of permissions and then click on the Close button
Give SP permissions in your workspace

















3) Add secret to Azure Key Vault
To create the AAD Token in the next step we need:
  • the ID (a guid) of the SP
  • the secret of the SP
  • the AAD tenant ID
You could store them in parameters or hardcode them in your pipeline, but we will store them as secrets in an Azure Key Vault. If you already used an Azure Key Vault in ADF(/Synapse) before then you know you have to give the Managed Service Identity of your ADF(/Synapse) 'Get' and 'List' access to read the secrets (see step 3 in this blog post).

The ID and secret of the SP are provided by the person who created the SP in the Azure Active Directory. The tenant ID can also be found in AAD, but you can also retrieve it from the about box in Power BI online. Store these three items as a secret in the Key Vault.
The guid in the Tenant URL is the Tenant ID













To retrieve the AAD token in the next step we will use a Web Activity. Unfortunately the Web activity doesn't work with the Key Vault Linked Service. Therefor we need to use Web Activities to retrieve all secrets from the Key Vault. In a previous blog post we showed you how to do this in detail. Below a summary of all those steps (which you must be repeated for all three secrets).
  • Add a Web Activity and give it useful name, because it will be used in the next activity.
  • Use the Secure output checkbox to prevent secrets showing up in the log
  • Set the URL to https://{Name Keyvault}.vault.azure.net/secrets/{Name Secret}?api-version=7.0 and replace the values between {} with the real value. Tip: create a Global Parameter for the Key Vault name and use that to build the URL with an expression like this:
    @concat(
        'https://'
        , pipeline().parameters.KeyVaultName
        , '.vault.azure.net/secrets/sp_secret/?api-version=7.1'
        )
  • Set Method to Get
  • Under Advanced select Managed Identity as Authentication
  • And set Resource to https://vault.azure.net
Web Activity to retrieve secrets from the Key Vault



















4) Retrieve AAD token
Next step is to use yet another Web Activity to retrieve an AAD token by providing the output from the previous three Web Activities.
  • Add a Web Activity and give it useful name, because its name will be used in the next activity.
  • Use the Secure input and output checkbox to prevent secrets or tokens showing up in the log
  • The URL uses the Tenant ID from the previous step. Use an expression like this to build the URL (check the activity name):
    @concat(
        'https://login.microsoftonline.com/'
        , activity('Get TenantId from AKV').output.value
        , '/oauth2/v2.0/token'
        )
  • Set Method to Post
  • Add a header with the name 'Content-Type' and the value 'application/x-www-form-urlencoded'
  • The body is also an expression that uses the secrets of the previous step (check the names):
    @concat(
        'client_id='
        , activity('Get ClientId from AKV').output.value
        , '&scope=https://analysis.windows.net/powerbi/api/.default'
        , '&client_secret='
        , activity('Get Secret from AKV').output.value
        , &grant_type=client_credentials'
        )
Get AAD Token

















5) Refresh Dataset
For refreshing a dataset we need the Token from the previous step, but also the ID of both the workspace and the dataset. We will store these IDs in two Pipeline parameter so that you can use the pipeline to refresh different datasets. In a later post we will show how to use the names instead of the IDs.
Pipeline parameters for Workspace ID and Dataset ID
















The refresh command is again an other Web activity that executes the Refresh Dataset In Group Rest API from Power BI. The refresh call is asynchronous, so it does not wait for the refresh to finish.
  • Add a Web Activity and give it useful name.
  • Use the Secure input checkbox to prevent tokens showing up in the log
  • The URL uses the two IDs stored in the two pipeline parameters above. Use an expression like this to build the URL (check the parameters names):
    @concat(
        'https://api.powerbi.com/v1.0/myorg/groups/'
        , pipeline().parameters.WorkspaceGuid
        , '/datasets/'
        , pipeline().parameters.DatasetGuid
        , '/refreshes'
        )
  • Set Method to Post
  • Add a header with the name 'Authorization' and the value is an expression with the Token of the previous step:
    @concat(
        string(activity('Get AAD Token').output.token_type)
        , ' '
        , string(activity('Get AAD Token').output.access_token)
        )
  • The Body can be used to set certain options like no notifications. See the documentation for more options. Enter the following JSON in the body:
    {"notifyOption":"NoNotification"}
Refresh a Power BI dataset via Rest API


















6) Check refresh status
The last steps are optionally, but if you want to be sure your dataset has been successfully refreshed we need to check the dataset refresh status in a loop. For this task the UNTIL loop is the best option. Within the loop we have a Web Activity to check the status and an Wait activity (within an If condition) to wait a couple of minutes/seconds because you don't want to check the status each second.
The check status construction














A slightly simpler option would be to just put a Wait Activity before the Web Activity instead of the If construction after the Web Activity.

Note: If you expect that the refresh takes more than an hour then you have some additional work to do, because the lifetime of the token is 60 to 90 minutes. In that case you need to copy the Get AAD Token to the loop and perhaps add an extra If Condition to only get a new token if it is about to expire.

Now all the individual steps and their screentshot:
  • Add an Until and give it a suitable name
  • The Expression continues the loop until the status is not equals 'Unknown' :
    @not(
        equals(
            first(
                json(
                    string(
                        activity('Get dataset refresh status').output
                    )
                ).value
            ).status,'Unknown'
        )
    )
    Note the name of the activity that will be inside the Until loop (if it is not yet there, it will give an error)
  • Consider changing the default 7 day timeout to the expected max refresh duration with an extra couple of minutes to be sure
Until loop











  • Add a Web Activity within the Until loop and name it 'Get dataset refresh status'. If you change that name, then you have to change it in all the expressions from steps 6 en 7.
  • Use the Secure input checkbox to prevent tokens showing up in the log
  • The URL is simular to the Refresh dataset, but it uses a different action 'refreshes' and a top 1 to only get the latest status:
    @concat(
        'https://api.powerbi.com/v1.0/myorg/groups/'
        ,pipeline().parameters.WorkspaceGuid
        ,'/datasets/',pipeline().parameters.DatasetGuid
        ,'/refreshes?$top=1'
        )
  • Set Method to Get
  • Add a first header with the name 'Authorization' and the value is an expression with the Token of the previous step:
    @concat(
        string(activity('Get AAD Token').output.token_type)
        , ' '
        , string(activity('Get AAD Token').output.access_token)
        )
  • Add a second header with the name 'Content-Type' and the value 'application/json'
Web Activity to get refresh status























  • Add an If Condition after the Web activity
  • The expression checks whether the Web Activity still returns 'Unknown' as status (same expression as the Until loop):
    @equals(
        first(
            json(
                string(
                    activity('Get dataset refresh status').output
                )
            ).value
        ).status,'Unknown'
    )


















  • Last step is to add a Wait Activity in the True condition.
  • Choose a smart number of seconds. Too low and you will get a lot of unnecessary checks. Too high and you will get an unnecessary long turnaround time.
Wait X seconds











7) Fail pipeline if refresh failed
If the dataset refresh in Power BI fails then the Web Activity will show you the status, but it won't fail the pipeline. The Until loop ends when the status changes. So by adding an additional If Condition that checks whether the refresh status is failed, allows you to use the Fail activity to fail the pipeline.
  • Add a If Condition after the Until loop
  • The expression is simulair to the If Condition within the loop. Just a different status:
    @equals(
        first(
            json(
                string(
                    activity('Get dataset refresh status').output
                )
            ).value
        ).status,'Failed'
    )

  • Within the True condition add the Fail activity
  • You can either enter a default fail message or try to get the error from the Web Activity with an expression like this:
    @string(
        json(
            first(
                json(
                    string(
                        activity('Get dataset refresh status').output
                    )
                ).value
            ).serviceExceptionJson
        )
    )
  • Last task is to pick an error code. This will show up in the logs.
Fail pipeline if refresh fails


















Conclusion
In this long post you learned how to refresh a Power BI dataset from within your ETL pipelines. Hopefully you weren't scared of due the number of steps. In a next blog post we will show you how to use the workspace name and data set name instead of their IDs because that is much more user friendly and even a little safer because when you delete a dataset and publish it again it will get a different ID. This will need three additional tasks for the translation part.
Translate names to ID's






Another extension that we will show in yet an other blog post is how to refresh only certain tables within that dataset. This will cause some changes to the Body of the refresh call, but also an additional Tenant setting change and... Power BI Premium.

thx to colleague Sam Hughes for helping. 

Update: we have also posted a version without using a Service Principal. Instead it uses the MSI of ADF or Synapse.





Monday 14 March 2022

ADF Snack - Script Activity

Case
I need to execute a couple of queries in my pipeline, but I don't want to create a Stored Procedure for each query and then use the Stored Procedure Activity in my pipeline. What is the best solution to execute a single query in my Pipeline?
Script Activity for SQL queries













Solution
Untill recently your only option was to use the Stored Procedure activity to execute sp_executesql with your query in the parameter "stmt" of this system stored procedure. Not that bad for one simple query, but annoying for more complex queries or a whole bunch of queries. Especially when you want dynamic queries with values from your pipeline.
Stored Procedure Activity to execute single query












Luckily Microsoft just introduced the Script Activity for both ADF and Synapse pipelines. This allows you to execute SQL scripts for SQL Server, Oracle and Snowflake databases.
Allowed Linked Services for Script Activity






















1) Script type
After selecting the Linked Service you need to specify the Script type: Query or NonQuery. The first option is a SELECT query that returns one or more records. The second option is for your other query types like INSERT, UPDATE or DELETE. 
Script Activity to execute a single query

















This means you could also use this new Script Activity with a SELECT query to replace the Lookup Activity if your lookup source is a database. The Items option in the ForEach Activity is a slightly different: @activity('MyTables').output.resultSets[0].Rows
output.resultSets[0].Rows instead of output.Value





















2) Query
For the actual query you can enter a query in the textbox, but you could also use the familiar Dynamic Content option. However the Script Activity has an other option called parameters. For SQL Server you need to use @myParameterName in your query. For Oracle and Snowflake you need to use ? instead.
Using parameters in your query
















You can even add multiple queries separated by ; in that textbox, but you could also add multiple query textboxes which each their own parameters. You can add additional textboxes by pressing the + button above the first textbox. This creates something similar as a notebook with multiple scripts.
Multiple queries
























3) Logging
Last option is to add extra logging. Either to 'External storage' (Azure Storage Account) or to the 'Activity output'. To log additional information you can use the PRINT statement in your queries.
Add PRINT to your queries























When using "Activity output" you can check the output to see the result of your PRINT statements.
Check the property outputLogs













Conclusion
In this post we explained how you could use the new 'Script' activity to execute queries in your pipeline. This is a great alternative for executing queries via the Stored Procedure Activity or even the Lookup Activity. 

The screenshots in the documentation also show an extra feature (which is not yet available) where you can execute queries that are stored in an Azure Storage Account file. So expect more features for this new Script Activity.


Sunday 20 February 2022

ADF Snack - Give ADF access to SQL DB via MSI

Case
I'm using Azure Data Factory to populate tables in an Azure SQL Database. I want to avoid using SQL Server authentication/Basic authentication. What is the best alternative?
Using the Managed Service Identie of ADF











Solution
Recently we showed how to give ADF access to a Data Lake via its Managed Service Identity. You can also use that to give ADF access within your database. This will save you the hassle of using accounts and passwords within ADF (or Azure Key Vault).

 
1) Enable Azure Active Directory authentication
First make sure you are using Azure AD authentication for your Azure SQL Server by adding an Azure Active Directory admin within your SQL Server. Without an AAD admin you cannot add AAD users (or Data Factories) to your database.

Go to your Azure SQL Server (not the database) and under Azure Active Directory in the left menu you can check and change this option. To try it out you just could add your own AAD account as AAD admin.
Enable Azure Active Directory authentication












2) Create user in SQL DB
Now use your favourite query tool and login to your database with an AAD account that has the db_owner role. Execute the following create-user-command on your database. The user is the name of your Azure Data Factory. 
-- Add your Azure Data Factory as external to SQL DB
CREATE USER [YourDataFactoryName] FROM EXTERNAL PROVIDER;

-- Check the permissions of your Azure Data Factory
SELECT	DISTINCT pri.principal_id
,		pri.name
,		pri.type_desc
,		pri.authentication_type_desc
,		per.state_desc
,		per.permission_name
FROM	sys.database_principals AS pri
JOIN	sys.database_permissions AS per
		ON per.grantee_principal_id = pri.principal_id
WHERE	pri.name = 'YourDataFactoryName'

-- Remove all permission of your Azure Data Factory
DROP USER [YourDataFactoryName]
This command gives your ADF connect permissions on your database. This is enough to test the connection of a Linked Service in ADF, but is not enough to actually do something with the data.
Add ADF as extermal user













3) Give data permissions
Next step is to give your ADF a database role so that it is able to read or write data depending on your needs. This can be done with the stored procedure sp_addrolemember or via the ALTER ROLE command
-- Two options to add a DB role to your ADF

-- Add db_datareader role to your ADF
EXEC sp_addrolemember N'db_datareader', N'YourDataFactoryName'

-- Add db_datawriter role to your ADF
ALTER ROLE [db_datawriter] ADD MEMBER [YourDataFactoryName]

-- Check roles of your ADF
SELECT				DP1.name AS DatabaseRoleName
,					isnull (DP2.name, 'No members') AS DatabaseUserName   
FROM				sys.database_role_members AS DRM  
RIGHT OUTER JOIN	sys.database_principals AS DP1  
					ON DRM.role_principal_id = DP1.principal_id  
LEFT OUTER JOIN		sys.database_principals AS DP2  
					ON DRM.member_principal_id = DP2.principal_id  
WHERE				DP1.type = 'R'
AND					DP2.name = 'YourDataFactoryName'

-- Two options to remove the role from your ADF
-- EXEC sp_droprolemember N'db_datareader', N'YourDataFactoryName'
-- ALTER ROLE [db_datawriter] DROP MEMBER [YourDataFactoryName]
Give ADF permission to read and/or write data
















4) Create Linked Service via MSI
Now create a new Linked Service in ADF and use Managed Identity as the Authenication type. When you do this you don't have to fill in a password. Hit the Test connection button to see if it works.
Linked Service to your database via MSI















If it's failing check the firewall/VNet/Subnet settings or check whether you executed the above queries on the correct database.
22300 - Cannot connect to SQL Database:
'bitools2.database.windows.net', Database:
'bitools2', User: ''. Check the linked
service configuration is correct, and
make sure the SQL Database firewall
allows the integration runtime to
access. A severe error occurred on the
current command. The results, if any,
should be discarded., SqlErrorNumber=0,
Class=11,State=0, Activity ID: abcdfe.



























Conslusion
In this blogpost you learned how easy it is to give ADF persmissions within your database without using accounts and passwords. Note that those permissions are totally different than when you want to scale your database. The same method with the Managed Service Identity can be used to give ADF access to your Azure Storage Account or Azure Key Vault.