Monday 21 March 2022

Refresh Power BI Datasets with ADF or Synapse (SP)

Case
I want to refresh my Power BI Dataset as soon as my ETL process has finished. I don't want to schedule my dataset refresh x minutes/hours after my ADF(/Synapse) pipeline schedule because then I'm either refreshing too soon or waiting unnecessarily long to start the refresh. How can I minimize the time between the end of my ETL proces and the start of my dataset refresh?
Refresh Power BI Dataset












Solution
You can use the Rest APIs from Power BI to refresh a dataset from within your ADF(/Synapse) pipeline. However for this solution you (the DWH / BI developer) will probably need some help from a Domain Administrator and/or an Office 365 Administrator, because:
  • You need an Azure Active Directory Service Principal (SP) and know its name, id and secret
  • This SP needs to be in an Azure Active Directory Group that will used within the Tenant Settings of Power BI.

For this solution we will store the secret (and optionally the id) of the SP in an Azure Key Vault. Then retrieve it within the ADF(/Synapse) pipeline and use it to retrieve a token from the Active Direcory. The last step is to use that token to login to Power BI and refreh the dataset.
Use secret to create token and login to Power BI


















The basic version of the pipeline looks like this. The more complex version could for example retrieve the required workspace id and dataset id by its name via additional Rest API calls to Power BI. More on that in the conclusions.
Basic version of pipeline








Now below all the steps to create this pipeline that can refresh your Power BI datasets. There are a lot of steps, but don't let that scare you. It seems way more difficult then it really is.

1) Allow service principal to use Power BI APIs
For this first step you need access to the Power BI admin portal to give the Service Principal access to the Rest APIs from Power BI. 
  • Log in to Power BI online and click on the gear icon in the upper right corner
  • Click on Admin portal in the menu to open the portal
  • Then click on Tenant settings (if you don't see this, continue after the next image)
  • Scroll down to the Developer settings and locate 'Allow service principal to use Power BI APIs'
Either allow the entire organization to use the Rest APIs (not recommended) or specify an Azure Active Directory Group and make sure your Service Principal is a member of that group (this is where you probably need some help from the Domain Administrator. 
PBI Admin portal - Tenant Settings















If you couldn't find the tenant settings in the Admin Portal from Power BI then you need an Power BI administrator role. Your Office 365 administrator can either give you the role or do the steps above for you. If you are able get this role then you should probably use Azure AD Privileged Identity Management (PIM) to first get approval from a colleage to temporarily get this role and also use the PBI audit log to record all logins to the admin portal and record all changes in the tenant settings. 
Not a PBI admin













2) Give permissions in Power BI Workspace
Next step is to give your Service Principal contributor rights in the PBI Workspace that contains the dataset that you want to refresh.
  • Go to your Power BI workspace and click on Access in the upper right corner
  • Search for your Service Principal by its name (ours is called PowerBI)
  • Select the right permission (contributor) and click on the Add button
  • Verify the changed list of permissions and then click on the Close button
Give SP permissions in your workspace

















3) Add secret to Azure Key Vault
To create the AAD Token in the next step we need:
  • the ID (a guid) of the SP
  • the secret of the SP
  • the AAD tenant ID
You could store them in parameters or hardcode them in your pipeline, but we will store them as secrets in an Azure Key Vault. If you already used an Azure Key Vault in ADF(/Synapse) before then you know you have to give the Managed Service Identity of your ADF(/Synapse) 'Get' and 'List' access to read the secrets (see step 3 in this blog post).

The ID and secret of the SP are provided by the person who created the SP in the Azure Active Directory. The tenant ID can also be found in AAD, but you can also retrieve it from the about box in Power BI online. Store these three items as a secret in the Key Vault.
The guid in the Tenant URL is the Tenant ID













To retrieve the AAD token in the next step we will use a Web Activity. Unfortunately the Web activity doesn't work with the Key Vault Linked Service. Therefor we need to use Web Activities to retrieve all secrets from the Key Vault. In a previous blog post we showed you how to do this in detail. Below a summary of all those steps (which you must be repeated for all three secrets).
  • Add a Web Activity and give it useful name, because it will be used in the next activity.
  • Use the Secure output checkbox to prevent secrets showing up in the log
  • Set the URL to https://{Name Keyvault}.vault.azure.net/secrets/{Name Secret}?api-version=7.0 and replace the values between {} with the real value. Tip: create a Global Parameter for the Key Vault name and use that to build the URL with an expression like this:
    @concat(
        'https://'
        , pipeline().parameters.KeyVaultName
        , '.vault.azure.net/secrets/sp_secret/?api-version=7.1'
        )
  • Set Method to Get
  • Under Advanced select Managed Identity as Authentication
  • And set Resource to https://vault.azure.net
Web Activity to retrieve secrets from the Key Vault



















4) Retrieve AAD token
Next step is to use yet another Web Activity to retrieve an AAD token by providing the output from the previous three Web Activities.
  • Add a Web Activity and give it useful name, because its name will be used in the next activity.
  • Use the Secure input and output checkbox to prevent secrets or tokens showing up in the log
  • The URL uses the Tenant ID from the previous step. Use an expression like this to build the URL (check the activity name):
    @concat(
        'https://login.microsoftonline.com/'
        , activity('Get TenantId from AKV').output.value
        , '/oauth2/v2.0/token'
        )
  • Set Method to Post
  • Add a header with the name 'Content-Type' and the value 'application/x-www-form-urlencoded'
  • The body is also an expression that uses the secrets of the previous step (check the names):
    @concat(
        'client_id='
        , activity('Get ClientId from AKV').output.value
        , '&scope=https://analysis.windows.net/powerbi/api/.default'
        , '&client_secret='
        , activity('Get Secret from AKV').output.value
        , &grant_type=client_credentials'
        )
Get AAD Token

















5) Refresh Dataset
For refreshing a dataset we need the Token from the previous step, but also the ID of both the workspace and the dataset. We will store these IDs in two Pipeline parameter so that you can use the pipeline to refresh different datasets. In a later post we will show how to use the names instead of the IDs.
Pipeline parameters for Workspace ID and Dataset ID
















The refresh command is again an other Web activity that executes the Refresh Dataset In Group Rest API from Power BI. The refresh call is asynchronous, so it does not wait for the refresh to finish.
  • Add a Web Activity and give it useful name.
  • Use the Secure input checkbox to prevent tokens showing up in the log
  • The URL uses the two IDs stored in the two pipeline parameters above. Use an expression like this to build the URL (check the parameters names):
    @concat(
        'https://api.powerbi.com/v1.0/myorg/groups/'
        , pipeline().parameters.WorkspaceGuid
        , '/datasets/'
        , pipeline().parameters.DatasetGuid
        , '/refreshes'
        )
  • Set Method to Post
  • Add a header with the name 'Authorization' and the value is an expression with the Token of the previous step:
    @concat(
        string(activity('Get AAD Token').output.token_type)
        , ' '
        , string(activity('Get AAD Token').output.access_token)
        )
  • The Body can be used to set certain options like no notifications. See the documentation for more options. Enter the following JSON in the body:
    {"notifyOption":"NoNotification"}
Refresh a Power BI dataset via Rest API


















6) Check refresh status
The last steps are optionally, but if you want to be sure your dataset has been successfully refreshed we need to check the dataset refresh status in a loop. For this task the UNTIL loop is the best option. Within the loop we have a Web Activity to check the status and an Wait activity (within an If condition) to wait a couple of minutes/seconds because you don't want to check the status each second.
The check status construction














A slightly simpler option would be to just put a Wait Activity before the Web Activity instead of the If construction after the Web Activity.

Note: If you expect that the refresh takes more than an hour then you have some additional work to do, because the lifetime of the token is 60 to 90 minutes. In that case you need to copy the Get AAD Token to the loop and perhaps add an extra If Condition to only get a new token if it is about to expire.

Now all the individual steps and their screentshot:
  • Add an Until and give it a suitable name
  • The Expression continues the loop until the status is not equals 'Unknown' :
    @not(
        equals(
            first(
                json(
                    string(
                        activity('Get dataset refresh status').output
                    )
                ).value
            ).status,'Unknown'
        )
    )
    Note the name of the activity that will be inside the Until loop (if it is not yet there, it will give an error)
  • Consider changing the default 7 day timeout to the expected max refresh duration with an extra couple of minutes to be sure
Until loop











  • Add a Web Activity within the Until loop and name it 'Get dataset refresh status'. If you change that name, then you have to change it in all the expressions from steps 6 en 7.
  • Use the Secure input checkbox to prevent tokens showing up in the log
  • The URL is simular to the Refresh dataset, but it uses a different action 'refreshes' and a top 1 to only get the latest status:
    @concat(
        'https://api.powerbi.com/v1.0/myorg/groups/'
        ,pipeline().parameters.WorkspaceGuid
        ,'/datasets/',pipeline().parameters.DatasetGuid
        ,'/refreshes?$top=1'
        )
  • Set Method to Get
  • Add a first header with the name 'Authorization' and the value is an expression with the Token of the previous step:
    @concat(
        string(activity('Get AAD Token').output.token_type)
        , ' '
        , string(activity('Get AAD Token').output.access_token)
        )
  • Add a second header with the name 'Content-Type' and the value 'application/json'
Web Activity to get refresh status























  • Add an If Condition after the Web activity
  • The expression checks whether the Web Activity still returns 'Unknown' as status (same expression as the Until loop):
    @equals(
        first(
            json(
                string(
                    activity('Get dataset refresh status').output
                )
            ).value
        ).status,'Unknown'
    )


















  • Last step is to add a Wait Activity in the True condition.
  • Choose a smart number of seconds. Too low and you will get a lot of unnecessary checks. Too high and you will get an unnecessary long turnaround time.
Wait X seconds











7) Fail pipeline if refresh failed
If the dataset refresh in Power BI fails then the Web Activity will show you the status, but it won't fail the pipeline. The Until loop ends when the status changes. So by adding an additional If Condition that checks whether the refresh status is failed, allows you to use the Fail activity to fail the pipeline.
  • Add a If Condition after the Until loop
  • The expression is simulair to the If Condition within the loop. Just a different status:
    @equals(
        first(
            json(
                string(
                    activity('Get dataset refresh status').output
                )
            ).value
        ).status,'Failed'
    )

  • Within the True condition add the Fail activity
  • You can either enter a default fail message or try to get the error from the Web Activity with an expression like this:
    @string(
        json(
            first(
                json(
                    string(
                        activity('Get dataset refresh status').output
                    )
                ).value
            ).serviceExceptionJson
        )
    )
  • Last task is to pick an error code. This will show up in the logs.
Fail pipeline if refresh fails


















Conclusion
In this long post you learned how to refresh a Power BI dataset from within your ETL pipelines. Hopefully you weren't scared of due the number of steps. In a next blog post we will show you how to use the workspace name and data set name instead of their IDs because that is much more user friendly and even a little safer because when you delete a dataset and publish it again it will get a different ID. This will need three additional tasks for the translation part.
Translate names to ID's






Another extension that we will show in yet an other blog post is how to refresh only certain tables within that dataset. This will cause some changes to the Body of the refresh call, but also an additional Tenant setting change and... Power BI Premium.

thx to colleague Sam Hughes for helping. 

Update: we have also posted a version without using a Service Principal. Instead it uses the MSI of ADF or Synapse.





Monday 14 March 2022

ADF Snack - Script Activity

Case
I need to execute a couple of queries in my pipeline, but I don't want to create a Stored Procedure for each query and then use the Stored Procedure Activity in my pipeline. What is the best solution to execute a single query in my Pipeline?
Script Activity for SQL queries













Solution
Untill recently your only option was to use the Stored Procedure activity to execute sp_executesql with your query in the parameter "stmt" of this system stored procedure. Not that bad for one simple query, but annoying for more complex queries or a whole bunch of queries. Especially when you want dynamic queries with values from your pipeline.
Stored Procedure Activity to execute single query












Luckily Microsoft just introduced the Script Activity for both ADF and Synapse pipelines. This allows you to execute SQL scripts for SQL Server, Oracle and Snowflake databases.
Allowed Linked Services for Script Activity






















1) Script type
After selecting the Linked Service you need to specify the Script type: Query or NonQuery. The first option is a SELECT query that returns one or more records. The second option is for your other query types like INSERT, UPDATE or DELETE. 
Script Activity to execute a single query

















This means you could also use this new Script Activity with a SELECT query to replace the Lookup Activity if your lookup source is a database. The Items option in the ForEach Activity is a slightly different: @activity('MyTables').output.resultSets[0].Rows
output.resultSets[0].Rows instead of output.Value





















2) Query
For the actual query you can enter a query in the textbox, but you could also use the familiar Dynamic Content option. However the Script Activity has an other option called parameters. For SQL Server you need to use @myParameterName in your query. For Oracle and Snowflake you need to use ? instead.
Using parameters in your query
















You can even add multiple queries separated by ; in that textbox, but you could also add multiple query textboxes which each their own parameters. You can add additional textboxes by pressing the + button above the first textbox. This creates something similar as a notebook with multiple scripts.
Multiple queries
























3) Logging
Last option is to add extra logging. Either to 'External storage' (Azure Storage Account) or to the 'Activity output'. To log additional information you can use the PRINT statement in your queries.
Add PRINT to your queries























When using "Activity output" you can check the output to see the result of your PRINT statements.
Check the property outputLogs













Conclusion
In this post we explained how you could use the new 'Script' activity to execute queries in your pipeline. This is a great alternative for executing queries via the Stored Procedure Activity or even the Lookup Activity. 

The screenshots in the documentation also show an extra feature (which is not yet available) where you can execute queries that are stored in an Azure Storage Account file. So expect more features for this new Script Activity.