Showing posts with label RESTAPI. Show all posts
Showing posts with label RESTAPI. Show all posts

Tuesday, 13 September 2022

Sending test messages to Azure Event Hubs

Case
I want to send dummy messages to Azure Event Hubs to test the streaming process before the actual service starts sending real messages.  Is there an easy way to send a bulk load of test messages to Azure Event Hubs?
Sending messages to Azure Event Hubs











Solution
Yes we could use for example a little bit of PowerShell code to send dummy messages to Azure Event Hubs via a Rest API. To do that we first need to collect some names and a key from Azure Event Hubs.

1)  Namespace and Event Hub name
Go to your Event Hubs Namespace in the Azure Portal and click on Event Hubs on the left side. In the top left corner you will find the Event Hubs Namespace (1) and in the list in the center of the page you will find the name of your Event Hub (2). Copy these names to your Powershell editor.
Namespace and Event Hub name


















2) Shared access policies Name and Key
Now click on your Event Hub in the list above and then click on Shared access policies in the left menu. If there is no policy then create one (send is enough for testing). Then click on the policy to reveal the keys. Copy the Name (3) and one of the keys (4) to your Powershell editor.
Shared access policies name and key








3) The script
Now you have all the things you need from your event hub. Time to do some PowerShell coding. The top part of the script is to create a Shared Access Signature token (SAS token). This token is needed for authorization in the Rest API. In this part of the script you will also need to specify the names and key from the previous two steps under EventHubs Parameters.

The second part is sending a messsage via a Rest API to your event hub. To make it a little more usefull there is a loop to send multiple messages with a pause between each message. You must adjust the dummy message to your own needs by changing the column names and values. You can also specify the number of messages and the pause between each message.
####################################################################
# Create SAS TOKEN FOR AZURE EVENT HUBS
####################################################################
# EventHubs Parameters
$EventHubsNamespace = "bitools"
$EventHubsName = "myeventhub"
$SharedAccessPolicyName = "SendOnly"
$SharedAccessPolicyPrimaryKey = "1fhvzfOkVs+MxsZ/fakeZwrHTImD3YCCN7CGqYCAFN8kU="

# Create SAS Token
[Reflection.Assembly]::LoadWithPartialName("System.Web")| out-null
$URI = "$($EventHubsNamespace).servicebus.windows.net/$($EventHubsName)"
$Expires = ([DateTimeOffset]::Now.ToUnixTimeSeconds())+3600
$SignatureString = [System.Web.HttpUtility]::UrlEncode($URI)+ "`n" + [string]$Expires
$HMACSHA256 = New-Object System.Security.Cryptography.HMACSHA256
$HMACSHA256.key = [Text.Encoding]::ASCII.GetBytes($SharedAccessPolicyPrimaryKey)
$SignatureBytes = $HMACSHA256.ComputeHash([Text.Encoding]::ASCII.GetBytes($SignatureString))
$SignatureBase64 = [Convert]::ToBase64String($SignatureBytes)
$SASToken = "SharedAccessSignature sr=" + [System.Web.HttpUtility]::UrlEncode($URI) + "&sig=" + [System.Web.HttpUtility]::UrlEncode($SignatureBase64) + "&se=" + $Expires + "&skn=" + $SharedAccessPolicyName


####################################################################
# SEND DUMMY MESSAGES
####################################################################
# Message Parameters
$StartNumber = 1
$NumberOfMessages = 10
$MillisecondsToWait = 1000

# Determine URL and header
$RestAPI = "https://$($EventHubsNamespace).servicebus.windows.net/$($EventHubsName)/messages"

# API headers
$Headers = @{
            "Authorization"=$SASToken;
            "Content-Type"="application/atom+xml;type=entry;charset=utf-8";
            }

# Screenfeedback
Write-Host "Sending $($NumberOfMessages) messages to event hub [$($EventHubsName)] within [$($EventHubsNamespace)]"

# Loop to create X number of dummy messages
for($i = $StartNumber; $i -lt $NumberOfMessages+$StartNumber; $i++)
{
    # Create dummy message to sent to Azure Event Hubs
    $Body = "{'CallId':$($i), 'DurationInSeconds':$(Get-Random -Maximum 1000)}"

    # Screenfeedback
    Write-Host "Sending message nr $($i) and then waiting $($MillisecondsToWait) milliseconds"

    # execute the Azure REST API
    Invoke-RestMethod -Uri $RestAPI -Method "POST" -Headers $Headers -Body $Body

    # Wait a couple of milliseconds before sending next dummy message
    Start-Sleep -Milliseconds $MillisecondsToWait
}
When you run the PowerShell script with these parameters then 10 messages will be sent to your Event Hub.

Executing the Powershell script

















4) Check messages in the Event Hub
Now we can check the messages in your event hub. Go to your eventhub (myeventhub in our case) and click on Process data. Then find the Stream Analytics Query editor and execute the query.
Stream Analytics Query editor























Here can see the contents of your 10 dummy messages with a very basic query.
The result












Conclusion
In this post you learned how to test the setup of your Event Hub and if you for example also connect to Azure Stream Analytics and a Power BI streaming dataset with a report and dashboard then you can also see the messages arriving live in your Power BI Dashboard. This will be shown in a separate post.

Note that the script is sending the messages one by one with an even period between each message. You could for example also make the pause period random or even execute the script in multiple PowerShell ISE editors at once to simulate a more random load of arriving messages.

Do you have an easier way to send test messages or a more sophisticated script then please share your knowledge in the comments below.

Wednesday, 8 June 2022

Break or stop ForEach loop in ADF and Synapse

Case
I have a pipeline with a parallel Foreach loop to execute for example multiple Stored Procedures, but if one of those fails I want to stop the Foreach executing more Stored Procedures. Can I break the Foreach loop in case of an error without turning it from parallel into sequential? 
Can you stop a ForEach Loop? Nope!















Solution
You can't stop the foreach loop itself on error, but if an iteration fails then you can cancel the entire pipeline that is running the foreach loop. When the pipeline is running it gets a RUN ID and you can use that to call a Rest API to cancel the current pipeline run if one of the Stored Procedures inside the ForEach fails.
Cancel the entire pipeline














1) Give ADF/Synapse access to its own Rest APIs
To use the Rest APIs of ADF or Synapse within a pipeline you first need to give ADF or Synapse access to its own Rest APIs. This can be done in the Azure Portal on the overview page of the service.
  • Open the Azure Portal in your browser and go to the overview page of your Data Factory or Synapse Workspace.
  • In the left menu click on Access control (IAM)
  • Click on +Add and then choose Add role assignment
  • Select the role Contributor or for ADF Data Factory Contributor and click on Next
  • Under Assign access to select Manged identity
  • Under Members click on +Select members
  • Now you need to select Synapse workspace or Data Factory (V2) and search for your ADF or Synapse
  • Click on your Synapse or ADF and click on the Select button
  • Optionally enter a description: "Give ADF/Synapse access to its own Rest APIs"
  • Click on the Review + assign button (twice) 
The animated GIF is from a Synapse workspace, but it is identical to Data Factory. Just select Data Factory as Managed identity type.
Giving Synapse access to its own Rest APIs
















2) Add Web Activity in Foreach on fail
In the ForEach loop construction we need to add a Web Activity to call the Rest API that cancels the Pipeline run. Data Factory and Synapse have a different URL to cancel the pipeline. Within the example URL you need to replace the red parts with the curly brackets by either a hard code value, a parameter or a System Variable. For Data Factory and Synapse you can use the same System Variabes to retrieve the factoryName/workspaceName and the runId: pipeline().DataFactory / pipeline().RunId

Azure Data Factory:
https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.DataFactory/factories/{factoryName}/pipelineruns/{runId}/cancel?api-version=2018-06-01

@concat('https://management.azure.com/subscriptions/',
pipeline().parameters.subscriptionId
'/resourceGroups/',
pipeline().parameters.resourceGroup,
'/providers/Microsoft.DataFactory/factories/',
pipeline().DataFactory,
'/pipelineruns/',
pipeline().RunId,
'/cancel?api-version=2018-06-01')

Azure Synapse Workspace
{endpoint}/pipelineruns/{runId}/cancel?api-version=2020-12-01

or a little easier:
https://{workspaceName}.dev.azuresynapse.net/pipelineruns/{runId}/cancel?api-version=2020-12-01

@concat('https://',
pipeline().DataFactory,
'.dev.azuresynapse.net/pipelineruns/',
pipeline().RunId,
'/cancel?api-version=2020-12-01')

  • In the ForEach add a Web Activity and connect it to your existing activity
  • Make sure to change the type of the line from Success to Failure
  • Give the Web Activty a useful name and go to the Settings tab
  • For URL use and expression like above (don't forget to add parameters if you use them)
  • Set the Method to POST
  • Enter a fake/dummy JSON message in the body. We don't need it, but it is required
  • Set the Authentication to System Assigned Managed Identity
  • And last set the Resource to https://management.azure.com/
Web Activity to Cancel the pipeline run





















3) Testing
For this example we used 10 Stored Procedures of which one of them fails on purpose. First make sure to publish the pipeline and then trigger the pipeline (don't use Debug). Then check the monitor to see the result.
One Stored Procedure Failed, rest is cancelled




















Conclusion
In this post you learned how to stop a running pipeline when one of the activities within a foreach loop fails so that is doesn't unnecessary executes new activities. One of the downsides is that in the logs the status of your pipeline will be 'Cancelled' instead of 'Failed', but it saves you running (and paying for) unnecessary activities.

Another downside is that it does not work in debug mode because then you don't get a RUN ID and there is no pipeline run to cancel. The workaround for that is to put the Web Activity that cancels your pipeline in an IF construction that only cancels if the RUN ID is not 0.

The usefulness of this construction get better if you have a lot of iterations and hopefully it's not the last iteration that failed. Let us know in the comments what you think of a construction like this and if you have an alternative solutions.

Sunday, 8 May 2022

Refresh Power BI Datasets via its name instead of Id

Case
In our last post we refreshed a Power BI dataset from within an ADF or Synapse pipeline. For this we used the IDs of the PBI Workspace and Dataset, but it would be much user friendly to use the name instead of the ID.
Translate Name to Id






Solution
This is a follow up post. Make sure you first create a Pipeline that either uses a Service Principal (SP) or the Managed Identity (MSI) to authorize against Power BI. This example adds three aditional activities before the Web Activity called Call dataset refresh. If you used the Service Principal it is between Get AAD Token and Call dataset refresh


1) Change Parameters
In the previous post we used the parameters named WorkspaceId and DatasetId. Let's change those to WorkspaceName and DatasetName. The type is still a string for both parameters.
Change parameters from Id to Name








2) Get WorkspaceId
To translate the Workspace Name to a Workspace Id we will call a Rest API with a Web Activity. Make sure to use the same authorization as you used in the Call dataset refresh.
  • Add Web Activity to your existing pipeline
  • Give it the name Get WorkspaceId. It will be used later on.
  • Set the URL to the following expression
    @concat(
    'https://api.powerbi.com/v1.0/myorg/groups?$filter=name%20eq%20''',
    pipeline().parameters.WorkspaceName,'''')
    It uses the parameter from step 1 in the filter option from the Rest API. Note that the space is translated to %20
  • Set Method to GET
  • Now either add a authorization header for SP or set the Authenication for the MSI
This will return 1 workspace (if it exists).
Get Workspace Id via its Name













3) Get All Datasets
Next step is to translate the Dataset Name to and ID. This is also done via a Rest API, but unfortunately it doesn't support a filter. So we will get ALL datasets from the Workspace and then filter it afterwards. Again make sure to use the same authorization as you used in the Call dataset refresh.
  • Add Web Activity to your existing pipeline and connect it to Get WorkspaceId
  • Give it the name Get All Datasets. It will be used later on.
  • Set the URL to the following expression
    @concat('https://api.powerbi.com/v1.0/myorg/groups/',
    activity('Get WorkspaceId').output.value[0].id,
    '/datasets')
    It will use the parameter from step to get all datasets from a specific workspace.
  • Set Method to GET
  • Now either add a authorization header for SP or set the Authenication for the MSI
This will return all datasets from the workspace from step 2 (if you have access).
Get all datasets from a workspace













4) Filter DatasetName
The last new activity is a Filter activity which we will use to filter all the datasets from step 3 to only the one we need.
  • Add Filter Activity to your existing pipeline and connect it to Get All Datasets
  • Give it the name Filter DatasetName. It will be used later on.
  • Set Items expression to
    @activity('Get All Datasets').output.value
    This is the output from step 3
  • Set Condition expression to
    @equals(item().name,pipeline().parameters.DatasetName)
    This is the second parameter from step 1
Filter all datasets with a dataset name to get only 1








5) Change expressions
Now go to all the successive (Web) activities and change the expressions where you used a parameter. Now you need the get the WorkspaceId with:
activity('Get WorkspaceId').output.value[0].id
and the DatasetId with:
activity('Filter DatasetName').output.value[0].id


Conclusion
In this follow up post you learned how to translate the Workspace Name and Dataset Name to an ID which is required for the Rest API calls. With the three extra activities you just made your pipeline a little more friendlier to use, but also a little more monkey proof. Because if you delete a dataset and publish it again, it gets a different ID and then you also need to change your pipeline parameter. In a next follow up post we will refresh only certain tables within a Power BI dataset.

thx to colleague Hans Timmerman for helping.

Saturday, 30 April 2022

Refresh Power BI Datasets with ADF or Synapse (MSI)

Case
I want to refresh my Power BI Dataset as soon as my ETL process has finished. I don't want to schedule my dataset refresh x minutes/hours after my ADF(/Synapse) pipeline schedule because then I'm either refreshing too soon or waiting unnecessarily long to start the refresh. How can I minimize the time between the end of my ETL proces and the start of my dataset refresh?
Refresh Power BI Dataset












Solution
In the previous blog post we showed you how to use the Rest APIs from Power BI to refresh a dataset from within your ADF(/Synapse) pipeline. For that example we used an Azure Active Directory Service Principal (SP), but there is also an option to use the Managed (Service) Identity (MSI) from ADF(/Synapse). 

Using the MSI not only saves you creating a SP (and storing/recycling its secret), but the pipeline also uses less activities which makes it more readable. However you probably still need some help from a Domain Administrator and/or an Office 365 Administrator, because:
  • You need to create an Azure Active Directory Group and add the MSI of your ADF or Synapse to it.
  •  After that you need to use that AAD Group within the Tenant Settings of Power BI.

The basic version of the pipeline looks like this. The more complex version could for example retrieve the required workspace id and dataset id by its name via additional Rest API calls to Power BI. More on that in the conclusions.
Basic pipeline to refresh a PBI dataset












Now below all the steps to create this pipeline that can refresh your Power BI datasets. There are a lot of steps, but don't let that scare you. It seems way more difficult then it really is.

1) Finding the MSI of your ADF/Synapse workspace
For this example we will give the Managed (Service) Identity of your ADF or Synapse workspace access to Power BI, but how can you find this MSI? They can be found in the Azure Active Directory (AAD) as an EnterPrise Application. This means you can authorize it for services like Power BI just like a regular user from the AAD. The EnterPrise Application has a name (equals to the name of your ADF or Synapse workspace), an object id (a guid) and an application id (also a guid).
Find your ADF (or Synapse workspace) in the AAD



















The Name is the one you need and the Application ID is the one you can use to double check that you have the correct item from the AAD. ADF also shows the Application ID on its Properties page as Managed Identity Application ID. However Synapse only shows the Managed Identity object ID which is useless in this case.

2) Allow MSI to use Power BI APIs
For this first real step you need access to the Power BI admin portal to give the MSI of ADF(/Synapse) access to the Rest APIs from Power BI. 
  • Log in to Power BI online and click on the gear icon in the upper right corner
  • Click on Admin portal in the menu to open the portal
  • Then click on Tenant settings (if you don't see this, continue after the next image)
  • Scroll down to the Developer settings and locate 'Allow service principal to use Power BI APIs'
Either allow the entire organization to use the Rest APIs (not recommended) or specify an Azure Active Directory Group and make sure your MSI is a member of that group (this is where you probably need some help from the Domain Administrator. 
PBI Admin portal - Tenant Settings















If you couldn't find the tenant settings in the Admin Portal from Power BI then you need an Power BI administrator role. Your Office 365 administrator can either give you the role or do the steps above for you. If you are able get this role then you should probably consider using Azure AD Privileged Identity Management (PIM) to first get approval from a colleage to temporarily get this role and also use the PBI audit log to record all logins to the admin portal and record all changes in the tenant settings. 
Not a PBI admin












3) Give permissions in Power BI Workspace
Next step is to give your MSI contributor rights in the PBI Workspace that contains the dataset that you want to refresh.
  • Go to your Power BI workspace and click on Access in the upper right corner
  • Search for your Service Principal by its name (ours is called PowerBI)
  • Select the right permission (contributor) and click on the Add button
  • Verify the changed list of permissions and then click on the Close button
Give MSI permissions in your workspace
















4) Refresh Dataset
Now go to your pipeline in ADF (or Synapse). For refreshing a dataset we need both the workspace and the dataset ID from Power BI. We will store these IDs in two Pipeline parameter so that you can use the pipeline to refresh different datasets. In a later post we will show how to use the names instead of the IDs.
Pipeline parameters for Workspace ID and Dataset ID















The refresh command will be given via a Web activity that executes the Refresh Dataset In Group Rest API from Power BI. The refresh call is asynchronous, so it does not wait for the refresh to finish.
  • Add a Web Activity and give it useful name.
  • The URL uses the two IDs stored in the two pipeline parameters above. Use an expression like this to build the URL (check the parameters names):
    @concat(
        'https://api.powerbi.com/v1.0/myorg/groups/'
        , pipeline().parameters.WorkspaceGuid
        , '/datasets/'
        , pipeline().parameters.DatasetGuid
        , '/refreshes'
        )
  • Set Method to Post
  • The Body can be used to set certain options like no notifications. See the documentation for more options. Enter the following JSON in the body:
    {"notifyOption":"NoNotification"}
  • Set Authentication to System Assigned Managed Identity
  • Set Resource to https://analysis.windows.net/powerbi/api
Refresh a Power BI dataset via Rest API
























5) Check refresh status
The last steps are optionally, but if you want to be sure your dataset has been successfully refreshed we need to check the dataset refresh status in a loop. For this task the UNTIL loop is the best option. Within the loop we have a Web Activity to check the status and an Wait activity (within an If condition) to wait a couple of minutes/seconds because you don't want to check the status each second.
The check status construction











A slightly simpler option would be to just put a Wait Activity before the Web Activity instead of the If construction after the Web Activity.

Now all the individual steps and their screentshot:
  • Add an Until and give it a suitable name
  • The Expression continues the loop until the status is not equals 'Unknown' :
    @not(
        equals(
            first(
                json(
                    string(
                        activity('Get dataset refresh status').output
                    )
                ).value
            ).status,'Unknown'
        )
    )
    Note the name of the activity that will be inside the Until loop (if it is not yet there, it will give an error)
  • Consider changing the default 7 day timeout to the expected max refresh duration with an extra couple of minutes to be sure
Until loop















  • Add a Web Activity within the Until loop and name it 'Get dataset refresh status'. If you change that name, then you have to change it in all the expressions from the next steps.
  • The URL is simular to the Refresh dataset, but it uses a different action 'refreshes' and a top 1 to only get the latest status:
    @concat(
        'https://api.powerbi.com/v1.0/myorg/groups/'
        ,pipeline().parameters.WorkspaceGuid
        ,'/datasets/',pipeline().parameters.DatasetGuid
        ,'/refreshes?$top=1'
        )
  • Set Method to Get
  • Add a header with the name 'Content-Type' and the value 'application/json'
  • Set Authentication to System Assigned Managed Identity
  • Set Resource to https://analysis.windows.net/powerbi/api
Web Activity to get the refresh status
























  • Add an If Condition after the Web activity
  • The expression checks whether the Web Activity still returns 'Unknown' as status (same expression as the Until loop):
    @equals(
        first(
            json(
                string(
                    activity('Get dataset refresh status').output
                )
            ).value
        ).status,'Unknown'
    )
Check status value with IF


















  • Last step is to add a Wait Activity in the True condition.
  • Choose a smart number of seconds. Too low and you will get a lot of unnecessary checks. Too high and you will get an unnecessary long turnaround time.
Wait X seconds














6) Fail pipeline if refresh failed
If the dataset refresh in Power BI fails then the Web Activity will show you the status, but it won't fail the pipeline. The Until loop ends when the status changes. So by adding an additional If Condition that checks whether the refresh status is failed, allows you to use the Fail activity to fail the pipeline.
  • Add a If Condition after the Until loop
  • The expression is simulair to the If Condition within the loop. Just a different status:
    @equals(
        first(
            json(
                string(
                    activity('Get dataset refresh status').output
                )
            ).value
        ).status,'Failed'
    )

  • Within the True condition add the Fail activity
  • You can either enter a default fail message or try to get the error from the Web Activity with an expression like this:
    @string(
        json(
            first(
                json(
                    string(
                        activity('Get dataset refresh status').output
                    )
                ).value
            ).serviceExceptionJson
        )
    )
  • Last task is to pick an error code. This will show up in the logs.
Fail pipeline if refresh fails





















Conclusion
In this post you learned how to refresh a Power BI dataset from within your ETL pipelines. Hopefully you weren't scared of due all those steps. In a next blog post we will show you how to use the workspace name and data set name instead of their IDs because that is much more user friendly and even a little safer because when you delete a dataset and publish it again it will get a different ID. This will need three additional tasks for the translation part.
Translate names to ID's






Another extension that we will show in yet an other blog post is how to refresh only certain tables within that dataset. This will cause some changes to the Body of the refresh call, but also an additional Tenant setting change and... Power BI Premium.

thx to Simon Zeinstra for suggesting the simpler version en helping out with the resource url

Monday, 21 March 2022

Refresh Power BI Datasets with ADF or Synapse (SP)

Case
I want to refresh my Power BI Dataset as soon as my ETL process has finished. I don't want to schedule my dataset refresh x minutes/hours after my ADF(/Synapse) pipeline schedule because then I'm either refreshing too soon or waiting unnecessarily long to start the refresh. How can I minimize the time between the end of my ETL proces and the start of my dataset refresh?
Refresh Power BI Dataset












Solution
You can use the Rest APIs from Power BI to refresh a dataset from within your ADF(/Synapse) pipeline. However for this solution you (the DWH / BI developer) will probably need some help from a Domain Administrator and/or an Office 365 Administrator, because:
  • You need an Azure Active Directory Service Principal (SP) and know its name, id and secret
  • This SP needs to be in an Azure Active Directory Group that will used within the Tenant Settings of Power BI.

For this solution we will store the secret (and optionally the id) of the SP in an Azure Key Vault. Then retrieve it within the ADF(/Synapse) pipeline and use it to retrieve a token from the Active Direcory. The last step is to use that token to login to Power BI and refreh the dataset.
Use secret to create token and login to Power BI


















The basic version of the pipeline looks like this. The more complex version could for example retrieve the required workspace id and dataset id by its name via additional Rest API calls to Power BI. More on that in the conclusions.
Basic version of pipeline








Now below all the steps to create this pipeline that can refresh your Power BI datasets. There are a lot of steps, but don't let that scare you. It seems way more difficult then it really is.

1) Allow service principal to use Power BI APIs
For this first step you need access to the Power BI admin portal to give the Service Principal access to the Rest APIs from Power BI. 
  • Log in to Power BI online and click on the gear icon in the upper right corner
  • Click on Admin portal in the menu to open the portal
  • Then click on Tenant settings (if you don't see this, continue after the next image)
  • Scroll down to the Developer settings and locate 'Allow service principal to use Power BI APIs'
Either allow the entire organization to use the Rest APIs (not recommended) or specify an Azure Active Directory Group and make sure your Service Principal is a member of that group (this is where you probably need some help from the Domain Administrator. 
PBI Admin portal - Tenant Settings















If you couldn't find the tenant settings in the Admin Portal from Power BI then you need an Power BI administrator role. Your Office 365 administrator can either give you the role or do the steps above for you. If you are able get this role then you should probably use Azure AD Privileged Identity Management (PIM) to first get approval from a colleage to temporarily get this role and also use the PBI audit log to record all logins to the admin portal and record all changes in the tenant settings. 
Not a PBI admin













2) Give permissions in Power BI Workspace
Next step is to give your Service Principal contributor rights in the PBI Workspace that contains the dataset that you want to refresh.
  • Go to your Power BI workspace and click on Access in the upper right corner
  • Search for your Service Principal by its name (ours is called PowerBI)
  • Select the right permission (contributor) and click on the Add button
  • Verify the changed list of permissions and then click on the Close button
Give SP permissions in your workspace

















3) Add secret to Azure Key Vault
To create the AAD Token in the next step we need:
  • the ID (a guid) of the SP
  • the secret of the SP
  • the AAD tenant ID
You could store them in parameters or hardcode them in your pipeline, but we will store them as secrets in an Azure Key Vault. If you already used an Azure Key Vault in ADF(/Synapse) before then you know you have to give the Managed Service Identity of your ADF(/Synapse) 'Get' and 'List' access to read the secrets (see step 3 in this blog post).

The ID and secret of the SP are provided by the person who created the SP in the Azure Active Directory. The tenant ID can also be found in AAD, but you can also retrieve it from the about box in Power BI online. Store these three items as a secret in the Key Vault.
The guid in the Tenant URL is the Tenant ID













To retrieve the AAD token in the next step we will use a Web Activity. Unfortunately the Web activity doesn't work with the Key Vault Linked Service. Therefor we need to use Web Activities to retrieve all secrets from the Key Vault. In a previous blog post we showed you how to do this in detail. Below a summary of all those steps (which you must be repeated for all three secrets).
  • Add a Web Activity and give it useful name, because it will be used in the next activity.
  • Use the Secure output checkbox to prevent secrets showing up in the log
  • Set the URL to https://{Name Keyvault}.vault.azure.net/secrets/{Name Secret}?api-version=7.0 and replace the values between {} with the real value. Tip: create a Global Parameter for the Key Vault name and use that to build the URL with an expression like this:
    @concat(
        'https://'
        , pipeline().parameters.KeyVaultName
        , '.vault.azure.net/secrets/sp_secret/?api-version=7.1'
        )
  • Set Method to Get
  • Under Advanced select Managed Identity as Authentication
  • And set Resource to https://vault.azure.net
Web Activity to retrieve secrets from the Key Vault



















4) Retrieve AAD token
Next step is to use yet another Web Activity to retrieve an AAD token by providing the output from the previous three Web Activities.
  • Add a Web Activity and give it useful name, because its name will be used in the next activity.
  • Use the Secure input and output checkbox to prevent secrets or tokens showing up in the log
  • The URL uses the Tenant ID from the previous step. Use an expression like this to build the URL (check the activity name):
    @concat(
        'https://login.microsoftonline.com/'
        , activity('Get TenantId from AKV').output.value
        , '/oauth2/v2.0/token'
        )
  • Set Method to Post
  • Add a header with the name 'Content-Type' and the value 'application/x-www-form-urlencoded'
  • The body is also an expression that uses the secrets of the previous step (check the names):
    @concat(
        'client_id='
        , activity('Get ClientId from AKV').output.value
        , '&scope=https://analysis.windows.net/powerbi/api/.default'
        , '&client_secret='
        , activity('Get Secret from AKV').output.value
        , &grant_type=client_credentials'
        )
Get AAD Token

















5) Refresh Dataset
For refreshing a dataset we need the Token from the previous step, but also the ID of both the workspace and the dataset. We will store these IDs in two Pipeline parameter so that you can use the pipeline to refresh different datasets. In a later post we will show how to use the names instead of the IDs.
Pipeline parameters for Workspace ID and Dataset ID
















The refresh command is again an other Web activity that executes the Refresh Dataset In Group Rest API from Power BI. The refresh call is asynchronous, so it does not wait for the refresh to finish.
  • Add a Web Activity and give it useful name.
  • Use the Secure input checkbox to prevent tokens showing up in the log
  • The URL uses the two IDs stored in the two pipeline parameters above. Use an expression like this to build the URL (check the parameters names):
    @concat(
        'https://api.powerbi.com/v1.0/myorg/groups/'
        , pipeline().parameters.WorkspaceGuid
        , '/datasets/'
        , pipeline().parameters.DatasetGuid
        , '/refreshes'
        )
  • Set Method to Post
  • Add a header with the name 'Authorization' and the value is an expression with the Token of the previous step:
    @concat(
        string(activity('Get AAD Token').output.token_type)
        , ' '
        , string(activity('Get AAD Token').output.access_token)
        )
  • The Body can be used to set certain options like no notifications. See the documentation for more options. Enter the following JSON in the body:
    {"notifyOption":"NoNotification"}
Refresh a Power BI dataset via Rest API


















6) Check refresh status
The last steps are optionally, but if you want to be sure your dataset has been successfully refreshed we need to check the dataset refresh status in a loop. For this task the UNTIL loop is the best option. Within the loop we have a Web Activity to check the status and an Wait activity (within an If condition) to wait a couple of minutes/seconds because you don't want to check the status each second.
The check status construction














A slightly simpler option would be to just put a Wait Activity before the Web Activity instead of the If construction after the Web Activity.

Note: If you expect that the refresh takes more than an hour then you have some additional work to do, because the lifetime of the token is 60 to 90 minutes. In that case you need to copy the Get AAD Token to the loop and perhaps add an extra If Condition to only get a new token if it is about to expire.

Now all the individual steps and their screentshot:
  • Add an Until and give it a suitable name
  • The Expression continues the loop until the status is not equals 'Unknown' :
    @not(
        equals(
            first(
                json(
                    string(
                        activity('Get dataset refresh status').output
                    )
                ).value
            ).status,'Unknown'
        )
    )
    Note the name of the activity that will be inside the Until loop (if it is not yet there, it will give an error)
  • Consider changing the default 7 day timeout to the expected max refresh duration with an extra couple of minutes to be sure
Until loop











  • Add a Web Activity within the Until loop and name it 'Get dataset refresh status'. If you change that name, then you have to change it in all the expressions from steps 6 en 7.
  • Use the Secure input checkbox to prevent tokens showing up in the log
  • The URL is simular to the Refresh dataset, but it uses a different action 'refreshes' and a top 1 to only get the latest status:
    @concat(
        'https://api.powerbi.com/v1.0/myorg/groups/'
        ,pipeline().parameters.WorkspaceGuid
        ,'/datasets/',pipeline().parameters.DatasetGuid
        ,'/refreshes?$top=1'
        )
  • Set Method to Get
  • Add a first header with the name 'Authorization' and the value is an expression with the Token of the previous step:
    @concat(
        string(activity('Get AAD Token').output.token_type)
        , ' '
        , string(activity('Get AAD Token').output.access_token)
        )
  • Add a second header with the name 'Content-Type' and the value 'application/json'
Web Activity to get refresh status























  • Add an If Condition after the Web activity
  • The expression checks whether the Web Activity still returns 'Unknown' as status (same expression as the Until loop):
    @equals(
        first(
            json(
                string(
                    activity('Get dataset refresh status').output
                )
            ).value
        ).status,'Unknown'
    )


















  • Last step is to add a Wait Activity in the True condition.
  • Choose a smart number of seconds. Too low and you will get a lot of unnecessary checks. Too high and you will get an unnecessary long turnaround time.
Wait X seconds











7) Fail pipeline if refresh failed
If the dataset refresh in Power BI fails then the Web Activity will show you the status, but it won't fail the pipeline. The Until loop ends when the status changes. So by adding an additional If Condition that checks whether the refresh status is failed, allows you to use the Fail activity to fail the pipeline.
  • Add a If Condition after the Until loop
  • The expression is simulair to the If Condition within the loop. Just a different status:
    @equals(
        first(
            json(
                string(
                    activity('Get dataset refresh status').output
                )
            ).value
        ).status,'Failed'
    )

  • Within the True condition add the Fail activity
  • You can either enter a default fail message or try to get the error from the Web Activity with an expression like this:
    @string(
        json(
            first(
                json(
                    string(
                        activity('Get dataset refresh status').output
                    )
                ).value
            ).serviceExceptionJson
        )
    )
  • Last task is to pick an error code. This will show up in the logs.
Fail pipeline if refresh fails


















Conclusion
In this long post you learned how to refresh a Power BI dataset from within your ETL pipelines. Hopefully you weren't scared of due the number of steps. In a next blog post we will show you how to use the workspace name and data set name instead of their IDs because that is much more user friendly and even a little safer because when you delete a dataset and publish it again it will get a different ID. This will need three additional tasks for the translation part.
Translate names to ID's






Another extension that we will show in yet an other blog post is how to refresh only certain tables within that dataset. This will cause some changes to the Body of the refresh call, but also an additional Tenant setting change and... Power BI Premium.

thx to colleague Sam Hughes for helping. 

Update: we have also posted a version without using a Service Principal. Instead it uses the MSI of ADF or Synapse.