Showing posts with label POWER_BI. Show all posts
Showing posts with label POWER_BI. Show all posts

Tuesday, 26 December 2023

Give Power BI access to Synapse Serverless SQL Pool

Case
I have a Service Account for my data source in the Power BI Gateway, but how do I give this account access to the (External) Tables located in the Synapse Serverless SQL Pool database?
Give Power BI Access to Delta Tables








Solution
If you create a Data Lakehouse with Synapse then at the moment you still need to put the datamart as a set of external tables in the Synapse Serverless SQL Pool to make it easily accesable for Power BI. Giving access to it is very similar to the solution we showed to give readonly access to an Azure SQL Database when you have a Data Warehouse Achitecture. For a Data Lakehouse Architecture we use  External Tables that point to a Delta Table instead of regular tables. This requires an extra step with a Database Scoped Credential.

The starting position is that you have a database in the Synapse Serverless SQL Pool with some external tables pointing to your Delta Tables within your 'golden' container. Below the basic TSQL code for setting up those External Tables.
  1. First step is to create a Master Key. We need this for the second step. You can make it more secure by adding a strong password for encryption.
  2. Second step is to create a Database Scoped Credential with the Managed Service Identity of the underlying Synapse Workspace. This allows us the give the underlying Synapse Workspace access to the 'golden' container instead of giving all the separate users access. There are alternatives so study the options to see which solution fits the security policy of your organization.
  3. Third step is creating an External Data Source that points to your 'golden' container where the Delta Tables are stored. Notice that it uses the credential from the previous step. The location is the URI pointing to a container in a Gen2 Storage Account. ABFSS is short for Azure Blob File System Secure. The format of the URI is:  abfss://[containername]@[storageaccountname].dfs.core.windows.net/
  4. Fourth step is creating an External File Format where you can specify options for your source type (CSV/Parquet/JSON/Delta/etc). In this example the format for the Delta Table is very basic.
  5. The fifth and last step is creating an External Table that points to the Delta Table in your data lake. This means the data stays in the data lake and only the table structure is stored in the database. Notice the reference to your data source and file format from the previous steps. The location in the table is a folder path that starts in the root of your container and points to the Delta Table folder.
-- 1. Create Master Key for using Managed Service Identity access
CREATE MASTER KEY;

-- 2. Create Credential to use Managed Service Identity of Synapse
CREATE DATABASE SCOPED CREDENTIAL [SynapseIdentity]
WITH
    IDENTITY = 'Managed Service Identity'
;

-- 3. Create a Data source pointing to the container of your Delta tables
CREATE EXTERNAL DATA SOURCE [DeltaLocation]
	WITH (
	LOCATION = N'abfss://gold@mydevstorage.dfs.core.windows.net/',
    CREDENTIAL = [SynapseIdentity]
	);

-- 4. Create a File format for Delta tables
CREATE EXTERNAL FILE FORMAT [DeltaFormat]
    WITH (
    FORMAT_TYPE = DELTA,
    DATA_COMPRESSION = N'org.apache.hadoop.io.compress.SnappyCodec'
    );

-- 5. Create an external table
CREATE EXTERNAL TABLE [dbo].[dim_abcdef] (
    [integerfield] INT NULL,
    [stringfield] NVARCHAR (400) NULL,
    [datefield] DATETIME2 (7) NULL
)
    WITH (
    LOCATION = N'/delta/dim_abcdef',
    DATA_SOURCE = [DeltaLocation],
    FILE_FORMAT = [DeltaFormat]
    );

When you have created a whole set of those External Tables for your datamart, then you can add a user to the Serverless SQL Pool database which can be used by Power BI to create the Data Source in the Gateway.
  1. First you need to create a user. For this example we used a user from Microsoft Entra  ID (Azure Active Directory). Therefor you see the FROM EXTERNAL USER. This is because our Synapse Workspace is created with the option 'Microsoft Entra authentication only'. Therefor database users or Shared Access Signatures are not allowed.
  2. The second step is to give this newly added user some access. In this simplified example we give the new user db_datareader access to the entire database. If you have a larger or more complex database then you can make this much more sophisticated by for example giving reader access to only a specific schema or table.
  3. The last step is giving the user permission to use the Database Scoped Credential that was used in the External Data Source that was used in the External Table. This is done with GRANT and the REFERENCES permission. Without this step you will not be able to query the External Tables.
-- 6. Service Account bekend maken op datamart database
CREATE USER [sa-pbi-workspace-d@xyz.com] FROM EXTERNAL PROVIDER;

-- 7. Service Account leesrechten geven op tabellen
ALTER ROLE [db_datareader] ADD MEMBER [sa-pbi-workspace-d@xyz.com]

-- 8. Service Account reference rechten geven op credential van Synapse
GRANT REFERENCES ON DATABASE SCOPED CREDENTIAL::SynapseIdentity TO [sa-pbi-workspace-d@xyz.com]
Conclusion
In this post we showed you how to give a user, that was created for Power BI, read-only access to the tables in your Azure Synapse Serverless SQL Pool database. Very similar to giving access to a regular database. Now you can use this user for your Power BI gateway datasource.

In the future we can hopefully use the Power BI Direct Lake mode on regular Azure Storage Accounts or write with Synapse to the Fabric One Lake.

Thursday, 23 February 2023

Streaming data Azure & Power BI - Streaming dataset

Case
I want to send streaming data to Power BI for realtime reporting purposes. This post is part of a series on Streaming Data in Azure and Power BI. It focusses on showing the live data in Power BI.
Power BI Streaming dataset























Solution
In the previous Stream Analytics post we pushed the data from the Event Hub to a new Streaming dataset in Power BI and in this post we will create a Power BI report and dashboard to see the streaming data actually moving on our screen.






Posts in this series:

1) Check dataset
First make sure your Stream Analytics job is running and that data is send to the Event Hub. Only then a new Power BI Dataset will be created. Note that the icon for a streaming Dataset differs from a normal dataset. You can edit the dataset to see which columns are available, but don't change the columns or settings because Stream Analytics is managing it.
New Streaming dataset in Power BI














2) Create report
Now create a new Power BI report on the newly created dataset, publish it and  then view it in Power BI online.
Creating and publish our beautifull report














The workspace with dataset and report











The live report













Now while viewing the report make sure data is still streaming into the Power BI dataset. You will probably notice that nothing is changing or moving in your report. Once you hit the refresh visual button you will see the new data. However continuously hitting the refresh button is not likely an option for you.

3) Pin report visual to Dashboard
Hover your mouse over your report visual and look for the little push pin button. Click on it to create a new dashboard. Repeat this for all the visuals you want to see in live mode.
Pin Report Visual to Dashboard













The workspace with the new dashboard










Now open the newly created dashboard and watch the data streaming live into your visuals. You can adjust the size of the visuals to see more details. We left the bottom right empty to show a PowerShell ISE window pushing data to the Event Hub and with the lowest and slowest settings (basic Event Hub, 1 streaming unit in Stream Analytics and the max of 1 message a second) it just takes a view seconds for the data to appear in Power BI.
Streaming live data into Power BI













Conclusions
In this post we showed you the end of the hot path by creating a very basic report and pinning its visuals to a dashboard to see the data streaming live into your dashboard. You can enrich the dashboard with visuals from your cold path reports to also compare the live data to for example daily everages. The storage part of the cold path will be explained in a next post from this series.

Tip: during testing you can empty the streaming dataset by temporary switching off Historic data analysis in the edit screem of the dataset. You wont see new data until you switch it on again.


Sunday, 19 February 2023

Streaming data Azure & Power BI - Stream Analytics

Case
I want to send streaming data to Power BI for realtime reporting purposes. This post is part of a series on Streaming Data in Azure and Power BI and focusses on getting the streaming data from the Event Hub and sending it to a Power BI streaming dataset with Azure Stream Analylics.
Azure Stream Analytics
















Solution
In the previous post we sent messages to the Event Hub and in this post we will read those messages and send them to Power BI. For this we need a service that can handle streaming data. Azure Stream Analytics is the query tool for streaming data in Azure. It uses the regular SQL language (so easy to learn). However it has some extras like the windowing option for the GROUP BY because you can not aggregate the entire stream (it never ends), but you can aggregate within a certain time window.

Posts in this series:

First explain the three most important items of the Stream Analytics Jobs.

Streaming inputs and reference inputs
There are three types of input sources in the query for streaming data: Azure Event Hubs, Azure IoT Hub and Azure Data Lake. But there is also a reference input where your can use an Azure SQL Database table or a blob storage file to enrich or translate your streaming data. Streaming messages are often kept small to speed up everything by using IDs or Codes instead of long strings. With the reference data you can transform them for reporting purposes.

Streaming Outputs
There are several streaming outputs like the Storage Accounts, a SQL database or even a Cosmos DB, but the one we are using for this example is the Power BI output. The Power BI output has a limitation that it can be called roughly once every second and the messages can't be larger then 15KB. This means that you will need a windowing function to aggreate the data to make sure you don't overflood Power BI.

Query
The query language in Stream Analytics is a subset of T-SQL, so very similar and easy to learn if you already know how to query a Microsoft SQL server database. The most common query pattern is a SELECT INTO with a GROUP BY.
SELECT      SomeColumn, Count(*) as Count
INTO        OutputStream
FROM        InputStream TIMESTAMP BY CreatedAt
GROUP BY    SomeColumn, TumblingWindow(second, 30)

Let's create a new Azure Stream Analytics Job and configure a query to push data to Power BI.


1) Create Stream Analytics Job
Creating the new SA Job is very straightforward. For this example the region and the number of streaming units are the important parts.
  • Go to the Azure portal an create a new Stream Analytics job
  • Select the correct subscription and resource group
  • Come up with a good descriptive name for your job. Keep in mind that there is only one query window but you can run multiple queries within that.
  • Choose the correct region. The same region as your even hub and your Power BI tenant is the best for performance.
  • Hosting environment is Cloud (unless you have an on-premises Edge environment.
  • Last thing is the number of Streaming units. This is where you start paying. The default is 3 SUs, but for testing purposes or small jobs 1 SU is more then enough.
  • Under storage you can setup a secure storage account. For this example you can leave that empty.
  • Optionally add some Tags and then review and create the ne SA Job.
Create Streaming Analytics Job




































2) Create Input stream
Now that we have a SA job we first need to create an input to connect to our Event Hub.
  • Go to your SA job in the Azure portal and click on Inputs in the left menu under Job topology
  • In the upper left corner click on + Add stream input and choose Event Hub
  • A new pane appears on the right site. First enter a descriptive name for your Event Hub.
  • Now select the correct Subscription and Event Hub Namespace.
  • Select the existing Event hub name that we created in the previous blog post.
  • Select the existing Event Hub consumer group which we left default in the previous blog post
  • For the Authenication mode the easiest way is to select Create system assigned managed identity. This means this specific SA job will get access to the selected Event Hub.
  • Partition key is for optimizing performance if your input in indeed partitionized. You can leave it empty for this example
  • We used a JSON structure for our test messages. Therefore select JSON as Event serialization format.
  • Select UTF-8 as encoding (the only option at this moment)
  • Leave the Event compression type to None for this example

Create new Input



















After saving you will see a couple of notifications to create and test your new input. This takes about a minutes to complete.
Notifications for new Input












3) Create Output stream
After the input it's now time for adding the output to Power BI. For this you need a Power BI workspace where you have admin rights. Once the job is running for the first time and new events are streaming then Stream Analytics will create a Streaming Dataset in your workspace.
  • Go to your SA job in the Azure portal and click on Outputs in the left menu under Job topology
  • In the upper left corner click on + Add and choose Power BI
  • A new pane appears on the right site. First enter a descriptive name for your Power BI output
  • Next select the Power BI workspace where your streaming dataset will appear
  • For Authenication mode choose Managed Identity-System assigned. This specific job will then be added as a Contributor
  • At last enter a Dataset name and a Table name. Note that you can only have one table in a streaming dataset. So don't create a second output to the same Power BI streaming dataset with a different table (we tried).
Create new Output



















After saving you will see a couple of notifications to create and test your new output. This takes about a minute to complete.
Notifictions for new Output

`








In Power BI your will see your new Stream Analytics job as a Contributor in your workspace. However the streaming dataset will only appear once the job is running and pushing data.
Stream Analytics Job as Contributor












4) Create query
With the new Input and Output we will now create a very basic query to push the test data from Azure Event Hub via Azure Stream Analytics to Power BI.
  • Go to your SA job in the Azure portal and click on Query in the left menu under Job topology
  • A basic but working query will alread be created for you if you have an input and output
  • Once opened wait a few moments for Azure Event Hub to refresh. You will see a turning circle icon behind your input. If there is data in your event hub then it will appear after a few moments
Default query and waiting for input data









Default query with data from input

















  • Now you can adjust your query by only selecting the columns you need in Power BI (less is more). The windowing function is not required if the number of messages don't exceed the limits of Power BI
  • After editing the query hit the Test query button and check the Test results.
Test result of basic query without time window
















Test result of basic query with TumblingWindow












  • Once you are satisfied with the query result hit the Save Query button above the editor
Save query once you're ready











The query for testing (without windowing).
SELECT
    CallId
,   DurationInSeconds
,   EventEnqueuedUtcTime as CallTimeStampStr
INTO
    [pbbitools]
FROM
    [ehbitools]
If you send more request to Power BI than it can handle, then Stream Analytics will try batching multiple messages into one request. For small messages that occasionally exceed the max number of messages this could be a 'workaround'. However batching multiple messages could also cause to exceed an other limit: the max message size. A better solution is to use a windowing function to slow down the stream.

5) Start SA job
Now go back to the overview page of your Stream Analytics job and hit the Start button to start your SA job. The first time you can choose between Now or Custom as a start time to recieve new messages. The second time you can also choose for When last stopped. For this example we choose Now. It will take a view moments to change the status from Created to Starting to Running. 
SA Job is running
















Once it is started AND new data is send to the Event Hub then the streaming dataset will appear in your Power BI workspace. Notice that the red icon is different compared to a regular dataset.
A new dataset appeared for the streaming data









Conclusion
In this post you learned how to create a (very basic) Stream Analytics job. In a follow up post we will explain the Window Functions in more details. For now you can read our old post about Window Functions, but compared to 6,5 years ago we now have 2 new Windowing functions. However you probably end up using the old TumblingWindow. The next post in this series will be showing the live data in Power BI with automatically changing visuals when new data is collected.




Sunday, 5 February 2023

Streaming data Azure & Power BI - Introduction

Case
I want to send streaming data to Power BI for reporting purposes. What should I take into account when choosing the right architecture?
Streaming Data to Power BI













Solution
If you for example have a helpdesk for your customers where they can call or chat for support then you probably also want some real time reports to see the current state of the calls and chats. Most regular Data Warehouses are often only refreshed once a night and then it's already too late to react to incidents.

For real time reports in Power BI have two main options. The first option is to send the events directly to a Power BI Streaming dataset (Push or Streaming) and then build a report and pin reports visuals to a dashboard. This is an appropriate solution for a lot of real time reports, but there are some limitations. For example there is a maximum number of events per second. Once you exceed that limit you start loosing data. Propably just when you need accurate reports the most: when it is very busy in your helpdesk. An other limitation for streaming datasets in Power BI is the history. It keeps only one hour of data.

The second option is to push the data into Azure Event Hubs and then use Azure Stream Analytics to push it to Power BI. This solves the max number of events per second because Stream Analytics can aggregate or filter the data before sending it to Power BI and Stream Analytics can also send it to for example a data lake to solve your history problem.
Streaming Data to Power BI












In this streaming data series we will explain this second option focussing on the hot path and the capture in the data lake which is part of the cold path. Just like for a 'regular' data warehouse architecture there are a lot of different solutions, but this one is probably the most common and simple solution that will fit the majority of cases. One particular new streaming data feature in Azure that is worth mentioning, is writing to a Delta Lake table. At the moment of writing this is still in public preview and only available in a limited number of Azure regions, but this will fit the Lake House architecture very well.

Posts in this series:






Friday, 27 January 2023

Deploy a Power BI dataset via a DevOps - Publish

Case
I want to develop my Power BI dataset separately from my Power BI reports. So we started using Tabular Editor. This also allows us to work on the model with multiple people at the same time because every object is a separate JSON file. But how can we automatically deploy these files using Azure DevOps?
Deploy your Power BI datasets with DevOps











Solution
In this serie of posts we will deploy the Tabular Editor project as an Power BI dataset via DevOps. We will deploy this to the development workspace. For the deployment from Dev to Test and Test to Prod we will use Power BI Deployment pipelines.


In this third post we will create the YAML files that will do the actual publishing of the data sets. The first file uses Tabular Editor to publish to the Development environment and the second file calls the Power BI deployment pipeline to publish to test/acceptance and to production. Both YAML files are stored in the repository under CICD\YAML\*****.yml. 

Repository folder structure
























1) DeployPBIModelDev.yml
The First YAML file is BuildModel.yml that builds the JSON files of the tabular editor project into a deployable *.bim file (the dataset) and then calls the deployment YAML files to do the actual deployment.

First we are adding the parameters to this YAML file. The values will be passed through by the YAML file of the previous post.
parameters:
  - name: env
    displayName: Environment
    type: string
    values:
    - DEV
  - name: SpApplicationId
    displayName: Service Principal Application Id
    type: string
  - name: SpClientSecret
    displayName: Service Principal Client Secret
    type: string
  - name: SpTenantId
    displayName: Service Principal Tenant Id
    type: string
  - name: DownloadLinkTabularEditor
    displayName: Download Link Tabular Editor
    type: string

This part connects to a enviroment to which you can add approval steps in DevOps under Pipelines - Environments.
jobs:
  - deployment: deploymentjob${{ parameters.env }}
    displayName: Deployment Job ${{ parameters.env }} 
    environment: Deploy to ${{ parameters.env }}
    strategy:
      runOnce:
        deploy:
          steps:
The first step is a check out to get the files from the repository to the agent. For this simple example it is not required, but later on you perhaps want to execute some Powershells that are stored in the repository to for example connect the dataset to a gateway.
          ###################################
          # 1 Check out repository to agent
          ###################################
          - checkout: self
            displayName: '1 Retrieve Repository'
            clean: true 
In the second step we will download and unzip tabular editor. If you have a self-hosted agent instead of a Microsoft hosted agent you also could install this software on the VM manually. However this step allows you to easily upgrade to a newer version of the editor by just changing the URL in the variable group.
          ###################################
          # 2 Download Tabular Editor
          ###################################
          - powershell: |
              # Download URL for Tabular Editor portable:
              
              # Create Download folder
              $DownloadFolder = join-path (get-location) "TabularEditor"
              new-item -type directory -path $DownloadFolder -Force
              Write-Host "Create download location: $($DownloadFolder)" 

              # Download destination (root of PowerShell script execution path):
              $DownloadFile = join-path $DownloadFolder "TabularEditor.zip"
              Write-Host "Save download as: $($DownloadFile)"
              
              # Download from GitHub:
              Write-Host "Downloading Tabular Editor from: $(DownloadLinkTabularEditor)" 
              Invoke-WebRequest -Uri $(DownloadLinkTabularEditor) -OutFile $DownloadFile
              
              # Unzip Tabular Editor portable, and then delete the zip file:
              Write-host "Extracting downloaded zip file"
              Expand-Archive -Path $DownloadFile -DestinationPath $DownloadFolder
              Write-host "Clean up download file"
              Remove-Item $DownloadFile
            displayName: '2 Download Tabular Editor'
The third step is also an optional debug step showing you the artifact and repository files on your agent as well as the exact location of the executable of tabular editor which we need in the next step. Once everything is running fine you can remove or comment out this step.
            ###################################
            # 3 Show treeview of agent
            ###################################
            - powershell: |
                tree "$(Pipeline.Workspace)" /F
              displayName: '3 Treeview of Pipeline.Workspace'
The fourth step is the only mandatory step. It executes tabular editor for each project on the agent. Checkout all the parameters after TabularEditor.exe where the you will see the service principal and a lot of parameters of the tool it self.
          ###################################
          # 4 Publish data models
          ###################################
          - powershell: |
              # Get Artifact folder with model files
              $ModelsFolder = Join-Path -Path $(Pipeline.Workspace) -ChildPath "\PBIModelsArtifact\"

              # Loop through local artifact folder to get all datamodels
              $AllDatamodels = Get-ChildItem -Path $ModelsFolder -Recurse -Filter *.bim
              Write-Host "$($AllDatamodels.Count) model$(If ($AllDatamodels.Count -ne 1) {"s"}) found in $($ModelsFolder)"
              Write-Host "================================"
              foreach ($DataModel in $AllDatamodels)
              {
                $Path = "$($DataModel.Fullname)"
                Write-Host "Processing model $($Path)"

                # Extract datasetname and workspacename from folder path of model
                $DatasetName = $Path | Split-Path -Parent | Split-Path -Leaf
                $WorkspaceName = $Path | Split-Path -Parent | Split-Path -Parent | Split-Path -Leaf
                # Concat workspace name with the suffix used for the deployment workspace
                $WorkspaceName = "$($WorkspaceName)$(PowerBiWorkspaceSuffix)"
                  
                Write-Host "Publishing model [$($DatasetName)] in [$($WorkspaceName)]"
                # Publish model.bim to Power BI Workspace
                # Deleted -P argument at the end of the command line, because we don't want to overwrite the table partitions in the model
                $(Build.SourcesDirectory)\TabularEditor\TabularEditor.exe "$($DataModel.Fullname)" -D "Provider=MSOLAP;Data Source=powerbi://api.powerbi.com/v1.0/myorg/$($WorkspaceName);User ID=app:${{ parameters.SpApplicationId }}@${{ parameters.SpTenantId }};Password=${{ parameters.SpClientSecret }};" "$($DatasetName)" -O -C -R -M -E -V
                Wait-Process -Name "TabularEditor"

                Write-Host "================================"
              }
            displayName: '4 Deploy data models'

This is all you need to deploy your model to a Power BI Workspace. You could reuse this for other environments like test/acceptance and production as well. However we will be using the Power BI Deployment Pipelines for the next environments. This also allows you to change parameters and connections without writing code.

2) DeployPBIModelAccPrd.yml
By default the Power BI deployment pipeline will deploy everything from the source workspace to the destination workspace. We only want to deploy datasets and leave all the other stuff for other pipelines. To be even more specific we only want datasets from the respository and leave any manually deployed datasets. In the GUI you can make that selection manually. In the YAML code you can provide a semicolon separated string with the names of the datasets you want to promote. For this we will create a variable, fill it with some Powershell and then provide it as a parameter for the deployment.

First we are adding the parameters to this YAML file. The values will be passed through by the YAML file of the previous post. Note that the parameter differ from the YAML above. This is mainly because we can use a DevOps Service Connection instead of a Service Principal.
parameters:
  - name: env
    displayName: Environment
    type: string
    values:
    - ACC
    - PRD
  - name: DeployStagePowerBI
    displayName: DeployStagePowerBI
    type: string
    values:
    - Test
    - Production
  - name: ServiceConnection
    displayName: Service Connection
    type: string
  - name: ReposWorkspaceFolder
    displayName: ReposWorkspaceFolder
    type: string
  - name: DeploymentPipelinePBI
    displayName: Deployment Pipeline PBI
    type: string
This part connects to a enviroment to which you can add approval steps in DevOps under Pipelines - Environments. Note that we created an empty string variable named Datamodels. We will use this for the list of datasets that we want to promote.
jobs:
  - deployment: deploymentjob${{ parameters.env }}
    displayName: Deployment Job ${{ parameters.env }} 
    environment: Deploy to ${{ parameters.env }}
    variables:
    - name: Datamodels
      value: ""

    strategy:
      runOnce:
        deploy:
          steps:
The first step is a check out to get the files from the repository to the agent. For this simple example it is not required, but it could be handy for a more complex pipeline where you want to use Powershells from your repository.
          ###################################
          # 1 Check out repository to agent
          ###################################
          - checkout: self
            displayName: '1 Retrieve Repository'
            clean: true 
The second step is also optional, but shows the artifact (and repos) files on your agent. We need this in the next step. You can remove or comment out the code once everything works
          ###################################
          # 2 Show environment and treeview
          ###################################
          - powershell: |
              Write-Output "Deploying PBI models in the ${{ parameters.env }} environment"
              tree "$(Pipeline.Workspace)" /F
            displayName: '2 Show environment and treeview Pipeline_Workspace'
In this step we will loop through the artifact folder to get a list of all datasets. The code will first create an array of subfolders (the subfolder has the name of the dataset). Then the array will be used to create a semicolon separated string of dataset names. The last Write-Host line looks a bit strange, but will populate the YAML variable that we create above.
          ###################################
          # 3 Fill variable with list of models
          ###################################
          - powershell: |
              # Get Artifact folder with model files
              $WorkspacesFolder = Join-Path -Path $(Pipeline.Workspace) -ChildPath "\PBIModelsArtifact\Models\${{ parameters.ReposWorkspaceFolder }}"

              # Loop through local artifact folder to get all datamodels
              [array]$ModelsArray = Get-ChildItem -Path $WorkspacesFolder | ? {$_.PSIsContainer} | Select-Object Name
              Write-Host "$($ModelsArray.Count) workspace$(If ($ModelsArray.Count -ne 1) {"s"}) found in $($WorkspacesFolder)"

              [string]$ModelsList = $null
              $ModelsList = $ModelsArray.name -join ";"
              Write-Host "Value $($ModelsList)"
              Write-Host "##vso[task.setvariable variable=Datamodels;]$ModelsList"
            displayName: '3 Fill variable with list of Workspaces'
For debug purposes we show the variable with the semicolon separated string of dataset names. You can remove this once everything works or leave it and use it as log information for your pipeline.
          ###################################
          # 4 Show environment and treeview
          ###################################
          - powershell: |
              Write-Output "Variable value of Datamodels $(Datamodels)"
            displayName: '4 Show variable value'
The last step is calling the deployment task from the installed DevOps addon for Power BI. We use Selective for the deployType which allows us to provide a lost of datasets. You can use a similar construction for the reports which we we show in a later blog post.
          ###################################
          # 5 Deploy models
          ###################################
          - task: ms-pbi-api.pbi-automation-tools.Pipelines_Deploy.DeploymentPipelines-Deploy@1
            displayName: '5 Deploy models to ${{ parameters.DeployStagePowerBI }} stage'
            inputs:
                pbiConnection: PowerBI
                pipeline: ${{ parameters.DeploymentPipelinePBI }}
                stageOrder: ${{ parameters.DeployStagePowerBI }}
                deployType: Selective
                datasets: $(Datamodels)
                createNewWS: false

Running it once for dev and once for test(/acc) will look like this. Note that some of the stages are skipped. This is due the branch strategy and the conditions we used between the stages.
The result of 2 runs























In the runs overview it will look like this.














Conclusion
In this post we showed two different ways to deploy your Power BI datasets. The first with tabular editor can be used without a Power BI Premium license. A disadvantage is that it is a lot of work to change parameters and connections.

The second method with the Power BI Deployment pipelines require Power BI Premium, but those pipelines allow you to change parameters and connections without code. A big disadvantage is that your datasets already need to be present in the development workspace of Power BI. This can be done with a Manual deployment of by combining both methods to get the best of both worlds and versioning of your work in a repository.



Thursday, 26 January 2023

Deploy a Power BI dataset via a DevOps - Build

Case
I want to develop my Power BI dataset separately from my Power BI reports. So we started using Tabular Editor. This also allows us to work on the model with multiple people at the same time because every object is a separate JSON file. But how can we automatically deploy these files using Azure DevOps?
Deploy your Power BI datasets with DevOps











Solution
In this serie of posts we will deploy the Tabular Editor project as an Power BI dataset via DevOps. We will deploy this to the development workspace. For the deployment from Dev to Test and Test to Prod we will use Power BI Deployment pipelines.


In this second post we will create the YAML file that will build the Tabular Editor project that is stored in the repository. The filename we used is BuildModel.yml and it is stored in the repository under CICD\YAML\BuildModel.yml. There are two additional YAML files in that same folder and those will be explained in the next post. However they will be mentioned in one of the tasks below.

Repository folder structure
























1) BuildModel.yml
The first YAML file is BuildModel.yml that builds the JSON files of the tabular editor project into a deployable *.bim file (the dataset) and then calls the deployment YAML files to do the actual deployment.

First we are adding the general variable group (mentioned in the previous post) to the YAML file. This allows us to use those variables in the YAML and Powershell code later on.
###################################
# General Variables
###################################
variables:
  - group: PBIModelParamsGen


Secondly we need to determine when to trigger this pipeline. This part is of course heavily depending on your branch strategy. In this example it triggers on changes in multiple branches but only for models in a certain Power BI workspace folder. Note that the path is hardcoded and that it can't use variables from your Variable groups.
###################################
# When to create a pipeline run
###################################
trigger:
  branches:
    include:
    - Development
    - Acceptance
    - main
  paths:
    include:
    - Models/myPowerBIWorkspace/*


The next step is describing the first stage of the pipeline where we build the model and set it ready for publishing. The important property is pool where you either configure a self hosted agent (via name) or a Microsoft hosted agent (via vmImage). If you are using a self hosted agent then the clean: all setting helps you to cleanup your agent before you start. 
stages:
###################################
# Create Artifact of datamodels
###################################
- stage: CreateModelArtifact
  displayName: Create PBI Model Artifact

  jobs:
  - job: CreateModel
    displayName: 'Create Model'
    workspace:
      clean: all
    pool:
      name: mySelfHostedAgent
      # vmImage: 'windows-latest' #'ubuntu-latest'
    steps:


The first real task is retrieving all the files from the repository. After this step your datasets/models will be available on the agent it self. This is necessary for the build step.
    ###################################
    # 1 Retrieve Repository
    ###################################
    - checkout: self
      displayName: '1 Retrieve Repository'
      clean: true


The next task is downloading and unzipping the tool Tabular Editor. If you have a self hosted agent then you could also choose to download/install this on your Virtual Machine. However installing it via YAML makes it very easy to use a newer version just by changing the 'DownloadLinkTabularEditor' variable in the general variable group (see previous post). That variable should be filled with the download url of the portable/zip file: https://github.com/TabularEditor/TabularEditor/releases/
    ###################################
    # 2 Download Tabular Editor
    ###################################
    - powershell: |
        # Download URL for Tabular Editor portable:
        
        # Create Download folder
        $DownloadFolder = join-path (get-location) "TabularEditor"
        new-item -type directory -path $DownloadFolder -Force
        Write-Host "Create download location: $($DownloadFolder)" 

        # Download destination (root of PowerShell script execution path):
        $DownloadFile = join-path $DownloadFolder "TabularEditor.zip"
        Write-Host "Save download as: $($DownloadFile)"
        
        # Download from GitHub:
        Write-Host "Downloading Tabular Editor from: $(DownloadLinkTabularEditor)" 
        Invoke-WebRequest -Uri $(DownloadLinkTabularEditor) -OutFile $DownloadFile
        
        # Unzip Tabular Editor portable, and then delete the zip file:
        Write-host "Extracting downloaded zip file"
        Expand-Archive -Path $DownloadFile -DestinationPath $DownloadFolder
        Write-host "Clean up download file"
        Remove-Item $DownloadFile
      displayName: '2 Download Tabular Editor'


To check whether the checkout step and the download/unzip step were succesful we have an optional step that shows a treeview of the agent. Once everything is working you can remove or comment-out this step.
    ###################################
    # 3 Show treeview of agent
    ###################################
    - powershell: |
        tree "$(Pipeline.Workspace)" /F
      displayName: '3 Treeview of Pipeline.Workspace'


Now that we have succesfully copied the models to the agent and downloaded Tabluar editor, we can build the json files into a BIM file which we can later on release as a dataset to a Power BI workspace.

This Powershell step loops through all database.json files (the main/project file) that are on the agent. It retrieves the subfolder/parentfolder name so that is can be used as a dataset name and then uses TabularEditor.exe to build the project. All BIM files are stored in the artifact staging folder.

    ###################################
    # 4 Build data models
    ###################################
    - powershell: |
        # Get repos folder with model files
        $ModelsFolder = Join-Path -Path $(Build.SourcesDirectory) -ChildPath "\Models\$($ReposWorkspaceFolder)\"
      
        # Loop through local repos folder to get all datamodels
        $AllDatamodels = Get-ChildItem -Path $ModelsFolder -Recurse -Filter database.json
        Write-Host "$($AllDatamodels.Count) model$(If ($AllDatamodels.Count -ne 1) {"s"}) found in $($ModelsFolder)`n"
        foreach ($DataModel in $AllDatamodels)
        {
          $Path = "$($DataModel.Fullname)"
          Write-Host "Building model $($Path)"

          # extract datasetname and workspacename from folder path of model
          $DatasetName = $Path | Split-Path -Parent | Split-Path -Leaf
          $WorkspaceName = $Path | Split-Path -Parent | Split-Path -Parent | Split-Path -Leaf

          # Create target path for staging folder
          $TargetFolder = Join-Path -Path $(Build.ArtifactStagingDirectory) -ChildPath "\Models\"
          $TargetFolder = Join-Path -Path $TargetFolder -ChildPath $WorkspaceName
          $TargetFolder = Join-Path -Path $TargetFolder -ChildPath $DatasetName

          Write-Host "Saving build model in $($TargetFolder)"
          # Build model into bim file by executing tabular editor
          $(Build.SourcesDirectory)\TabularEditor\TabularEditor.exe "$($DataModel.Fullname)" -B "$($TargetFolder)\Model.bim"
          Wait-Process -Name "TabularEditor"
        }
      displayName: '4 Build data models'


This is again the same treeview step to check the result of the build. In this case it shows all files and subfolders of the artifact staging folder. You can remove or comment-out this step once everything works. 
    ###################################
    # 5 Show treeview of agent
    ###################################
    - powershell: |
        tree "$(Build.ArtifactStagingDirectory)" /F
      displayName: '5 Treeview of Build.ArtifactStagingDirectory'


The last step of the build stage is creating an artifact of all (BIM) files in the artifact staging folder. This artifact will auttomatically be downloaded to the agent in the next stage. 
    ###################################
    # 6 Publish artifact
    ###################################
    - task: PublishPipelineArtifact@1
      displayName: '6 Publish ARM template as artifact'
      inputs:
        targetPath: $(Build.ArtifactStagingDirectory)
        artifact: 'PBIModelsArtifact'
        publishLocation: 'pipeline'


Now it's time to do the actual deployment. We have 3 extra stages (Deploy to Dev, Acc and Prd), but we used a condition on them. If a change happened in the Acceptance branch then we will only execute the Deploy to Acc stage and skip the rest of the stages. In the pipeline runs overview it will look like this:

Stages of pipeline runs



















In this example the deployment method for Development is a little PowerShell script and for Acceptance and Production we will use the Power BI Deployment Pipeline. Therefore those stages have different parameters. The details of the those yaml files will be explained in the next blog post.
###################################
# Deploy Dev environment
###################################
- stage: DeployDev
  displayName: Deploy DEV
  variables:
    - group: PBIModelParamsDev
  pool:
    name: mySelfHostedAgent
    #vmImage: 'windows-latest'
  condition: eq(variables['Build.SourceBranchName'], 'Development')
  jobs:
    - template: DeployPBIModelDev.yml
      parameters:
        env: DEV
        SpApplicationId: $(SpApplicationId)
        SpClientSecret: $(SpClientSecret)
        SpTenantId: $(SpTenantId)
        DownloadLinkTabularEditor: $(DownloadLinkTabularEditor)
        
###################################
# Deploy Acc environment
###################################
- stage: DeployAcc
  displayName: Deploy ACC
  variables:
    - group: PBIModelParamsAcc
  pool:
    name: mySelfHostedAgent
    #vmImage: 'windows-latest'
  condition: eq(variables['Build.SourceBranchName'], 'Acceptance')
  jobs:
    - template: DeployPBIModelAccPrd.yml
      parameters:
        env: ACC
        DeployStagePowerBI: Test # PBI doesn't use Acceptance
        ServiceConnection: 'PowerBI'
        ReposWorkspaceFolder: $(ReposWorkspaceFolder)
        DeploymentPipelinePBI: $(DeploymentPipelinePBI)


###################################
# Deploy Prd environment
###################################
- stage: DeployPrd
  displayName: Deploy PRD
  variables:
    - group: PBIModelParamsPrd
  pool:
    name: mySelfHostedAgent
    #vmImage: 'windows-latest'
  condition: eq(variables['Build.SourceBranchName'], 'main')
  dependsOn: CreateModelArtifact
  jobs:
    - template: DeployPBIModelAccPrd.yml
      parameters:
        env: PRD
        DeployStagePowerBI: Production
        ServiceConnection: 'PowerBI'
        ReposWorkspaceFolder: $(ReposWorkspaceFolder)
        DeploymentPipelinePBI: $(DeploymentPipelinePBI)
Conclusion
In this post we showed how to build a Tabular Editor project with JSON files into one BIM file which we can deploy to the Development workspace in Power BI. You could do the same for the other enviroments, but in this example we will use the Power BI Deployment pipeline for Test and Production which you will see in the next post.

Deploy a Power BI dataset via a DevOps - Setup

Case
I want to develop my Power BI dataset separately from my Power BI reports. So we started using Tabular Editor. This also allows us to work on the model with multiple people at the same time because every object is a separate JSON file. But how can we automatically deploy these files using Azure DevOps?
Deploy your Power BI datasets with DevOps











Solution
In this serie of posts we will deploy the Tabular Editor project as an Power BI dataset via DevOps. We will deploy this to the development workspace. For the deployment from Dev to Test and Test to Prod we will use Power BI Deployment pipelines.


In this first post we are setting up the project in Azure DevOps as a preparation for publishing Power BI Datasets. We will show you
  • the folder stucture of the repository
  • which addon is handy in combination with Power BI Premium 
  • how to authorize the Service Principal and create a Service Connection
  • which variables we use in the variable groups


1) Repository
You need to store those JSON files from your Tabular Editor project in a DevOps (or GITHUB) repository. For this example we created the following folder structure where we separated de models from the reports. Within the models folder we have one or more subfolders for each PBI workspace we want to target and then an other subfolder level for each dataset/project.
Repository folder structure





















2) Power BI Automation tools addon
We will use Tabular Editor to deploy from the respository to the Dev workspace in Power BI, but for this example we will use Power BI Deployment pipelines to deploy from Dev tot Tst(/Acc) and then to Prd. This is optional because you need Power BI Premium for this, but it is saving you a lot of time writing and maintaining code.

To call the Power BI Deployment pipelines from within Azure DevOps we need a (free) add-on from Microsoft called: Power BI automation tools. Requesting to add the extentions is just a few clicks, but for approving the installation of the extention you need enough rights on organization level in DevOps.
Power BI automation tools add on for DevOps














3) Service Principal
To do the release of the dataset to the Power BI workspace we need an Azure Active Directory Service Principal (SP).
  1. This SP needs to be in an Azure Active Directory Group that will used within the Tenant Settings of Power BI to allow the usage of Power BI rest APIs.
  2. This SP needs to get Contributor access in the workspace to do the deployment of the dataset
  3. (optional) This SP needs to get access to an Power BI Deployment pipeline (=Power BI premium)
3a) Allow service principal to use Power BI APIs
For this first step you need access to the Power BI admin portal to give the Service Principal access to the Rest APIs from Power BI. 
  • Log in to Power BI online and click on the gear icon in the upper right corner
  • Click on Admin portal in the menu to open the portal
  • Then click on Tenant settings (if you don't see this, continue after the next image)
  • Scroll down to the Developer settings and locate 'Allow service principal to use Power BI APIs'
Either allow the entire organization to use the Rest APIs (not recommended) or specify an Azure Active Directory Group and make sure your Service Principal is a member of that group. 
PBI Admin portal - Tenant Settings















If you couldn't find the tenant settings in the Admin Portal from Power BI then you need an Power BI administrator role. Your Office 365 administrator can either give you the role or do the steps above for you. If you are able get this role then you should probably use Azure AD Privileged Identity Management (PIM) to first get approval from a colleage to temporarily get this role and also use the PBI audit log to record all logins to the admin portal and record all changes in the tenant settings. 
Not a PBI admin













3b) Give permissions in Power BI Workspace
Next step is to give your Service Principal contributor rights in the PBI Workspaces where you want to deploy the datasets.
  • Go to your Power BI workspace and click on Access in the upper right corner
  • Search for your Service Principal by its name (ours is called PowerBI)
  • Select the right permission (contributor) and click on the Add button
  • Verify the changed list of permissions and then click on the Close button
Give SP permissions in your workspace

















3c) Give permissions to Deployment pipeline
The Power BI Deployment Pipelines are a premium feature to release Power BI components from a Development workspace to a Test Workspace and then from Test to Production. These pipelines will save you writing (and maintaining) a lot of PowerShell code for Azure DevOps pipelines, but they also have 2 major disadvantages. They don't use a repository to store versions of your work and the can't deploy to the Development workspace. This is where the combination DevOps and Power BI can help.

For this you need to create a Power BI Deployment Pipeline, connect it to your PBI workspaces (dev, tst & prd) and at last give the Service Principal from above access to your newly created Deployment pipeline. At the moment the only role is admin.
Give SP access to PBI deployment pipeline
















4) Service Connection Power BI
Create an Azure DevOps Service Connection. The type should be a Power BI Service Connection which can only be added once the add-on of step 2 is installed. Use the Service Principal of step 3 for this.
  • Go to the Project Settings of your DevOps project (bottom left)
  • Find Service connections under Pipelines
  • Click New service connection (upper right)
  • Choose Power BI and click next
  • Fill in the details of your Service Principal of step 3
Power BI Service Connection (via add-on)



















5) Libarary - Variable groups
Before we start writing the YAML (and some PowerShell) code we need to create some Variable groups under Pipelines - Libary. We have a general variable group with a couple of general variables that don't vary per environment like the download URL for Tabular Editor or the SP properties. We also have three variable groups (one for each environment) that contain variables that do have different values per environment.
Create Variable groups (1 general and 3 for enviroments)









General Variable Group




















General Variable Group
  • DeploymentPipelinePBI
    The name of your Power BI Deployment Pipeline for publishing from Dev 2 Tst and Tst 2 Prd
  • DownloadLinkTabularEditor
    The url of the zipped portable version of tabular editor. See all versions.
  • ServiceConnection
    The name of the Azure DevOps Service Connection created in step 4
  • SpApplicationId
    The application id of the SP from step 3
  • SpClientSecret
    The client secret of the SP from step 3 (change type to secret)
  • SpTenantId
    The tentant id of your Azure Actice Directory. You can also find this in Power BI  online in the ?-menu and then About Power BI and in the popup you see the Tenant URL with the ID at the end
  • ReposWorkspaceFolder
    This contains the name of the workspace folder in the repository ("Cloud Team" in the repository picture) and it should also be the (start of the) workspace name in Power BI. We will add something behind it to distinguish dev from tst and prd. So in the repos picture you see "Cloud Team" and for Development the actual workspace name will be "Cloud Team - dev" (see next variable group)
Development Variable Group














Development Variable Group
  • PowerBiWorkspaceSuffix
    The respository contains the name of the workspace, but for this example we added a suffix after it to distinguish dev from tst and prd
If you don't want to use the Power BI Deployment Pipelines then just Clone this variable group for test/acceptance and production. If you do want to use the Power BI Deployment Pipelines then you don't need variable groups for test/acceptance and production.

Conclusion
In this first post we did all the preparation work for DevOps and for the Service Principal. In the next blog post we will build the project and make it available as an Artifact in DevOps to release it to the Development workspace in Power BI. In the third post we will show two different ways for deployment.