Microsoft BI Tools: POWERSHELL

Showing posts with label POWERSHELL. Show all posts

Thursday, 26 January 2023

Deploy a Power BI dataset via a DevOps - Build

Case
I want to develop my Power BI dataset separately from my Power BI reports. So we started using Tabular Editor. This also allows us to work on the model with multiple people at the same time because every object is a separate JSON file. But how can we automatically deploy these files using Azure DevOps?

Deploy your Power BI datasets with DevOps

Solution
In this serie of posts we will deploy the Tabular Editor project as an Power BI dataset via DevOps. We will deploy this to the development workspace. For the deployment from Dev to Test and Test to Prod we will use Power BI Deployment pipelines.

In this second post we will create the YAML file that will build the Tabular Editor project that is stored in the repository. The filename we used is BuildModel.yml and it is stored in the repository under CICD\YAML\BuildModel.yml. There are two additional YAML files in that same folder and those will be explained in the next post. However they will be mentioned in one of the tasks below.

Repository folder structure

1) BuildModel.yml

The first YAML file is BuildModel.yml that builds the JSON files of the tabular editor project into a deployable *.bim file (the dataset) and then calls the deployment YAML files to do the actual deployment.

First we are adding the general variable group (mentioned in the previous post) to the YAML file. This allows us to use those variables in the YAML and Powershell code later on.

###################################
# General Variables
###################################
variables:
  - group: PBIModelParamsGen

Secondly we need to determine when to trigger this pipeline. This part is of course heavily depending on your branch strategy. In this example it triggers on changes in multiple branches but only for models in a certain Power BI workspace folder. Note that the path is hardcoded and that it can't use variables from your Variable groups.

###################################
# When to create a pipeline run
###################################
trigger:
  branches:
    include:
    - Development
    - Acceptance
    - main
  paths:
    include:
    - Models/myPowerBIWorkspace/*

The next step is describing the first stage of the pipeline where we build the model and set it ready for publishing. The important property is pool where you either configure a self hosted agent (via name) or a Microsoft hosted agent (via vmImage). If you are using a self hosted agent then the clean: all setting helps you to cleanup your agent before you start.

stages:
###################################
# Create Artifact of datamodels
###################################
- stage: CreateModelArtifact
  displayName: Create PBI Model Artifact

  jobs:
  - job: CreateModel
    displayName: 'Create Model'
    workspace:
      clean: all
    pool:
      name: mySelfHostedAgent
      # vmImage: 'windows-latest' #'ubuntu-latest'
    steps:

The first real task is retrieving all the files from the repository. After this step your datasets/models will be available on the agent it self. This is necessary for the build step.

    ###################################
    # 1 Retrieve Repository
    ###################################
    - checkout: self
      displayName: '1 Retrieve Repository'
      clean: true

The next task is downloading and unzipping the tool Tabular Editor. If you have a self hosted agent then you could also choose to download/install this on your Virtual Machine. However installing it via YAML makes it very easy to use a newer version just by changing the 'DownloadLinkTabularEditor' variable in the general variable group (see previous post). That variable should be filled with the download url of the portable/zip file: https://github.com/TabularEditor/TabularEditor/releases/

    ###################################
    # 2 Download Tabular Editor
    ###################################
    - powershell: |
        # Download URL for Tabular Editor portable:
        
        # Create Download folder
        $DownloadFolder = join-path (get-location) "TabularEditor"
        new-item -type directory -path $DownloadFolder -Force
        Write-Host "Create download location: $($DownloadFolder)" 

        # Download destination (root of PowerShell script execution path):
        $DownloadFile = join-path $DownloadFolder "TabularEditor.zip"
        Write-Host "Save download as: $($DownloadFile)"
        
        # Download from GitHub:
        Write-Host "Downloading Tabular Editor from: $(DownloadLinkTabularEditor)" 
        Invoke-WebRequest -Uri $(DownloadLinkTabularEditor) -OutFile $DownloadFile
        
        # Unzip Tabular Editor portable, and then delete the zip file:
        Write-host "Extracting downloaded zip file"
        Expand-Archive -Path $DownloadFile -DestinationPath $DownloadFolder
        Write-host "Clean up download file"
        Remove-Item $DownloadFile
      displayName: '2 Download Tabular Editor'

To check whether the checkout step and the download/unzip step were succesful we have an optional step that shows a treeview of the agent. Once everything is working you can remove or comment-out this step.

    ###################################
    # 3 Show treeview of agent
    ###################################
    - powershell: |
        tree "$(Pipeline.Workspace)" /F
      displayName: '3 Treeview of Pipeline.Workspace'

Now that we have succesfully copied the models to the agent and downloaded Tabluar editor, we can build the json files into a BIM file which we can later on release as a dataset to a Power BI workspace.

This Powershell step loops through all database.json files (the main/project file) that are on the agent. It retrieves the subfolder/parentfolder name so that is can be used as a dataset name and then uses TabularEditor.exe to build the project. All BIM files are stored in the artifact staging folder.

    ###################################
    # 4 Build data models
    ###################################
    - powershell: |
        # Get repos folder with model files
        $ModelsFolder = Join-Path -Path $(Build.SourcesDirectory) -ChildPath "\Models\$($ReposWorkspaceFolder)\"
      
        # Loop through local repos folder to get all datamodels
        $AllDatamodels = Get-ChildItem -Path $ModelsFolder -Recurse -Filter database.json
        Write-Host "$($AllDatamodels.Count) model$(If ($AllDatamodels.Count -ne 1) {"s"}) found in $($ModelsFolder)`n"
        foreach ($DataModel in $AllDatamodels)
        {
          $Path = "$($DataModel.Fullname)"
          Write-Host "Building model $($Path)"

          # extract datasetname and workspacename from folder path of model
          $DatasetName = $Path | Split-Path -Parent | Split-Path -Leaf
          $WorkspaceName = $Path | Split-Path -Parent | Split-Path -Parent | Split-Path -Leaf

          # Create target path for staging folder
          $TargetFolder = Join-Path -Path $(Build.ArtifactStagingDirectory) -ChildPath "\Models\"
          $TargetFolder = Join-Path -Path $TargetFolder -ChildPath $WorkspaceName
          $TargetFolder = Join-Path -Path $TargetFolder -ChildPath $DatasetName

          Write-Host "Saving build model in $($TargetFolder)"
          # Build model into bim file by executing tabular editor
          $(Build.SourcesDirectory)\TabularEditor\TabularEditor.exe "$($DataModel.Fullname)" -B "$($TargetFolder)\Model.bim"
          Wait-Process -Name "TabularEditor"
        }
      displayName: '4 Build data models'

This is again the same treeview step to check the result of the build. In this case it shows all files and subfolders of the artifact staging folder. You can remove or comment-out this step once everything works.

    ###################################
    # 5 Show treeview of agent
    ###################################
    - powershell: |
        tree "$(Build.ArtifactStagingDirectory)" /F
      displayName: '5 Treeview of Build.ArtifactStagingDirectory'

The last step of the build stage is creating an artifact of all (BIM) files in the artifact staging folder. This artifact will auttomatically be downloaded to the agent in the next stage.

    ###################################
    # 6 Publish artifact
    ###################################
    - task: PublishPipelineArtifact@1
      displayName: '6 Publish ARM template as artifact'
      inputs:
        targetPath: $(Build.ArtifactStagingDirectory)
        artifact: 'PBIModelsArtifact'
        publishLocation: 'pipeline'

Now it's time to do the actual deployment. We have 3 extra stages (Deploy to Dev, Acc and Prd), but we used a condition on them. If a change happened in the Acceptance branch then we will only execute the Deploy to Acc stage and skip the rest of the stages. In the pipeline runs overview it will look like this:

Stages of pipeline runs

In this example the deployment method for Development is a little PowerShell script and for Acceptance and Production we will use the Power BI Deployment Pipeline. Therefore those stages have different parameters. The details of the those yaml files will be explained in the next blog post.

###################################
# Deploy Dev environment
###################################
- stage: DeployDev
  displayName: Deploy DEV
  variables:
    - group: PBIModelParamsDev
  pool:
    name: mySelfHostedAgent
    #vmImage: 'windows-latest'
  condition: eq(variables['Build.SourceBranchName'], 'Development')
  jobs:
    - template: DeployPBIModelDev.yml
      parameters:
        env: DEV
        SpApplicationId: $(SpApplicationId)
        SpClientSecret: $(SpClientSecret)
        SpTenantId: $(SpTenantId)
        DownloadLinkTabularEditor: $(DownloadLinkTabularEditor)

###################################
# Deploy Acc environment
###################################
- stage: DeployAcc
  displayName: Deploy ACC
  variables:
    - group: PBIModelParamsAcc
  pool:
    name: mySelfHostedAgent
    #vmImage: 'windows-latest'
  condition: eq(variables['Build.SourceBranchName'], 'Acceptance')
  jobs:
    - template: DeployPBIModelAccPrd.yml
      parameters:
        env: ACC
        DeployStagePowerBI: Test # PBI doesn't use Acceptance
        ServiceConnection: 'PowerBI'
        ReposWorkspaceFolder: $(ReposWorkspaceFolder)
        DeploymentPipelinePBI: $(DeploymentPipelinePBI)


###################################
# Deploy Prd environment
###################################
- stage: DeployPrd
  displayName: Deploy PRD
  variables:
    - group: PBIModelParamsPrd
  pool:
    name: mySelfHostedAgent
    #vmImage: 'windows-latest'
  condition: eq(variables['Build.SourceBranchName'], 'main')
  dependsOn: CreateModelArtifact
  jobs:
    - template: DeployPBIModelAccPrd.yml
      parameters:
        env: PRD
        DeployStagePowerBI: Production
        ServiceConnection: 'PowerBI'
        ReposWorkspaceFolder: $(ReposWorkspaceFolder)
        DeploymentPipelinePBI: $(DeploymentPipelinePBI)

Conclusion
In this post we showed how to build a Tabular Editor project with JSON files into one BIM file which we can deploy to the Development workspace in Power BI. You could do the same for the other enviroments, but in this example we will use the Power BI Deployment pipeline for Test and Production which you will see in the next post.

Saturday, 31 December 2022

Cleanup Synapse before deployment

Case
We are using the Synapse workspace deployment add on for deploying Synapse to the next environment, but when I remove a pipeline/dataset/linked service/trigger in development it doesn't get deleted in Test, Acceptance or Production. There is a pre- and post-deployment script for Data Factory by Microsoft, but where is the Synapse version of it?

Clean Synapse Workspace before deployment

Update: use DeleteArtifactsNotInTemplate: true in deployment task to avoid powershell

Solution
The deployment add on just deploys a new version over an existing version. Therefore deleted parts will remain in your workspace. For most parts this is ugly and annoying, but if obsoleet triggers are still executing pipelines it could screwup your ETL proces.

Below you will find a first version of a Powershell script that first removes all pipelines, datasets, linked services and triggers before deploying a new version of your Synapse workspace from the repository.

Note: Pipelines that are called by other pipelines can't be deleted. So you first need to delete the parent pipeline before you can delete the child pipeline. The scripts skips those child pipelines and continous with the rest. After this first delete iteration a lot of parent pipelines wont exist any more and allow you to remove the child pipelines in a second iteration. This is done in a loop and that loops stops after a100 iterations. So don't create a monstrous tree of pipelines calling each other (and especially don't create loops of pipelines calling each other). The same trick is used for Linked Services. If you have for example a Key Vault Linked Service that is used in an other Linked Service then you first need to delete that second Linked Service before you can delete the Key Vault Linked Service.

param (

   [Parameter (Mandatory = $true, HelpMessage = 'Synapse name')]
   [ValidateNotNullOrEmpty()]
   [string] $WorkspaceName,
   
   [Parameter (Mandatory = $true, HelpMessage = 'Resourcegroup name')]
   [ValidateNotNullOrEmpty()]
   [string] $ResourceGroupName 
)

[string] $WorkspaceDefaultSqlServer = "$($WorkspaceName)-WorkspaceDefaultSqlServer"
[string] $WorkspaceDefaultSqlStorage = "$($WorkspaceName)-WorkspaceDefaultStorage"


#######################################################
# 1) Checking for resource locks and removing them
#######################################################
Write-Output "========================================"
Write-Output "1) Getting resource locks"
# Getting all locks on the Azure Synapse Workspace
$lock = Get-AzResourceLock -ResourceGroupName $ResourceGroupName -ResourceName $WorkspaceName -ResourceType "Microsoft.Synapse/workspaces"

# Looping through all locks to remove them one by one
Write-Output "========================================"
Write-Output "Remove resource locks"
if($null -ne $lock)
{
    $lock | ForEach-Object -process {
        Write-Output "Removing Lock Id: $($lock.LockId)"
        # Remove lock
        Remove-AzResourceLock -LockId $_.LockId -Force
    }
}


#######################################################
# 2) Stopping and removing all triggers
#######################################################
Write-Output "========================================"
Write-Output "2) Remove triggers"
# Getting all triggers from Synapse
$triggers = Get-AzSynapseTrigger -WorkspaceName $WorkspaceName
Write-Output "Found $($triggers.Count) triggers"

# Stopping all triggers before deleting them
$triggers | ForEach-Object -process { 
    Write-Output "Stopping trigger $($_.name)"
    try {
        # Trying to stop each trigger
        Stop-AzSynapseTrigger -WorkspaceName $WorkspaceName -Name $($_.name) -ErrorAction Stop
    }
    catch {
        if ($_.Exception.Message -eq "{}") {
            Write-Output "Trigger stopped"
           # $_.Exception
        }
        else {
            Write-Output "Throw"
            Throw $_
        }
    }
    # Remove trigger
    Remove-AzSynapseTrigger -Name $_.name -WorkspaceName $WorkspaceName -Force
}


#######################################################
# 3) Removing all pipelines
#######################################################
Write-Output "========================================" 
Write-Output "3) Remove pipelines"
# Getting all pipelines from Synapse
$pipelines = Get-AzSynapsePipeline -WorkspaceName $WorkspaceName | Sort-Object -Property id
Write-Output "Found $($pipelines.Count) pipelines"

# Trying to delete all pipelines. If a pipeline is still referenced
# by an other pipeline it will continue to remove other pipelines 
# before trying to remove it again... max 100 times. So don't create
# chains of pipelines that are too long
[int] $depthCount = 0
while ($pipelines.Count -gt 0 -and $depthCount -lt 100)
{
    Write-Output "$($pipelines.Count) pipelines left"
    $pipelines | ForEach-Object -process { 
        Write-Output "Trying to delete pipeline $($_.name)"
        Remove-AzSynapsePipeline -Name $_.name -WorkspaceName $WorkspaceName -Force -ErrorAction SilentlyContinue
    }
    Start-Sleep 2 
    $depthCount += 1
    $pipelines = Get-AzSynapsePipeline -WorkspaceName $WorkspaceName
}
Write-Output "Depthcount: $depthCount"
if ($depthCount -eq 100)
{
    throw "Depthcount is to high!"
}


#######################################################
# 4) Removing all notebooks
#######################################################
Write-Output "========================================"
Write-Output "4) Remove notebooks"
# Getting all notebooks from Synapse
$notebooks = Get-AzSynapseNotebook -WorkspaceName $WorkspaceName
Write-Output "Found $($notebooks.Count) notebooks"

# Loop through all notebooks to delete them
$notebooks | ForEach-Object -process {
    Write-Output "Deleting notebooks $($_.Name)"
    Remove-AzSynapseNotebook -Name $($_.Name) -WorkspaceName $WorkspaceName -Force
}


#######################################################
# 5) Removing all SQL scripts
#######################################################
Write-Output "========================================"
Write-Output "5) Remove SQL scripts"
# Getting all scripts from Synapse
$sqlscripts = Get-AzSynapseSqlScript -WorkspaceName $WorkspaceName
Write-Output "Found $($sqlscripts.count) SQL-scripts"

# Loop through all SQL scripts to delete them
$sqlscripts | ForEach-Object -Process {
    Write-Output "Deleting SQL-script $($_.Name)"
    Remove-AzSynapseSqlScript -Name $($_.Name) -WorkspaceName $WorkspaceName -Force
}


#######################################################
# 6) Removing all datasets
#######################################################
Write-Output "========================================"
Write-Output "6) Remove datasets"
# Getting all datasets from Synapse
$datasets = Get-AzSynapseDataset -WorkspaceName $WorkspaceName
Write-Output "Found $($datasets.Count) datasets"

# Loop through all datasets to delete them
$datasets | ForEach-Object -process { 
    Write-Output "Deleting dataset $($_.name)"
    Remove-AzSynapseDataset -Name $_.name -WorkspaceName $WorkspaceName -Force
}


#######################################################
# 7) Removing all linked services
#######################################################
Write-Output "========================================"
Write-Output "7) Collecting Linked services"
# Getting all linked services from Synapse, except the two default ones
$lservices = Get-AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and  $_.Name -ne $WorkspaceDefaultSqlStorage) } 
Write-Output "Found $($lservices.Count) linked services"

# Trying to delete all linked services. If a linked service is still
# referenced by an other linked service it will continue to remove 
# other linked services before trying to remove it again... 
# max 100 times. Example: KeyVault linked services
$depthCount = 0
while ($lservices.Count -gt 0 -and $depthCount -lt 100)
{
    Write-Output "$($lservices.Count) linked services left"
    $lservices | ForEach-Object -process { 
        Write-Output "Trying to delete linked service $($_.name)"
        Remove-AzSynapseLinkedService -Name $_.name -WorkspaceName $WorkspaceName -Force -ErrorAction Continue
    }

    Start-Sleep 2 
    $depthCount += 1
    $lservices = Get-AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and  $_.Name -ne $WorkspaceDefaultSqlStorage) }
}
Write-Output "Depthcount: $depthCount"
if ($depthCount -eq 100)
{
    throw "Depthcount is to high!"
}
Write-Output "========================================"

You need to store this Powershell file in the repository as ClearSynapse.ps1. In this case we created a CICD folder in the root of the repository and within that folder we created a Powershell subfolder for all our PowerShell files. Then you can call this script in your YAML pipeline just before you do the deployment part. Make sure your service connection (Service Principal) has enough rights within the workspace. For the first post we used the Synapse Artifact Publisher role to minimize access. For running this script your Service Principal needs more: Synapse Administrator.

           ###################################
            # 4 Cleanup Synapse
            ###################################
            - task: AzurePowerShell@5
              displayName: '4 Cleanup Synapse'
              inputs:
                azureSubscription: ${{ parameters.ServiceConnection }}
                scriptType: filePath
                scriptPath: $(Pipeline.Workspace)\s\CICD\Powershell\ClearSynapse.ps1
                scriptArguments:
                  -WorkspaceName ${{ parameters.TargetWorkspaceName }} `
                  -ResourceGroupName ${{ parameters.ResourceGroupName }} `
                azurePowerShellVersion: latestVersion
                pwsh: true
                
           ###################################
            # 5 Validate and Deploy Synapse
            ###################################
            - task: Synapse workspace deployment@2
              displayName: '5 Validate and deploy Synapse'
              inputs:
                operation: validateDeploy
                ArtifactsFolder: '$(Pipeline.Workspace)/SynapseArtifact'
                azureSubscription: ${{ parameters.ServiceConnection }}
                ResourceGroupName: ${{ parameters.ResourceGroupName }}
                TargetWorkspaceName: ${{ parameters.TargetWorkspaceName }}
                OverrideArmParameters: '
                  -LS_AKV_MyKeyVault_properties_typeProperties_baseUrl                  https://${{ parameters.KeyVaultName }}.vault.azure.net/
                  '

Conclusion
In this post you learned how to cleanup Synapse with a little PowerShell script. This scripts works perfectly, but is a litte rough by just deleting all basic parts of your workspace (pipelines, datasets, linked services and triggers). A next / nicer version it will just delete everything that is in the Synapse Workspace but isn't in the repositorty (after the deployment).

Deleting stuff before deploying new stuff also makes is almost mandatory to use at least 3 environments because when your deployment fails you are left with an almost empty Synapse workspace. So an extra enviroment between development and production will prevent most deployment screwups.

Thank you Walter ter Maten for improving the script with the delete iterations.

Monday, 7 November 2022

Create Data Lake containers and folders via DevOps

Case
I need a process to create Azure Data Lake containers throughout my DTAP environment of my Azure Data Platform. Manually is not an option because we want to minimize owner and contributor access to the Data Lake of acceptance and production, but Synapse and Data Factory don't have a standard activity to create ADL containers. How to automatically create Azure Data Lake Containers (and folders) ?

Storage Account (datalake) containers

Solution
An option is to use a PowerShell script that is executed by the Custom activity in combination with an Azure Batch service. Or an Azure Automation runbook with the same PowerShell script that is executed by a Web(hook) activity.

However since you probably don't need create new containers during every (ADF/Synapse) pipeline run, we suggest to do this via an Azure Devops Pipeline as part of your CICD proces with the same PowerShell script. You could either create a separte CICD pipeline for it or integrate it in your Synapse or ADF pipeline.

The example below creates containers and optionaly also folders and subfolders within these container. Synapse and Data Factory will create folders with forexample the Copy Data activity

1) Repos folder structure

For this example we use a CICD folder in the repos with subfolders for PowerShell, YAML and Json.

Repos folder structure

2) JSON config

Because we don't want to hardcode the containers and folders we use a JSON file as input for the PowerShell script. This JSON file is stored within the JSON folder of the DevOps Repository. We use the same JSON file for the entire environment, but you can ofcourse create a separate file for each environment if you need for example different containers on production. Our file is called config_storage_account.json

The folder array in this example is optional and when left empty no folders will be created. You can create subfolders within folders by separating them with a forwardslash.

{
"containers":   {
                "dataplatform":["folder1","folder2/subfolder1","folder2/subfolder2"]
                , "SourceX":["Actual","History"]
                , "SourceY":["Actual","History"]
                , "SourceZ":[]
                }
}

3) PowerShell code
The PowerShell script called SetStorageAccounts.ps1 is stored in the PowerShell folder and contains three parameters:

ResourceGroupName - The name of the resource group where the storage account is located.
StorageAccountName - The name the storage account
JsonLocation - The location of the json config file in the repos (see previous step)

It checks the existance of both the config file and the storage account. Then first loop through the containers from the config and within the container loop it loops through the folders of that specific container. For container names and folderpaths it does some small corrections for often made mistakes.

Note that the script will not delete containers and folders (or set authorizations to them). This is of course possible, but make sure to test this very thoroughly and even with testing a human error in configuring the config file is easy to make and could cause lots of data lose!

# This PowerShell will create the containers provided in the JSON file
# It does not delete of update containers and folders or set authorizations
param (
    [Parameter (Mandatory = $true, HelpMessage = 'Resource group name of the storage account.')]
    [ValidateNotNullOrEmpty()]
    [string] $ResourceGroupName,

    [Parameter (Mandatory = $true, HelpMessage = 'Storage account name.')]
    [ValidateNotNullOrEmpty()]
    [string] $StorageAccountName,

    [Parameter (Mandatory = $true, HelpMessage = 'Location of config_storage_account.json on agent.')]
    [ValidateNotNullOrEmpty()]
    [string] $JsonLocation
 )

# Combine path and file name for JSON file. The file name is hardcoded and the
# same for each environment. Create an extra parameters for the filename if
# you need different files/configurations per environment.
$path = Join-Path -Path $JsonLocation -ChildPath "config_storage_account.json"
Write-output "Extracting containernames from $($path)"


# Check existance of file path on the agent
if (Test-Path $path -PathType leaf) {
    
    # Get all container objects from JSON file
    $Config = Get-Content -Raw -Path $path | ConvertFrom-Json

    # Create containers array for looping
    $Config | ForEach-Object { 
        $Containers = $($_.containers) | Get-Member -MemberType NoteProperty | Select-Object -ExpandProperty Name
    }
    
    # Check Storage Account existance and get the context of it
    $StorageCheck = Get-AzStorageAccount -ResourceGroupName $ResourceGroupName -Name $StorageAccountName -ErrorAction SilentlyContinue        
    $context = $StorageCheck.Context

    # If Storage Account found
    if ($StorageCheck) {
        # Storage Account found
        Write-output "Storage account $($StorageAccountName) found"

        # Loop through container array and create containers if the don't exist
        foreach ($container in $containers) {
            # First a little cleanup of the container

            # 1) Change to lowercase
            $container = $container.ToLower()

            # 2) Trim accidental spaces
            $container = $container.Trim()


            # Check if container already exists
            Write-output "Checking existence of container $($container)"
            $ContainerCheck = Get-AzStorageContainer -Context $context -Name $container -ErrorAction SilentlyContinue 

            # If container exists
            if ($ContainerCheck) {
                Write-Output "Container $($container) already exists"
            }
            else {
                Write-Output "Creating container $($container)"
                New-AzStorageContainer -Name $container -Context $context | Out-Null
            }

            # Get container folders from JSON
            Write-Output "Retrieving folders from config"
            $folders = $Config.containers.$container
            Write-Output "Found $($folders.Count) folders in config for container $($container)"

            # Loop through container folders
            foreach ($folder in $folders) {
                # First a little cleanup of the folders

                # 1) Replace backslashes by a forward slash
                $path = $folder.Replace("\","/")

                # 3) Remove unwanted spaces
                $path = $path.Trim()
                $path = $path.Replace("/ ","/")
                $path = $path.Replace(" /","/")

                # 3) Check if path ends with a forward slash
                if (!$path.EndsWith("/")) {
                    $path = $path + "/"
                }
                    
                # Check if folder path exists
                $FolderCheck = Get-AzDataLakeGen2Item -FileSystem $container -Context $context -Path $path  -ErrorAction SilentlyContinue
                if ($FolderCheck) {
                    Write-Output "Path $($folder) exists in container $($container)"
                } else {
                    New-AzDataLakeGen2Item -Context $context -FileSystem $container -Path $path -Directory | Out-Null
                    Write-Output "Path $($folder) created in container $($container)"
                }
            }

        }
    } else {
        # Provided storage account not corrrect
        Write-Output "Storageaccount: $($StorageAccountName) not available, containers not setup."
    }
} else {
    # Path to JSON file incorrect
    Write-output "File $($path) not found, containers not setup."
}

4) YAML file.

If you integrate this in your existing Data Factory or Synapse YAML pipeline then you only need to add one PowerShell step. Make sure you have a checkout step to copy the config and powershell file from the repository to the agent. You may also want to add a (temporary) treeview step to check the paths on your agent. This makes it easier to configure paths within your YAML code.

parameters:
  - name: SerCon
    displayName: Service Connection
    type: string
  - name: Env
    displayName: Environment
    type: string
    values: 
    - DEV
    - ACC
    - PRD
  - name: ResourceGroupName
    displayName:
    type: string
  - name: StorageAccountName
    displayName:
    type: string

jobs:
    - deployment: deploymentjob${{ parameters.Env }}
      displayName: Deployment Job ${{ parameters.Env }} 
      environment: Deploy to ${{ parameters.Env }}

      strategy:
        runOnce:
          deploy:
            steps:
            ###################################
            # 1 Check out repository to agent
            ###################################
            - checkout: self
              displayName: '1 Retrieve Repository'
              clean: true 

            ###################################
            # 3 Show environment and treeview
            ###################################
            - powershell: |
                Write-Output "Deploying Synapse in the ${{ parameters.Env }} environment"
                tree "$(Pipeline.Workspace)" /F
              displayName: '2 Show environment and treeview Pipeline_Workspace'

            ###################################
            # 3 Create containers in datalake
            ###################################
            - task: AzurePowerShell@5
              displayName: '3 Create data lake containers'
              inputs:
                azureSubscription: ${{ parameters.SerCon }}
                scriptType: filePath
                scriptPath: $(Pipeline.Workspace)\s\CICD\PowerShell\SetStorageAccounts.ps1
                scriptArguments:
                  -ResourceGroupName ${{ parameters.ResourceGroupName }} `
                  -StorageAccountName ${{ parameters.StorageAccountName }} `
                  -JsonLocation $(Pipeline.Workspace)\s\CICD\Json\
                azurePowerShellVersion: latestVersion
                pwsh: true

5) The result

Now it's time to run the YAML pipeline and check the Storage Account to see wether the containers and folders are created.

DevOps logs of creating containers and folders

Created data lake folders in container

Conclusion

In this post you learned how to create containers and folders in the Storage Account / Data Lake via a little PowerShell script and a DevOps pipeline, but you can also reuse this PowerShell script in for the mentioned alternative solutions.

Again the note about also deleting containers and folders. Make sure to double check the code, but also the procedures to avoid human errors and potenially loose a lot of data. You might want to setup soft deletes in your storage account to have a fallback scenario for screwups.

Tuesday, 13 September 2022

Sending test messages to Azure Event Hubs

Case
I want to send dummy messages to Azure Event Hubs to test the streaming process before the actual service starts sending real messages. Is there an easy way to send a bulk load of test messages to Azure Event Hubs?

Sending messages to Azure Event Hubs

Solution
Yes we could use for example a little bit of PowerShell code to send dummy messages to Azure Event Hubs via a Rest API. To do that we first need to collect some names and a key from Azure Event Hubs.

1) Namespace and Event Hub name

Go to your Event Hubs Namespace in the Azure Portal and click on Event Hubs on the left side. In the top left corner you will find the Event Hubs Namespace (1) and in the list in the center of the page you will find the name of your Event Hub (2). Copy these names to your Powershell editor.

Namespace and Event Hub name

2) Shared access policies Name and Key

Now click on your Event Hub in the list above and then click on Shared access policies in the left menu. If there is no policy then create one (send is enough for testing). Then click on the policy to reveal the keys. Copy the Name (3) and one of the keys (4) to your Powershell editor.

Shared access policies name and key

3) The script

Now you have all the things you need from your event hub. Time to do some PowerShell coding. The top part of the script is to create a Shared Access Signature token (SAS token). This token is needed for authorization in the Rest API. In this part of the script you will also need to specify the names and key from the previous two steps under EventHubs Parameters.

The second part is sending a messsage via a Rest API to your event hub. To make it a little more usefull there is a loop to send multiple messages with a pause between each message. You must adjust the dummy message to your own needs by changing the column names and values. You can also specify the number of messages and the pause between each message.

####################################################################
# Create SAS TOKEN FOR AZURE EVENT HUBS
####################################################################
# EventHubs Parameters
$EventHubsNamespace = "bitools"
$EventHubsName = "myeventhub"
$SharedAccessPolicyName = "SendOnly"
$SharedAccessPolicyPrimaryKey = "1fhvzfOkVs+MxsZ/fakeZwrHTImD3YCCN7CGqYCAFN8kU="

# Create SAS Token
[Reflection.Assembly]::LoadWithPartialName("System.Web")| out-null
$URI = "$($EventHubsNamespace).servicebus.windows.net/$($EventHubsName)"
$Expires = ([DateTimeOffset]::Now.ToUnixTimeSeconds())+3600
$SignatureString = [System.Web.HttpUtility]::UrlEncode($URI)+ "`n" + [string]$Expires
$HMACSHA256 = New-Object System.Security.Cryptography.HMACSHA256
$HMACSHA256.key = [Text.Encoding]::ASCII.GetBytes($SharedAccessPolicyPrimaryKey)
$SignatureBytes = $HMACSHA256.ComputeHash([Text.Encoding]::ASCII.GetBytes($SignatureString))
$SignatureBase64 = [Convert]::ToBase64String($SignatureBytes)
$SASToken = "SharedAccessSignature sr=" + [System.Web.HttpUtility]::UrlEncode($URI) + "&sig=" + [System.Web.HttpUtility]::UrlEncode($SignatureBase64) + "&se=" + $Expires + "&skn=" + $SharedAccessPolicyName


####################################################################
# SEND DUMMY MESSAGES
####################################################################
# Message Parameters
$StartNumber = 1
$NumberOfMessages = 10
$MillisecondsToWait = 1000

# Determine URL and header
$RestAPI = "https://$($EventHubsNamespace).servicebus.windows.net/$($EventHubsName)/messages"

# API headers
$Headers = @{
            "Authorization"=$SASToken;
            "Content-Type"="application/atom+xml;type=entry;charset=utf-8";
            }

# Screenfeedback
Write-Host "Sending $($NumberOfMessages) messages to event hub [$($EventHubsName)] within [$($EventHubsNamespace)]"

# Loop to create X number of dummy messages
for($i = $StartNumber; $i -lt $NumberOfMessages+$StartNumber; $i++)
{
    # Create dummy message to sent to Azure Event Hubs
    $Body = "{'CallId':$($i), 'DurationInSeconds':$(Get-Random -Maximum 1000)}"

    # Screenfeedback
    Write-Host "Sending message nr $($i) and then waiting $($MillisecondsToWait) milliseconds"

    # execute the Azure REST API
    Invoke-RestMethod -Uri $RestAPI -Method "POST" -Headers $Headers -Body $Body

    # Wait a couple of milliseconds before sending next dummy message
    Start-Sleep -Milliseconds $MillisecondsToWait
}

When you run the PowerShell script with these parameters then 10 messages will be sent to your Event Hub.

Executing the Powershell script

4) Check messages in the Event Hub

Now we can check the messages in your event hub. Go to your eventhub (myeventhub in our case) and click on Process data. Then find the Stream Analytics Query editor and execute the query.

Stream Analytics Query editor

Here can see the contents of your 10 dummy messages with a very basic query.

The result

Conclusion

In this post you learned how to test the setup of your Event Hub and if you for example also connect to Azure Stream Analytics and a Power BI streaming dataset with a report and dashboard then you can also see the messages arriving live in your Power BI Dashboard. This will be shown in a separate post.

Note that the script is sending the messages one by one with an even period between each message. You could for example also make the pause period random or even execute the script in multiple PowerShell ISE editors at once to simulate a more random load of arriving messages.

Do you have an easier way to send test messages or a more sophisticated script then please share your knowledge in the comments below.

Thursday, 18 November 2021

ADF Release - Use script to enable certain Triggers

Case
During deployment of Azure Data Factory (ADF) via Azure DevOps pipelines I want to make sure that a certain trigger is only executed on Production and not on the lower environments like acceptance or test. How can we accomplish that without any manual operations?

ADF Trigger

Solution
This is possible with an extra PowerShell step. The standard deployment stages consists of three steps:

a pre-deployment script that stops all triggers.
the actual deployment
a post-deployment script that starts all triggers and cleans up old parts.

You could adjust the standard pre- and post deployment PowerShell script from Microsoft or create an additional PowerShell script if you don't want to mess around with the standard script from Microsoft.

1) PowerShell

Below that additional script. Feel free to merge it with the standard script. The PowerShell file should be stored in the repository in the \CICD\PowerShell folder (see setup post).

PowerShell file for setting trigger status

The new PowerShell script has five parameters which will be provided by the YAML pipeline (or release pipeline):

DataFactoryName
[string] Name of your Data Factory
DataFactoryResourceGroup
[string] Name of the Resource Group holding your ADF
DataFactorySubscriptionId
[string] Guid of the Azure Subscription hosting your ADF
DisableAllTriggers
[boolean] True or false indicating whether all triggers should be disabled (except triggers mentioned in next parameter)
EnabledTriggers
[string] Comma separated list with triggernames that should be enabled: "trigger1,trigger2"

The script consists of three parts. The first part checks all parameters. If one of them is incorrect then the scripts fails and stops. The second part is the optional disabling of all triggers (except the ones that we need enabled) and the last part of the script checks the list of triggers that should be enabled. If they are still disabled they will be enabled.

param
(
    [parameter(Mandatory = $true)] [String] $DataFactoryName,
    [parameter(Mandatory = $true)] [String] $DataFactoryResourceGroup,
    [parameter(Mandatory = $true)] [String] $DataFactorySubscriptionId,
    [parameter(Mandatory = $false)] [Bool] $DisableAllTriggers = $true,
    [parameter(Mandatory = $true)] [String] $EnabledTriggers # comma separated list
)



##############################################
# Check provided information
##############################################
$ErrorActionPreference = "Stop"

# Setting one subscription on active (fails with non existing)
Write-Host "Checking existance Subscription Id [$($DataFactorySubscriptionId)]."
$Subscription = Get-AzSubscription -SubscriptionId $DataFactorySubscriptionId `
                                   -WarningAction Ignore
Write-Host "- Subscription [$($Subscription.Name)] found."
Set-AzContext -Subscription $DataFactorySubscriptionId `
              -WarningAction Ignore > $null
Write-Host "- Subscription [$($Subscription.Name)] is active."


# Checking whether resource group exists (fails with non existing)
Write-Host "Checking existance Resource Group [$($DataFactoryResourceGroup)]."
Get-AzResourceGroup -Name $DataFactoryResourceGroup > $null
Write-Host "- Resource Group [$($DataFactoryResourceGroup)] found."


# Checking whether provided data factory exists (fails with non existing)
Write-Host "Checking existance Data Factory [$($DataFactoryName)]."
Get-AzDataFactoryV2 -ResourceGroupName $DataFactoryResourceGroup `
                    -Name $DataFactoryName > $null
Write-Host "- Data Factory [$($DataFactoryName)] found."


# Checking provided triggernames, first split into array
$EnabledTriggersArray = $EnabledTriggers.Split(",")
Write-Host "Checking existance of ($($EnabledTriggersArray.Count)) provided triggernames."


# Loop through all provided triggernames
foreach ($EnabledTrigger in $EnabledTriggersArray)
{ 
    # Get Trigger by name
    $CheckTrigger = Get-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup `
                                               -DataFactoryName $DataFactoryName `
                                               -Name $EnabledTrigger `
                                               -ErrorAction Ignore # To be able to provide more detailed error

    # Check if trigger was found
    if (!$CheckTrigger)
    {
        throw "Trigger $($EnabledTrigger) not found in data dactory $($DataFactoryName) within resource group $($DataFactoryResourceGroup)"
    }
}
Write-Host "- All ($($EnabledTriggersArray.Count)) provided triggernames found in data dactory $($DataFactoryName) within resource group $($DataFactoryResourceGroup)"



##############################################
# Disable triggers
##############################################
# Check if all trigger should be disabled
if ($DisableAllTriggers)
{
    # Get all enabled triggers and stop them (unless they should be enabled)
    Write-Host "Getting all enabled triggers that should be disabled."
    $CurrentTriggers = Get-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup `
                                                   -DataFactoryName $DataFactoryName `
                       | Where-Object {$_.RuntimeState -ne 'Stopped'} `
                       | Where-Object {$EnabledTriggersArray.Contains($_.Name) -eq $false}

    # Loop through all found triggers
    Write-Host "- Number of triggers to disable: $($CurrentTriggers.Count)."
    foreach ($CurrentTrigger in $CurrentTriggers)
    {
        # Stop trigger
        Write-Host "- Stopping trigger [$($CurrentTrigger.Name)]."
        Stop-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup -DataFactoryName $DataFactoryName -Name $CurrentTrigger.Name -Force > $null
    }
}



##############################################
# Enable triggers
##############################################
# Loop through provided triggernames and enable them
Write-Host "Enable all ($($EnabledTriggersArray.Count)) provided triggers."
foreach ($EnabledTrigger in $EnabledTriggersArray)
{                   
    # Get trigger details
    $CheckTrigger = Get-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup `
                                               -DataFactoryName $DataFactoryName `
                                               -Name $EnabledTrigger

    # Check status of trigger
    if ($CheckTrigger.RuntimeState -ne "Started")
    {
        Write-Host "- Trigger [$($EnabledTrigger)] starting"
        Start-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup `
                                     -DataFactoryName $DataFactoryName `
                                     -Name $EnabledTrigger `
                                     -Force > $null
    }
    else
    {
        Write-Host "- Trigger [$($EnabledTrigger)] already started"
    }
}

2) YAML Pipeline

You can now extend the existing YAML pipeline with an extra step. Make sure that all parameters for this script are available as variables in the variable group (under Pipelines, Library) and make sure to pass them to the second YAML pipeline as parameters. If you followed the previous blogs then you only need to add EnabledTriggers as variable and a YAML parameter.

          ###################################
          # Enable certain triggers and disable rest
          ###################################
          - task: AzurePowerShell@5
            displayName: '6 Enable certain triggers and disable rest'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\s\CICD\powershell\SetTriggers.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -DataFactoryName $(DataFactoryName)
                -DataFactoryResourceGroup $(DataFactoryResourceGroupName)
                -DataFactorySubscriptionId $(DataFactorySubscriptionId)
                -DisableAllTriggers $true
                -EnabledTriggers $(EnabledTriggers) # format: "prd_daily_4am,prd_daily_1pm"

The result of running the pipeline

Conclusion
In this post you learned how to enable only certain triggers for a specific environment. This makes it easy to generate a trigger in development for the production environment. The downside (for some) is ofcourse that you get an extra piece of code to maintain. In a next post we will show that you can also accomplish this without writing code via the ARM template. However the trigger property runtimeState cannot be set via the ARM template, so a workaroumd is necessary for the nocode variant.

Monday, 1 November 2021

ADF Release - Create YAML CICD Pipeline - part 2

Case
How do you deploy Azure Data Factory via a YAML pipeline instead of the Release pipeline?

Release ADF pipelines with YAML pipelines

Solution
In a previous post we used a YAML pipeline to created an ARM template for ADF. That ARM template is now available as an artifact and ready for deployment. That previous post ended in calling the release part of the pipeline which is in a separate YAML file. This makes it easier to call that same YAML file for test, acceptance and production. Below the last part of the main pipeline:

###################################
# Deploy Test environment
###################################
- stage: DeployTest
  displayName: Deploy Test
  variables:
  - group: ParamsTst
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: tst
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)
        
###################################
# Deploy Acceptance environment
###################################
- stage: DeployAcceptance
  displayName: Deploy Acceptance
  variables:
  - group: ParamsAcc
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: acc
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)
        
###################################
# Deploy Production environment
###################################
- stage: DeployProduction
  displayName: Deploy Production
  variables:
  - group: ParamsPrd
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: prd
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)

In this blog we will create the deployADF.yml file mentioned in the YAML code above, but first we need to give the Service Principal (SP), used by the Azure DevOps service connection , access to the target Data Factories otherwise it can't release the ARM template.

1) Access control (IAM)

In this first step we will give the SP access to the target ADF. You have to repeat that for all target factories, but first you have to decide whether you want to give the Service Principal access to specific services (like ADF) or the resource group or even the entire subscription.

In most cases you want to limit the access to the bare minimum to avoid misuse. Since there is no ADF deployment task we are using the more general AzureResourceManagerTemplateDeployment task. One downside of this task is that you need to give permissions on at least the resource group. Access to ADF only is not enough and will give you an error: Failed to check the resource group status. Error: {"statusCode":403}.

The next thing to keep in mind is the role to assign. You want to avoid owner to avoid misuse. In this case we need the role Contributor.

In the Azure portal go to the resource group where your ADF is located
Click on Access control (IAM) in the left menu
Click on +Add and then on Add role Assignment
Search for the appropriate Azure role (this screen recently changed, but you can also still use the classic experience via the link. Click on Contributor and press Next.
Click on +Select members and search for your SP, click on the account and then press Select
Optionally add a description and press Next and then Review + assign

Contributor role in Resource Group for SP

2) Add additional YAML file

Next step is to add the second YAML file to the repository that does the deployment of ADF. Use the same repository folder as the existing YAML file (in CICD\YAML folder). Splitting up the deployment allows you to reuse the deployment code for test, acceptance and production. Downside is that you have to edit it in the repository instead of under pipelines (but you could also use the YAML extension for Visual Studio Code).

The second YAML consists of 4 parts and the optional treeview task to check where all your files are located on the agent.

Parameters and environment
Treeview
Stop triggers
Deploy ADF
Cleanup and start triggers

A. Parameters and environment
This YAML file starts with parameters that will be filled by the main YAML file. As an alternative you could just use the variable group added in the main pipeline because those variables are also available in sub pipelines. There are four string parameters of which only Env has a list of expected/allowed values:

env (name of the environment: tst, acc or prd)
DataFactoryName
DataFactoryResourceGroupName
DataFactorySubscriptionId

parameters:
  - name: env
    displayName: Environment
    type: string
    values:
    - dev
    - tst
    - acc
    - prd
  - name: DataFactoryName
    displayName: Data Factory Name
    type: string
  - name: DataFactoryResourceGroupName
    displayName: Data Factory Resource Group Name
    type: string
  - name: DataFactorySubscriptionId
    displayName: Data Factory Subscription Id
    type: string

We also give the job a name and we create an environment. A list of environments can be found under Pipelines - Environments. This is also the place where you can add Approvals and Checks which is not available in the YAML language. The checkout is optional, but very handy when you for example have some custom PowerShell scripts in the repository that you want to execute before, during or after deployment.

jobs:
  - deployment: deploymentjob${{ parameters.env }}
    displayName: Deployment Job ${{ parameters.env }} 
    environment: deploy ${{ parameters.env }}
    strategy:
      runOnce:
        deploy:
          steps:
          - checkout: self
            displayName: '1 Retrieve Repository'
            clean: true

B. Treeview
This treeview task is just for debugging. It for example allows you to see where the artifact is located on your agent. This makes it much easier to configure the following steps. When you have finished the pipeline then just delete this task from the steps or comment it out until you need it again.

          ###################################
          # Show environment and treeview
          ###################################
          - powershell: |
              Write-Output "This is the ${{ parameters.env }} environment"
              tree "$(Pipeline.Workspace)" /F
            displayName: '2 Show environment and treeview Pipeline_Workspace'

C. Stop triggers
Before we can deploy the created ARM template to our ADF we need to make sure nothing is running. Microsoft created a PowerShell script for this which is included in the generated ARM template folder. It can stop all triggers

PrePostDeploymentScript.ps1

          ###################################
          # Stop triggers
          ###################################
          - task: AzurePowerShell@5
            displayName: '3 Stop triggers'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\ArmTemplatesArtifact\PrePostDeploymentScript.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate '$(Pipeline.Workspace)\ArmTemplatesArtifact\ARMTemplateForFactory.json'
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $true
                -deleteDeployment $false

If you're running your new pipeline and you're getting an error stating that it cannot find your resource group while you're absolutely sure that it exists and that the SP has access to it then please check this blog post. It shows you how to create a slightly adjusted copy of the script (with an extra parameter) that is stored in the CICD\PowerShel folder.

          ###################################
          # Stop triggers
          ###################################
          - task: AzurePowerShell@5
            displayName: '3 Stop triggers'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\s\CICD\powershell\PrePostDeploymentADF.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate '$(Pipeline.Workspace)\ArmTemplatesArtifact\ARMTemplateForFactory.json'
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $true
                -deleteDeployment $false
                -Subscription $(DataFactorySubscriptionId)

Stopping deployed triggers

D. Deploy ADF
Now it's finally time for the actual deployment of ADF. As mentioned above we are using the AzureResourceManagerTemplateDeploymentV3 task. Check the documentation for a description of all parameters. We will mention one parameter: Deployment Mode. It is very important to keep this on INCREMENTAL! The complete mode will delete everything in your resource group that is not mentioned in the ARM template. Since our template only contains ADF you will end up with a nearly empty resource group with only ADF in it. This is a very common mistake. So now you are warned.

          ###################################
          # Deploy ADF Artifact
          ###################################
          - task: AzureResourceManagerTemplateDeployment@3
            displayName: '4 Deploy ADF Artifact'
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: 'sc_adf-devopssp'
              subscriptionId: $(DataFactorySubscriptionId)
              action: 'Create Or Update Resource Group'
              resourceGroupName: $(DataFactoryResourceGroupName)
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json'
              csmParametersFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateParametersForFactory.json'
              overrideParameters: '-factoryName $(DataFactoryName)'
              deploymentMode: 'Incremental'

            env: 
                SYSTEM_ACCESSTOKEN: $(System.AccessToken)

Deployment of ADF

E. Cleanup and start triggers
Because we did an incremental deployment all deleted items are still in ADF. Only new and update items have changed. So we have to compare ADF with the template and delete all items that are not in the template. Luckily Microsoft already created a script for this. Same script as the stop trigger script. Just different parameters. And it also enables the triggers.

          ###################################
          # Start triggers and cleanup
          ###################################
          - task: AzurePowerShell@5
            displayName: '5 Start triggers and cleanup'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\ArmTemplatesArtifact\PrePostDeploymentScript.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate $(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $false
                -deleteDeployment $true

Note that the same issue with not finding your Resource Group will occure here if it occured when stopping the triggers. Same solution (different script and extra subscription parameter).

Start triggers and cleanup

3) The result

Now it's time to run the pipeline from start to end by making changes to the Development Data Factory. And in no time all factories are updated.

The Result

Conclusion

In this blog post you learned how to use YAML to do the (build and) deployment of ADF in a pipeline. The take away is to use the incremental option and not set it to complete to avoid those shocked looks when viewing your empty resource group.

In a next post we will show how to overwrite global parameters and change Linked Services during deployment and show you how to enable or disable certain ADF triggers depending on the environment. This allows you to have different active triggers in Development, Test, Acceptance and Production without setting them manually after deployment.

Now all YAML parts together:

parameters:
  - name: env
    displayName: Environment
    type: string
    values:
    - dev
    - tst
    - acc
    - prd
  - name: DataFactoryName
    displayName: Data Factory Name
    type: string
  - name: DataFactoryResourceGroupName
    displayName: Data Factory Resource Group Name
    type: string
  - name: DataFactorySubscriptionId
    displayName: Data Factory Subscription Id
    type: string

jobs:
  - deployment: deploymentjob${{ parameters.env }}
    displayName: Deployment Job ${{ parameters.env }} 
    environment: deploy ${{ parameters.env }}
    strategy:
      runOnce:
        deploy:
          steps:
          - checkout: self
            displayName: '1 Retrieve Repository'
            clean: true 

          ###################################
          # Show environment and treeview
          ###################################
          - powershell: |
              Write-Output "This is the ${{ parameters.env }} environment"
              tree "$(Pipeline.Workspace)" /F
            displayName: '2 Show environment and treeview Pipeline_Workspace'
            
          ###################################
          # Stop triggers
          ###################################
          - task: AzurePowerShell@5
            displayName: '3 Stop triggers'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\ArmTemplatesArtifact\PrePostDeploymentScript.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate '$(Pipeline.Workspace)\ArmTemplatesArtifact\ARMTemplateForFactory.json'
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $true
                -deleteDeployment $false
                
          ###################################
          # Deploy ADF Artifact
          ###################################
          - task: AzureResourceManagerTemplateDeployment@3
            displayName: '4 Deploy ADF Artifact'
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: 'sc_adf-devopssp'
              subscriptionId: $(DataFactorySubscriptionId)
              action: 'Create Or Update Resource Group'
              resourceGroupName: $(DataFactoryResourceGroupName)
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json'
              csmParametersFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateParametersForFactory.json'
              overrideParameters: '-factoryName $(DataFactoryName)'
              deploymentMode: 'Incremental'

            env: 
                SYSTEM_ACCESSTOKEN: $(System.AccessToken)
                
          ###################################
          # Start triggers and cleanup
          ###################################
          - task: AzurePowerShell@5
            displayName: '5 Start triggers and cleanup'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\ArmTemplatesArtifact\PrePostDeploymentScript.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate $(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $false
                -deleteDeployment $true