Microsoft BI Tools: YAML

Showing posts with label YAML. Show all posts

Saturday, 12 November 2022

Deploy Azure SQL Databases via a DevOps pipeline

Case
A while ago we did a post about deploying an Azure SQL Database via Azure DevOps using the deployment pipelines. Since that post we mostley used YAML pipelines. So in this post an update on how to accomplish the same with a YAML pipeline.

Deploy Azure SQL DB with DevOps

Solution
The starting point of this post is that you have a Visual Studio database project connected to the DevOps repository.

1) Variable Groups
First we start with some variable groups to parameterize the YAML code. For this example we have one General variable group with variables like the (visual studio) projectname, databasename and service connection (if each environment uses the same service connection). Then there are also variable groups per environment (Development, Test, Acceptance, Production) with variables like the SQL Server name and service connection (if each environment uses a different service connection).

Pipelines > Library > Variable Groups

Whether you need a variable group for DEV depends on how you do your SQL development. Some teams do the development in the central development database and use the compare option in Visual Studio to update the SQL project. In that case you don't want to deploy the project to the DEV database.

Example of variables in variable groups

2) Environments
Second preparation step is the creation of enviroments. This allows you to add approvals and checks for specific environments. You probably want some approvals before you update your production environment. You can reuse these environments for other deployments like ADF, Synapse or Power BI.

Pipelines > Environments

Same note about the DEV environment as in the previous step. If you don't deploy to DEV then you don't need the 'Devploy to DEV' environment.

3) Repos folder structure
Our folder structure in the repos is as follows (if you use a different structure you need to adjust the YAML to it). We have a SQL folder with a subfolder for our Visual Studio database project. The name of the project/folder is stored in the variable group (see step 1). We also have a CICD folder in the root where we store everything that is deployment related. For SQL deployment we only have a subfolder called YAML for the pipelines, but for other deployments we also have a subfolder for PowerShell scripts.

Folder structure of repos

In the YAML folder we have a BuildSQL.yml and this is the main pipeline file that you will be using to add a new pipeline. It calls DeploySQL.yml once for each environment.

4) Build pipeline
Lets explain the YAML of the build pipeline. It first starts by adding the general variable group to the pipeline so that you are able to use those variables from step 1.

Next is the trigger that shows you when the pipeline will be started. In this example it listens to changes in multiple branches, but only when those changes happen in the SQL folder. This part will be different per branche strategy.

Then we have the stages. The first stage is de build stage where the artifact is created and published for later use. In the Job part we have two noteworthy items: workspace clean will first clean your agent before starting. This is handy if you have a self-hosted agent instead of a Microsoft hosted agent. The agent pool that will be used is the second important part. Name is voor self-hosted agents and vmImage is for Microsoft hosted agents.

Then the steps withing the first stage.

Checkout will retrieve the repository content to your agent. We need the visual studio project on the agent to build it.
The treeview step is an optional debug step to show you where the files are on your agent after the checkout. It makes it easier to specify the paths in the next steps. You can remove it or comment it out when your are ready with your pipeline.
The MSBuild step will build your SQL project and create a dacpac file which you need for deployment. This is equals to building your project within Visual Studio. If you have a self-hosted agent you will need to either install Visual Studio or msbuild on your VM.
The Copy step copies the generated dacpac file to a stage folder, but also copies the publish file with settings for publishing your project.
The last step is to publish the copied files as an artifact

The last stages of this YAML file are copies of each other and then slitly changed for the enviroment specific settings. You can add or remove stages if you have more or less environments.

In these stages you first see that the environment specific variable group is added to the pipeline and it shows which agent will be used for the deployment. If your Azure SQL server only uses private endpoints then the self-hosted agent is required because the Microsoft hosted agents won't be able to connect to your server.

The condition part in this example is optional and depends on the branch strategy. Remove that line if you don't use multiple branches for deployment. The last part of these stages is calling the second YAML file and passing values from the specifi variable group as parameters to the second YAML file.

###################################
# General Variables
###################################
variables:
  - group: SQLParamsGen


###################################
# When to create a pipeline run
###################################
trigger:
  branches:
    include:
    - Development
    - Acceptance
    - main
  paths:
    include:
    - SQL/*

stages:
###################################
# Create DacPac Artifcat
###################################
- stage: CreateSQLArtifact
  displayName: Create SQL Artifact (dacpac)

  jobs:
  - job: Job
    displayName: 'Build DacPac'
    workspace:
      clean: all
    pool:
      name: DataPlatform
      # vmImage: 'windows-latest' 
    steps:

    ###################################
    # 1 Retrieve Repository
    ###################################
    - checkout: self
      displayName: '1 Retrieve Repository'
      clean: true

    ###################################
    # 2 Show treeview of agent
    ###################################
    - powershell: |
        Write-Output "Dit is de tree omgeving"
        tree "$(Pipeline.Workspace)" /F
      displayName: '3 Treeview Workspace and ArmTemplateOutput content '

    ###################################
    # 3 Build Visual Studio project
    ###################################
    - task: MSBuild@1
      displayName: '3. Creating Artifact'
      inputs:
        solution: '$(System.DefaultWorkingDirectory)/SQL/$(SqlProjectName)/$(SqlProjectName).sqlproj'
        msbuildArchitecture: x86
        msbuildVersion: latest

    ###################################
    # 4 Stage dacpac and publish
    ###################################
    - task: CopyFiles@2
      displayName: '4. Copy Artifact'
      inputs:
        contents: |
          **\*.dacpac
          **\*.publish.xml
        TargetFolder: '$(build.artifactstagingdirectory)'

    ###################################
    # 5 Publish dacpac and xml artifact
    ###################################
    - task: PublishPipelineArtifact@1
      displayName: '5. Publish Artifact'
      inputs:
        targetPath: '$(Build.ArtifactStagingDirectory)'
        artifact: 'SQL_Dacpac'
        publishLocation: 'pipeline'



###################################
# Deploy Dev environment
###################################
- stage: DeployDev
  displayName: Deploy DEV
  variables:
    - group: SQLParamsDev
  pool:
    name: DataPlatform
    # vmImage: 'windows-latest'
  condition: eq(variables['Build.SourceBranchName'], 'Development')
  jobs:
    - template: DeploySQL.yml
      parameters:
        env: DEV
        ServiceConnection: $(ServiceConnection)
        SqlServerName: $(SqlServerName)
        SqlDatabaseName: $(SqlDatabaseName)
        SqlProjectName: $(SqlProjectName)



###################################
# Deploy Acc environment
###################################
- stage: DeployAcc
  displayName: Deploy ACC
  variables:
    - group: SQLParamsAcc
  pool:
    name: DataPlatform
    # vmImage: 'windows-latest'
  condition: eq(variables['Build.SourceBranchName'], 'Acceptance')
  jobs:
    - template: DeploySQL.yml
      parameters:
        env: ACC
        ServiceConnection: $(ServiceConnection)
        SqlServerName: $(SqlServerName)
        SqlDatabaseName: $(SqlDatabaseName)
        SqlProjectName: $(SqlProjectName)



###################################
# Deploy Prd environment
###################################
- stage: DeployPrd
  displayName: Deploy PRD
  variables:
    - group: SQLParamsPrd
  pool:
    name: DataPlatform
    # vmImage: 'windows-latest'
  condition: eq(variables['Build.SourceBranchName'], 'Main')
  jobs:
    - template: DeploySQL.yml
      parameters:
        env: PRD
        ServiceConnection: $(ServiceConnection)
        SqlServerName: $(SqlServerName)
        SqlDatabaseName: $(SqlDatabaseName)
        SqlProjectName: $(SqlProjectName)

5) Deploy pipeline
The second YAML file does the deployment of the dacpac to your Azure SQL Database. It starts with the YAML parameters. You can specify allowed values or default values for your parameters.

The job part specifies which enviroment you want to use (see step 2) and then the steps of the deployment:

The checkout retrieves to files from the repos to the agent. For this example it is not necessary, but if you perform some extra steps like executing a PowerShell then you need this to get the PowerShell file to your agent.
The treeview step is an optional debug step to show you where the artifact files are on your agent. You can remove it or comment it out when you don't need it anymore.
The SqlAzureDacpacDeployment@1 is the actual deployment step for the dacpac and it also adds the publish profile file. If you don't use a publish profile then remove that line. Note that if you are using a self-hosted agent then you need to install SqlPackage.exe on that VM for this step.

###################################
# Parameters
###################################
parameters:
  - name: env
    displayName: Enviroment
    type: string
    values: 
    - DEV
    - ACC
    - PRD
  - name: ServiceConnection
    displayName: Service Connection
    type: string
  - name: SqlServerName
    displayName: Sql Server Name
    type: string
  - name: SqlDatabaseName
    displayName: Sql Server Name
    type: string
  - name: SqlProjectName
    displayName: Sql Project Name
    type: string


jobs:
  - deployment: deploymentjob${{ parameters.env }}
    displayName: Deployment Job ${{ parameters.env }} 
    environment: Deploy to ${{ parameters.env }}
    strategy:
      runOnce:
        deploy:
          steps:
          ###################################
          # 1 Check out repository to agent
          ###################################
          - checkout: self
            displayName: '1 Retrieve Repository'
            clean: true 
 
          ###################################
          # 2 Show environment and treeview
          ###################################
          - powershell: |
              Write-Output "Deploying ${{ parameters.SqlProjectName }} to DB ${{ parameters.SqlDatabaseName }} and server ${{ parameters.SqlServerName }} in the ${{ parameters.env }} environment"
              tree "$(Pipeline.Workspace)" /F
            displayName: '2 Show environment and treeview Pipeline_Workspace'

          ###################################
          # 3 Deploy DacPac
          ###################################             
          - task: SqlAzureDacpacDeployment@1
            displayName: '3 Deploy DacPac' 
            inputs:
              azureSubscription: '${{ parameters.ServiceConnection }}'
              AuthenticationType: 'servicePrincipal'
              ServerName: '${{ parameters.SqlServerName }}.database.windows.net'
              DatabaseName: '${{ parameters.SqlDatabaseName }}' 
              deployType: 'DacpacTask'
              DeploymentAction: 'Publish'
              DacpacFile: '$(Pipeline.Workspace)/SQL_Dacpac/SQL/${{ parameters.SqlProjectName }}/bin/debug/${{ parameters.SqlProjectName }}.dacpac'
              PublishProfile: '$(Pipeline.Workspace)/SQL_Dacpac/SQL/${{ parameters.SqlProjectName }}/${{ parameters.SqlProjectName }}.publish.xml'
              IpDetectionMethod: 'AutoDetect'

6) Azure DevOps Service Connection

Make sure you have a DevOps Service Connection (type Azure Resource Manager) and also make sure that your Azure SQL Server uses Azure Active Directory authentication (see your server in the Azure Portal and then go to Azure Active Directory in the left menu to check it).

Azure DevOps Project Service Connection

The Service Principal behind this Service Connection should be either a member of the Azure Active Directory group that is specified under Azure Active Directory admin on that page or...

Azure Active Directory authentication with AAD group

... or you can add the Service Principal as an user with a TSQL command

-- Add your Azure AD Service Principal as external to SQL DB
CREATE USER [YourServicePrincipal] FROM EXTERNAL PROVIDER;

-- Add db_owner role to your Service Principal
ALTER ROLE [db_owner] ADD MEMBER [YourServicePrincipal]

If you don't have a Service Principal then it is also possible to use a SQL database user. Please see the documentation of the Azure SQL Database deployment v1 task for that.

7) Testing
Now go to pipelines and add a new YAML pipeline based on an existing YAML file in your repository and select the BuildSQL.yml file. Then either run it manually or change something in your database project in the repos to trigger pipeline.

Run your new SQL Deployment pipeline

Conclusion

In this post you learned how to deploy your Azure SQL Databases with a YAML pipeline in Azure DevOps. Note that if you don't have public endpoints on your Azure SQL Server that you will need a self-hosted agent in your Azure subscription (a Virtual Machine with Devops software) because the Microsoft hosted agents will use public endpoints. This will be described a next post.

Update when you get a Get-SpnAccessToken is obsolete warning during deployment then check this solution.

Monday, 3 October 2022

Deploy Synapse workspaces via DevOps - Pipeline

Case
I want to deploy my development Synapse workspace to the next environment (test, acceptance or production). What is the easiest way to automate this proces via DevOps? And is it possible to ignore the publish button just like in Data Factory.

Release Synapse Workspace via DevOps

Solution
With the new (updated) Synapse add-on for DevOps it is much easier to release Synapse then it was to release Data Factory. And if you use the validateDeploy operation (instead of deploy) then you don't need the workspace_publish branch. It can directly read from the collaboration branch so that you don't have to use the publish button to initiate the CICD proces.

This solution contains of two separate main posts and a couple of side posts.

Setup Synapse and DevOps in preparation of the pipeline.
Setup the YAML pipeline to do the actual deployment (this post).

Addiontal posts

Cleanup Synapse before deployment
Cleanup Synapse after deployement (faster) comming soon

5) Preparation

Make sure to do the preparations described in the previous post. Add two empty files to the CICD\YAML folder (or your own folder setup):

Synapse.yml
DeploySynapse.yml

Add two YAML files

6) The YAML pipeline

In this example we will create an artifact first and then deploy that artifact to the test/acceptance/production environment of Synapse, but depending on the branch strategy you could skip that step and directly publish from a branch in the repository. The artifact build and artifact publish are separated in two different YAML files.

Synapse.yml

First step is the trigger. When should the pipeline start? In this case if the branch name is equals to 'main' and the change is happening in the 'Synapse' folder. So changes in the 'CICD' folder will not trigger the pipeline.

###################################
# When to create a pipeline run
###################################
trigger:
  branches:
    include:
    - main
  paths:
    include:
    - Synapse/*

The second step is creating the first stage. It will become the first blue, green or red circle in the overview. It also cleans the workspace, which is handy for self-hosted agents. And it mentions the agent pool. In this case a Microsoft-hosted agent.

stages:
###################################
# Create Artifact of Synapse files
###################################
- stage: CreateSynapseArtifact
  displayName: Create Synapse Artifact

  jobs:
  - job: CreateArtifact
    displayName: 'Create Actifact'
    workspace:
      clean: all
    pool:
      vmImage: 'windows-latest' #'ubuntu-latest'
    steps:

The third block (step 1) retrieves the content of the repository to the agent. This allows us to create an artifact of the Synapse files that are stored in the repository.

    ###################################
    # 1 Retrieve Repository
    ###################################
    - checkout: self
      displayName: '1 Retrieve Repository'
      clean: true

The fourth block (step 2) is optional. It just shows a treeview of the agent which is very handy for debugging your YAML pipelines to make sure you are mentioning the right folder or file in any of the tasks. This is explained in detail in a previous post.

    ###################################
    # 2 Show treeview of agent
    ###################################
    - powershell: |
        tree "$(Pipeline.Workspace)" /F
      displayName: '2 Treeview of Pipeline.Workspace'

The fifth block (step 3) is the coping of all Synapse files to an artifact staging folder. Optionaly you could skip this part an publish directly from the Synapse folder.

    ###################################
    # 3 Stage artifact
    ###################################
    - task: CopyFiles@2
      displayName: '3. Copy Artifact'
      inputs:
        contents: |
          **\*.*
        SourceFolder: 'Synapse'
        TargetFolder: '$(build.artifactstagingdirectory)'

The sixth block (step 6) is publising all the files located in the artifact staging folder.

    ###################################
    # 4 Publish artifact
    ###################################
    - task: PublishPipelineArtifact@1
      displayName: '4 Publish template as artifact'
      inputs:
        targetPath: $(Build.ArtifactStagingDirectory)
        artifact: 'SynapseArtifact'
        publishLocation: 'pipeline'

The last block in this YAML file is calling the next YAML file with parameters so that you can reuse this step for all environments (Test/Acceptance/Production). The values are now hardcoded in this example, but you should ofcourse try to use a variable group from the Pipeline Library. This makes it much easier to change those parameter values.

This example contains 4 variables. The first is just to show the enviroment when writing values to the screen (debugging). The ServiceConnection is the name of your ARM Service Connection that you created in the preparation post. The last two are to point to the correct environment of Synapse.

###################################
# Deploy Acceptance environment
###################################
- stage: DeployAcc
  displayName: Deploy ACC
#   variables:
#     - group: SynapseParametersAcc
  pool:
    vmImage: 'windows-latest'
  jobs:
    - template: DeploySynapse.yml
      parameters:
        Environment: ACC
        ServiceConnection: SynapseServiceConnection
        ResourceGroupName: rg_dwh_acc
        TargetWorkspaceName: dwhacc

DeploySynapse.yml

The first code block are the parameters. In this example the 4 string parameters with the values that are provided in the first YAML file.

###################################
# Parameters
###################################
parameters:
  - name: Environment
    displayName: Environment
    type: string
    values:
    - TST
    - ACC
    - PRD
  - name: ServiceConnection
    displayName: Service Connection
    type: string
  - name: ResourceGroupName
    displayName: Resource Group Name
    type: string
  - name: TargetWorkspaceName
    displayName: Target Workspace Name
    type: string

The second block consist of some starter code, but the environment piece is important if you want to add rules like approvals. So make sure to create a environment 'Deploy Synapse to ACC' or choose your own name.

###################################
# Start
###################################
jobs:
    - deployment: deploymentjob${{ parameters.Environment }}
      displayName: Deployment Job ${{ parameters.Environment }} 
      environment: Deploy Synapse to ${{ parameters.Environment }}

      strategy:
        runOnce:
          deploy:
            steps:

The third block (step 1) is about getting the data from the repository. If you use the artifact then you could skip this code because the job will automatically start with a download artifact step. If you want to directly start publising from the collaboration branch or you need to execute for example some extra PowerShell scripts from the repos as well then you need this step.

If you want to use the publish branch then you will find some example code for that as well. This allows you to keep the the YAML files in the collaboration branch instead of the publish branch. You will need to change the operation in the last step to 'deploy' and change some of its properties (not discribed in this post).

            ###################################
            # 1 Check out repository to agent
            ###################################
            # - checkout: 'git://YourProjectName/YourReposName@workspace_publish'
            #   path: workspace_publish
            - checkout: self
              displayName: '1 Retrieve Repository'
              clean: true

The fourth block (step 2) is again the optional treeview to check the path of folders and files on your agent. Very handy, but once your code works fine then you can comment-out this part.

            ###################################
            # 2 Show environment and treeview
            ###################################
            - powershell: |
                Write-Output "Deploying Synapse in the ${{ parameters.Environment }} environment"
                tree "$(Pipeline.Workspace)" /F
              displayName: '2 Show environment and treeview Pipeline_Workspace'

The fifth and last block (step 3) is the actual deployment of Synapse. The DeleteArtifactsNotInTemplate option is to remove pipelines, datasets, linkedservice, etc. from your test/acceptance/production environment that you removed from the development environment. This is also the place where you can replace parameters and linked service which will be explained in a separate post.

            ###################################
            # 3 validateDeploy
            ###################################
            - task: Synapse workspace deployment@2
              displayName: '3 Deploy Synapse Workspace'
              inputs:
                operation: validateDeploy
                ArtifactsFolder: '$(Pipeline.Workspace)/SynapseArtifact'
                azureSubscription: ${{ parameters.ServiceConnection }} 
                ResourceGroupName: ${{ parameters.ResourceGroupName }} 
                TargetWorkspaceName: ${{ parameters.TargetWorkspaceName }} 
                DeleteArtifactsNotInTemplate: true
                # OverrideArmParameters: '
                # -workspaceName $(syn_wrk_name)
                # -ls_akv_mykeyvault_properties_typeProperties_baseUrl $(syn_mykeyvault)
                # '

Note 1: If you get an error Stderr: error: missing required argument 'factoryId', then please check this post

Note 2: If you get an error: Stderr: 'node' is not recognized as an internal or external command, operable program or batch file, then please check this post

7) The result

Now create a pipeline of an existing YAML file in your reposity and make sure to run the pipeline (manually or triggered) to see the result.

Successfully deployed Synapse

Conclusion

In this second post we described all the steps of the YAML pipeline and succesfully executed the pipeline. In a follow up post we will explain more details about overriding parameters during the deployment. Also see Microsofts own documentation for CICD for Synapse, but at the moment of writing it is not up to date with info of task version 2.

To see the available operations and related properties of this task you can also use the 'Show assistant' option in the YAML editor in Azure DevOps. An other option is to use the Release Pipeline editor and then hit the View YAML button.

Gui of the task via Show Assistant

View YAML of Release pipeline task

Sunday, 2 October 2022

Synapse - error: missing required argument 'factoryId'

Case
I want to deploy a Synapse workspace via DevOps and the Synapse workspace deployment addon, but it is giving me an error: Stderr: error: missing required argument 'factoryId'. How do I solve this error?

error: missing required argument 'factoryId'

2022-10-02T19:05:21.6763177Z ##[section]Starting: Synapseworkspacedeployment
2022-10-02T19:05:21.6900329Z ==============================================================================
2022-10-02T19:05:21.6900630Z Task         : Synapse workspace deployment
2022-10-02T19:05:21.6900882Z Description  : Deployment task for synapse workspace v2
2022-10-02T19:05:21.6901097Z Version      : 2.3.0
2022-10-02T19:05:21.6901303Z Author       : Microsoft Corporation
2022-10-02T19:05:21.6901526Z Help         : Displays the name of your extension v2
2022-10-02T19:05:21.6901791Z ==============================================================================
2022-10-02T19:05:22.5141212Z Bundle source :  https://web.azuresynapse.net/assets/cmd-api/main.js
2022-10-02T19:05:22.5165738Z Downloading asset file
2022-10-02T19:05:23.5975682Z Asset file downloaded at :  D:\a\1\s\downloads\main.js
2022-10-02T19:05:23.5986866Z Starting export operation
2022-10-02T19:05:23.5989932Z Executing shell command
2022-10-02T19:05:23.5991887Z Command :  node D:\a\1\s\downloads\main.js export "D:\a\1\SynapseArtifact\" dwhtst ExportedArtifacts
2022-10-02T19:05:25.3052935Z Stderr:  error: missing required argument 'factoryId'
2022-10-02T19:05:25.3054315Z 
2022-10-02T19:05:25.3225669Z Shell execution failed.
2022-10-02T19:05:25.3227048Z An error occurred during execution: Shell execution failed.
2022-10-02T19:05:25.3262506Z ##[error]Encountered with exception:Shell execution failed.
2022-10-02T19:05:25.3355687Z ##[section]Finishing: Synapseworkspacedeployment

Solution
This error points to a mistake in the ArtifactsFolder property of the Synapse workspace deployment@2 task. If you don't use the correct folder or even add a forward slash at the end (!) then you will get a not very descriptive error: Stderr: error: missing required argument 'factoryId'. If you get this error then make sure to add the treeview step in your pipeline to double check whether the folder is correct. It should point to the folder with publish_config.json file in it. However you will also get this error if you end the path with a forward slash!

            ###################################
            # Show treeview of agent
            ###################################
            - powershell: |
                Write-Output "Folder and file treeview of Pipeline_Workspace folder:"
                tree "$(Pipeline.Workspace)" /F
              displayName: 'Show treeview of Pipeline_Workspace folder'


            ###################################
            # validateDeploy
            ###################################
            - task: Synapse workspace deployment@2
              inputs:
                operation: validateDeploy
                ArtifactsFolder: '$(Pipeline.Workspace)/SynapseArtifact'
                azureSubscription: DevOps
                ResourceGroupName: dhwacc
                TargetWorkspaceName: rg_dwhacc
                DeleteArtifactsNotInTemplate: true

Conclusion

Double check the artifact folder and don't add a forward slash at the end of it. The forward slash bug(?) occurred in version 2.3.0 (9/2/2022).

Also see our posts of setting up Synapse and DevOps and creating the YAML pipeline.

Friday, 30 September 2022

Synapse - 'node' is not recognized as an command

Case
I want to deploy a Synapse workspace via DevOps and the Synapse workspace deployment addon, but it is giving me an error: Stderr: 'node' is not recognized as an internal or external command, operable program or batch file. How do I solve this error?

Node not known in DevOps

Solution
Just like with the deployment of Data Factory this add on also uses node.js to do the actual deployment. If you are using a self-hosted agent then you need to install node.js on your DevOps agent (VM) or use the Node.js Tool Installer task before the Synapse Workspace Deployment task.

NodeTool@0

###################################
# 2 Installs Node.js on agent
###################################
- task: NodeTool@0
  displayName: '3 Install Node.js'
  inputs:
    versionSpec: '16.x'
    checkLatest: true

Conclusion

In this short post you learned how to overcome the unrecognized node command in your DevOps Deployment pipeline. A simple manual installation of node.js or an automated installation via your CICD pipeline will do the trick.

Also see our posts of setting up Synapse and DevOps and creating the YAML pipeline.

Please note that you need to keep your node.js version up-to-date to avoid new error in your pipeline.

Wednesday, 31 August 2022

DevOps snack: If condition in YAML code

Case
Can you have an IF condition in your YAML code to make it more flexible? I want to do for example something different for my Data Factory deployment in Development, Test and Production.

IF conditions in YAML?

Solution
Yes the expression language also supports Conditional insertion (IF, ELSE, ELSEIF) which you can use to make your YAML code more flexible. However use it with moderation because it also makes is a bit less readable. The expressions won't get any color formatting in the browser.

If for example you want to adjust a parameter value depending on which branch is triggering the pipeline. Without the IF you would start with something like this:

jobs:
  - template: DeployADF.yml
    parameters:
      environment: tst

Between ${{ and }}: you can add an if construction. In this example it checks whether the name if the branch (that is triggering the pipeline) is equals to 'Development', 'Test' or 'Main'. Note the extra indention on the line below the IF.

jobs:
  - template: DeployADF.yml
    parameters:
      ${{ if eq(variables['Build.SourceBranchName'], 'Development' }}:
        environment: dev
      ${{ if eq(variables['Build.SourceBranchName'], 'Test' }}:
        environment: tst
      ${{ if eq(variables['Build.SourceBranchName'], 'Main' }}:
        environment: Prd

When the line below the IF would normally start with a minus sign then the if should also start with a minus sign. For example with the variable groups. Without the if you would start like below:

variables:
- group: ADFParamsTst

If you want to make the groupname depending on the branch name triggering the pipeline then you can add the IF between ${{ and }}: but the line should then start with a minus and also the line below the IF should start with a minus. Also note the extra indentions.

variables:
- ${{ if eq(variables['Build.SourceBranchName'], 'Development' }}:
  - group: ADFParamsDev
- ${{ if eq(variables['Build.SourceBranchName'], 'Test' }}:
  - group: ADFParamsTst
- ${{ if eq(variables['Build.SourceBranchName'], 'Main' }}:
  - group: ADFParamsPrd

Conclusion

In this blog you learned how to add an IF statement in your YAML code. Don't forget the extra indention on the next line below the IF and add the extra minus symbol before the IF when the line after the IF requires one. And don't make it too complex for your colleagues.

Thanks Collin Mezach for helping out.

Sunday, 21 November 2021

ADF Release - Use parameters to enable Triggers

Case
During deployment of Azure Data Factory (ADF) via Azure DevOps pipelines I want to make sure that a certain trigger is only executed on Production and not on the lower environments. How can we do this without writing code (low-code)?

ADF Trigger

Solution

This is possible by changing the ARM template parameter definition which on its turn will switch certain properties into overridable parameters during deployment. However, the triggers are not included by default in the parameter file. There is also a limitation that you cannot override every property, for example runtimeState to activate and deactivate the trigger. The workaround for this is to use the endTime property.

More information about which properties are parameterized can be found here.

1) Understand parameters in ADF

Before we start overriding properties in the ARM template, it is good to understand the parameters in general. As you know, when start building your ADF, one of the first things you do is creating a Linked Service. By default, ADF knows that for example a connection string or using a Key Vault in a Linked Service should be parameterize, because the database server or the URL will be different per environment in a DTAP. The result is always two ARM template files: the content itself (ARMTemplateForFactory.json) and the parameters that can be overwritten (ARMTemplateParametersForFactory.json). Another file holds the definition of the parameters (arm-template-parameters-definition.json).

When you start developing in a new ADF, the ARM template parameters file (result) only contains the ADF name that can be overwritten. When you have created a Linked Service, for example Azure Blob Storage, the file should look something like below.

{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "factoryName": {
            "value": "bitools-d-adf-dwh"
        },
        "LS_ABLB_bitools_connectionString": {
            "value": ""
        }
    }
}

You can check this via "Manage - ARM template - Export ARM template".

ADF Portal - Check the parameters

2) Check Trigger code

Now back to our trigger. Based on the documentation, we know which properties we can parameterize for a trigger. Lets have a look at the code of the trigger itself.

In the ADF portal go to Manage (toolbox icon in left menu) and then to Triggers
Find your trigger and hover your mouse on it and click on the code icon {}

ADF portal - Check the code of your trigger

See below the JSON code of the trigger. You can override everything that is related to typeProperties. Unfortunately the runtimeState property is not one of them.

{
    "name": "Trigger_Master",
    "properties": {
        "description": "Test",
        "annotations": [],
        "runtimeState": "Started",
        "pipelines": [
            {
                "pipelineReference": {
                    "referenceName": "PL_Master",
                    "type": "PipelineReference"
                }
            }
        ],
        "type": "ScheduleTrigger",
        "typeProperties": {
            "recurrence": {
                "frequency": "Day",
                "interval": 1,
                "startTime": "2021-01-01T00:00:00Z",
                "endTime": "2021-01-02T00:00:00Z",
                "timeZone": "UTC",
                "schedule": {
                    "minutes": [
                        10
                    ],
                    "hours": [
                        0
                    ]
                }
            }
        }
    }
}

Now that we have identified which properties can be parameterized, we need to know which property we want to override for our use case. As you know, we need to make sure the trigger should not be executed on every environment. One way to do this is to set the end date (and time) of a trigger. This property is called endTime. For example: a trigger with an end date on "01/02/2021 12:00 AM" will not be executed because this is in the past. When the end date is "12/31/9999 12:00 AM", the trigger will be executed because it is in the future.

Go to your trigger and set an end date and time in the future, for example 12/31/9999 12:00 AM.

ADF portal - Specify end date for trigger

3) ARM template

Next step is to override the endTime property in the ARM template parameter definition. Unlike integration runtime or linked services properties, we need to add this property first.

In the ADF portal go to Manage (same as step 2) and then to ARM template
Click on Edit parameter configuration
Search for "Microsoft.DataFactory/factories/triggers" and add the endTime property (that is part of recurrence) within typeProperties, set the value to "=:-endTime" and click the OK button. See below how your JSON should look like for the trigger part.

    "Microsoft.DataFactory/factories/triggers": {
        "properties": {
            "pipelines": [
                {
                    "parameters": {
                        "*": "="
                    }
                },
                "pipelineReference.referenceName"
            ],
            "pipeline": {
                "parameters": {
                    "*": "="
                }
            },
            "typeProperties": {
                "scope": "=",
                "recurrence": {
                    "endTime": "=:-endTime"
                }
            }
        }
    },

Now check the ARM template parameters via "Manage - ARM template - Export ARM template" (see step 1) and the result should look like this.

{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "factoryName": {
            "value": "bitools-d-adf-dwh"
        },
        "LS_ABLB_bitools_connectionString": {
            "value": ""
        },
        "Trigger_Master_endTime": {
            "value": "9999-12-31T00:00:00Z"
        }
    }
}

In this case, we have set the default by using "=" in front of the value. Adding a minus - in front of parameter name (endTime) will remove "_properties_typeProperties" from the parameter name. More information here.

Note:

The global parameters are also not set by default. Click here how to include them in the ARM template parameters file as well.

4) Adjust release pipeline

If you are using YAML to publish the changes then the only thing you have to change is the overrideParameters property by adding the new parameter Trigger_Master_endTime and adding either a variable or a hardcoded value. The > behind the property helps you to break the string over multiple lines and keep the YAML code more readable.

          ###################################
          # Deploy ADF Artifact
          ###################################
          - task: AzureResourceManagerTemplateDeployment@3
            displayName: '4 Deploy ADF Artifact'
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: 'sc_mcacc-adf-devopssp'
              subscriptionId: $(DataFactorySubscriptionId)
              action: 'Create Or Update Resource Group'
              resourceGroupName: $(DataFactoryResourceGroupName)
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json'
              csmParametersFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateParametersForFactory.json'
              overrideParameters: > 
                -factoryName $(DataFactoryName)
                -LS_ABLB_bitools_connectionString $(AzureBlobConnectionString)
                -Trigger_Master_endTime $(AzureDataFactoryTriggerEndTimeActive)
              deploymentMode: 'Incremental'

            env: 
                SYSTEM_ACCESSTOKEN: $(System.AccessToken)

And if you're using the Release pipelines with the ARM template deployment task then you can just go to the Override template parameters property, click on the edit button and replace the value with a new value or a variable from a variable group.

ARM template deployment - Override template parameters

Conclusion

In this post you learned how to add and override properties of a trigger during deployment via Azure DevOps. This allows you to activate or deactivate a trigger for that environment during deployment using ARM templates without writing any code.

In a previous post we showed you how to accomplish this for a Linked Service in combination with Azure Key Vault.

Thursday, 18 November 2021

ADF Release - Use script to enable certain Triggers

Case
During deployment of Azure Data Factory (ADF) via Azure DevOps pipelines I want to make sure that a certain trigger is only executed on Production and not on the lower environments like acceptance or test. How can we accomplish that without any manual operations?

ADF Trigger

Solution
This is possible with an extra PowerShell step. The standard deployment stages consists of three steps:

a pre-deployment script that stops all triggers.
the actual deployment
a post-deployment script that starts all triggers and cleans up old parts.

You could adjust the standard pre- and post deployment PowerShell script from Microsoft or create an additional PowerShell script if you don't want to mess around with the standard script from Microsoft.

1) PowerShell

Below that additional script. Feel free to merge it with the standard script. The PowerShell file should be stored in the repository in the \CICD\PowerShell folder (see setup post).

PowerShell file for setting trigger status

The new PowerShell script has five parameters which will be provided by the YAML pipeline (or release pipeline):

DataFactoryName
[string] Name of your Data Factory
DataFactoryResourceGroup
[string] Name of the Resource Group holding your ADF
DataFactorySubscriptionId
[string] Guid of the Azure Subscription hosting your ADF
DisableAllTriggers
[boolean] True or false indicating whether all triggers should be disabled (except triggers mentioned in next parameter)
EnabledTriggers
[string] Comma separated list with triggernames that should be enabled: "trigger1,trigger2"

The script consists of three parts. The first part checks all parameters. If one of them is incorrect then the scripts fails and stops. The second part is the optional disabling of all triggers (except the ones that we need enabled) and the last part of the script checks the list of triggers that should be enabled. If they are still disabled they will be enabled.

param
(
    [parameter(Mandatory = $true)] [String] $DataFactoryName,
    [parameter(Mandatory = $true)] [String] $DataFactoryResourceGroup,
    [parameter(Mandatory = $true)] [String] $DataFactorySubscriptionId,
    [parameter(Mandatory = $false)] [Bool] $DisableAllTriggers = $true,
    [parameter(Mandatory = $true)] [String] $EnabledTriggers # comma separated list
)



##############################################
# Check provided information
##############################################
$ErrorActionPreference = "Stop"

# Setting one subscription on active (fails with non existing)
Write-Host "Checking existance Subscription Id [$($DataFactorySubscriptionId)]."
$Subscription = Get-AzSubscription -SubscriptionId $DataFactorySubscriptionId `
                                   -WarningAction Ignore
Write-Host "- Subscription [$($Subscription.Name)] found."
Set-AzContext -Subscription $DataFactorySubscriptionId `
              -WarningAction Ignore > $null
Write-Host "- Subscription [$($Subscription.Name)] is active."


# Checking whether resource group exists (fails with non existing)
Write-Host "Checking existance Resource Group [$($DataFactoryResourceGroup)]."
Get-AzResourceGroup -Name $DataFactoryResourceGroup > $null
Write-Host "- Resource Group [$($DataFactoryResourceGroup)] found."


# Checking whether provided data factory exists (fails with non existing)
Write-Host "Checking existance Data Factory [$($DataFactoryName)]."
Get-AzDataFactoryV2 -ResourceGroupName $DataFactoryResourceGroup `
                    -Name $DataFactoryName > $null
Write-Host "- Data Factory [$($DataFactoryName)] found."


# Checking provided triggernames, first split into array
$EnabledTriggersArray = $EnabledTriggers.Split(",")
Write-Host "Checking existance of ($($EnabledTriggersArray.Count)) provided triggernames."


# Loop through all provided triggernames
foreach ($EnabledTrigger in $EnabledTriggersArray)
{ 
    # Get Trigger by name
    $CheckTrigger = Get-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup `
                                               -DataFactoryName $DataFactoryName `
                                               -Name $EnabledTrigger `
                                               -ErrorAction Ignore # To be able to provide more detailed error

    # Check if trigger was found
    if (!$CheckTrigger)
    {
        throw "Trigger $($EnabledTrigger) not found in data dactory $($DataFactoryName) within resource group $($DataFactoryResourceGroup)"
    }
}
Write-Host "- All ($($EnabledTriggersArray.Count)) provided triggernames found in data dactory $($DataFactoryName) within resource group $($DataFactoryResourceGroup)"



##############################################
# Disable triggers
##############################################
# Check if all trigger should be disabled
if ($DisableAllTriggers)
{
    # Get all enabled triggers and stop them (unless they should be enabled)
    Write-Host "Getting all enabled triggers that should be disabled."
    $CurrentTriggers = Get-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup `
                                                   -DataFactoryName $DataFactoryName `
                       | Where-Object {$_.RuntimeState -ne 'Stopped'} `
                       | Where-Object {$EnabledTriggersArray.Contains($_.Name) -eq $false}

    # Loop through all found triggers
    Write-Host "- Number of triggers to disable: $($CurrentTriggers.Count)."
    foreach ($CurrentTrigger in $CurrentTriggers)
    {
        # Stop trigger
        Write-Host "- Stopping trigger [$($CurrentTrigger.Name)]."
        Stop-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup -DataFactoryName $DataFactoryName -Name $CurrentTrigger.Name -Force > $null
    }
}



##############################################
# Enable triggers
##############################################
# Loop through provided triggernames and enable them
Write-Host "Enable all ($($EnabledTriggersArray.Count)) provided triggers."
foreach ($EnabledTrigger in $EnabledTriggersArray)
{                   
    # Get trigger details
    $CheckTrigger = Get-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup `
                                               -DataFactoryName $DataFactoryName `
                                               -Name $EnabledTrigger

    # Check status of trigger
    if ($CheckTrigger.RuntimeState -ne "Started")
    {
        Write-Host "- Trigger [$($EnabledTrigger)] starting"
        Start-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup `
                                     -DataFactoryName $DataFactoryName `
                                     -Name $EnabledTrigger `
                                     -Force > $null
    }
    else
    {
        Write-Host "- Trigger [$($EnabledTrigger)] already started"
    }
}

2) YAML Pipeline

You can now extend the existing YAML pipeline with an extra step. Make sure that all parameters for this script are available as variables in the variable group (under Pipelines, Library) and make sure to pass them to the second YAML pipeline as parameters. If you followed the previous blogs then you only need to add EnabledTriggers as variable and a YAML parameter.

          ###################################
          # Enable certain triggers and disable rest
          ###################################
          - task: AzurePowerShell@5
            displayName: '6 Enable certain triggers and disable rest'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\s\CICD\powershell\SetTriggers.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -DataFactoryName $(DataFactoryName)
                -DataFactoryResourceGroup $(DataFactoryResourceGroupName)
                -DataFactorySubscriptionId $(DataFactorySubscriptionId)
                -DisableAllTriggers $true
                -EnabledTriggers $(EnabledTriggers) # format: "prd_daily_4am,prd_daily_1pm"

The result of running the pipeline

Conclusion
In this post you learned how to enable only certain triggers for a specific environment. This makes it easy to generate a trigger in development for the production environment. The downside (for some) is ofcourse that you get an extra piece of code to maintain. In a next post we will show that you can also accomplish this without writing code via the ARM template. However the trigger property runtimeState cannot be set via the ARM template, so a workaroumd is necessary for the nocode variant.

Sunday, 14 November 2021

ADF Release - Update Linked Service while deploying

Case
I'm deploying Azure Data Factory via DevOps pipelines through my DTAP environment. During the deployment I want to change the URL of the Linked Service from Azure Key Vault to point it to the Key Vault of that specific environment. How do I change that Linked Service in DevOps?

ADF Linked Service

Solution
This is possible by changing the ARM template parameter definition which on its turn will switch certain properties into overridable parameters during deployment. There is one downside: you cannot create a parameter for one specific Linked Service, because it will work for all Linked Services with that same property. However you can narrow it down to one particular type of Linked Service (for all Key Vaults in this example).

1) Check Linked Service

For this example we will override the URL of the Azure Key Vault Linked Service, but first we need to find the actual property name that we want to override.

In ADF Studio go to Manage (toolbox icon in left menu) and then to Linked Services.
Now find your Key Vault Linked Service and hover your mouse over .it and click on the code icon {}.
Now check which property identifies a specific Key Vault. It should be within the typeProperties tag. In this case the baseUrl property contains a URL that points to one specific Key Vault.

baseURL property

Also notice the type property which will be used further on: AzureKeyVault

2) ARM template

Next step is to make this property overridable in the ARM template parameter definition (arm-template-parameters-definition.json). First the easiest way:

Under the same Manage menu item as step 1 go to ARM template
Click on Edit parameter configuration
Now find the baseUrl property within the Linked Services tag and change its value from "=" to "-" and then click on the OK button

Making baseUrl overridable

This will create a new parameter with the name: [LinkedServiceName]_properties_typeProperties_baseUrl

As mentioned before, this will now work for all Linked Services that have a (filled) property called baseUrl. A bit nicer is to instead create a Key Vault specific parameter by adding a piece JSON code below the general tag with the *. The name 'AzureKeyVault' from the code below can be found in the code of step 1.

 
        "AzureKeyVault": {
            "properties": {
                "typeProperties": {
                    "baseUrl": "-"
                }
            }
        },

This will result in the same (long) parameter name, but now only for Linked Services pointing to Azure Key Vault. We can shorten that very long parameter name by adding -:-BaseUrl where the colon : is the separator for the next part of the property. This is the name of the parameter. Adding a minus - in front of that name will remove _properties_typeProperties from the parametername and shorten it to:
[LinkedServiceName]_baseUrl

        "AzureKeyVault": {
            "properties": {
                "typeProperties": {
                    "baseUrl": "-:-BaseUrl"
                }
            }
        }

This is much nicer. More info about this can be found in the documentation.

3) Adjust release pipeline

If you are using YAML to publish the changes then the only thing you have to change is the overrideParameters property by adding the new parameter ls_kv_bitools_baseUrl and adding either a variable or a hardcoded value. The > behind the property helps you to break the string over multiple lines and keep the YAML code more readable.

          ###################################
          # Deploy ADF Artifact
          ###################################
          - task: AzureResourceManagerTemplateDeployment@3
            displayName: '4 Deploy ADF Artifact'
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: 'sc_mcacc-adf-devopssp'
              subscriptionId: $(DataFactorySubscriptionId)
              action: 'Create Or Update Resource Group'
              resourceGroupName: $(DataFactoryResourceGroupName)
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json'
              csmParametersFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateParametersForFactory.json'
              overrideParameters: > 
                -factoryName $(DataFactoryName) 
                -ls_kv_bitools_baseUrl "https://bitools-prd.vault.azure.net/"
              deploymentMode: 'Incremental'

            env: 
                SYSTEM_ACCESSTOKEN: $(System.AccessToken)

If you are not sure about which parameternames you can use in the YAML, then you can lookup that name by exporting the ARM template under ARM template. Then check arm_template.json or arm_template_parameters.json

export template to find parametername

ARM template deployment - Override template parameters

Conclusion

In this post you learned how to override properties of a Linked Service during deployment via Azure DevOps. This allows you to point your Linked Service to the correct service for that specific environment. The most likely candidate for this is probably the Linked Service pointing to Azure Key Vault where you store all other connection details.

In a previous post we also showed you how to change Global Parameters during deployment and in a next post we will show you how to change triggers during deployment because you probably don't want to use the same triggers on development, test, acceptance and production.