Monday, 1 November 2021

ADF Release - Create YAML CICD Pipeline - part 2

Case
How do you deploy Azure Data Factory via a YAML pipeline instead of the Release pipeline?
Release ADF pipelines with YAML pipelines












Solution
In a previous post we used a YAML pipeline to created an ARM template for ADF. That ARM template is now available as an artifact and ready for deployment. That previous post ended in calling the release part of the pipeline which is in a separate YAML file. This makes it easier to call that same YAML file for test, acceptance and production. Below the last part of the main pipeline:
###################################
# Deploy Test environment
###################################
- stage: DeployTest
  displayName: Deploy Test
  variables:
  - group: ParamsTst
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: tst
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)
        
###################################
# Deploy Acceptance environment
###################################
- stage: DeployAcceptance
  displayName: Deploy Acceptance
  variables:
  - group: ParamsAcc
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: acc
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)
        
###################################
# Deploy Production environment
###################################
- stage: DeployProduction
  displayName: Deploy Production
  variables:
  - group: ParamsPrd
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: prd
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)

In this blog we will create the deployADF.yml file mentioned in the YAML code above, but first we need to give the Service Principal (SP), used by the Azure DevOps service connection , access to the target Data Factories otherwise it can't release the ARM template.


1) Access control (IAM)
In this first step we will give the SP access to the target ADF. You have to repeat that for all target factories, but first you have to decide whether you want to give the Service Principal access to specific services (like ADF) or the resource group or even the entire subscription. 

In most cases you want to limit the access to the bare minimum to avoid misuse. Since there is no ADF deployment task we are using the more general AzureResourceManagerTemplateDeployment task. One downside of this task is that you need to give permissions on at least the resource group. Access to ADF only is not enough and will give you an error: Failed to check the resource group status. Error: {"statusCode":403}.

The next thing to keep in mind is the role to assign. You want to avoid owner to avoid misuse. In this case we need the role Contributor.
  • In the Azure portal go to the resource group where your ADF is located 
  • Click on Access control (IAM) in the left menu
  • Click on +Add and then on Add role Assignment
  • Search for the appropriate Azure role (this screen recently changed, but you can also still use the classic experience via the link. Click on Contributor and press Next.
  • Click on +Select members and search for your SP, click on the account and then press Select
  • Optionally add a description and press Next and then Review + assign
Contributor role in Resource Group for SP













2) Add additional YAML file
Next step is to add the second YAML file to the repository that does the deployment of ADF. Use the same repository folder as the existing YAML file (in CICD\YAML folder). Splitting up the deployment allows you to reuse the deployment code for test, acceptance and production. Downside is that you have to edit it in the repository instead of under pipelines (but you could also use the YAML extension for Visual Studio Code).

The second YAML consists of 4 parts and the optional treeview task to check where all your files are located on the agent.
  1. Parameters and environment
  2. Treeview
  3. Stop triggers
  4. Deploy ADF
  5. Cleanup and start triggers


A. Parameters and environment
This YAML file starts with parameters that will be filled by the main YAML file. As an alternative you could just use the variable group added in the main pipeline because those variables are also available in sub pipelines. There are four string parameters of which only Env has a list of expected/allowed values:
  • env (name of the environment: tst, acc or prd)
  • DataFactoryName
  • DataFactoryResourceGroupName
  • DataFactorySubscriptionId
parameters:
  - name: env
    displayName: Environment
    type: string
    values:
    - dev
    - tst
    - acc
    - prd
  - name: DataFactoryName
    displayName: Data Factory Name
    type: string
  - name: DataFactoryResourceGroupName
    displayName: Data Factory Resource Group Name
    type: string
  - name: DataFactorySubscriptionId
    displayName: Data Factory Subscription Id
    type: string

We also give the job a name and we create an environment. A list of environments can be found under Pipelines - Environments. This is also the place where you can add Approvals and Checks which is not available in the YAML language. The checkout is optional, but very handy when you for example have some custom PowerShell scripts in the repository that you want to execute before, during or after deployment.

jobs:
  - deployment: deploymentjob${{ parameters.env }}
    displayName: Deployment Job ${{ parameters.env }} 
    environment: deploy ${{ parameters.env }}
    strategy:
      runOnce:
        deploy:
          steps:
          - checkout: self
            displayName: '1 Retrieve Repository'
            clean: true 

B. Treeview
This treeview task is just for debugging. It for example allows you to see where the artifact is located on your agent. This makes it much easier to configure the following steps. When you have finished the pipeline then just delete this task from the steps or comment it out until you need it again.
          ###################################
          # Show environment and treeview
          ###################################
          - powershell: |
              Write-Output "This is the ${{ parameters.env }} environment"
              tree "$(Pipeline.Workspace)" /F
            displayName: '2 Show environment and treeview Pipeline_Workspace'

C. Stop triggers
Before we can deploy the created ARM template to our ADF we need to make sure nothing is running. Microsoft created a PowerShell script for this which is included in the generated ARM template folder. It can stop all triggers
PrePostDeploymentScript.ps1



















          ###################################
          # Stop triggers
          ###################################
          - task: AzurePowerShell@5
            displayName: '3 Stop triggers'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\ArmTemplatesArtifact\PrePostDeploymentScript.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate '$(Pipeline.Workspace)\ArmTemplatesArtifact\ARMTemplateForFactory.json'
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $true
                -deleteDeployment $false
If you're running your new pipeline and you're getting an error stating that it cannot find your resource group while you're absolutely sure that it exists and that the SP has access to it then please check this blog post. It shows you how to create a slightly adjusted copy of the script (with an extra parameter) that is stored in the CICD\PowerShel folder.
          ###################################
          # Stop triggers
          ###################################
          - task: AzurePowerShell@5
            displayName: '3 Stop triggers'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\s\CICD\powershell\PrePostDeploymentADF.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate '$(Pipeline.Workspace)\ArmTemplatesArtifact\ARMTemplateForFactory.json'
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $true
                -deleteDeployment $false
                -Subscription $(DataFactorySubscriptionId)

Stopping deployed triggers














D. Deploy ADF
Now it's finally time for the actual deployment of ADF. As mentioned above we are using the AzureResourceManagerTemplateDeploymentV3 task. Check the documentation for a description of all parameters. We will mention one parameter: Deployment Mode. It is very important to keep this on INCREMENTAL! The complete mode will delete everything in your resource group that is not mentioned in the ARM template. Since our template only contains ADF you will end up with a nearly empty resource group with only ADF in it. This is a very common mistake. So now you are warned.
          ###################################
          # Deploy ADF Artifact
          ###################################
          - task: AzureResourceManagerTemplateDeployment@3
            displayName: '4 Deploy ADF Artifact'
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: 'sc_adf-devopssp'
              subscriptionId: $(DataFactorySubscriptionId)
              action: 'Create Or Update Resource Group'
              resourceGroupName: $(DataFactoryResourceGroupName)
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json'
              csmParametersFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateParametersForFactory.json'
              overrideParameters: '-factoryName $(DataFactoryName)'
              deploymentMode: 'Incremental'

            env: 
                SYSTEM_ACCESSTOKEN: $(System.AccessToken)
Deployment of ADF












E. Cleanup and start triggers
Because we did an incremental deployment all deleted items are still in ADF. Only new and update items have changed. So we have to compare ADF with the template and delete all items that are not in the template. Luckily Microsoft already created a script for this. Same script as the stop trigger script. Just different parameters. And it also enables the triggers.
          ###################################
          # Start triggers and cleanup
          ###################################
          - task: AzurePowerShell@5
            displayName: '5 Start triggers and cleanup'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\ArmTemplatesArtifact\PrePostDeploymentScript.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate $(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $false
                -deleteDeployment $true
Note that the same issue with not finding your Resource Group will occure here if it occured when stopping the triggers. Same solution (different script and extra subscription parameter).
Start triggers and cleanup
















3) The result
Now it's time to run the pipeline from start to end by making changes to the Development Data Factory. And in no time all factories are updated.
The Result




















Conclusion
In this blog post you learned how to use YAML to do the (build and) deployment of ADF in a pipeline. The take away is to use the incremental option and not set it to complete to avoid those shocked looks when viewing your empty resource group.
In a next post we will show how to overwrite global parameters and change Linked Services during deployment and show you how to enable or disable certain ADF triggers depending on the environment. This allows you to have different active triggers in Development, Test, Acceptance and Production without setting them manually after deployment.

Now all YAML parts together:


parameters:
  - name: env
    displayName: Environment
    type: string
    values:
    - dev
    - tst
    - acc
    - prd
  - name: DataFactoryName
    displayName: Data Factory Name
    type: string
  - name: DataFactoryResourceGroupName
    displayName: Data Factory Resource Group Name
    type: string
  - name: DataFactorySubscriptionId
    displayName: Data Factory Subscription Id
    type: string

jobs:
  - deployment: deploymentjob${{ parameters.env }}
    displayName: Deployment Job ${{ parameters.env }} 
    environment: deploy ${{ parameters.env }}
    strategy:
      runOnce:
        deploy:
          steps:
          - checkout: self
            displayName: '1 Retrieve Repository'
            clean: true 

          ###################################
          # Show environment and treeview
          ###################################
          - powershell: |
              Write-Output "This is the ${{ parameters.env }} environment"
              tree "$(Pipeline.Workspace)" /F
            displayName: '2 Show environment and treeview Pipeline_Workspace'
            
          ###################################
          # Stop triggers
          ###################################
          - task: AzurePowerShell@5
            displayName: '3 Stop triggers'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\ArmTemplatesArtifact\PrePostDeploymentScript.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate '$(Pipeline.Workspace)\ArmTemplatesArtifact\ARMTemplateForFactory.json'
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $true
                -deleteDeployment $false
                
          ###################################
          # Deploy ADF Artifact
          ###################################
          - task: AzureResourceManagerTemplateDeployment@3
            displayName: '4 Deploy ADF Artifact'
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: 'sc_adf-devopssp'
              subscriptionId: $(DataFactorySubscriptionId)
              action: 'Create Or Update Resource Group'
              resourceGroupName: $(DataFactoryResourceGroupName)
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json'
              csmParametersFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateParametersForFactory.json'
              overrideParameters: '-factoryName $(DataFactoryName)'
              deploymentMode: 'Incremental'

            env: 
                SYSTEM_ACCESSTOKEN: $(System.AccessToken)
                
          ###################################
          # Start triggers and cleanup
          ###################################
          - task: AzurePowerShell@5
            displayName: '5 Start triggers and cleanup'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\ArmTemplatesArtifact\PrePostDeploymentScript.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate $(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $false
                -deleteDeployment $true

Sunday, 31 October 2021

ADF Build - Create YAML CICD Pipeline - part 1

Case
When deploying Azure Data Factory (ADF) via Azure DevOps we often forget to hit the publish button in the Data Factory UI and then the changes won't be deployed. Is there a easier solution?
No Publish required anymore!







Solution
This year Microsoft released the ARM template export option that doesn't require manually hitting the publish button in the Data Factory UI. This makes the Continuous Integration experience much better. In this blog post we will create a complete YAML pipeline to build and deploy ADF to your DTAP environments.

Prerequisites
  • A Service Principle in the Azure Active Directory
  • At least two Azure Data Factories (Dev and Prd)
  • An Azure DevOps project
  • Basic or Visual Studio Enterprise/Pro license
    Stakeholder is not enough but the first 5 users are for free
Note: At lease three ADF environments is recommended. Otherwise you will always do the first deployment in production without testing the deployment itself.

1) Connect ADF Dev to a repository
First we need to connect the Development Data Factory to a Git repository. Other ADF's (test, acceptance and production) will not be connected to the Git repository. You can do this when creating your ADF or skip that step and change the settings afterwards.
  • Go to your ADF and open ADF studio. On the left side click on Manage (toolbox icon)
  • Under Source Control click on Git configuration and then on the Configure button
  • Select Azure DevOps Git as Repository type
  • Select your Azure Active Directory and then click on the Continue button
  • Now select your Azure DevOps organization name, the Project name and the Repository name. Note: you can create a new/separate repository for your Azure Data Factory within your Azure DevOps project settings.
  • Next step is to select the Collaboration Branch. In most cases this is the Main branch, but some organizations use different names (and different branch strategies).
  • Leave the Publish branch on adf_publish (we will not use it, no more publish buttons)
  • The Root folder is also something to consider changing. Since we don't want everything of ADF in the root of the repository we will use /ADF/ as root folder. Now all pipelines, datesets, dataflows, etc. will be stored in a sub folder called ADF.
  • Uncheck the import radio button since this is a new configuration/start situation and then hit the Apply button
  • Go to the Repository in you Azure DevOps project and check the changes. You should see some new folders and files.
Configure GIT in ADF









ADF in the repository








2) ARM parameter configuration in ADF
The next step is not required, but will give an annoying error(/warning) during the build of your project in DevOps that it cannot find the file arm-template-parameters-definition.json and will use a default instead. See this blog post for more details.
  • In the same menu as before, click on ARM template under Source Control
  • Then click on Edit parameter configuration to open the editor
  • Now you will see the filename mentioned above and you will see the content of the json file.
  • For now we will use the default content and click on the OK button. In a later post we will show you what you can accomplish by changing this file
  • Go to the Repository in you Azure DevOps project and check the changes. You should now see the new json file.
arm-template-parameters-definition.json










3) Create repository folder CICD
The repository now contains a folder called ADF containing all components of ADF. The next step is to create an other folder in the root called CICD (continuous integration continuous deployment). This folder will be used to store files used during the deployment of ADF, but that are not part of ADF itself like the YAML and PowerShell files. You can of course change that name, but keep in mind that changing it effects a lot of the following steps. 

Also note that you cannot create empty folders in a Git repository and that you always have to create a file when creating a folder. For this example call the file readme.md and use it to add documentation about the ADF CICD process (or to add a link to this page).

Within the CICD folder create a YAML and PowerShell folder. These will be filled later on in the process. For now add an other readme file in those folders which you can delete after you added other files later on in the process.
Required folder structure in devops







4) Add package.json for Npm
Next step is to add a file to the repository that contains information about an ADF module for node.js which we will use later on. Within the CICD folder create a subfolder called packages and call the new file package.json

Copy and paste the following JSON message to the new file and commit the changes to the repository.
{
    "scripts":{
        "build":"node node_modules/@microsoft/azure-data-factory-utilities/lib/index"
    },
    "dependencies":{
        "@microsoft/azure-data-factory-utilities":"^0.1.5"
    }
} 
package.json







5) Add publish_config.json
The next step is (again) not required, but will otherwise give you an other annoying error(/warning) during the build of your project in DevOps that it cannot find the file publish_config.json. See this post for more details. Summary:
  • Add a file called publish_config.json to the repository in the root of the ADF folder
  • The content of the file should be:
    {"publishBranch": "factory/adf_publish"}
Add the publishing branch in the following format










6) Add Service Connection
Now it's time to create a service connection in Azure DevOps with the Service Principle (SP) mentioned in the Prerequisites. You will need read access to the AAD to see this SP in the AAD. This SP account will be used to deploy ADF. Make sure you have all the details like Service Principal Id, Service Principal Key and Tenant Id
  • Within your DevOps project go to the Project Settings (bottom-left)
  • Go to Service connections under Pipelines
  • Click on New service connection, Choose Azure Resource Manager as type and click next
  • Choose Service principal (manual) 
  • Now fill in the Service principal details and enter a name (and description) for the service connection. Make sure to Verify the SP details and click save.
Add Service Connection with SP










7) Add Variable Groups
To supply all the parameters(/variables) for the YAML script we need one general Variable Group (called ParamsGen) and one for each ADF environment (called ParamsDev, ParamsAcc, ParamsPrd).
  • In your Azure DevOps project go to Library in the Pipelines section
  • Click on + Variable Group to create a new Variable Group called ParamsGen
  • Add the following Variables:
    • PackageLocation (path in Repos of package location: /CICD/packages/)
    • ArmTemplatefolder (subfolder of generated ARM template: ArmTemplateOutput)
  • Click on + Variable Group to create a new Variable Group called ParamsDev
  • Add the following Variables:
    • DataFactoryName (name of your Dev ADF connected to the Repos)
    • DataFactoryResourceGroupName (Location of your Dev ADF)
    • DataFactorySubscriptionId (Azure subcription of your Dev ADF)
  • Repeat the previous step for all the environments where you want to deploy ADF (Tst/Acc/Prd)
Variable Groups






















8) Add YAML Build pipeline
Now it's time to create an actual pipeline. Either create a new pipeline under the Pipelines section in DevOps, enter all the required YAML code and save it in the CICD\Yaml folder of the repository. Or create a Yaml file in the CICD\Yaml repository folder then create a new pipeline bases on an existing file from the repository.

The first part of the pipeline where we generate an ARM template, which we will later on deploy to the other Data Factories, consists of 3 parts:
  1. Variables
  2. Trigger
  3. Stages
A. Variables
The YAML file starts with including the variable groups ParamsGen and ParamsDev. The other groups will be included in the deployment part.
###################################
# General Variables
###################################
variables:
  - group: ParamsGen
  - group: ParamsDev

B. Trigger
Also in the start of the yaml is the trigger. The trigger configuration determines when the pipeline will run. In this case it will run when something happens in the main branch but it will ignore changes in the CICD folder. It's also possible to include the ADF folder instead. Then it will only be triggered by changes in that folder.
###################################
# When to create a pipeline run
###################################
trigger:
  branches:
    include: # Collaboration branch
    - main 
  paths:
    exclude:
    - CICD/*

C. Stages
The last part of the start is configuring the first stage and its job. Besides naming those parts you also see the workspace clean all step which will empty/clean the workspace folder. This is especially useful when you are hosting your own agent. The Pool determines which agent will run the YAML pipeline. In this example it will use a Microsoft hosted windows based agent or (see comment) a unix based agent.
stages:
###################################
# Create Artifact of ADF files
###################################
- stage: CreateADFArtifact
  displayName: Create ADF Artifact
  jobs:
  - job: CreateArtifactJob
    workspace:
      clean: all
    pool:
      vmImage: 'windows-latest' #'ubuntu-latest'
    steps:

And now it's time for the first Stage called Create ADF Artifact, which consists of 6 steps and an optional debug step:
  1. Retrieve files from repository
  2. Install Node.js on agent
  3. Install npm package on agent
  4. Validate ADF
  5. Generate ARM template
  6. Publish ARM template as artifact
  7. Debugging, show treeview or file content

I: Retrieve Repository
The first step is retrieve all files from the repository to the agent that is running the pipeline. This is done via the Checkout command which has a couple of options like checkout type and clean.  
    ###################################
    # 1 Retrieve Repository
    ###################################
    - checkout: self
      displayName: '1 Retrieve Repository'
      clean: true

II: Installs Node.js on agent
This example required node.js so the next step is to install it on our agent via NodeTool@0 Update: make sure to take an update version. You can now use 14.0 or even 16.0
 
    ###################################
    # 2 Installs Node.js on agent
    ###################################
    - task: NodeTool@0
      displayName: '2 Install Node.js'
      inputs:
        versionSpec: '10.x'
        checkLatest: true    

III: Installs npm package for node.js on agent
The npm@1 task with the command 'Install' will install the ADF package mentioned in step 4. The working directory is a concatenation of predefined variable Build.Repository.LocalPath and user variable PackageLocation mentioned in step 7:  D:\A\1\s  +  \CICD\packages\
    ###################################
    # 3 Install npm package of ADF
    ###################################
    - task: Npm@1
      displayName: '3 Install npm package'
      inputs:
        command: 'install'
        workingDir: '$(Build.Repository.LocalPath)$(PackageLocation)'  # Working folder that contains package.json
        verbose: true
IV: Validate ADF
This step is the equivalent of the Validate All button in the gui.The npm@1 task with the command 'validate' will validate the content of the ADF folder against the Data Factory in Development. The URL for that ADF is concatenated with the variable values from the development variable group which is included together with the general version at the start of the YAML file. You could skip this step because the next step (export) seems to first validate the files before it exports an ARM template. However it only takes a few seconds so it doesn't really hurts us.
    ###################################
    # 4 Validate ADF in repository
    ###################################
    - task: Npm@1
      displayName: '4 Validate ADF'
      inputs:
        command: 'custom'
        workingDir: '$(Build.Repository.LocalPath)$(PackageLocation)' # Working folder that contains package.json
        customCommand: 'run build validate $(Build.Repository.LocalPath)/ADF /subscriptions/$(DataFactorySubscriptionId)/resourceGroups/$(DataFactoryResourceGroupName)/providers/Microsoft.DataFactory/factories/$(DataFactoryName)'

V: Generate ARM template
This step is the equivalent of the publish button in the gui and exports an ARM template for ADF. It is nearly the same as the previous step. Validate has been replaced by export and and export location has been added at the end.
    ###################################
    # 5 Generate ARM template from repos
    ###################################
    - task: Npm@1
      displayName: '5 Generate ARM template'
      inputs:
        command: 'custom'
        workingDir: '$(Build.Repository.LocalPath)$(PackageLocation)' # Working folder that contains package.json
        customCommand: 'run build export $(Build.Repository.LocalPath)/ADF /subscriptions/$(DataFactorySubscriptionId)/resourceGroups/$(DataFactoryResourceGroupName)/providers/Microsoft.DataFactory/factories/$(DataFactoryName) "$(ArmTemplateFolder)"'

VI: Publish ARM template as artifact
This step generates an artifact for the exported ARM template. This ensures the ARM template can be used in a next stage.
    ###################################
    # 6 Publish ARM template as artifact
    ###################################
    - task: PublishPipelineArtifact@1
      displayName: '6 Publish ARM template as artifact'
      inputs:
        targetPath: '$(Build.Repository.LocalPath)$(PackageLocation)$(ArmTemplateFolder)' # The arm template export folder
        artifact: 'ArmTemplatesArtifact'
        publishLocation: 'pipeline'
VII: Debugging, show treeview or file content
A simple PowerShell step will show you a treeview of files and folders. This extra step is extremely useful because it shows you the current state of the agent. You could also add this step right after the Checkout step to see what the result is of that step. It really helps you to understand what happens after each step.
    ###################################
    # 7 Show treeview of agent
    ###################################
    - powershell: |
        tree "$(Pipeline.Workspace)" /F
        Write-host "--------------------ARMTemplateForFactory--------------------"
        Get-Content -Path $(Build.Repository.LocalPath)$(PackageLocation)$(ArmTemplateFolder)/ARMTemplateForFactory.json
        Write-host "-------------------------------------------------------------"
      displayName: '7 Treeview Workspace and ArmTemplateOutput content '

The result of running the build part of the pipeline














9) Add release part to YAML pipeline
In the next blog post we will create a separate YAML file which will do the actual deployment including some pre and post deployment steps. That YAML file needs to know to which Data Factory we need to deploy the ARM template. If you include the specific variable group just before calling the second YAML then that YAML file can use those variables, but you can also provide those variable values via parameters to the second YAML file. Providing the values via parameters is probably a bit nicer but also a bit of extra work, especially when a lot of parameters.
###################################
# Deploy Test environment
###################################
- stage: DeployTest
  displayName: Deploy Test
  variables:
  - group: ParamsTst
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: tst
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)
        
###################################
# Deploy Acceptance environment
###################################
- stage: DeployAcceptance
  displayName: Deploy Acceptance
  variables:
  - group: ParamsAcc
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: acc
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)
        
###################################
# Deploy Production environment
###################################
- stage: DeployProduction
  displayName: Deploy Production
  variables:
  - group: ParamsPrd
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: prd
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)
Conslusion
In this post you learned how to avoid the need of the publish button in Azure Data Factory. This will give you a much better CICD experience. Unfortunately this is not yet available for Synapse. In the next post we will give the Service Principal access to the target Data Factories, stop the triggers, deploy the arm template, do a little cleanup of old pipelines/datasets and then turn on or off certain triggers for that environment. Thx to colleague Roelof Jonkers for helping.

Now all YAML parts together:
###################################
# General Variables
###################################
variables:
  - group: ParamsGen
  - group: ParamsDev

###################################
# When to create a pipeline run
###################################
trigger:
  branches:
    include: # Collaboration branch
    - main 
  paths:
    exclude:
    - CICD/*

stages:
###################################
# Create Artifact of ADF files
###################################
- stage: CreateADFArtifact
  displayName: Create ADF Artifact
  jobs:
  - job: CreateArtifactJob
    workspace:
      clean: all
    pool:
      vmImage: 'windows-latest' #'ubuntu-latest'
    steps:
    - checkout: self
      displayName: '1 Retrieve Repository'
      clean: true

    ###################################
    # Installs Node.js on agent
    ###################################
    - task: NodeTool@0
      displayName: '2 Install Node.js'
      inputs:
        versionSpec: '10.x'
        checkLatest: true    

    ###################################
    # Install npm package of ADF
    ###################################
    - task: Npm@1
      displayName: '3 Install npm package'
      inputs:
        command: 'install'
        workingDir: '$(Build.Repository.LocalPath)$(PackageLocation)'  # Working folder that contains package.json
        verbose: true

    ###################################
    # Validate ADF in repository
    ###################################
    - task: Npm@1
      displayName: '4 Validate ADF'
      inputs:
        command: 'custom'
        workingDir: '$(Build.Repository.LocalPath)$(PackageLocation)' # Working folder that contains package.json
        customCommand: 'run build validate $(Build.Repository.LocalPath)/ADF /subscriptions/$(DataFactorySubscriptionId)/resourceGroups/$(DataFactoryResourceGroupName)/providers/Microsoft.DataFactory/factories/$(DataFactoryName)'

    ###################################
    # Generate ARM template from repos
    ###################################
    - task: Npm@1
      displayName: '5 Generate ARM template'
      inputs:
        command: 'custom'
        workingDir: '$(Build.Repository.LocalPath)$(PackageLocation)' # Working folder that contains package.json
        customCommand: 'run build export $(Build.Repository.LocalPath)/ADF /subscriptions/$(DataFactorySubscriptionId)/resourceGroups/$(DataFactoryResourceGroupName)/providers/Microsoft.DataFactory/factories/$(DataFactoryName) "$(ArmTemplateFolder)"'

    ###################################
    # Publish ARM template as artifact
    ###################################
    - task: PublishPipelineArtifact@1
      displayName: '6 Publish ARM template as artifact'
      inputs:
        targetPath: '$(Build.Repository.LocalPath)$(PackageLocation)$(ArmTemplateFolder)' # The arm template export folder
        artifact: 'ArmTemplatesArtifact'
        publishLocation: 'pipeline'

    ###################################
    # Show treeview of agent
    ###################################
    - powershell: |
        tree "$(Pipeline.Workspace)" /F
        Write-host "--------------------ARMTemplateForFactory--------------------"
        Get-Content -Path $(Build.Repository.LocalPath)$(PackageLocation)$(ArmTemplateFolder)/ARMTemplateForFactory.json
        Write-host "-------------------------------------------------------------"
      displayName: '7 Treeview Workspace and ArmTemplateOutput content '

###################################
# Deploy Test environment
###################################
- stage: DeployTest
  displayName: Deploy Test
  variables:
  - group: ParamsTst
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: tst
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)
        
###################################
# Deploy Acceptance environment
###################################
- stage: DeployAcceptance
  displayName: Deploy Acceptance
  variables:
  - group: ParamsAcc
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: acc
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)
        
###################################
# Deploy Production environment
###################################
- stage: DeployProduction
  displayName: Deploy Production
  variables:
  - group: ParamsPrd
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: prd
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)

Friday, 15 October 2021

ADF Snack - Fail pipeline on purpose

Case
I want to fail my Azure Data Factory pipeline if certain conditions occurs. For example when a lookup doesn't return any records for my ForEach loop. Which setting or activity can I use for that?
Fail Activity
















Solution
In the past we often used a Stored Procedure Activity with a RAISERROR query in it to throw a custom message. However a while ago Microsoft released the Fail activity to accomplish this in a better way (at the moment of writing still in public preview).

For this example we have a Lookup activity and a ForEach activity that loops through the records from the lookup. If the Lookup activity doesn't return any records then the ForEach activity just won't loop. Now add an If Condition activity between those two and use an expression to test the record count: @equals(activity('myLookup').output.count, 0)
Add If Condition













Now go to the True part of the If Condition and add the Fail activity. Add the custom error message with a made up error code.
Fail Activity
















Now make sure query in the lookup doesn't return any records for this test and then trigger the pipeline.
Oops the pipeline fails (on purpose)











Conclusion
In this blog post you learned how to fail your pipelines on purpose with the new Fail Activity. You could for example also use it after an error handler step (which will handle the error and therefore not fail your pipeline) to force fail the pipeline.
Force Fail after error handler












Monday, 20 September 2021

DevOps Snack: Get Treeview of files on agent

Case
I want to see which files and folders are available in my workspace on a DevOps agent. This would make it much easier to determine paths to for example other YAML files or PowerShell scripts. Is there an option to browse files on the agent?
Treeview after a checkout of the respository

















Solution
Unless you are using your own Virtual Machine as a private agent (instead of a Microsoft-hosted agent) where you can login to the actual VM, the answer is no. However with a single line PowerShell script it is very easy. Don't worry it's just copy and paste!

The trick is to add a YAML PowerShell task with an inline script that executes the PowerShell Tree command. The first parameter of the Tree command is the folder or drive. This is where the Predefined DevOps variables are very handy. For this example we will use the Pipeline.Workspace to see its content. The /F parameter will show all file in each directory.
###################################
# Show treeview of agent
###################################
- powershell: |
    tree "$(Pipeline.Workspace)" /F
  displayName: '2 Treeview of Pipeline.Workspace'

On a Windows agent it looks a bit crapy, but on an Ubuntu agent it is much better (see first screenshot above).
Treeview of Pipeline.Workspace on Windows agent
















A useful place for this tree command is for example right after a checkout of the repository. Now you know where for example your other YAML file is located so you can call in in a next task. 
YAML pipeline






















Or use it after the build of an artifact to see the result. And now that you now where the files are located you could even show the content of a (text)file with an additional line of code. 
###################################
# Show treeview of agent
###################################
- powershell: |
    tree "$(Pipeline.Workspace)" /F
    Write-host "--------------------ARMTemplateForFactory--------------------"
    Get-Content -Path $(Pipeline.Workspace)/s/CICD/packages/ArmTemplateOutput/ARMTemplateForFactory.json
    Write-host "-------------------------------------------------------------"
  displayName: '7 Treeview of Pipeline.Workspace and ArmTemplateOutput content '
Note: For large files it is wise to limit the number of rows to read with an additional parameter for the Get-Content: -TotalCount 25

Conclusion
In this post you learned how a little PowerShell can help you debugging your YAML pipelines. Don't forget to comment out the extra code when you go to production with your pipeline. Please share your debug tips in the comments below.

Thx to colleague Walter ter Maten for helping!

Sunday, 19 September 2021

ADF Build - missing publish_config.json

Case
I'm using the new and improved ARM export via Npm to generated an ARM template for my Data Factory so I can deploy it to the next environment, but the Validate step and the Validate and Generate ARM template step both throw an error sayin that the publish_config.json file can't be found. This file isn't mentioned in the steps from the documentation. How do I add this file and what content should be in it?
Unable to read file: publish_config.json


















ERROR === LocalFileClientService: Unable to read file: D:\a\1\publish_config.json, error: {"stack":"Error: ENOENT: no such file or directory, open 'D:\\a\\1\\publish_config.json'","message":"ENOENT: no such file or directory, open 'D:\\a\\1\\publish_config.json'","errno":-4058,"code":"ENOENT","syscall":"open","path":"D:\\a\\1\\publish_config.json"}
ERROR === PublishConfigService: _getLatestPublishConfig - Unable to process publish config file, error: {"stack":"Error: ENOENT: no such file or directory, open 'D:\\a\\1\\publish_config.json'","message":"ENOENT: no such file or directory, open 'D:\\a\\1\\publish_config.json'","errno":-4058,"code":"ENOENT","syscall":"open","path":"D:\\a\\1\\publish_config.json"}
Solution
While it indeed looks like a real error it doesn't stop the DevOps pipeline. The old method of publishing ADF changes to an other ADF did create this file automatically in the adf_publish branch when you hit the Publish button the the Data Factory Gui. So probably it isn't used any more, but we still want to get rid of annoying errors!

You can solve this by manually adding the missing file:

1) Add new file to repos
To solve it go to the Azure DevOps repository and locate the folder where ADF stores the pipeline, dataset and factory files (in subfolders). Click on the +New button and create a file called publish_config.json.
Add new file to repository (in root of ADF)










2) Add JSON content
The content of the new file should be the name of your publishing branch when you configured GIT for ADF in the following JSON format:
{"publishBranch": "factory/adf_publish"}

Add the publishing branch in the following format









3) The result
Now the new file is available for the Npm task in the pipeline. Run that DevOps pipeline again and you will notice that the error message won't appear in the logs.
publish_config.json is now available for the pipeline













Conclusion
In this post you learned how to avoid the error message about the missing publish_config.json file. Not very satisfying that it is still unknown why this file was missing and if it is still used by the process. Please add comment below if you found more details.

In a next post we will describe the entire Data Factory ARM deployment where you don't need to hit that annoying Publish button within the Data Factory GUI. Everything (CI and CD) will be a YAML pipeline).

thx to colleague Roelof Jonkers for helping