Showing posts with label ADFDEVOPS. Show all posts
Showing posts with label ADFDEVOPS. Show all posts

Monday 7 November 2022

Create Data Lake containers and folders via DevOps

Case
I need a process to create Azure Data Lake containers throughout my DTAP environment of my Azure Data Platform. Manually is not an option because we want to minimize owner and contributor access to the Data Lake of acceptance and production, but Synapse and Data Factory don't have a standard activity to create ADL containers. How to automatically create Azure Data Lake Containers (and folders) ?
Storage Account (datalake) containers














Solution
An option is to use a PowerShell script that is executed by the Custom activity in combination with an Azure Batch service. Or an Azure Automation runbook with the same PowerShell script that is executed by a Web(hook) activity.

However since you probably don't need create new containers during every (ADF/Synapse) pipeline run, we suggest to do this via an Azure Devops Pipeline as part of your CICD proces with the same PowerShell script. You could either create a separte CICD pipeline for it or integrate it in your Synapse or ADF pipeline.

The example below creates containers and optionaly also folders and subfolders within these container. Synapse and Data Factory will create folders with forexample the Copy Data activity

1) Repos folder structure
For this example we use a CICD folder in the repos with subfolders for PowerShell, YAML and Json.
Repos folder structure




















2) JSON config
Because we don't want to hardcode the containers and folders we use a JSON file as input for the PowerShell script. This JSON file is stored within the JSON folder of the DevOps Repository. We use the same JSON file for the entire environment, but you can ofcourse create a separate file for each environment if you need for example different containers on production. Our file is called config_storage_account.json

The folder array in this example is optional and when left empty no folders will be created. You can create subfolders within folders by separating them with a forwardslash.
{
"containers":   {
                "dataplatform":["folder1","folder2/subfolder1","folder2/subfolder2"]
                , "SourceX":["Actual","History"]
                , "SourceY":["Actual","History"]
                , "SourceZ":[]
                }
}

3) PowerShell code
The PowerShell script called SetStorageAccounts.ps1 is stored in the PowerShell folder and contains three parameters:
  • ResourceGroupName - The name of the resource group where the storage account is located.
  • StorageAccountName - The name the storage account
  • JsonLocation - The location of the json config file in the repos (see previous step)
It checks the existance of both the config file and the storage account. Then first loop through the containers from the config and within the container loop it loops through the folders of that specific container. For container names and folderpaths it does some small corrections for often made mistakes.

Note that the script will not delete containers and folders (or set authorizations to them). This is of course possible, but make sure to test this very thoroughly and even with testing a human error in configuring the config file is easy to make and could cause lots of data lose!
# This PowerShell will create the containers provided in the JSON file
# It does not delete of update containers and folders or set authorizations
param (
    [Parameter (Mandatory = $true, HelpMessage = 'Resource group name of the storage account.')]
    [ValidateNotNullOrEmpty()]
    [string] $ResourceGroupName,

    [Parameter (Mandatory = $true, HelpMessage = 'Storage account name.')]
    [ValidateNotNullOrEmpty()]
    [string] $StorageAccountName,

    [Parameter (Mandatory = $true, HelpMessage = 'Location of config_storage_account.json on agent.')]
    [ValidateNotNullOrEmpty()]
    [string] $JsonLocation
 )

# Combine path and file name for JSON file. The file name is hardcoded and the
# same for each environment. Create an extra parameters for the filename if
# you need different files/configurations per environment.
$path = Join-Path -Path $JsonLocation -ChildPath "config_storage_account.json"
Write-output "Extracting containernames from $($path)"


# Check existance of file path on the agent
if (Test-Path $path -PathType leaf) {
    
    # Get all container objects from JSON file
    $Config = Get-Content -Raw -Path $path | ConvertFrom-Json

    # Create containers array for looping
    $Config | ForEach-Object { 
        $Containers = $($_.containers) | Get-Member -MemberType NoteProperty | Select-Object -ExpandProperty Name
    }
    
    # Check Storage Account existance and get the context of it
    $StorageCheck = Get-AzStorageAccount -ResourceGroupName $ResourceGroupName -Name $StorageAccountName -ErrorAction SilentlyContinue        
    $context = $StorageCheck.Context

    # If Storage Account found
    if ($StorageCheck) {
        # Storage Account found
        Write-output "Storage account $($StorageAccountName) found"

        # Loop through container array and create containers if the don't exist
        foreach ($container in $containers) {
            # First a little cleanup of the container

            # 1) Change to lowercase
            $container = $container.ToLower()

            # 2) Trim accidental spaces
            $container = $container.Trim()


            # Check if container already exists
            Write-output "Checking existence of container $($container)"
            $ContainerCheck = Get-AzStorageContainer -Context $context -Name $container -ErrorAction SilentlyContinue 

            # If container exists
            if ($ContainerCheck) {
                Write-Output "Container $($container) already exists"
            }
            else {
                Write-Output "Creating container $($container)"
                New-AzStorageContainer -Name $container -Context $context | Out-Null
            }

            # Get container folders from JSON
            Write-Output "Retrieving folders from config"
            $folders = $Config.containers.$container
            Write-Output "Found $($folders.Count) folders in config for container $($container)"

            # Loop through container folders
            foreach ($folder in $folders) {
                # First a little cleanup of the folders

                # 1) Replace backslashes by a forward slash
                $path = $folder.Replace("\","/")

                # 3) Remove unwanted spaces
                $path = $path.Trim()
                $path = $path.Replace("/ ","/")
                $path = $path.Replace(" /","/")

                # 3) Check if path ends with a forward slash
                if (!$path.EndsWith("/")) {
                    $path = $path + "/"
                }
                    
                # Check if folder path exists
                $FolderCheck = Get-AzDataLakeGen2Item -FileSystem $container -Context $context -Path $path  -ErrorAction SilentlyContinue
                if ($FolderCheck) {
                    Write-Output "Path $($folder) exists in container $($container)"
                } else {
                    New-AzDataLakeGen2Item -Context $context -FileSystem $container -Path $path -Directory | Out-Null
                    Write-Output "Path $($folder) created in container $($container)"
                }
            }

        }
    } else {
        # Provided storage account not corrrect
        Write-Output "Storageaccount: $($StorageAccountName) not available, containers not setup."
    }
} else {
    # Path to JSON file incorrect
    Write-output "File $($path) not found, containers not setup."
}
4) YAML file.
If you integrate this in your existing Data Factory or Synapse YAML pipeline then you only need to add one PowerShell step. Make sure you have a checkout step to copy the config and powershell file from the repository to the agent. You may also want to add a (temporary) treeview step to check the paths on your agent. This makes it easier to configure paths within your YAML code.
parameters:
  - name: SerCon
    displayName: Service Connection
    type: string
  - name: Env
    displayName: Environment
    type: string
    values: 
    - DEV
    - ACC
    - PRD
  - name: ResourceGroupName
    displayName:
    type: string
  - name: StorageAccountName
    displayName:
    type: string

jobs:
    - deployment: deploymentjob${{ parameters.Env }}
      displayName: Deployment Job ${{ parameters.Env }} 
      environment: Deploy to ${{ parameters.Env }}

      strategy:
        runOnce:
          deploy:
            steps:
            ###################################
            # 1 Check out repository to agent
            ###################################
            - checkout: self
              displayName: '1 Retrieve Repository'
              clean: true 

            ###################################
            # 3 Show environment and treeview
            ###################################
            - powershell: |
                Write-Output "Deploying Synapse in the ${{ parameters.Env }} environment"
                tree "$(Pipeline.Workspace)" /F
              displayName: '2 Show environment and treeview Pipeline_Workspace'

            ###################################
            # 3 Create containers in datalake
            ###################################
            - task: AzurePowerShell@5
              displayName: '3 Create data lake containers'
              inputs:
                azureSubscription: ${{ parameters.SerCon }}
                scriptType: filePath
                scriptPath: $(Pipeline.Workspace)\s\CICD\PowerShell\SetStorageAccounts.ps1
                scriptArguments:
                  -ResourceGroupName ${{ parameters.ResourceGroupName }} `
                  -StorageAccountName ${{ parameters.StorageAccountName }} `
                  -JsonLocation $(Pipeline.Workspace)\s\CICD\Json\
                azurePowerShellVersion: latestVersion
                pwsh: true

5) The result
Now it's time to run the YAML pipeline and check the Storage Account to see wether the containers and folders are created.
DevOps logs of creating containers and folders















Created data lake folders in container














Conclusion
In this post you learned how to create containers and folders in the Storage Account / Data Lake via a little PowerShell script and a DevOps pipeline, but you can also reuse this PowerShell script in for the mentioned alternative solutions.

Again the note about also deleting containers and folders. Make sure to double check the code, but also the procedures to avoid human errors and potenially loose a lot of data. You might want to setup soft deletes in your storage account to have a fallback scenario for screwups.

Sunday 21 November 2021

ADF Release - Use parameters to enable Triggers

Case
During deployment of Azure Data Factory (ADF) via Azure DevOps pipelines I want to make sure that a certain trigger is only executed on Production and not on the lower environments. How can we do this without writing code (low-code)?

ADF Trigger



















Solution
This is possible by changing the ARM template parameter definition which on its turn will switch certain properties into overridable parameters during deployment. However, the triggers are not included by default in the parameter file. There is also a limitation that you cannot override every property, for example runtimeState to activate and deactivate the trigger. The workaround for this is to use the endTime property. 

More information about which properties are parameterized can be found here.

1) Understand parameters in ADF
Before we start overriding properties in the ARM template, it is good to understand the parameters in general. As you know, when start building your ADF, one of the first things you do is creating a Linked Service. By default, ADF knows that for example a connection string or using a Key Vault in a Linked Service should be parameterize, because the database server or the URL will be different per environment in a DTAP. The result is always two ARM template files: the content itself (ARMTemplateForFactory.json) and the parameters that can be overwritten (ARMTemplateParametersForFactory.json). Another file holds the definition of the parameters (arm-template-parameters-definition.json). 

When you start developing in a new ADF, the ARM template parameters file (result) only contains the ADF name that can be overwritten. When you have created a Linked Service, for example Azure Blob Storage, the file should look something like below. 
{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "factoryName": {
            "value": "bitools-d-adf-dwh"
        },
        "LS_ABLB_bitools_connectionString": {
            "value": ""
        }
    }
}
You can check this via "Manage - ARM template - Export ARM template".

ADF Portal - Check the parameters

















2) Check Trigger code
Now back to our trigger. Based on the documentation, we know which properties we can parameterize for a trigger. Lets have a look at the code of the trigger itself.
  • In the ADF portal go to Manage (toolbox icon in left menu) and then to Triggers
  • Find your trigger and hover your mouse on it and click on the code icon {}
ADF portal - Check the code of your trigger

See below the JSON code of the trigger. You can override everything that is related to typeProperties. Unfortunately the runtimeState property is not one of them.
{
    "name": "Trigger_Master",
    "properties": {
        "description": "Test",
        "annotations": [],
        "runtimeState": "Started",
        "pipelines": [
            {
                "pipelineReference": {
                    "referenceName": "PL_Master",
                    "type": "PipelineReference"
                }
            }
        ],
        "type": "ScheduleTrigger",
        "typeProperties": {
            "recurrence": {
                "frequency": "Day",
                "interval": 1,
                "startTime": "2021-01-01T00:00:00Z",
                "endTime": "2021-01-02T00:00:00Z",
                "timeZone": "UTC",
                "schedule": {
                    "minutes": [
                        10
                    ],
                    "hours": [
                        0
                    ]
                }
            }
        }
    }
}
Now that we have identified which properties can be parameterized, we need to know which property we want to override for our use case. As you know, we need to make sure the trigger should not be executed on every environment. One way to do this is to set the end date (and time) of a trigger. This property is called endTime. For example: a trigger with an end date on "01/02/2021 12:00 AM" will not be executed because this is in the past. When the end date is "12/31/9999 12:00 AM", the trigger will be executed because it is in the future.

Go to your trigger and set an end date and time in the future, for example 12/31/9999 12:00 AM.

ADF portal - Specify end date for trigger


































3) ARM template
Next step is to override the endTime property in the ARM template parameter definition. Unlike integration runtime or linked services properties, we need to add this property first. 
  • In the ADF portal go to Manage (same as step 2) and then to ARM template
  • Click on Edit parameter configuration
  • Search for "Microsoft.DataFactory/factories/triggers" and add the endTime property (that is part of recurrence) within typeProperties, set the value to "=:-endTime" and click the OK button. See below how your JSON should look like for the trigger part.
    "Microsoft.DataFactory/factories/triggers": {
        "properties": {
            "pipelines": [
                {
                    "parameters": {
                        "*": "="
                    }
                },
                "pipelineReference.referenceName"
            ],
            "pipeline": {
                "parameters": {
                    "*": "="
                }
            },
            "typeProperties": {
                "scope": "=",
                "recurrence": {
                    "endTime": "=:-endTime"
                }
            }
        }
    },
Now check the ARM template parameters via "Manage - ARM template - Export ARM template" (see step 1) and the result should look like this.
{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "factoryName": {
            "value": "bitools-d-adf-dwh"
        },
        "LS_ABLB_bitools_connectionString": {
            "value": ""
        },
        "Trigger_Master_endTime": {
            "value": "9999-12-31T00:00:00Z"
        }
    }
}
In this case, we have set the default by using "=" in front of the value. Adding a minus - in front of parameter name (endTime) will remove "_properties_typeProperties" from the parameter name. More information here.

Note:
The global parameters are also not set by default. Click here how to include them in the ARM template parameters file as well.

4) Adjust release pipeline
If you are using YAML to publish the changes then the only thing you have to change is the overrideParameters property by adding the new parameter Trigger_Master_endTime and adding either a variable or a hardcoded value. The > behind the property helps you to break the string over multiple lines and keep the YAML code more readable.
          ###################################
          # Deploy ADF Artifact
          ###################################
          - task: AzureResourceManagerTemplateDeployment@3
            displayName: '4 Deploy ADF Artifact'
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: 'sc_mcacc-adf-devopssp'
              subscriptionId: $(DataFactorySubscriptionId)
              action: 'Create Or Update Resource Group'
              resourceGroupName: $(DataFactoryResourceGroupName)
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json'
              csmParametersFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateParametersForFactory.json'
              overrideParameters: > 
                -factoryName $(DataFactoryName)
                -LS_ABLB_bitools_connectionString $(AzureBlobConnectionString)
                -Trigger_Master_endTime $(AzureDataFactoryTriggerEndTimeActive)
              deploymentMode: 'Incremental'

            env: 
                SYSTEM_ACCESSTOKEN: $(System.AccessToken)
And if you're using the Release pipelines with the ARM template deployment task then you can just go to the Override template parameters property, click on the edit button and replace the value with a new value or a variable from a variable group.

ARM template deployment - Override template parameters


















Conclusion
In this post you learned how to add and override properties of a trigger during deployment via Azure DevOps. This allows you to activate or deactivate a trigger for that environment during deployment using ARM templates without writing any code

In a previous post we showed you how to accomplish this for a Linked Service in combination with Azure Key Vault.

Thursday 18 November 2021

ADF Release - Use script to enable certain Triggers

Case
During deployment of Azure Data Factory (ADF) via Azure DevOps pipelines I want to make sure that a certain trigger is only executed on Production and not on the lower environments like acceptance or test. How can we accomplish that without any manual operations? 
ADF Trigger
























Solution
This is possible with an extra PowerShell step. The standard deployment stages consists of three steps:
  • a pre-deployment script that stops all triggers.
  • the actual deployment
  • a post-deployment script that starts all triggers and cleans up old parts.
You could adjust the standard pre- and post deployment PowerShell script from Microsoft or create an additional PowerShell script if you don't want to mess around with the standard script from Microsoft. 

1) PowerShell
Below that additional script. Feel free to merge it with the standard script. The PowerShell file should be stored in the repository in the \CICD\PowerShell folder (see setup post).
PowerShell file for setting trigger status
























The new PowerShell script has five parameters which will be provided by the YAML pipeline (or release pipeline):
  1. DataFactoryName
    [string] Name of your Data Factory
  2. DataFactoryResourceGroup
    [string] Name of the Resource Group holding your ADF
  3. DataFactorySubscriptionId
    [string] Guid of the Azure Subscription hosting your ADF
  4. DisableAllTriggers
    [boolean] True or false indicating whether all triggers should be disabled (except triggers mentioned in next parameter)
  5. EnabledTriggers
    [string] Comma separated list with triggernames that should be enabled: "trigger1,trigger2"
The script consists of three parts. The first part checks all parameters. If one of them is incorrect then the scripts fails and stops. The second part is the optional disabling of all triggers (except the ones that we need enabled) and the last part of the script checks the list of triggers that should be enabled. If they are still disabled they will be enabled.
param
(
    [parameter(Mandatory = $true)] [String] $DataFactoryName,
    [parameter(Mandatory = $true)] [String] $DataFactoryResourceGroup,
    [parameter(Mandatory = $true)] [String] $DataFactorySubscriptionId,
    [parameter(Mandatory = $false)] [Bool] $DisableAllTriggers = $true,
    [parameter(Mandatory = $true)] [String] $EnabledTriggers # comma separated list
)



##############################################
# Check provided information
##############################################
$ErrorActionPreference = "Stop"

# Setting one subscription on active (fails with non existing)
Write-Host "Checking existance Subscription Id [$($DataFactorySubscriptionId)]."
$Subscription = Get-AzSubscription -SubscriptionId $DataFactorySubscriptionId `
                                   -WarningAction Ignore
Write-Host "- Subscription [$($Subscription.Name)] found."
Set-AzContext -Subscription $DataFactorySubscriptionId `
              -WarningAction Ignore > $null
Write-Host "- Subscription [$($Subscription.Name)] is active."


# Checking whether resource group exists (fails with non existing)
Write-Host "Checking existance Resource Group [$($DataFactoryResourceGroup)]."
Get-AzResourceGroup -Name $DataFactoryResourceGroup > $null
Write-Host "- Resource Group [$($DataFactoryResourceGroup)] found."


# Checking whether provided data factory exists (fails with non existing)
Write-Host "Checking existance Data Factory [$($DataFactoryName)]."
Get-AzDataFactoryV2 -ResourceGroupName $DataFactoryResourceGroup `
                    -Name $DataFactoryName > $null
Write-Host "- Data Factory [$($DataFactoryName)] found."


# Checking provided triggernames, first split into array
$EnabledTriggersArray = $EnabledTriggers.Split(",")
Write-Host "Checking existance of ($($EnabledTriggersArray.Count)) provided triggernames."


# Loop through all provided triggernames
foreach ($EnabledTrigger in $EnabledTriggersArray)
{ 
    # Get Trigger by name
    $CheckTrigger = Get-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup `
                                               -DataFactoryName $DataFactoryName `
                                               -Name $EnabledTrigger `
                                               -ErrorAction Ignore # To be able to provide more detailed error

    # Check if trigger was found
    if (!$CheckTrigger)
    {
        throw "Trigger $($EnabledTrigger) not found in data dactory $($DataFactoryName) within resource group $($DataFactoryResourceGroup)"
    }
}
Write-Host "- All ($($EnabledTriggersArray.Count)) provided triggernames found in data dactory $($DataFactoryName) within resource group $($DataFactoryResourceGroup)"



##############################################
# Disable triggers
##############################################
# Check if all trigger should be disabled
if ($DisableAllTriggers)
{
    # Get all enabled triggers and stop them (unless they should be enabled)
    Write-Host "Getting all enabled triggers that should be disabled."
    $CurrentTriggers = Get-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup `
                                                   -DataFactoryName $DataFactoryName `
                       | Where-Object {$_.RuntimeState -ne 'Stopped'} `
                       | Where-Object {$EnabledTriggersArray.Contains($_.Name) -eq $false}

    # Loop through all found triggers
    Write-Host "- Number of triggers to disable: $($CurrentTriggers.Count)."
    foreach ($CurrentTrigger in $CurrentTriggers)
    {
        # Stop trigger
        Write-Host "- Stopping trigger [$($CurrentTrigger.Name)]."
        Stop-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup -DataFactoryName $DataFactoryName -Name $CurrentTrigger.Name -Force > $null
    }
}



##############################################
# Enable triggers
##############################################
# Loop through provided triggernames and enable them
Write-Host "Enable all ($($EnabledTriggersArray.Count)) provided triggers."
foreach ($EnabledTrigger in $EnabledTriggersArray)
{                   
    # Get trigger details
    $CheckTrigger = Get-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup `
                                               -DataFactoryName $DataFactoryName `
                                               -Name $EnabledTrigger

    # Check status of trigger
    if ($CheckTrigger.RuntimeState -ne "Started")
    {
        Write-Host "- Trigger [$($EnabledTrigger)] starting"
        Start-AzDataFactoryV2Trigger -ResourceGroupName $DataFactoryResourceGroup `
                                     -DataFactoryName $DataFactoryName `
                                     -Name $EnabledTrigger `
                                     -Force > $null
    }
    else
    {
        Write-Host "- Trigger [$($EnabledTrigger)] already started"
    }
}

2) YAML Pipeline
You can now extend the existing YAML pipeline with an extra step. Make sure that all parameters for this script are available as variables in the variable group (under Pipelines, Library) and make sure to pass them to the second YAML pipeline as parameters. If you followed the previous blogs then you only need to add EnabledTriggers as variable and a YAML parameter.
          ###################################
          # Enable certain triggers and disable rest
          ###################################
          - task: AzurePowerShell@5
            displayName: '6 Enable certain triggers and disable rest'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\s\CICD\powershell\SetTriggers.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -DataFactoryName $(DataFactoryName)
                -DataFactoryResourceGroup $(DataFactoryResourceGroupName)
                -DataFactorySubscriptionId $(DataFactorySubscriptionId)
                -DisableAllTriggers $true
                -EnabledTriggers $(EnabledTriggers) # format: "prd_daily_4am,prd_daily_1pm"
The result of running the pipeline










Conclusion

In this post you learned how to enable only certain triggers for a specific environment. This makes it easy to generate a trigger in development for the production environment. The downside (for some) is ofcourse that you get an extra piece of code to maintain. In a next post we will show that you can also accomplish this without writing code via the ARM template. However the trigger property runtimeState cannot be set via the ARM template, so a workaroumd is necessary for the nocode variant.


Sunday 14 November 2021

ADF Release - Update Linked Service while deploying

Case
I'm deploying Azure Data Factory via DevOps pipelines through my DTAP environment. During the deployment I want to change the URL of the Linked Service from Azure Key Vault to point it to the Key Vault of that specific environment. How do I change that Linked Service in DevOps?
ADF Linked Service






















Solution
This is possible by changing the ARM template parameter definition which on its turn will switch certain properties into overridable parameters during deployment. There is one downside: you cannot create a parameter for one specific Linked Service, because it will work for all Linked Services with that same property. However you can narrow it down to one particular type of Linked Service (for all Key Vaults in this example). 

1) Check Linked Service
For this example we will override the URL of the Azure Key Vault Linked Service, but first we need to find the actual property name that we want to override.
  • In ADF Studio go to Manage (toolbox icon in left menu) and then to Linked Services.
  • Now find your Key Vault Linked Service and hover your mouse over .it and click on the code icon {}.
  • Now check which property identifies a specific Key Vault. It should be within the typeProperties tag. In this case the baseUrl property contains a URL that points to one specific Key Vault.
baseURL property











Also notice the type property which will be used further on: AzureKeyVault


2) ARM template
Next step is to make this property overridable in the ARM template parameter definition (arm-template-parameters-definition.json). First the easiest way:
  • Under the same Manage menu item as step 1 go to ARM template
  • Click on Edit parameter configuration
  • Now find the baseUrl property within the Linked Services tag and change its value from "=" to "-" and then click on the OK button
Making baseUrl overridable
























This will create a new parameter with the name: [LinkedServiceName]_properties_typeProperties_baseUrl

As mentioned before, this will now work for all Linked Services that have a (filled) property called baseUrl. A bit nicer is to instead create a Key Vault specific parameter by adding a piece JSON code below the general tag with the *. The name 'AzureKeyVault' from the code below can be found in the code of step 1.
 
        "AzureKeyVault": {
            "properties": {
                "typeProperties": {
                    "baseUrl": "-"
                }
            }
        },
























This will result in the same (long) parameter name, but now only for Linked Services pointing to Azure Key Vault. We can shorten that very long parameter name by adding -:-BaseUrl where the colon : is the separator for the next part of the property. This is the name of the parameter. Adding a minus - in front of that name will remove _properties_typeProperties from the parametername and shorten it to:
[LinkedServiceName]_baseUrl 
        "AzureKeyVault": {
            "properties": {
                "typeProperties": {
                    "baseUrl": "-:-BaseUrl"
                }
            }
        }
This is much nicer. More info about this can be found in the documentation.

3) Adjust release pipeline
If you are using YAML to publish the changes then the only thing you have to change is the overrideParameters property by adding the new parameter ls_kv_bitools_baseUrl and adding either a variable or a hardcoded value. The > behind the property helps you to break the string over multiple lines and keep the YAML code more readable.
          ###################################
          # Deploy ADF Artifact
          ###################################
          - task: AzureResourceManagerTemplateDeployment@3
            displayName: '4 Deploy ADF Artifact'
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: 'sc_mcacc-adf-devopssp'
              subscriptionId: $(DataFactorySubscriptionId)
              action: 'Create Or Update Resource Group'
              resourceGroupName: $(DataFactoryResourceGroupName)
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json'
              csmParametersFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateParametersForFactory.json'
              overrideParameters: > 
                -factoryName $(DataFactoryName) 
                -ls_kv_bitools_baseUrl "https://bitools-prd.vault.azure.net/"
              deploymentMode: 'Incremental'

            env: 
                SYSTEM_ACCESSTOKEN: $(System.AccessToken)
If you are not sure about which parameternames you can use in the YAML, then you can lookup that name by exporting the ARM template under ARM template. Then check arm_template.json or arm_template_parameters.json
export template to find parametername













And if you're using the Release pipelines with the ARM template deployment task then you can just go to the Override template parameters property, click on the edit button and replace the value with a new value or a variable from a variable group.
ARM template deployment - Override template parameters

















Conclusion
In this post you learned how to override properties of a Linked Service during deployment via Azure DevOps. This allows you to point your Linked Service to the correct service for that specific environment. The most likely candidate for this is probably the Linked Service pointing to Azure Key Vault where you store all other connection details.

In a previous post we also showed you how to change Global Parameters during deployment and in a next post we will show you how to change triggers during deployment because you probably don't want to use the same triggers on development, test, acceptance and production.



Thursday 4 November 2021

ADF Release - Set global params during deployment

Case
I'm using global parameters in my Azure Data Factory project and I want to change their values during deployment through my DTAP environments. How do I do that?
ADF Global Parameters


















Solution
In this post we are only focusing on the deployment part of the DevOps YAML pipeline. If you do not have a pipeline yet then please read the complete story of configuring the Development ADF and releasing it to other Factories first.

1) Include in ARM template
First make sure that in the pane of the Global Parameters the checkbox "Include in ARM template" is checked. This will add the parameters to the ARM template which will make them available as a parameters during deployment.
Check Include in ARM template













2) Getting name of parameter
The global parameter will get a slightly different name in the ARM template. For example: myGlobParam becomes dataFactory_properties_globalParameters_myGlobParam_value. You can check that name by exporting the ARM template under ARM template. Then check arm_template.json or arm_template_parameters.json
Find the parameter name










An other option is to hit the publish button and then check the (not used) adf_publish branch in the repository.
Find the parameter name













3) Adjust release pipeline
If you are using YAML to publish the changes then the only thing you have to change is the overrideParameters property by adding the new parameter dataFactory_properties_globalParameters_myGlobParam_value and adding either a variable or a hardcoded value. The > behind the property helps you to break the string over multiple lines and keep the YAML code more readable.
          ###################################
          # Deploy ADF Artifact
          ###################################
          - task: AzureResourceManagerTemplateDeployment@3
            displayName: '4 Deploy ADF Artifact'
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: 'sc_mcacc-adf-devopssp'
              subscriptionId: $(DataFactorySubscriptionId)
              action: 'Create Or Update Resource Group'
              resourceGroupName: $(DataFactoryResourceGroupName)
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json'
              csmParametersFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateParametersForFactory.json'
              overrideParameters: > 
                -factoryName $(DataFactoryName) 
                -dataFactory_properties_globalParameters_myGlobParam_value "Test123"
              deploymentMode: 'Incremental'

            env: 
                SYSTEM_ACCESSTOKEN: $(System.AccessToken)
After deployment you can see the new parameter value











And if you're using the Release pipelines with the ARM template deployment task then you can just go to the Override template parameters property, click on the edit button and replace the value with a new value or a variable from a variable group.
ARM template deployment - Override template parameters





















Conclusion
In this post you learned how to use Global Parameters from ADF as ARM template parameters for Azure Data Factory. This allows you to use different settings in the various factories in you DTAP environments. In a next post we will show you how to override properties from for example a Linked Service or a Trigger that should get a different value in acceptance or production.


Monday 1 November 2021

ADF Release - Create YAML CICD Pipeline - part 2

Case
How do you deploy Azure Data Factory via a YAML pipeline instead of the Release pipeline?
Release ADF pipelines with YAML pipelines












Solution
In a previous post we used a YAML pipeline to created an ARM template for ADF. That ARM template is now available as an artifact and ready for deployment. That previous post ended in calling the release part of the pipeline which is in a separate YAML file. This makes it easier to call that same YAML file for test, acceptance and production. Below the last part of the main pipeline:
###################################
# Deploy Test environment
###################################
- stage: DeployTest
  displayName: Deploy Test
  variables:
  - group: ParamsTst
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: tst
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)
        
###################################
# Deploy Acceptance environment
###################################
- stage: DeployAcceptance
  displayName: Deploy Acceptance
  variables:
  - group: ParamsAcc
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: acc
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)
        
###################################
# Deploy Production environment
###################################
- stage: DeployProduction
  displayName: Deploy Production
  variables:
  - group: ParamsPrd
  pool:
    vmImage: 'windows-latest'
  condition: Succeeded()
  jobs:
    - template: deployADF.yml
      parameters:
        env: prd
        DataFactoryName: $(DataFactoryName)
        DataFactoryResourceGroupName: $(DataFactoryResourceGroupName)
        DataFactorySubscriptionId: $(DataFactorySubscriptionId)

In this blog we will create the deployADF.yml file mentioned in the YAML code above, but first we need to give the Service Principal (SP), used by the Azure DevOps service connection , access to the target Data Factories otherwise it can't release the ARM template.


1) Access control (IAM)
In this first step we will give the SP access to the target ADF. You have to repeat that for all target factories, but first you have to decide whether you want to give the Service Principal access to specific services (like ADF) or the resource group or even the entire subscription. 

In most cases you want to limit the access to the bare minimum to avoid misuse. Since there is no ADF deployment task we are using the more general AzureResourceManagerTemplateDeployment task. One downside of this task is that you need to give permissions on at least the resource group. Access to ADF only is not enough and will give you an error: Failed to check the resource group status. Error: {"statusCode":403}.

The next thing to keep in mind is the role to assign. You want to avoid owner to avoid misuse. In this case we need the role Contributor.
  • In the Azure portal go to the resource group where your ADF is located 
  • Click on Access control (IAM) in the left menu
  • Click on +Add and then on Add role Assignment
  • Search for the appropriate Azure role (this screen recently changed, but you can also still use the classic experience via the link. Click on Contributor and press Next.
  • Click on +Select members and search for your SP, click on the account and then press Select
  • Optionally add a description and press Next and then Review + assign
Contributor role in Resource Group for SP













2) Add additional YAML file
Next step is to add the second YAML file to the repository that does the deployment of ADF. Use the same repository folder as the existing YAML file (in CICD\YAML folder). Splitting up the deployment allows you to reuse the deployment code for test, acceptance and production. Downside is that you have to edit it in the repository instead of under pipelines (but you could also use the YAML extension for Visual Studio Code).

The second YAML consists of 4 parts and the optional treeview task to check where all your files are located on the agent.
  1. Parameters and environment
  2. Treeview
  3. Stop triggers
  4. Deploy ADF
  5. Cleanup and start triggers


A. Parameters and environment
This YAML file starts with parameters that will be filled by the main YAML file. As an alternative you could just use the variable group added in the main pipeline because those variables are also available in sub pipelines. There are four string parameters of which only Env has a list of expected/allowed values:
  • env (name of the environment: tst, acc or prd)
  • DataFactoryName
  • DataFactoryResourceGroupName
  • DataFactorySubscriptionId
parameters:
  - name: env
    displayName: Environment
    type: string
    values:
    - dev
    - tst
    - acc
    - prd
  - name: DataFactoryName
    displayName: Data Factory Name
    type: string
  - name: DataFactoryResourceGroupName
    displayName: Data Factory Resource Group Name
    type: string
  - name: DataFactorySubscriptionId
    displayName: Data Factory Subscription Id
    type: string

We also give the job a name and we create an environment. A list of environments can be found under Pipelines - Environments. This is also the place where you can add Approvals and Checks which is not available in the YAML language. The checkout is optional, but very handy when you for example have some custom PowerShell scripts in the repository that you want to execute before, during or after deployment.

jobs:
  - deployment: deploymentjob${{ parameters.env }}
    displayName: Deployment Job ${{ parameters.env }} 
    environment: deploy ${{ parameters.env }}
    strategy:
      runOnce:
        deploy:
          steps:
          - checkout: self
            displayName: '1 Retrieve Repository'
            clean: true 

B. Treeview
This treeview task is just for debugging. It for example allows you to see where the artifact is located on your agent. This makes it much easier to configure the following steps. When you have finished the pipeline then just delete this task from the steps or comment it out until you need it again.
          ###################################
          # Show environment and treeview
          ###################################
          - powershell: |
              Write-Output "This is the ${{ parameters.env }} environment"
              tree "$(Pipeline.Workspace)" /F
            displayName: '2 Show environment and treeview Pipeline_Workspace'

C. Stop triggers
Before we can deploy the created ARM template to our ADF we need to make sure nothing is running. Microsoft created a PowerShell script for this which is included in the generated ARM template folder. It can stop all triggers
PrePostDeploymentScript.ps1



















          ###################################
          # Stop triggers
          ###################################
          - task: AzurePowerShell@5
            displayName: '3 Stop triggers'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\ArmTemplatesArtifact\PrePostDeploymentScript.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate '$(Pipeline.Workspace)\ArmTemplatesArtifact\ARMTemplateForFactory.json'
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $true
                -deleteDeployment $false
If you're running your new pipeline and you're getting an error stating that it cannot find your resource group while you're absolutely sure that it exists and that the SP has access to it then please check this blog post. It shows you how to create a slightly adjusted copy of the script (with an extra parameter) that is stored in the CICD\PowerShel folder.
          ###################################
          # Stop triggers
          ###################################
          - task: AzurePowerShell@5
            displayName: '3 Stop triggers'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\s\CICD\powershell\PrePostDeploymentADF.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate '$(Pipeline.Workspace)\ArmTemplatesArtifact\ARMTemplateForFactory.json'
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $true
                -deleteDeployment $false
                -Subscription $(DataFactorySubscriptionId)

Stopping deployed triggers














D. Deploy ADF
Now it's finally time for the actual deployment of ADF. As mentioned above we are using the AzureResourceManagerTemplateDeploymentV3 task. Check the documentation for a description of all parameters. We will mention one parameter: Deployment Mode. It is very important to keep this on INCREMENTAL! The complete mode will delete everything in your resource group that is not mentioned in the ARM template. Since our template only contains ADF you will end up with a nearly empty resource group with only ADF in it. This is a very common mistake. So now you are warned.
          ###################################
          # Deploy ADF Artifact
          ###################################
          - task: AzureResourceManagerTemplateDeployment@3
            displayName: '4 Deploy ADF Artifact'
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: 'sc_adf-devopssp'
              subscriptionId: $(DataFactorySubscriptionId)
              action: 'Create Or Update Resource Group'
              resourceGroupName: $(DataFactoryResourceGroupName)
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json'
              csmParametersFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateParametersForFactory.json'
              overrideParameters: '-factoryName $(DataFactoryName)'
              deploymentMode: 'Incremental'

            env: 
                SYSTEM_ACCESSTOKEN: $(System.AccessToken)
Deployment of ADF












E. Cleanup and start triggers
Because we did an incremental deployment all deleted items are still in ADF. Only new and update items have changed. So we have to compare ADF with the template and delete all items that are not in the template. Luckily Microsoft already created a script for this. Same script as the stop trigger script. Just different parameters. And it also enables the triggers.
          ###################################
          # Start triggers and cleanup
          ###################################
          - task: AzurePowerShell@5
            displayName: '5 Start triggers and cleanup'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\ArmTemplatesArtifact\PrePostDeploymentScript.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate $(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $false
                -deleteDeployment $true
Note that the same issue with not finding your Resource Group will occure here if it occured when stopping the triggers. Same solution (different script and extra subscription parameter).
Start triggers and cleanup
















3) The result
Now it's time to run the pipeline from start to end by making changes to the Development Data Factory. And in no time all factories are updated.
The Result




















Conclusion
In this blog post you learned how to use YAML to do the (build and) deployment of ADF in a pipeline. The take away is to use the incremental option and not set it to complete to avoid those shocked looks when viewing your empty resource group.
In a next post we will show how to overwrite global parameters and change Linked Services during deployment and show you how to enable or disable certain ADF triggers depending on the environment. This allows you to have different active triggers in Development, Test, Acceptance and Production without setting them manually after deployment.

Now all YAML parts together:


parameters:
  - name: env
    displayName: Environment
    type: string
    values:
    - dev
    - tst
    - acc
    - prd
  - name: DataFactoryName
    displayName: Data Factory Name
    type: string
  - name: DataFactoryResourceGroupName
    displayName: Data Factory Resource Group Name
    type: string
  - name: DataFactorySubscriptionId
    displayName: Data Factory Subscription Id
    type: string

jobs:
  - deployment: deploymentjob${{ parameters.env }}
    displayName: Deployment Job ${{ parameters.env }} 
    environment: deploy ${{ parameters.env }}
    strategy:
      runOnce:
        deploy:
          steps:
          - checkout: self
            displayName: '1 Retrieve Repository'
            clean: true 

          ###################################
          # Show environment and treeview
          ###################################
          - powershell: |
              Write-Output "This is the ${{ parameters.env }} environment"
              tree "$(Pipeline.Workspace)" /F
            displayName: '2 Show environment and treeview Pipeline_Workspace'
            
          ###################################
          # Stop triggers
          ###################################
          - task: AzurePowerShell@5
            displayName: '3 Stop triggers'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\ArmTemplatesArtifact\PrePostDeploymentScript.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate '$(Pipeline.Workspace)\ArmTemplatesArtifact\ARMTemplateForFactory.json'
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $true
                -deleteDeployment $false
                
          ###################################
          # Deploy ADF Artifact
          ###################################
          - task: AzureResourceManagerTemplateDeployment@3
            displayName: '4 Deploy ADF Artifact'
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: 'sc_adf-devopssp'
              subscriptionId: $(DataFactorySubscriptionId)
              action: 'Create Or Update Resource Group'
              resourceGroupName: $(DataFactoryResourceGroupName)
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json'
              csmParametersFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateParametersForFactory.json'
              overrideParameters: '-factoryName $(DataFactoryName)'
              deploymentMode: 'Incremental'

            env: 
                SYSTEM_ACCESSTOKEN: $(System.AccessToken)
                
          ###################################
          # Start triggers and cleanup
          ###################################
          - task: AzurePowerShell@5
            displayName: '5 Start triggers and cleanup'
            inputs:
              azureSubscription: 'sc_adf-devopssp'
              pwsh: true
              azurePowerShellVersion: LatestVersion
              scriptType: filePath
              scriptPath: '$(Pipeline.Workspace)\ArmTemplatesArtifact\PrePostDeploymentScript.ps1'
              scriptArguments: > # Use this to avoid newline characters in multiline string
                -armTemplate $(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json
                -ResourceGroupName $(DataFactoryResourceGroupName)
                -DataFactoryName $(DataFactoryName)
                -predeployment $false
                -deleteDeployment $true