Microsoft BI Tools: MARK

Showing posts with label MARK. Show all posts

Saturday, 31 December 2022

Cleanup Synapse before deployment

Case
We are using the Synapse workspace deployment add on for deploying Synapse to the next environment, but when I remove a pipeline/dataset/linked service/trigger in development it doesn't get deleted in Test, Acceptance or Production. There is a pre- and post-deployment script for Data Factory by Microsoft, but where is the Synapse version of it?

Clean Synapse Workspace before deployment

Update: use DeleteArtifactsNotInTemplate: true in deployment task to avoid powershell

Solution
The deployment add on just deploys a new version over an existing version. Therefore deleted parts will remain in your workspace. For most parts this is ugly and annoying, but if obsoleet triggers are still executing pipelines it could screwup your ETL proces.

Below you will find a first version of a Powershell script that first removes all pipelines, datasets, linked services and triggers before deploying a new version of your Synapse workspace from the repository.

Note: Pipelines that are called by other pipelines can't be deleted. So you first need to delete the parent pipeline before you can delete the child pipeline. The scripts skips those child pipelines and continous with the rest. After this first delete iteration a lot of parent pipelines wont exist any more and allow you to remove the child pipelines in a second iteration. This is done in a loop and that loops stops after a100 iterations. So don't create a monstrous tree of pipelines calling each other (and especially don't create loops of pipelines calling each other). The same trick is used for Linked Services. If you have for example a Key Vault Linked Service that is used in an other Linked Service then you first need to delete that second Linked Service before you can delete the Key Vault Linked Service.

param (
 
   [Parameter (Mandatory = $true, HelpMessage = 'Synapse name')]
   [ValidateNotNullOrEmpty()]
   [string] $WorkspaceName,
    
   [Parameter (Mandatory = $true, HelpMessage = 'Resourcegroup name')]
   [ValidateNotNullOrEmpty()]
   [string] $ResourceGroupName
)
 
[string] $WorkspaceDefaultSqlServer = "$($WorkspaceName)-WorkspaceDefaultSqlServer"
[string] $WorkspaceDefaultSqlStorage = "$($WorkspaceName)-WorkspaceDefaultStorage"
 
 
#######################################################
# 1) Checking for resource locks and removing them
#######################################################
Write-Output "========================================"
Write-Output "1) Getting resource locks"
# Getting all locks on the Azure Synapse Workspace
$lock = Get-AzResourceLock -ResourceGroupName $ResourceGroupName -ResourceName $WorkspaceName -ResourceType "Microsoft.Synapse/workspaces"
 
# Looping through all locks to remove them one by one
Write-Output "========================================"
Write-Output "Remove resource locks"
if($null -ne $lock)
{
    $lock | ForEach-Object -process {
        Write-Output "Removing Lock Id: $($lock.LockId)"
        # Remove lock
        Remove-AzResourceLock -LockId $_.LockId -Force
    }
}
 
 
#######################################################
# 2) Stopping and removing all triggers
#######################################################
Write-Output "========================================"
Write-Output "2) Remove triggers"
# Getting all triggers from Synapse
$triggers = Get-AzSynapseTrigger -WorkspaceName $WorkspaceName
Write-Output "Found $($triggers.Count) triggers"
 
# Stopping all triggers before deleting them
$triggers | ForEach-Object -process { 
    Write-Output "Stopping trigger $($_.name)"
    try {
        # Trying to stop each trigger
        Stop-AzSynapseTrigger -WorkspaceName $WorkspaceName -Name $($_.name) -ErrorAction Stop
    }
    catch {
        if ($_.Exception.Message -eq "{}") {
            Write-Output "Trigger stopped"
           # $_.Exception
        }
        else {
            Write-Output "Throw"
            Throw $_
        }
    }
    # Remove trigger
    Remove-AzSynapseTrigger -Name $_.name -WorkspaceName $WorkspaceName -Force
}
 
 
#######################################################
# 3) Removing all pipelines
#######################################################
Write-Output "========================================"
Write-Output "3) Remove pipelines"
# Getting all pipelines from Synapse
$pipelines = Get-AzSynapsePipeline -WorkspaceName $WorkspaceName | Sort-Object -Property id
Write-Output "Found $($pipelines.Count) pipelines"
 
# Trying to delete all pipelines. If a pipeline is still referenced
# by an other pipeline it will continue to remove other pipelines 
# before trying to remove it again... max 100 times. So don't create
# chains of pipelines that are too long
[int] $depthCount = 0
while ($pipelines.Count -gt 0 -and $depthCount -lt 100)
{
    Write-Output "$($pipelines.Count) pipelines left"
    $pipelines | ForEach-Object -process { 
        Write-Output "Trying to delete pipeline $($_.name)"
        Remove-AzSynapsePipeline -Name $_.name -WorkspaceName $WorkspaceName -Force -ErrorAction SilentlyContinue
    }
    Start-Sleep 2 
    $depthCount += 1
    $pipelines = Get-AzSynapsePipeline -WorkspaceName $WorkspaceName
}
Write-Output "Depthcount: $depthCount"
if ($depthCount -eq 100)
{
    throw "Depthcount is to high!"
}
 
 
#######################################################
# 4) Removing all notebooks
#######################################################
Write-Output "========================================"
Write-Output "4) Remove notebooks"
# Getting all notebooks from Synapse
$notebooks = Get-AzSynapseNotebook -WorkspaceName $WorkspaceName
Write-Output "Found $($notebooks.Count) notebooks"
 
# Loop through all notebooks to delete them
$notebooks | ForEach-Object -process {
    Write-Output "Deleting notebooks $($_.Name)"
    Remove-AzSynapseNotebook -Name $($_.Name) -WorkspaceName $WorkspaceName -Force
}
 
 
#######################################################
# 5) Removing all SQL scripts
#######################################################
Write-Output "========================================"
Write-Output "5) Remove SQL scripts"
# Getting all scripts from Synapse
$sqlscripts = Get-AzSynapseSqlScript -WorkspaceName $WorkspaceName
Write-Output "Found $($sqlscripts.count) SQL-scripts"
 
# Loop through all SQL scripts to delete them
$sqlscripts | ForEach-Object -Process {
    Write-Output "Deleting SQL-script $($_.Name)"
    Remove-AzSynapseSqlScript -Name $($_.Name) -WorkspaceName $WorkspaceName -Force
}
 
 
#######################################################
# 6) Removing all datasets
#######################################################
Write-Output "========================================"
Write-Output "6) Remove datasets"
# Getting all datasets from Synapse
$datasets = Get-AzSynapseDataset -WorkspaceName $WorkspaceName
Write-Output "Found $($datasets.Count) datasets"
 
# Loop through all datasets to delete them
$datasets | ForEach-Object -process { 
    Write-Output "Deleting dataset $($_.name)"
    Remove-AzSynapseDataset -Name $_.name -WorkspaceName $WorkspaceName -Force
}
 
 
#######################################################
# 7) Removing all linked services
#######################################################
Write-Output "========================================"
Write-Output "7) Collecting Linked services"
# Getting all linked services from Synapse, except the two default ones
$lservices = Get-AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and  $_.Name -ne $WorkspaceDefaultSqlStorage) } 
Write-Output "Found $($lservices.Count) linked services"
 
# Trying to delete all linked services. If a linked service is still
# referenced by an other linked service it will continue to remove 
# other linked services before trying to remove it again... 
# max 100 times. Example: KeyVault linked services
$depthCount = 0
while ($lservices.Count -gt 0 -and $depthCount -lt 100)
{
    Write-Output "$($lservices.Count) linked services left"
    $lservices | ForEach-Object -process { 
        Write-Output "Trying to delete linked service $($_.name)"
        Remove-AzSynapseLinkedService -Name $_.name -WorkspaceName $WorkspaceName -Force -ErrorAction Continue
    }
 
    Start-Sleep 2 
    $depthCount += 1
    $lservices = Get-AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and  $_.Name -ne $WorkspaceDefaultSqlStorage) }
}
Write-Output "Depthcount: $depthCount"
if ($depthCount -eq 100)
{
    throw "Depthcount is to high!"
}
Write-Output "========================================"

You need to store this Powershell file in the repository as ClearSynapse.ps1. In this case we created a CICD folder in the root of the repository and within that folder we created a Powershell subfolder for all our PowerShell files. Then you can call this script in your YAML pipeline just before you do the deployment part. Make sure your service connection (Service Principal) has enough rights within the workspace. For the first post we used the Synapse Artifact Publisher role to minimize access. For running this script your Service Principal needs more: Synapse Administrator.

###################################
 # 4 Cleanup Synapse
 ###################################
 - task: AzurePowerShell@5
   displayName: '4 Cleanup Synapse'
   inputs:
     azureSubscription: ${{ parameters.ServiceConnection }}
     scriptType: filePath
     scriptPath: $(Pipeline.Workspace)\s\CICD\Powershell\ClearSynapse.ps1
     scriptArguments:
       -WorkspaceName ${{ parameters.TargetWorkspaceName }} `
       -ResourceGroupName ${{ parameters.ResourceGroupName }} `
     azurePowerShellVersion: latestVersion
     pwsh: true
      
###################################
 # 5 Validate and Deploy Synapse
 ###################################
 - task: Synapse workspace deployment@2
   displayName: '5 Validate and deploy Synapse'
   inputs:
     operation: validateDeploy
     ArtifactsFolder: '$(Pipeline.Workspace)/SynapseArtifact'
     azureSubscription: ${{ parameters.ServiceConnection }}
     ResourceGroupName: ${{ parameters.ResourceGroupName }}
     TargetWorkspaceName: ${{ parameters.TargetWorkspaceName }}
     OverrideArmParameters: '
       -LS_AKV_MyKeyVault_properties_typeProperties_baseUrl                  https://${{ parameters.KeyVaultName }}.vault.azure.net/
       '

Conclusion
In this post you learned how to cleanup Synapse with a little PowerShell script. This scripts works perfectly, but is a litte rough by just deleting all basic parts of your workspace (pipelines, datasets, linked services and triggers). A next / nicer version it will just delete everything that is in the Synapse Workspace but isn't in the repositorty (after the deployment).

Deleting stuff before deploying new stuff also makes is almost mandatory to use at least 3 environments because when your deployment fails you are left with an almost empty Synapse workspace. So an extra enviroment between development and production will prevent most deployment screwups.

Thank you Walter ter Maten for improving the script with the delete iterations.

Monday, 7 November 2022

Create Data Lake containers and folders via DevOps

Case
I need a process to create Azure Data Lake containers throughout my DTAP environment of my Azure Data Platform. Manually is not an option because we want to minimize owner and contributor access to the Data Lake of acceptance and production, but Synapse and Data Factory don't have a standard activity to create ADL containers. How to automatically create Azure Data Lake Containers (and folders) ?

Storage Account (datalake) containers

Solution
An option is to use a PowerShell script that is executed by the Custom activity in combination with an Azure Batch service. Or an Azure Automation runbook with the same PowerShell script that is executed by a Web(hook) activity.

However since you probably don't need create new containers during every (ADF/Synapse) pipeline run, we suggest to do this via an Azure Devops Pipeline as part of your CICD proces with the same PowerShell script. You could either create a separte CICD pipeline for it or integrate it in your Synapse or ADF pipeline.

The example below creates containers and optionaly also folders and subfolders within these container. Synapse and Data Factory will create folders with forexample the Copy Data activity

1) Repos folder structure

For this example we use a CICD folder in the repos with subfolders for PowerShell, YAML and Json.

Repos folder structure

2) JSON config

Because we don't want to hardcode the containers and folders we use a JSON file as input for the PowerShell script. This JSON file is stored within the JSON folder of the DevOps Repository. We use the same JSON file for the entire environment, but you can ofcourse create a separate file for each environment if you need for example different containers on production. Our file is called config_storage_account.json

The folder array in this example is optional and when left empty no folders will be created. You can create subfolders within folders by separating them with a forwardslash.

{
"containers":   {
                "dataplatform":["folder1","folder2/subfolder1","folder2/subfolder2"]
                , "SourceX":["Actual","History"]
                , "SourceY":["Actual","History"]
                , "SourceZ":[]
                }
}

3) PowerShell code
The PowerShell script called SetStorageAccounts.ps1 is stored in the PowerShell folder and contains three parameters:

ResourceGroupName - The name of the resource group where the storage account is located.
StorageAccountName - The name the storage account
JsonLocation - The location of the json config file in the repos (see previous step)

It checks the existance of both the config file and the storage account. Then first loop through the containers from the config and within the container loop it loops through the folders of that specific container. For container names and folderpaths it does some small corrections for often made mistakes.

Note that the script will not delete containers and folders (or set authorizations to them). This is of course possible, but make sure to test this very thoroughly and even with testing a human error in configuring the config file is easy to make and could cause lots of data lose!

# This PowerShell will create the containers provided in the JSON file
# It does not delete of update containers and folders or set authorizations
param (
    [Parameter (Mandatory = $true, HelpMessage = 'Resource group name of the storage account.')]
    [ValidateNotNullOrEmpty()]
    [string] $ResourceGroupName,
 
    [Parameter (Mandatory = $true, HelpMessage = 'Storage account name.')]
    [ValidateNotNullOrEmpty()]
    [string] $StorageAccountName,
 
    [Parameter (Mandatory = $true, HelpMessage = 'Location of config_storage_account.json on agent.')]
    [ValidateNotNullOrEmpty()]
    [string] $JsonLocation
 )
 
# Combine path and file name for JSON file. The file name is hardcoded and the
# same for each environment. Create an extra parameters for the filename if
# you need different files/configurations per environment.
$path = Join-Path -Path $JsonLocation -ChildPath "config_storage_account.json"
Write-output "Extracting containernames from $($path)"
 
 
# Check existance of file path on the agent
if (Test-Path $path -PathType leaf) {
     
    # Get all container objects from JSON file
    $Config = Get-Content -Raw -Path $path | ConvertFrom-Json
 
    # Create containers array for looping
    $Config | ForEach-Object { 
        $Containers = $($_.containers) | Get-Member -MemberType NoteProperty | Select-Object -ExpandProperty Name
    }
     
    # Check Storage Account existance and get the context of it
    $StorageCheck = Get-AzStorageAccount -ResourceGroupName $ResourceGroupName -Name $StorageAccountName -ErrorAction SilentlyContinue        
    $context = $StorageCheck.Context
 
    # If Storage Account found
    if ($StorageCheck) {
        # Storage Account found
        Write-output "Storage account $($StorageAccountName) found"
 
        # Loop through container array and create containers if the don't exist
        foreach ($container in $containers) {
            # First a little cleanup of the container
 
            # 1) Change to lowercase
            $container = $container.ToLower()
 
            # 2) Trim accidental spaces
            $container = $container.Trim()
 
 
            # Check if container already exists
            Write-output "Checking existence of container $($container)"
            $ContainerCheck = Get-AzStorageContainer -Context $context -Name $container -ErrorAction SilentlyContinue 
 
            # If container exists
            if ($ContainerCheck) {
                Write-Output "Container $($container) already exists"
            }
            else {
                Write-Output "Creating container $($container)"
                New-AzStorageContainer -Name $container -Context $context | Out-Null
            }
 
            # Get container folders from JSON
            Write-Output "Retrieving folders from config"
            $folders = $Config.containers.$container
            Write-Output "Found $($folders.Count) folders in config for container $($container)"
 
            # Loop through container folders
            foreach ($folder in $folders) {
                # First a little cleanup of the folders
 
                # 1) Replace backslashes by a forward slash
                $path = $folder.Replace("\","/")
 
                # 3) Remove unwanted spaces
                $path = $path.Trim()
                $path = $path.Replace("/ ","/")
                $path = $path.Replace(" /","/")
 
                # 3) Check if path ends with a forward slash
                if (!$path.EndsWith("/")) {
                    $path = $path + "/"
                }
                     
                # Check if folder path exists
                $FolderCheck = Get-AzDataLakeGen2Item -FileSystem $container -Context $context -Path $path  -ErrorAction SilentlyContinue
                if ($FolderCheck) {
                    Write-Output "Path $($folder) exists in container $($container)"
                } else {
                    New-AzDataLakeGen2Item -Context $context -FileSystem $container -Path $path -Directory | Out-Null
                    Write-Output "Path $($folder) created in container $($container)"
                }
            }
 
        }
    } else {
        # Provided storage account not corrrect
        Write-Output "Storageaccount: $($StorageAccountName) not available, containers not setup."
    }
} else {
    # Path to JSON file incorrect
    Write-output "File $($path) not found, containers not setup."
}

4) YAML file.

If you integrate this in your existing Data Factory or Synapse YAML pipeline then you only need to add one PowerShell step. Make sure you have a checkout step to copy the config and powershell file from the repository to the agent. You may also want to add a (temporary) treeview step to check the paths on your agent. This makes it easier to configure paths within your YAML code.

parameters:
  - name: SerCon
    displayName: Service Connection
    type: string
  - name: Env
    displayName: Environment
    type: string
    values: 
    - DEV
    - ACC
    - PRD
  - name: ResourceGroupName
    displayName:
    type: string
  - name: StorageAccountName
    displayName:
    type: string
 
jobs:
    - deployment: deploymentjob${{ parameters.Env }}
      displayName: Deployment Job ${{ parameters.Env }} 
      environment: Deploy to ${{ parameters.Env }}
 
      strategy:
        runOnce:
          deploy:
            steps:
            ###################################
            # 1 Check out repository to agent
            ###################################
            - checkout: self
              displayName: '1 Retrieve Repository'
              clean: true 
 
            ###################################
            # 3 Show environment and treeview
            ###################################
            - powershell: |
                Write-Output "Deploying Synapse in the ${{ parameters.Env }} environment"
                tree "$(Pipeline.Workspace)" /F
              displayName: '2 Show environment and treeview Pipeline_Workspace'
 
            ###################################
            # 3 Create containers in datalake
            ###################################
            - task: AzurePowerShell@5
              displayName: '3 Create data lake containers'
              inputs:
                azureSubscription: ${{ parameters.SerCon }}
                scriptType: filePath
                scriptPath: $(Pipeline.Workspace)\s\CICD\PowerShell\SetStorageAccounts.ps1
                scriptArguments:
                  -ResourceGroupName ${{ parameters.ResourceGroupName }} `
                  -StorageAccountName ${{ parameters.StorageAccountName }} `
                  -JsonLocation $(Pipeline.Workspace)\s\CICD\Json\
                azurePowerShellVersion: latestVersion
                pwsh: true

5) The result

Now it's time to run the YAML pipeline and check the Storage Account to see wether the containers and folders are created.

DevOps logs of creating containers and folders

Created data lake folders in container

Conclusion

In this post you learned how to create containers and folders in the Storage Account / Data Lake via a little PowerShell script and a DevOps pipeline, but you can also reuse this PowerShell script in for the mentioned alternative solutions.

Again the note about also deleting containers and folders. Make sure to double check the code, but also the procedures to avoid human errors and potenially loose a lot of data. You might want to setup soft deletes in your storage account to have a fallback scenario for screwups.

Sunday, 31 July 2022

Skipped dependency between pipeline activities

Case
Where is the Skipped dependency between ADF or Synapse pipeline activities useful for?

Skipped dependency

Solution
The Skipped dependency will execute the next activity if the previous activity is not executed. For example because the activity before that previous activity failed.

You could for example use this to create a general event handler for the entire pipeline by connecting the end of each 'line' to a dummy activity and then to an event handler or notification activity. The dummy activity is required because it will be skipped if anything fails. The (1 sec) Wait activity is probably the easiest to use.

If something fails the Web activity will execute

If any of these activities above fails then the 'Dummy' Wait activity will not be executed and as a result the Web activity with the Skipped dependency will be executed. If all activities are succesful then the 'Dummy' Wait activity will be executed, but the Web activity won't .

Conclusion

In this post you learned one of the usages of the Skipped dependency, but please post your usage of the Skipped dependency in the comments below.

Note that it will wait for all 'lines' to be finished before the Dummy activity is skipped. So if Stored Procedure 1 fails then the Dummy activity still needs to wait for Stored Procedure 2, 3 and 4 to be ready (failed or succeeded). This is because the dependencies is ADF are a logial AND (in SSIS you could also change it to be a logical OR).

Wednesday, 8 June 2022

Break or stop ForEach loop in ADF and Synapse

Case
I have a pipeline with a parallel Foreach loop to execute for example multiple Stored Procedures, but if one of those fails I want to stop the Foreach executing more Stored Procedures. Can I break the Foreach loop in case of an error without turning it from parallel into sequential?

Can you stop a ForEach Loop? Nope!

Solution
You can't stop the foreach loop itself on error, but if an iteration fails then you can cancel the entire pipeline that is running the foreach loop. When the pipeline is running it gets a RUN ID and you can use that to call a Rest API to cancel the current pipeline run if one of the Stored Procedures inside the ForEach fails.

Cancel the entire pipeline

1) Give ADF/Synapse access to its own Rest APIs

To use the Rest APIs of ADF or Synapse within a pipeline you first need to give ADF or Synapse access to its own Rest APIs. This can be done in the Azure Portal on the overview page of the service.

Open the Azure Portal in your browser and go to the overview page of your Data Factory or Synapse Workspace.
In the left menu click on Access control (IAM)
Click on +Add and then choose Add role assignment
Select the role Contributor or for ADF Data Factory Contributor and click on Next
Under Assign access to select Manged identity
Under Members click on +Select members
Now you need to select Synapse workspace or Data Factory (V2) and search for your ADF or Synapse
Click on your Synapse or ADF and click on the Select button
Optionally enter a description: "Give ADF/Synapse access to its own Rest APIs"
Click on the Review + assign button (twice)

The animated GIF is from a Synapse workspace, but it is identical to Data Factory. Just select Data Factory as Managed identity type.

Giving Synapse access to its own Rest APIs

2) Add Web Activity in Foreach on fail

In the ForEach loop construction we need to add a Web Activity to call the Rest API that cancels the Pipeline run. Data Factory and Synapse have a different URL to cancel the pipeline. Within the example URL you need to replace the red parts with the curly brackets by either a hard code value, a parameter or a System Variable. For Data Factory and Synapse you can use the same System Variabes to retrieve the factoryName/workspaceName and the runId: pipeline().DataFactory / pipeline().RunId

Azure Data Factory:

https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.DataFactory/factories/{factoryName}/pipelineruns/{runId}/cancel?api-version=2018-06-01

@concat('https://management.azure.com/subscriptions/',
pipeline().parameters.subscriptionId
'/resourceGroups/',
pipeline().parameters.resourceGroup,
'/providers/Microsoft.DataFactory/factories/',
pipeline().DataFactory,
'/pipelineruns/',
pipeline().RunId,
'/cancel?api-version=2018-06-01')

Azure Synapse Workspace

{endpoint}/pipelineruns/{runId}/cancel?api-version=2020-12-01

or a little easier:

https://{workspaceName}.dev.azuresynapse.net/pipelineruns/{runId}/cancel?api-version=2020-12-01

@concat('https://',
pipeline().DataFactory,
'.dev.azuresynapse.net/pipelineruns/',
pipeline().RunId,
'/cancel?api-version=2020-12-01')

In the ForEach add a Web Activity and connect it to your existing activity
Make sure to change the type of the line from Success to Failure
Give the Web Activty a useful name and go to the Settings tab
For URL use and expression like above (don't forget to add parameters if you use them)
Set the Method to POST
Enter a fake/dummy JSON message in the body. We don't need it, but it is required
Set the Authentication to System Assigned Managed Identity
And last set the Resource to https://management.azure.com/

Web Activity to Cancel the pipeline run

3) Testing

For this example we used 10 Stored Procedures of which one of them fails on purpose. First make sure to publish the pipeline and then trigger the pipeline (don't use Debug). Then check the monitor to see the result.

One Stored Procedure Failed, rest is cancelled

Conclusion

In this post you learned how to stop a running pipeline when one of the activities within a foreach loop fails so that is doesn't unnecessary executes new activities. One of the downsides is that in the logs the status of your pipeline will be 'Cancelled' instead of 'Failed', but it saves you running (and paying for) unnecessary activities.

Another downside is that it does not work in debug mode because then you don't get a RUN ID and there is no pipeline run to cancel. The workaround for that is to put the Web Activity that cancels your pipeline in an IF construction that only cancels if the RUN ID is not 0.

The usefulness of this construction get better if you have a lot of iterations and hopefully it's not the last iteration that failed. Let us know in the comments what you think of a construction like this and if you have an alternative solutions.

Thursday, 4 November 2021

ADF Release - Set global params during deployment

Case
I'm using global parameters in my Azure Data Factory project and I want to change their values during deployment through my DTAP environments. How do I do that?

ADF Global Parameters

Solution
In this post we are only focusing on the deployment part of the DevOps YAML pipeline. If you do not have a pipeline yet then please read the complete story of configuring the Development ADF and releasing it to other Factories first.

1) Include in ARM template
First make sure that in the pane of the Global Parameters the checkbox "Include in ARM template" is checked. This will add the parameters to the ARM template which will make them available as a parameters during deployment.

Check Include in ARM template

2) Getting name of parameter

The global parameter will get a slightly different name in the ARM template. For example: myGlobParam becomes dataFactory_properties_globalParameters_myGlobParam_value. You can check that name by exporting the ARM template under ARM template. Then check arm_template.json or arm_template_parameters.json

Find the parameter name

An other option is to hit the publish button and then check the (not used) adf_publish branch in the repository.

Find the parameter name

3) Adjust release pipeline

If you are using YAML to publish the changes then the only thing you have to change is the overrideParameters property by adding the new parameter dataFactory_properties_globalParameters_myGlobParam_value and adding either a variable or a hardcoded value. The > behind the property helps you to break the string over multiple lines and keep the YAML code more readable.

###################################
# Deploy ADF Artifact
###################################
- task: AzureResourceManagerTemplateDeployment@3
  displayName: '4 Deploy ADF Artifact'
  inputs:
    deploymentScope: 'Resource Group'
    azureResourceManagerConnection: 'sc_mcacc-adf-devopssp'
    subscriptionId: $(DataFactorySubscriptionId)
    action: 'Create Or Update Resource Group'
    resourceGroupName: $(DataFactoryResourceGroupName)
    location: 'West Europe'
    templateLocation: 'Linked artifact'
    csmFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateForFactory.json'
    csmParametersFile: '$(Pipeline.Workspace)/ArmTemplatesArtifact/ARMTemplateParametersForFactory.json'
    overrideParameters: > 
      -factoryName $(DataFactoryName) 
      -dataFactory_properties_globalParameters_myGlobParam_value "Test123"
    deploymentMode: 'Incremental'
 
  env: 
      SYSTEM_ACCESSTOKEN: $(System.AccessToken)

After deployment you can see the new parameter value

And if you're using the Release pipelines with the ARM template deployment task then you can just go to the Override template parameters property, click on the edit button and replace the value with a new value or a variable from a variable group.

ARM template deployment - Override template parameters

Conclusion

In this post you learned how to use Global Parameters from ADF as ARM template parameters for Azure Data Factory. This allows you to use different settings in the various factories in you DTAP environments. In a next post we will show you how to override properties from for example a Linked Service or a Trigger that should get a different value in acceptance or production.

Monday, 1 March 2021

Pausing/resuming Dedicated SQL Pools from Synapse

Case
In a previous post we described how we can pause and resume our Azure Dedicated SQL pools via Azure Data Factory. The question we are going to solve now: "How can we pause and resume from within an Azure Synapse workspace"?

Pause / Resume SQL dedicated Pools

Solution
Pausing and resuming from within an Azure Synapse workspace is very similar, but there are some differences because there are two types of Azure Dedicated SQL Pools. When you create it from within the Azure Synapse Workspace then the provider will be 'Microsoft.Synapse' (hosted in Synapse), but when you create it outside the Azure Synapse Workspace then the provider will be 'Microsoft.Sql' (hosted in SQL Server). Both have different Rest APIs and different security roles. In this example we will demo the Synapse hosted version but we will also mention the differences. For the SQL version you could also take a look at our previous post which handles the SQL version from within Data Factory.

1) Give Azure Synapse Access to SQL Dedicated Pools
To call the Rest API we need to grant our Azure Synapse workspace access to the Dedicated SQL Pool or more specific to the Synapse hosting it. Perhaps a bit odd, but we need to grant Synapse access to its own resources (for the SQL version you need grant access to the SQL Server hosting the SQL Pool).

For the security role you always need to avoid 'Owner' and use 'Contributor'. This is because the Owner role can also change security related items (for the SQL version you can use 'SQL Server Contributor')

In the Azure portal go to the Azure Synapse Workspace hosting the SQL Pool that you want to pause or resume
In the left menu click on Access control (IAM)
Click on Add, Add role assignment
In the 'Role' drop down select 'Contributor'
In the 'Assign access to' drop down select user, group or service principal.
Search for your Synapse Studio name (in our example‘gansdorp’), select it and click on Save

Grant Synapse Contributor role to SQL Server

If you forget this step then you will receive an authorization error while executing your Synapse pipeline.

2108 Authorization Failed

{"error":
{"code":"AuthorizationFailed"
,"message":"The client 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' does not have authorization to perform action 'Microsoft.Synapse/workspace/resume/action' over scope '/subscriptions/xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/RG_bitools/providers/Microsoft.Synapse/workspace/gansdorp/dedsqlpools' or the scope is invalid. If access was recently granted, please refresh your credentials."}
}

2) Determine URL

Now it is almost time to edit your Synapse pipeline. The first step will be adding a Web activity to call the Rest API (for the SQL version you must use this Rest API). Find the operation you want to perform and then find the example URL to construct the new URL.

Pause compute

https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Synapse/workspaces/{workspacename}/sqlPools/{DedicatedSQLPoolName}/pause?api-version=2019-06-01-preview

Resume compute

Within these URL's you need to replace all parts that start and end with a curly bracket: {subscription-id}, {resource-group-name}, {workspacename} and {DedicatedSQLPoolName} (including the brackets themselves).

Example URL

https://management.azure.com/subscriptions/aaaa-bbbb-1234-cccc/resourceGroups/RG-DEV/providers/Microsoft.Synapse/workspaces/gansdorp/dedsqlpool/pause?api-version=2019-06-01-preview

Example Rest API URL

3) Add Web Activity
To call the Rest API we will use the Web Activity in the Synapse pipeline. The actual Rest API call is synchronous, which means it waits for it to finish the pause or resume action and then it will return a message. This also means that you don't have to build any checks to make sure it is already online.

Add a Web activity to your pipeline and give it a suitable name
Go to the Settings tab and use the URL from the previous step in the URL property
Choose POST as method
Fill in {} as body (we don't need it, but it is required)
Choose MSI as authentication method
As the last step enter this URL https://management.azure.com/ as Resource

web activity

To first check the current state of the SQL Pool you can use the Get Rest API and then use an If Condition to check its output with an expression like @Activity('Get Current State').Output.Properties.State

Summary

In this post you learned how pause and resume your Dedicated SQL Pool within Azure Synapse Studio, the method is very similar to the ADF version. There are some differences, however that is mostly caused by the Service that is hosting the SQL Pool. This could be either a SQL Server or your own Synapse Workspace. It is expected that one of both will probably disappear, because it isn't likely that Microsoft will be maintaining two different services with both their own set op Rest APIs.

Monday, 11 January 2021

Scaling Azure Dedicated SQL Pools from ADF

Case
Is there a solution to upscale and downscale my Azure Dedicated SQL Pool from the Azure Data Factory pipeline without scripting? I know there are PowerShell solutions, but I rather use a no-code solution. What are my options?

Scaling Azure Dedicated SQL Pools

Solution
Fortunately you can now use the Rest API's of Azure Dedicated SQL Pools (formerly known as Azure SQL Data Warehouse and for a short period as Azure Synapse Analytics) to down- or upscale the compute. So no coding required.

1) Give ADF Access to SQL Pool
To call the Rest API we need to give ADF access to the SQL Pool or more specific to the SQL Server hosting that SQL Pool. We need a role that can only change the database settings, but nothing security related: Contributer, SQL DB Contributer or SQL Server Contributer.

Go to the Azure SQL Server of the SQL Pool that you want to scale up or down with ADF
In the left menu click on Access control (IAM)
Click on Add, Add role assignment
In the 'Role' drop down select 'SQL DB Contributer'
In the 'Assign access to' drop down select Data Factory
Search for your Data Factory, select it and click on Save

Grant data factory SQL DB Contributor role to SQL Server

If you forget this step then you will receive an authorization error while executing your ADF pipeline.

2108 Authorization Failed

{"error":
{"code":"AuthorizationFailed"
,"message":"The client 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' does not have authorization to perform action 'Microsoft.Sql/servers/databases/resume/action' over scope '/subscriptions/xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/RG_bitools/providers/Microsoft.Sql/servers/SQL_bitools/databases/bitools' or the scope is invalid. If access was recently granted, please refresh your credentials."}
}

2) Determine URL

Now it is almost time to edit your ADF pipeline. The first step will be adding a Web activity to call the Rest API, but before we can do that we need to determine the URL of this API which you can find here.

Scaling

https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Sql/servers/{server-name}/databases/{database-name}?api-version=2014-04-01-preview

Within this URL you need to replace all parts that start and end with a curly bracket: {subscription-id}, {resource-group-name}, {server-name} and {database-name} (including the brackets themselves). Don't use a URL (bitools.database.windows.net) for the database server name, but use only the name: bitools.

Example URL

https://management.azure.com/subscriptions/aaaa-bbbb-1234-cccc/resourceGroups/RG_Bitools/providers/Microsoft.Sql/servers/bitools2/databases/bitools?api-version=2014-04-01-preview

Example URL

3) JSON message for Rest API

The Rest API above expects a JSON message with the pricing tier. A list of all pricing tiers can be found here in the column 'Data warehouse units'. Here are two example which you need to adjust for your requirements:

{
    "properties": {
        "requestedServiceObjectiveName": "DW200c"
    }
}

{
    "properties": {
        "requestedServiceObjectiveName": "DW1000c"
    }
}

Note: Just in case you get one of these errors below. The json example in the documentation is incorrect at the moment of writing:

Quotation marks around the data warehouse units are missing, which returns the following error:{"error":{"code":"InvalidRequestContent","message":"The request content was invalid and could not be deserialized: 'Unexpected character encountered while parsing value: D. Path 'properties.requestedServiceObjectiveName', line 3, position 41.'."}}
The c is missing after the data warehouse units (gen1 vs gen2) which returns the following error:
{"code":"45122","message":"\u0027Azure SQL Data Warehouse Gen1 has been deprecated in this region. Please use SQL Analytics in Azure Synapse.\u0027","target":null,"details":[{"code":"45122","message":"\u0027Azure SQL Data Warehouse Gen1 has been deprecated in this region. Please use SQL Analytics in Azure Synapse.\u0027","target":null,"severity":"16"}],"innererror":[]}

4) Add Web Activity
To call the Rest API we will use the Web Activity in the ADF pipeline. The actual Rest API call is synchronous, which means it waits for it to finish the pause or resume action and then it will return a message. This also means that you don't have to build any checks to make sure it is already online.

Add a Web activity to your pipeline and give it a suitable name
Go to the Settings tab and use the URL from step 2 in the URL property
Choose PATCH as method
Add a new header with the name 'Content-Type' and the value 'application/json'
Fill in the JSON message from step 3 as body
Choose MSI as authentication method
As the last step enter this URL https://management.azure.com/ as Resource

Use a Web activity to call the Rest API

If you want to scale up before your ETL/ELT process and scale down afterwards then you need two separate Web activities or one clever child pipeline with parameters that you execute from your main pipeline.

Summary

In this post you learned how to upscale and downscale your Dedicated SQL Pool to save some money on your Azure bill without writing any code. Note that at the moment of writing live scaling is not yet available and that you will loose the connection to your Dedicated SQL Pool for a couple of minutes.

Also note that ADF pipelines slightly differ from Azure Synapse Analytics pipelines. So if you consider switching to Synapse workspaces because you apparently already use Dedicated SQL Pools then you have to make some small adjustments to this specific task which will be described in a next post. In an other post we already showed how to pause and resume your Azure SQL Pools from within ADF.

Wednesday, 6 January 2021

Pausing and resuming Dedicated SQL Pools from ADF

Case
Is there a solution to pause and resume an Azure Dedicated SQL Pool from the Azure Data Factory pipeline without scripting? I know there are PowerShell solutions, but I rather use a no-code solution. What are my options?


Pause and resume Azure Dedicated SQL Pools

Solution
Luckily you can now use the Rest API's of Azure Dedicated SQL Pools (formerly known as Azure SQL Data Warehouse and for a short period as Azure Synapse Analytics) to pause or resume the compute.

Go to the Azure SQL Server of the SQL Pool that you want to pause or resume with ADF
In the left menu click on Access control (IAM)
Click on Add, Add role assignment
In the 'Role' drop down select 'SQL DB Contributor'
In the 'Assign access to' drop down select Data Factory
Search for your Data Factory, select it and click on Save

Grant data factory SQL DB Contributor role to SQL Server

If you forget this step then you will receive an authorization error while executing your ADF pipeline.

2108 Authorization Failed

{"error":
{"code":"AuthorizationFailed"
,"message":"The client 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' does not have authorization to perform action 'Microsoft.Sql/servers/databases/resume/action' over scope '/subscriptions/xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/RG_bitools/providers/Microsoft.Sql/servers/SQL_bitools/databases/bitools' or the scope is invalid. If access was recently granted, please refresh your credentials."}
}

2) Determine URL

Pause compute

Resume compute

Within these URL's you need to replace all parts that start and end with a curly bracket: {subscription-id}, {resource-group-name}, {server-name} and {database-name} (including the brackets themselves). Don't use a URL (bitools.database.windows.net) for the database server name, but use only the name: bitools.

Example URL

https://management.azure.com/subscriptions/aaaa-bbbb-1234-cccc/resourceGroups/RG_Bitools/providers/Microsoft.Sql/servers/bitools2/databases/bitools/resume?api-version=2014-04-01-preview

Example URL

3) Add Web Activity
To call the Rest API we will use the Web Activity in the ADF pipeline. The actual Rest API call is synchronous, which means it waits for it to finish the pause or resume action and then it will return a message. This also means that you don't have to build any checks to make sure it is already online.

Add a Web activity to your pipeline and give it a suitable name
Go to the Settings tab and use the URL from the previous step in the URL property
Choose POST as method
Fill in {} as body (we don't need it, but it is required)
Choose MSI as authentication method
As the last step enter this URL https://management.azure.com/ as Resource

Use a Web activity to call the Rest API

You could also first check the current status via the Check database state Rest API and then use an expression like @activity('Get Status').output.properties.status to retrieve the current state.

Summary

In this post you learned how to pause and resume your Dedicated SQL Pool to save some money on your Azure bill without writing any code. Note that you only pause the compute and that you still have to pay for the storage of the SQL Pool.

Also note that ADF pipelines slightly differ from Azure Synapse Analytics pipelines. So if you consider switching to Synapse workspaces because you apparently already use Dedicated SQL Pools then you have to make some small adjustments to this specific task which will be described in a next post. In an other post we will also show how to scale your Azure SQL Pools from within ADF.

Thursday, 24 September 2020

Using global parameters in Azure Data Factory

Case
In a previous blogpost we described how to use the Azure Keyvault to parameterize some properties in Azure Data Factory (ADF) pipelines. Although that worked technically very well, it’s not always preferable to store non-sensitive data in the keyvault.

With the arrival of global parameters in Data Factory it’s now possible to parameterize a property in ADF pipelines without the use of keyvault. You can use this parameter in all the pipelines in the ADF environment. In this weblog we will explain how to create and use those global parameters.

Data Factory Global parameters

Solution

We are going to create a global parameter to upscale an Azure SQL database with a Web activity via its RestAPI. For more in depth information about up and down scaling read the blogpost: up/down scale Azure DB

1) Creating a global parameter

The following steps describes how to create a global parameter

Go to ‘manage’ in the left menu (Suitcase with wrench icon)
Then click on ‘Global parameters’
Click on ‘New’
Now you can fill in the name, type and the value.
Please note that you can not use ‘-‘ in the parameter name but you can use the ‘_’ you will receive an errorcode "{"code":"BadRequest","message":"ErrorCode=InvalidTemplate, ErrorMessage=The expression 'pipeline().globalParameters.myparam-dbtest-url' is not valid: the string character 'd' at position '35' is not expected}"
Click on Save

Global parameters

In this example we used a global parameter to store the RestAPI URL of an Azure SQL Database. This URL will be different for each environment of the DTAP-street.

Adding Global parameter

2) Using a global parameter

The next step is to use the global parameter in a pipeline. In this example we used the Web activity and the Global parameter will make the URL property dynamic.

Go to the pipeline
Go to the Web activity
Go to the settings tab
Click on the URL textbox and then on ‘Add Dynamic content’
On the right side scroll down till you see the ‘Global parameters’, here you will find all your global parameters, in our example you will see the database url parameter.
Click on the name, and that’s it

Using global parameter

Conclusion
In this weblog we showed you how to create and use a global parameter. This parameter can only be used in the pipeline and it’s not possible to use them in e.g. Linked services or Triggers. In an upcoming post we will explain how to use these global parameters when deploying ADF to another environment with Azure DevOps.