Saturday 31 December 2022

Cleanup Synapse before deployment

Case
We are using the Synapse workspace deployment add on for deploying Synapse to the next environment, but when I remove a pipeline/dataset/linked service/trigger in development it doesn't get deleted in Test, Acceptance or Production. There is a pre- and post-deployment script for Data Factory by Microsoft, but where is the Synapse version of it?
Clean Synapse Workspace before deployment























Update: use DeleteArtifactsNotInTemplate: true in deployment task to avoid powershell

Solution
The deployment add on just deploys a new version over an existing version. Therefore deleted parts will remain in your workspace. For most parts this is ugly and annoying, but if obsoleet triggers are still executing pipelines it could screwup your ETL proces. 

Below you will find a first version of a Powershell script that first removes all pipelines, datasets, linked services and triggers before deploying a new version of your Synapse workspace from the repository.

Note: Pipelines that are called by other pipelines can't be deleted. So you first need to delete the parent pipeline before you can delete the child pipeline. The scripts skips those child pipelines and continous with the rest. After this first delete iteration a lot of parent pipelines wont exist any more and allow you to remove the child pipelines in a second iteration. This is done in a loop and that loops stops after a100 iterations. So don't create a monstrous tree of pipelines calling each other (and especially don't create loops of pipelines calling each other). The same trick is used for Linked Services. If you have for example a Key Vault Linked Service that is used in an other Linked Service then you first need to delete that second Linked Service before you can delete the Key Vault Linked Service.

param (

   [Parameter (Mandatory = $true, HelpMessage = 'Synapse name')]
   [ValidateNotNullOrEmpty()]
   [string] $WorkspaceName,
   
   [Parameter (Mandatory = $true, HelpMessage = 'Resourcegroup name')]
   [ValidateNotNullOrEmpty()]
   [string] $ResourceGroupName 
)

[string] $WorkspaceDefaultSqlServer = "$($WorkspaceName)-WorkspaceDefaultSqlServer"
[string] $WorkspaceDefaultSqlStorage = "$($WorkspaceName)-WorkspaceDefaultStorage"


#######################################################
# 1) Checking for resource locks and removing them
#######################################################
Write-Output "========================================"
Write-Output "1) Getting resource locks"
# Getting all locks on the Azure Synapse Workspace
$lock = Get-AzResourceLock -ResourceGroupName $ResourceGroupName -ResourceName $WorkspaceName -ResourceType "Microsoft.Synapse/workspaces"

# Looping through all locks to remove them one by one
Write-Output "========================================"
Write-Output "Remove resource locks"
if($null -ne $lock)
{
    $lock | ForEach-Object -process {
        Write-Output "Removing Lock Id: $($lock.LockId)"
        # Remove lock
        Remove-AzResourceLock -LockId $_.LockId -Force
    }
}


#######################################################
# 2) Stopping and removing all triggers
#######################################################
Write-Output "========================================"
Write-Output "2) Remove triggers"
# Getting all triggers from Synapse
$triggers = Get-AzSynapseTrigger -WorkspaceName $WorkspaceName
Write-Output "Found $($triggers.Count) triggers"

# Stopping all triggers before deleting them
$triggers | ForEach-Object -process { 
    Write-Output "Stopping trigger $($_.name)"
    try {
        # Trying to stop each trigger
        Stop-AzSynapseTrigger -WorkspaceName $WorkspaceName -Name $($_.name) -ErrorAction Stop
    }
    catch {
        if ($_.Exception.Message -eq "{}") {
            Write-Output "Trigger stopped"
           # $_.Exception
        }
        else {
            Write-Output "Throw"
            Throw $_
        }
    }
    # Remove trigger
    Remove-AzSynapseTrigger -Name $_.name -WorkspaceName $WorkspaceName -Force
}


#######################################################
# 3) Removing all pipelines
#######################################################
Write-Output "========================================" 
Write-Output "3) Remove pipelines"
# Getting all pipelines from Synapse
$pipelines = Get-AzSynapsePipeline -WorkspaceName $WorkspaceName | Sort-Object -Property id
Write-Output "Found $($pipelines.Count) pipelines"

# Trying to delete all pipelines. If a pipeline is still referenced
# by an other pipeline it will continue to remove other pipelines 
# before trying to remove it again... max 100 times. So don't create
# chains of pipelines that are too long
[int] $depthCount = 0
while ($pipelines.Count -gt 0 -and $depthCount -lt 100)
{
    Write-Output "$($pipelines.Count) pipelines left"
    $pipelines | ForEach-Object -process { 
        Write-Output "Trying to delete pipeline $($_.name)"
        Remove-AzSynapsePipeline -Name $_.name -WorkspaceName $WorkspaceName -Force -ErrorAction SilentlyContinue
    }
    Start-Sleep 2 
    $depthCount += 1
    $pipelines = Get-AzSynapsePipeline -WorkspaceName $WorkspaceName
}
Write-Output "Depthcount: $depthCount"
if ($depthCount -eq 100)
{
    throw "Depthcount is to high!"
}


#######################################################
# 4) Removing all notebooks
#######################################################
Write-Output "========================================"
Write-Output "4) Remove notebooks"
# Getting all notebooks from Synapse
$notebooks = Get-AzSynapseNotebook -WorkspaceName $WorkspaceName
Write-Output "Found $($notebooks.Count) notebooks"

# Loop through all notebooks to delete them
$notebooks | ForEach-Object -process {
    Write-Output "Deleting notebooks $($_.Name)"
    Remove-AzSynapseNotebook -Name $($_.Name) -WorkspaceName $WorkspaceName -Force
}


#######################################################
# 5) Removing all SQL scripts
#######################################################
Write-Output "========================================"
Write-Output "5) Remove SQL scripts"
# Getting all scripts from Synapse
$sqlscripts = Get-AzSynapseSqlScript -WorkspaceName $WorkspaceName
Write-Output "Found $($sqlscripts.count) SQL-scripts"

# Loop through all SQL scripts to delete them
$sqlscripts | ForEach-Object -Process {
    Write-Output "Deleting SQL-script $($_.Name)"
    Remove-AzSynapseSqlScript -Name $($_.Name) -WorkspaceName $WorkspaceName -Force
}


#######################################################
# 6) Removing all datasets
#######################################################
Write-Output "========================================"
Write-Output "6) Remove datasets"
# Getting all datasets from Synapse
$datasets = Get-AzSynapseDataset -WorkspaceName $WorkspaceName
Write-Output "Found $($datasets.Count) datasets"

# Loop through all datasets to delete them
$datasets | ForEach-Object -process { 
    Write-Output "Deleting dataset $($_.name)"
    Remove-AzSynapseDataset -Name $_.name -WorkspaceName $WorkspaceName -Force
}


#######################################################
# 7) Removing all linked services
#######################################################
Write-Output "========================================"
Write-Output "7) Collecting Linked services"
# Getting all linked services from Synapse, except the two default ones
$lservices = Get-AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and  $_.Name -ne $WorkspaceDefaultSqlStorage) } 
Write-Output "Found $($lservices.Count) linked services"

# Trying to delete all linked services. If a linked service is still
# referenced by an other linked service it will continue to remove 
# other linked services before trying to remove it again... 
# max 100 times. Example: KeyVault linked services
$depthCount = 0
while ($lservices.Count -gt 0 -and $depthCount -lt 100)
{
    Write-Output "$($lservices.Count) linked services left"
    $lservices | ForEach-Object -process { 
        Write-Output "Trying to delete linked service $($_.name)"
        Remove-AzSynapseLinkedService -Name $_.name -WorkspaceName $WorkspaceName -Force -ErrorAction Continue
    }

    Start-Sleep 2 
    $depthCount += 1
    $lservices = Get-AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and  $_.Name -ne $WorkspaceDefaultSqlStorage) }
}
Write-Output "Depthcount: $depthCount"
if ($depthCount -eq 100)
{
    throw "Depthcount is to high!"
}
Write-Output "========================================"


You need to store this Powershell file in the repository as ClearSynapse.ps1. In this case we created a CICD folder in the root of the repository and within that folder we created a Powershell subfolder for all our PowerShell files. Then you can call this script in your YAML pipeline just before you do the deployment part. Make sure your service connection (Service Principal) has enough rights within the workspace. For the first post we used the Synapse Artifact Publisher role to minimize access. For running this script your Service Principal needs more: Synapse Administrator.
           ###################################
            # 4 Cleanup Synapse
            ###################################
            - task: AzurePowerShell@5
              displayName: '4 Cleanup Synapse'
              inputs:
                azureSubscription: ${{ parameters.ServiceConnection }}
                scriptType: filePath
                scriptPath: $(Pipeline.Workspace)\s\CICD\Powershell\ClearSynapse.ps1
                scriptArguments:
                  -WorkspaceName ${{ parameters.TargetWorkspaceName }} `
                  -ResourceGroupName ${{ parameters.ResourceGroupName }} `
                azurePowerShellVersion: latestVersion
                pwsh: true
                
           ###################################
            # 5 Validate and Deploy Synapse
            ###################################
            - task: Synapse workspace deployment@2
              displayName: '5 Validate and deploy Synapse'
              inputs:
                operation: validateDeploy
                ArtifactsFolder: '$(Pipeline.Workspace)/SynapseArtifact'
                azureSubscription: ${{ parameters.ServiceConnection }}
                ResourceGroupName: ${{ parameters.ResourceGroupName }}
                TargetWorkspaceName: ${{ parameters.TargetWorkspaceName }}
                OverrideArmParameters: '
                  -LS_AKV_MyKeyVault_properties_typeProperties_baseUrl                  https://${{ parameters.KeyVaultName }}.vault.azure.net/
                  '
Conclusion
In this post you learned how to cleanup Synapse with a little PowerShell script. This scripts works perfectly, but is a litte rough by just deleting all basic parts of your workspace (pipelines, datasets, linked services and triggers). A next / nicer version it will just delete everything that is in the Synapse Workspace but isn't in the repositorty (after the deployment).

Deleting stuff before deploying new stuff also makes is almost mandatory to use at least 3 environments because when your deployment fails you are left with an almost empty Synapse workspace. So an extra enviroment between development and production will prevent most deployment screwups.

Thank you Walter ter Maten for improving the script with the delete iterations.

Sunday 18 December 2022

ADF Snack - Global parameters deployment resets Networking settings

Case
After deployment of Azure Data Factory (ADF) via Azure DevOps pipelines, I noticed that the Network Access settings has been changed from "Private endpoint" to "Public endpoint". How can we prevent this overwrite during deployment?

Azure portal - Networking settings of ADF












Solution
For this we need to make use of the new mechanism to include global parameters in your ARM template, instead of the old mechanism via de GUI ('Manage' - 'Author' - 'Global Parameters'). 

ADF portal - Old mechanism to include global parameters















Another solution that is part of the old mechanism, this PowerShell script, has already been updated and includes now the following code that is necessary to fix this.
Set-AzDataFactoryV2 -InputObject $dataFactory -Force -PublicNetworkAccess $dataFactory.PublicNetworkAccess
1) Include global parameters in ARM template
The first step is to include the global parameters on a different place then in the past. Go to "Manage", then "Source control" and click on "ARM template". Check the box to include the parameters.


ADF portal - New mechanism to include global parameters

















2) Edit parameter configuration
Next we need to add some additional code to the ARM template parameter definition file. The location where to include the global parameters has changed. 
"Microsoft.DataFactory/factories/globalparameters": {
     "properties": {
             "*": { 
                 "value": "=" 
             }
     }
},
The file will now look like something like this. You can remove the old global parameter configuration "Microsoft.DataFactory/factories" if you want, but for now I keep this to show the difference.

ADF portal - Result of edit parameter configuration





















Click here for more information and the current default parameterization template.

3) Change release pipeline
If you are using YAML to publish the changes, the only thing you have to edit is the overrideParameters property for the following task:
AzureResourceManagerTemplateDeployment@3.

Search the following part of your global parameters code.
dataFactory_properties_globalParameters_
Replace this with the following code.
default_properties_
This part of your code will now look like something like this.
overrideParameters: > 
  -default_properties_GP_AzureSynapseServerName_value $(AzureSqlDedicatedPoolServerName) 
  -default_properties_GP_AzureSynapseDatabaseName_value $(AzureSqlDedicatedPoolName) 
  -default_properties_GP_AzureResourceGroupName_value $(AzureResourceGroupName) 
  -default_properties_GP_AzureSubscriptionId_value $(AzureSubscriptionId) 
  -default_properties_GP_AzureKeyvaultSecretsURL_value $(AzureKeyVaultUrl) 

Conclusion
In this post you learned how to use the new mechanism to include global parameters in your ARM template for Azure Data Factory. Besides resolving the issue where the endpoint settings has been changed after a deployment, it will also be a future proof solution. For now Microsoft continue the support of the old mechanism, but the question (as always) is for how long.