Saturday, 31 December 2022

Cleanup Synapse before deployment

Case
We are using the Synapse workspace deployment add on for deploying Synapse to the next environment, but when I remove a pipeline/dataset/linked service/trigger in development it doesn't get deleted in Test, Acceptance or Production. There is a pre- and post-deployment script for Data Factory by Microsoft, but where is the Synapse version of it?
Release Synapse Workspace via DevOps











Solution
The deployment add on just deploys a new version over an existing version. Therefore deleted parts will remain in your workspace. For most parts this is ugly and annoying, but if obsoleet triggers are still executing pipelines it could screwup your ETL proces. 

Below you will find a first version of a Powershell script that first removes all pipelines, datasets, linked services and triggers before deploying a new version of your Synapse workspace from the repository.

Note: Pipelines that are called by other pipelines can't be deleted. So you first need to delete the parent pipeline before you can delete the child pipeline. The scripts skips those child pipelines and continous with the rest. After this first delete iteration a lot of parent pipelines wont exist any more and allow you to remove the child pipelines in a second iteration. This is done in a loop and that loops stops after a100 iterations. So don't create a monstrous tree of pipelines calling each other (and especially don't create loops of pipelines calling each other). The same trick is used for Linked Services. If you have for example a Key Vault Linked Service that is used in an other Linked Service then you first need to delete that second Linked Service before you can delete the Key Vault Linked Service.

param (

   [Parameter (Mandatory = $true, HelpMessage = 'Synapse name')]
   [ValidateNotNullOrEmpty()]
   [string] $WorkspaceName,
   
   [Parameter (Mandatory = $true, HelpMessage = 'Resourcegroup name')]
   [ValidateNotNullOrEmpty()]
   [string] $ResourceGroupName 
)

[string] $WorkspaceDefaultSqlServer = "$($WorkspaceName)-WorkspaceDefaultSqlServer"
[string] $WorkspaceDefaultSqlStorage = "$($WorkspaceName)-WorkspaceDefaultStorage"


#######################################################
# 1) Checking for resource locks and removing them
#######################################################
Write-Output "========================================"
Write-Output "1) Getting resource locks"
# Getting all locks on the Azure Synapse Workspace
$lock = Get-AzResourceLock -ResourceGroupName $ResourceGroupName -ResourceName $WorkspaceName -ResourceType "Microsoft.Synapse/workspaces"

# Looping through all locks to remove them one by one
Write-Output "========================================"
Write-Output "Remove resource locks"
if($null -ne $lock)
{
    $lock | ForEach-Object -process {
        Write-Output "Removing Lock Id: $($lock.LockId)"
        # Remove lock
        Remove-AzResourceLock -LockId $_.LockId -Force
    }
}


#######################################################
# 2) Stopping and removing all triggers
#######################################################
Write-Output "========================================"
Write-Output "2) Remove triggers"
# Getting all triggers from Synapse
$triggers = Get-AzSynapseTrigger -WorkspaceName $WorkspaceName
Write-Output "Found $($triggers.Count) triggers"

# Stopping all triggers before deleting them
$triggers | ForEach-Object -process { 
    Write-Output "Stopping trigger $($_.name)"
    try {
        # Trying to stop each trigger
        Stop-AzSynapseTrigger -WorkspaceName $WorkspaceName -Name $($_.name) -ErrorAction Stop
    }
    catch {
        if ($_.Exception.Message -eq "{}") {
            Write-Output "Trigger stopped"
           # $_.Exception
        }
        else {
            Write-Output "Throw"
            Throw $_
        }
    }
    # Remove trigger
    Remove-AzSynapseTrigger -Name $_.name -WorkspaceName $WorkspaceName -Force
}


#######################################################
# 3) Removing all pipelines
#######################################################
Write-Output "========================================" 
Write-Output "3) Remove pipelines"
# Getting all pipelines from Synapse
$pipelines = Get-AzSynapsePipeline -WorkspaceName $WorkspaceName | Sort-Object -Property id
Write-Output "Found $($pipelines.Count) pipelines"

# Trying to delete all pipelines. If a pipeline is still referenced
# by an other pipeline it will continue to remove other pipelines 
# before trying to remove it again... max 100 times. So don't create
# chains of pipelines that are too long
[int] $depthCount = 0
while ($pipelines.Count -gt 0 -and $depthCount -lt 100)
{
    Write-Output "$($pipelines.Count) pipelines left"
    $pipelines | ForEach-Object -process { 
        Write-Output "Trying to delete pipeline $($_.name)"
        Remove-AzSynapsePipeline -Name $_.name -WorkspaceName $WorkspaceName -Force -ErrorAction SilentlyContinue
    }
    Start-Sleep 2 
    $depthCount += 1
    $pipelines = Get-AzSynapsePipeline -WorkspaceName $WorkspaceName
}
Write-Output "Depthcount: $depthCount"
if ($depthCount -eq 100)
{
    throw "Depthcount is to high!"
}


#######################################################
# 4) Removing all notebooks
#######################################################
Write-Output "========================================"
Write-Output "4) Remove notebooks"
# Getting all notebooks from Synapse
$notebooks = Get-AzSynapseNotebook -WorkspaceName $WorkspaceName
Write-Output "Found $($notebooks.Count) notebooks"

# Loop through all notebooks to delete them
$notebooks | ForEach-Object -process {
    Write-Output "Deleting notebooks $($_.Name)"
    Remove-AzSynapseNotebook -Name $($_.Name) -WorkspaceName $WorkspaceName -Force
}


#######################################################
# 5) Removing all SQL scripts
#######################################################
Write-Output "========================================"
Write-Output "5) Remove SQL scripts"
# Getting all scripts from Synapse
$sqlscripts = Get-AzSynapseSqlScript -WorkspaceName $WorkspaceName
Write-Output "Found $($sqlscripts.count) SQL-scripts"

# Loop through all SQL scripts to delete them
$sqlscripts | ForEach-Object -Process {
    Write-Output "Deleting SQL-script $($_.Name)"
    Remove-AzSynapseSqlScript -Name $($_.Name) -WorkspaceName $WorkspaceName -Force
}


#######################################################
# 6) Removing all datasets
#######################################################
Write-Output "========================================"
Write-Output "6) Remove datasets"
# Getting all datasets from Synapse
$datasets = Get-AzSynapseDataset -WorkspaceName $WorkspaceName
Write-Output "Found $($datasets.Count) datasets"

# Loop through all datasets to delete them
$datasets | ForEach-Object -process { 
    Write-Output "Deleting dataset $($_.name)"
    Remove-AzSynapseDataset -Name $_.name -WorkspaceName $WorkspaceName -Force
}


#######################################################
# 7) Removing all linked services
#######################################################
Write-Output "========================================"
Write-Output "7) Collecting Linked services"
# Getting all linked services from Synapse, except the two default ones
$lservices = Get-AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and  $_.Name -ne $WorkspaceDefaultSqlStorage) } 
Write-Output "Found $($lservices.Count) linked services"

# Trying to delete all linked services. If a linked service is still
# referenced by an other linked service it will continue to remove 
# other linked services before trying to remove it again... 
# max 100 times. Example: KeyVault linked services
$depthCount = 0
while ($lservices.Count -gt 0 -and $depthCount -lt 100)
{
    Write-Output "$($lservices.Count) linked services left"
    $lservices | ForEach-Object -process { 
        Write-Output "Trying to delete linked service $($_.name)"
        Remove-AzSynapseLinkedService -Name $_.name -WorkspaceName $WorkspaceName -Force -ErrorAction Continue
    }

    Start-Sleep 2 
    $depthCount += 1
    $lservices = Get-AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and  $_.Name -ne $WorkspaceDefaultSqlStorage) }
}
Write-Output "Depthcount: $depthCount"
if ($depthCount -eq 100)
{
    throw "Depthcount is to high!"
}
Write-Output "========================================"


You need to store this Powershell file in the repository as ClearSynapse.ps1. In this case we created a CICD folder in the root of the repository and within that folder we created a Powershell subfolder for all our PowerShell files. Then you can call this script in your YAML pipeline just before you do the deployment part. Make sure your service connection (Service Principal) has enough rights within the workspace. For the first post we used the Synapse Artifact Publisher role to minimize access. For running this script your Service Principal needs more: Synapse Administrator.
           ###################################
            # 4 Cleanup Synapse
            ###################################
            - task: AzurePowerShell@5
              displayName: '4 Cleanup Synapse'
              inputs:
                azureSubscription: ${{ parameters.ServiceConnection }}
                scriptType: filePath
                scriptPath: $(Pipeline.Workspace)\s\CICD\Powershell\ClearSynapse.ps1
                scriptArguments:
                  -WorkspaceName ${{ parameters.TargetWorkspaceName }} `
                  -ResourceGroupName ${{ parameters.ResourceGroupName }} `
                azurePowerShellVersion: latestVersion
                pwsh: true
                
           ###################################
            # 5 Validate and Deploy Synapse
            ###################################
            - task: Synapse workspace deployment@2
              displayName: '5 Validate and deploy Synapse'
              inputs:
                operation: validateDeploy
                ArtifactsFolder: '$(Pipeline.Workspace)/SynapseArtifact'
                azureSubscription: ${{ parameters.ServiceConnection }}
                ResourceGroupName: ${{ parameters.ResourceGroupName }}
                TargetWorkspaceName: ${{ parameters.TargetWorkspaceName }}
                OverrideArmParameters: '
                  -LS_AKV_MyKeyVault_properties_typeProperties_baseUrl                  https://${{ parameters.KeyVaultName }}.vault.azure.net/
                  '
Conclusion
In this post you learned how to cleanup Synapse with a little PowerShell script. This scripts works perfectly, but is a litte rough by just deleting all basic parts of your workspace (pipelines, datasets, linked services and triggers). A next / nicer version it will just delete everything that is in the Synapse Workspace but isn't in the repositorty (after the deployment).

Deleting stuff before deploying new stuff also makes is almost mandatory to use at least 3 environments because when your deployment fails you are left with an almost empty Synapse workspace. So an extra enviroment between development and production will prevent most deployment screwups.

Thank you Walter ter Maten for improving the script with the delete iterations.

1 comment:

  1. Note there is a bug when using private endpoints in Synapse. Get-AzSynapsePipeline : The requested operation requires an element of type 'String', but the target element has type 'Object'.

    ReplyDelete

All comments will be verified first to avoid URL spammers