We are using the Synapse workspace deployment add on for deploying Synapse to the next environment, but when I remove a pipeline/dataset/linked service/trigger in development it doesn't get deleted in Test, Acceptance or Production. There is a pre- and post-deployment script for Data Factory by Microsoft, but where is the Synapse version of it?
Clean Synapse Workspace before deployment |
Update: use DeleteArtifactsNotInTemplate: true in deployment task to avoid powershell
Solution
The deployment add on just deploys a new version over an existing version. Therefore deleted parts will remain in your workspace. For most parts this is ugly and annoying, but if obsoleet triggers are still executing pipelines it could screwup your ETL proces.
Conclusion
In this post you learned how to cleanup Synapse with a little PowerShell script. This scripts works perfectly, but is a litte rough by just deleting all basic parts of your workspace (pipelines, datasets, linked services and triggers). A next / nicer version it will just delete everything that is in the Synapse Workspace but isn't in the repositorty (after the deployment).
The deployment add on just deploys a new version over an existing version. Therefore deleted parts will remain in your workspace. For most parts this is ugly and annoying, but if obsoleet triggers are still executing pipelines it could screwup your ETL proces.
Below you will find a first version of a Powershell script that first removes all pipelines, datasets, linked services and triggers before deploying a new version of your Synapse workspace from the repository.
Note: Pipelines that are called by other pipelines can't be deleted. So you first need to delete the parent pipeline before you can delete the child pipeline. The scripts skips those child pipelines and continous with the rest. After this first delete iteration a lot of parent pipelines wont exist any more and allow you to remove the child pipelines in a second iteration. This is done in a loop and that loops stops after a100 iterations. So don't create a monstrous tree of pipelines calling each other (and especially don't create loops of pipelines calling each other). The same trick is used for Linked Services. If you have for example a Key Vault Linked Service that is used in an other Linked Service then you first need to delete that second Linked Service before you can delete the Key Vault Linked Service.
You need to store this Powershell file in the repository as ClearSynapse.ps1. In this case we created a CICD folder in the root of the repository and within that folder we created a Powershell subfolder for all our PowerShell files. Then you can call this script in your YAML pipeline just before you do the deployment part. Make sure your service connection (Service Principal) has enough rights within the workspace. For the first post we used the Synapse Artifact Publisher role to minimize access. For running this script your Service Principal needs more: Synapse Administrator.
Note: Pipelines that are called by other pipelines can't be deleted. So you first need to delete the parent pipeline before you can delete the child pipeline. The scripts skips those child pipelines and continous with the rest. After this first delete iteration a lot of parent pipelines wont exist any more and allow you to remove the child pipelines in a second iteration. This is done in a loop and that loops stops after a100 iterations. So don't create a monstrous tree of pipelines calling each other (and especially don't create loops of pipelines calling each other). The same trick is used for Linked Services. If you have for example a Key Vault Linked Service that is used in an other Linked Service then you first need to delete that second Linked Service before you can delete the Key Vault Linked Service.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | param ( [Parameter (Mandatory = $true , HelpMessage = 'Synapse name' )] [ValidateNotNullOrEmpty()] [string] $WorkspaceName , [Parameter (Mandatory = $true , HelpMessage = 'Resourcegroup name' )] [ValidateNotNullOrEmpty()] [string] $ResourceGroupName ) [string] $WorkspaceDefaultSqlServer = "$($WorkspaceName)-WorkspaceDefaultSqlServer" [string] $WorkspaceDefaultSqlStorage = "$($WorkspaceName)-WorkspaceDefaultStorage" ####################################################### # 1) Checking for resource locks and removing them ####################################################### Write -Output "========================================" Write -Output "1) Getting resource locks" # Getting all locks on the Azure Synapse Workspace $lock = Get -AzResourceLock -ResourceGroupName $ResourceGroupName -ResourceName $WorkspaceName -ResourceType "Microsoft.Synapse/workspaces" # Looping through all locks to remove them one by one Write -Output "========================================" Write -Output "Remove resource locks" if( $null -ne $lock ) { $lock | ForEach-Object -process { Write -Output "Removing Lock Id: $($lock.LockId)" # Remove lock Remove -AzResourceLock -LockId $_.LockId -Force } } ####################################################### # 2) Stopping and removing all triggers ####################################################### Write -Output "========================================" Write -Output "2) Remove triggers" # Getting all triggers from Synapse $triggers = Get -AzSynapseTrigger -WorkspaceName $WorkspaceName Write -Output "Found $($triggers.Count) triggers" # Stopping all triggers before deleting them $triggers | ForEach-Object -process { Write -Output "Stopping trigger $($_.name)" try { # Trying to stop each trigger Stop -AzSynapseTrigger -WorkspaceName $WorkspaceName -Name $($_.name) -ErrorAction Stop } catch { if ($_.Exception.Message -eq "{}" ) { Write -Output "Trigger stopped" # $_.Exception } else { Write -Output "Throw" Throw $_ } } # Remove trigger Remove -AzSynapseTrigger -Name $_.name -WorkspaceName $WorkspaceName -Force } ####################################################### # 3) Removing all pipelines ####################################################### Write -Output "========================================" Write -Output "3) Remove pipelines" # Getting all pipelines from Synapse $pipelines = Get -AzSynapsePipeline -WorkspaceName $WorkspaceName | Sort-Object -Property id Write -Output "Found $($pipelines.Count) pipelines" # Trying to delete all pipelines. If a pipeline is still referenced # by an other pipeline it will continue to remove other pipelines # before trying to remove it again... max 100 times. So don't create # chains of pipelines that are too long [int] $depthCount = 0 while ( $pipelines .Count -gt 0 -and $depthCount -lt 100) { Write -Output "$($pipelines.Count) pipelines left" $pipelines | ForEach-Object -process { Write -Output "Trying to delete pipeline $($_.name)" Remove -AzSynapsePipeline -Name $_.name -WorkspaceName $WorkspaceName -Force -ErrorAction SilentlyContinue } Start-Sleep 2 $depthCount += 1 $pipelines = Get -AzSynapsePipeline -WorkspaceName $WorkspaceName } Write -Output "Depthcount: $depthCount" if ( $depthCount -eq 100) { throw "Depthcount is to high!" } ####################################################### # 4) Removing all notebooks ####################################################### Write -Output "========================================" Write -Output "4) Remove notebooks" # Getting all notebooks from Synapse $notebooks = Get -AzSynapseNotebook -WorkspaceName $WorkspaceName Write -Output "Found $($notebooks.Count) notebooks" # Loop through all notebooks to delete them $notebooks | ForEach-Object -process { Write -Output "Deleting notebooks $($_.Name)" Remove -AzSynapseNotebook -Name $($_.Name) -WorkspaceName $WorkspaceName -Force } ####################################################### # 5) Removing all SQL scripts ####################################################### Write -Output "========================================" Write -Output "5) Remove SQL scripts" # Getting all scripts from Synapse $sqlscripts = Get -AzSynapseSqlScript -WorkspaceName $WorkspaceName Write -Output "Found $($sqlscripts.count) SQL-scripts" # Loop through all SQL scripts to delete them $sqlscripts | ForEach-Object -Process { Write -Output "Deleting SQL-script $($_.Name)" Remove -AzSynapseSqlScript -Name $($_.Name) -WorkspaceName $WorkspaceName -Force } ####################################################### # 6) Removing all datasets ####################################################### Write -Output "========================================" Write -Output "6) Remove datasets" # Getting all datasets from Synapse $datasets = Get -AzSynapseDataset -WorkspaceName $WorkspaceName Write -Output "Found $($datasets.Count) datasets" # Loop through all datasets to delete them $datasets | ForEach-Object -process { Write -Output "Deleting dataset $($_.name)" Remove -AzSynapseDataset -Name $_.name -WorkspaceName $WorkspaceName -Force } ####################################################### # 7) Removing all linked services ####################################################### Write -Output "========================================" Write -Output "7) Collecting Linked services" # Getting all linked services from Synapse, except the two default ones $lservices = Get -AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and $_.Name -ne $WorkspaceDefaultSqlStorage ) } Write -Output "Found $($lservices.Count) linked services" # Trying to delete all linked services. If a linked service is still # referenced by an other linked service it will continue to remove # other linked services before trying to remove it again... # max 100 times. Example: KeyVault linked services $depthCount = 0 while ( $lservices .Count -gt 0 -and $depthCount -lt 100) { Write -Output "$($lservices.Count) linked services left" $lservices | ForEach-Object -process { Write -Output "Trying to delete linked service $($_.name)" Remove -AzSynapseLinkedService -Name $_.name -WorkspaceName $WorkspaceName -Force -ErrorAction Continue } Start-Sleep 2 $depthCount += 1 $lservices = Get -AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and $_.Name -ne $WorkspaceDefaultSqlStorage ) } } Write -Output "Depthcount: $depthCount" if ( $depthCount -eq 100) { throw "Depthcount is to high!" } Write -Output "========================================" |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ################################### # 4 Cleanup Synapse ################################### - task: AzurePowerShell@5 displayName: '4 Cleanup Synapse' inputs: azureSubscription: ${{ parameters.ServiceConnection }} scriptType: filePath scriptPath: $(Pipeline.Workspace)\s\CICD\Powershell\ClearSynapse.ps1 scriptArguments: -WorkspaceName ${{ parameters.TargetWorkspaceName }} ` -ResourceGroupName ${{ parameters.ResourceGroupName }} ` azurePowerShellVersion: latestVersion pwsh: true ################################### # 5 Validate and Deploy Synapse ################################### - task: Synapse workspace deployment@2 displayName: '5 Validate and deploy Synapse' inputs: operation: validateDeploy ArtifactsFolder: '$(Pipeline.Workspace)/SynapseArtifact' azureSubscription: ${{ parameters.ServiceConnection }} ResourceGroupName: ${{ parameters.ResourceGroupName }} TargetWorkspaceName: ${{ parameters.TargetWorkspaceName }} OverrideArmParameters: ' -LS_AKV_MyKeyVault_properties_typeProperties_baseUrl https://${{ parameters.KeyVaultName }}.vault.azure.net/ ' |
In this post you learned how to cleanup Synapse with a little PowerShell script. This scripts works perfectly, but is a litte rough by just deleting all basic parts of your workspace (pipelines, datasets, linked services and triggers). A next / nicer version it will just delete everything that is in the Synapse Workspace but isn't in the repositorty (after the deployment).
Deleting stuff before deploying new stuff also makes is almost mandatory to use at least 3 environments because when your deployment fails you are left with an almost empty Synapse workspace. So an extra enviroment between development and production will prevent most deployment screwups.
Thank you Walter ter Maten for improving the script with the delete iterations.