Microsoft BI Tools: POWERSHELL

Showing posts with label POWERSHELL. Show all posts

Tuesday, 29 October 2024

Synapse - Execute pipeline after deployment

Case
I want to execute a Synapse pipeline right after the deployment via Azure DevOps. How do you accomplish that?

Execute pipeline after deployment

Solution
We can use a PowerShell task and then use that to execute a Create Pipeline Run Rest API. This will be an asynchronous execution (shoot and forget). If you also want to know the result of the pipeline then you have to create an additional WHILE or UNTIL loop and use the Get Pipeline Run Rest API. Note that you will then keep your DevOps agent busy while waiting.

1) Create Synapse pipeline
First create a pipeline in Synapse and name it for example pl_Master_Deployment. In this pipeline you can add all the activities for tasks that you want to do right after the deployment. For testing purposes you can add a simple wait activity.

2) Add PowerShell script
To execute the Synapse pipeline we will use a PowerShell script and a Rest API call to Pipeline - Create Pipeline Run. This Rest API will only start the pipeline but it doesn't wait for it to finish. So it's an asynchronously call. The PowerShell script is stored in the repository as a PowerShell file called SynapseExecutePipeline.ps1 and we will call it via the YAML file using an AzurePowerShell@5 task.

The script has only two parameters. First the Synapse WorkspaceName so you can use it to execute piplines in all environments of your DTAP. The second is the name of the pipeline.

If your pipeline name contains a space or an other 'forbidden' character then the EscapeDataString method will make sure it fits in a URL.

<#
    .SYNOPSIS
    This script executes a Synapse pipeline during the CICD process asynchronously. This means it doesn't wait for it to succeed (or fail).
    .PARAMETER WorkspaceName
    Name of the Synapse Workspace where you want to execute a pipeline.
    .PARAMETER PipelineName
    The name of the synapse pipeline that you want to execute.
    .EXAMPLE
    SynapseExecutePipeline.ps1 -SynapseExecutePipeline MySynapse `
                               -PipelineName MyPipeline
#>
Param(
   [Parameter(Mandatory=$true,
   HelpMessage="Name of your Synapse workspace.")]
   [ValidateNotNullOrEmpty()]
   [Alias("SynapseWorkspaceName")]
   [string]
   $WorkspaceName,

   [Parameter(Mandatory=$true,
   HelpMessage="Name of the pipeline that you want to execute")]
   [ValidateNotNullOrEmpty()]
   [Alias("SynapsePipelineName")]
   [string]
   $PipelineName
)

# Make sure that characters like spaces are encode for the URL
$EncodePipelineName = [System.Uri]::EscapeDataString("$($PipelineName)")

# Determine the URI of the REST API
$RestApiURL = "https://$($WorkspaceName).dev.azuresynapse.net/pipelines/$($EncodePipelineName)/createRun?api-version=2020-12-01"

Write-Host "Executing $($PipelineName) asynchronously in the Synapse workspace $($WorkspaceName)."
Write-Host "Rest API uri $($RestApiURL)"

# Create a dummy json because POST needs one. However not needed at the moment because there are no parameters
# Adjust script en json if you need paramters: https://learn.microsoft.com/en-us/rest/api/synapse/data-plane/pipeline/create-pipeline-run?view=rest-synapse-data-plane-2020-12-01&tabs=HTTP
$payload = @{dummy="dummy"} | ConvertTo-Json -Depth 3
$Result = Invoke-AzRestMethod -Method POST -Uri $RestApiURL -Payload $payload
if ($Result.StatusCode -lt 300)
{
   Write-Host "Asynchronous call of $($PipelineName) is done"
}
else
{
   Write-Error "##vso[task.logissue type=error]Something went very wrong. Pipeline not found or no access?"
   exit 1
}

3) Add YAML task

In your deployment YAML we need to add an additional task. Put it somewhere after the Synapse workspace deployment@2 task or after the toggle-triggers-dev@2 task that will enable your triggers.

For the Azure Subscription parameter you can reuse the same Azure DevOps Service Connection as in your Synapse deployment task. If your Service Principal only got the role Synapse Artifact Publisher, then you need to give it an addtional role that also allows you to execute pipelines. For example Synapse Credential User or Synapse Administrator.

The WorkspaceName parameter can also reuse the value of the Synapse deployment task, but the PipelineName is hardcoded in this example. However you could ofcourse create a variable and/or parameter for this to make it easier to adjust.

            ###################################
            # Validate and Deploy Synapse
            ###################################
            - task: Synapse workspace deployment@2
              displayName: 'Validate and Deploy Synapse'
              inputs:
                operation: validateDeploy
                ArtifactsFolder: '$(Pipeline.Workspace)/SynapseArtifact'
                azureSubscription: ${{ parameters.ServiceConnection }}
                ResourceGroupName: ${{ parameters.Synapse_ResourceGroupName }}
                TargetWorkspaceName: ${{ parameters.Synapse_WorkspaceName }}
                DeleteArtifactsNotInTemplate: true
                DeployManagedPrivateEndpoints: true
                OverrideArmParameters: '
				  -LS_AKV_Secrets_properties_typeProperties_baseUrl				https://${{ parameters.KeyVault_Name }}.vault.azure.net/
				  -LS_ADLS_Datalake_properties_typeProperties_url				https://${{ parameters.Datalake_Name }}.dfs.core.windows.net/
				  -LS_ASQL_Metadata_connectionString							${{ parameters.Metadata_Connectionstring }}
                  '

###################################
# Execute Synapse pipeline
###################################
- task: AzurePowerShell@5
  displayName:  'Execute Synapse pipeline'
  inputs:
	azureSubscription: ${{ parameters.ServiceConnection }}
	scriptType: filePath
	scriptPath: $(Pipeline.Workspace)\s\CICD\PowerShell\SynapseExecutePipeline.ps1
	scriptArguments:
	  -WorkspaceName ${{ parameters.Synapse_WorkspaceName }} `
	  -PipelineName "pl_Master_Deployment"
	azurePowerShellVersion: latestVersion
	pwsh: true

Conclusion

In this short example we showed how to execute a Synapse pipeline right after deployment. You can use it for example to do some cleanup or setup in your datalake or lake database. Or use it to run your ETL for testing purposes.

And instead of running the pipeline immediately you could also create a (onetime) trigger and schedule it for the next evening which could be useful if you have multiple deployments during the day to the same environment.

Tuesday, 1 October 2024

Synapse - Invalid notebook reference

Case
During the deployment of Synapse I'm getting an invalid referene error for my notebooks. They are referencing Spark pools that only exists in the Development workspace.

The document creation or update failed because of invalid reference 'SparkPoolJoost' 
An error occurred during execution: Error: Failed to fetch the deployment status 
{"code":"400","message":"CreateOrUpdateNotebook failed: [statusCode from ADF:BadRequest, 
ErrorMessage: {\"code\":\"BadRequest\",\"message\":\"The document creation or update failed because of invalid reference 'bitools'.\",
\"target\":\"/subscriptions/aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee/resourceGroups/rg-bitools/providers/Microsoft.Synapse/workspaces/bitools-tst/notebooks/myNotebook\",
\"details\":null,\"error\":null}, workspace: yp-tst, notebook: myNotebook, ArtifactId: a4581d64-96d3-4041-9ac4-ccc0d7235cc4]"}

Solution
This deployment error often happens when you don't have the same set of Spark pools in each environment of your DTAP. If one of your notebooks is still referencing a Spark Pool that doesn't exists in the target workspace then the deploymenty proces will throw an invalid reference error.

We often have one general Spark Pool doing all the work and that is available in each Synapse workspace, but also have a whole bunch of Spark Pools in development letting multiple developers do some work without getting in each other's way. Nothing is more enoying then waiting for a colleague to release the spark nodes.

This deployment error can easily be solved by changing the attached Spark Pool in the notebook before you start deploying. This is ofcourse taking a lot of extra time (and patience) each time you forget to select the right Spark Pool that exists in all workspaces.

Attached Spark Pool

You can make it a little less enoying by adding an override to the YAML task Synapse workspace deployment@2 for the notebook property bigDataPool referenceName. You can even do this manually for a couple of notebooks, but the more notebooks you have the more enoying it gets.

###################################
# Validate and Deploy Synapse
###################################
- task: Synapse workspace deployment@2
  displayName: 'Validate and Deploy Synapse'
  inputs:
	operation: validateDeploy
	ArtifactsFolder: '$(Pipeline.Workspace)/SynapseArtifact'
	azureSubscription: ${{ parameters.ServiceConnection }}
	ResourceGroupName: ${{ parameters.Synapse_ResourceGroupName }}
	TargetWorkspaceName: ${{ parameters.Synapse_WorkspaceName }}
	DeleteArtifactsNotInTemplate: true
	DeployManagedPrivateEndpoints: true
	OverrideArmParameters: '
	  -LS_AKV_Secrets_properties_typeProperties_baseUrl				https://${{ parameters.KeyVault_Name }}.vault.azure.net/
	  -LS_ADLS_Datalake_properties_typeProperties_url				https://${{ parameters.Datalake_Name }}.dfs.core.windows.net/
	  -LS_ASQL_Metadata_connectionString							${{ parameters.Metadata_Connectionstring }}
	  -NB_myFirstNotebook_properties_bigDataPool_referenceName		${{ parameters.Synapse_SparkpoolName }}
	  -NB_mySecondNotebook_properties_bigDataPool_referenceName		${{ parameters.Synapse_SparkpoolName }}
	  -NB_myThirdNotebook_properties_bigDataPool_referenceName		${{ parameters.Synapse_SparkpoolName }}
	'

You can make it even easier by generating an override for each notebook in your Synapse workspace. For this we need these three steps:

1) Create variable
First create a YAML variable in your pipeline. For this example we used the name OverrideParams. The value is just an empty string.

jobs:
    - deployment: DeploymentJob${{ parameters.Env }}
      displayName: Deployment Job ${{ parameters.Env }} 
      environment: Deploy-to-${{ parameters.Env }}
      variables: 
      - name: OverrideParams
        value: ""

      strategy:
        runOnce:
          deploy:
            steps:

2) Add PowerShell task
Next is adding a PowerShell task that will be filling the above variable with an override list. The PowerShell loops through all your notebooks in the artifact. It creates one override for each notebook and stores it in a PowerShell string variable. The last step is to use the PowerShell variable to fill the YAML variable.

            ##################################
            # Edit Notebook sparkpool reference
            ##################################
            - powershell: |
                # Determine notebook subfolder in synapse artifact
                $Path = Join-Path -Path "$(Pipeline.Workspace)" -ChildPath "SynapseArtifact\notebook\"

                # Get all notebook files
                $notebooks = Get-ChildItem -Path $Path

				# Create string variable for all overrides
                [string]$overridelist = ""

                # Loop through notebook files
                foreach ($notebook in $notebooks)
                {
				  # Generate an override for each notebook, make sure to end
				  # with a space to separate each override. Don't use a line
				  # feed or carriage return, because the value should be 1 line
                  $overridelist += "-$($notebook.Basename)_properties_bigDataPool_referenceName   mysparkpool "
                }
				# Show list for debug purposes
                Write-Host "overridelist:`r`n$($overridelist)"
				# Fill the YAML variable value with the value of the PowerShell variable
                Write-Host "##vso[task.setvariable variable=OverrideParams;]$overridelist"
              displayName: 'Edit Notebook sparkpool reference'

Note if you are using the template files then you have to create an alternative loop

3) Use YAML variable in Override
Last step is to add the YAML variable in the OverrideArmParameters part. Now run your deployment and see your invalid reference errors disappear!

            ###################################
            # Validate and Deploy Synapse
            ###################################
            - task: Synapse workspace deployment@2
              displayName: 'Validate and Deploy Synapse'
              inputs:
                operation: validateDeploy
                ArtifactsFolder: '$(Pipeline.Workspace)/SynapseArtifact'
                azureSubscription: ${{ parameters.ServiceConnection }}
                ResourceGroupName: ${{ parameters.Synapse_ResourceGroupName }}
                TargetWorkspaceName: ${{ parameters.Synapse_WorkspaceName }}
                DeleteArtifactsNotInTemplate: true
                DeployManagedPrivateEndpoints: true
                OverrideArmParameters: '
				  -LS_AKV_Secrets_properties_typeProperties_baseUrl				https://${{ parameters.KeyVault_Name }}.vault.azure.net/
				  -LS_ADLS_Datalake_properties_typeProperties_url				https://${{ parameters.Datalake_Name }}.dfs.core.windows.net/
				  -LS_ASQL_Metadata_connectionString							${{ parameters.Metadata_Connectionstring }}
                  $(OverrideParams)
                  '

Conclusion
This solution uses a little, fairly simple PowerShell script so solve all your invalid reference errors during deployment. It doesn't change the Spark Pool in the Notebook Activity, but just the default Spark Pool in the notebook itself. So you can still have multiple Spark Pools for various jobs if you set it in the Notebook Activity.

As mentioned before, if you use the template files TemplateForWorkspace.json and TemplateParametersForWorkspace.json for the deployment then you have to retrieve the JSON objects for the notebooks in those files and create a similar loop as in step 2.

Monday, 1 January 2024

Toggle Azure Synapse Triggers during deployment

Case
To turn of the Synapse triggers before deployment and turning them back of after deployment, we used a PowerShell script for ADF that we rewritten for Synapse. However now there is a new activity available that avoids the use of scripting, but it is a bit hidden if you mainly use YAML for pipelines.

Azure Synapse Toggle Triggers Dev (Preview)

Solution
If you go the release pipelines (not YAML, but the visual variant) and under add task search for Synapse then you will find the Synapse workspace deployment activity, but also the Azure Synapse Toggle Triggers Dev. It's already out there for a couple months but still preview (probably to busy with Fabric at the moment). For these tasks you need to add the Synapse addon to Azure DevOps, but if you are already using the deployment task then you already have it. The postfix of the taskname is Dev, so it is to be expected that they will remove it once it is General Available.

Synapse Tasks

Since we have a preference for YAML we need to know what the task name is and which properties are available. There is no YAML documentation available, but it you fill in the form in the release pipeline then you can view the YAML code. And by clicking on the circled i icon you will get some more info about the properties.

View YAML code

First the YAML code for toggling off all triggers which you need to do before deploying a new version of Synapse. With and without parameters.

steps:
- task: AzureSynapseWorkspace.synapsecicd-deploy.toggle-trigger.toggle-triggers-dev@2
  displayName: 'Toggle Azure Synapse Triggers'
  inputs:
    azureSubscription: SP-synw-d
    ResourceGroupName: 'dwh-synw-d-we-001'
    WorkspaceName: 'dwh-synw-d-we-001'
    ToggleOn: false
    Triggers: '*'
    
    
- task: AzureSynapseWorkspace.synapsecicd-deploy.toggle-trigger.toggle-triggers-dev@2
  displayName: 'Toggle Azure Synapse Triggers'
  inputs:
    azureSubscription: ${{ parameters.SerCon }}
    ResourceGroupName: ${{ parameters.Synapse_ResourceGroupName }}
    WorkspaceName: ${{ parameters.Synapse_WorkspaceName }}
    ToggleOn: false
    Triggers: '*'

Toggle all trigger OFF

Then the same code but now for enabling certain triggers. Again with and without parameters.

steps:
- task: toggle-triggers-dev@2
  displayName: 'Toggle Azure Synapse Triggers'
  inputs:
    azureSubscription: SP-synw-d
    ResourceGroupName: 'dwh-synw-d-we-001'
    WorkspaceName: 'dwh-synw-d-we-001'
    ToggleOn: true
    Triggers: 'TR_DAILY_RUN_DEV,TR_DAILY_DOWNSCALE_GEN'


- task: toggle-triggers-dev@2
  displayName: 'Toggle Azure Synapse Triggers'
  inputs:
    azureSubscription: ${{ parameters.SerCon }}
    ResourceGroupName: ${{ parameters.Synapse_ResourceGroupName }}
    WorkspaceName: ${{ parameters.Synapse_WorkspaceName }}
    ToggleOn: true
    Triggers: '${{ parameters.Synapse_EnableTriggers }}'

Toggle specific triggers ON

You can remove the long prefix in the task name and just keep toggle-triggers-dev@2. Property ToggleOn set to 'false' means stop the triggers and 'true' means start the mentioned triggers. The triggers property shoud contain a '*' to stop/start everything, but you probably only want to enable certain triggers on each environment. In that case you can use 'trigger1,trigger2,trigger3' to do that, but without spaces around each comma. With extra spaces you will get an error that the trigger is not found. In the error message you can see the extra space in front of the trigger name.

##[error]Refer to above logs for more details:
The Trigger TR_DAILY_DOWNSCALE_GEN was not found

An other issue is that the Triggers property must be filled. If you have one environment where you don't want to start any of the triggers then you will get an error saying that the property is required. I would rather see a warning instead or an other user friendly solution that works with some of the YAML shortcomings.

##[error]Unhandled: Input required: Triggers

You can solve this by adding a YAML condition to the task where you check whether the list of triggers is empty. If it is, the task will be skipped. Unfortunately you cannot use a parameter in a condition. The workaround is to read directly from a variable group:

- task: toggle-triggers-dev@2
  displayName: 'Toggle Azure Synapse Triggers'
  condition: ne(replace(variables.Synapse_EnableTriggers, ' ', ''), '')
  inputs:
    azureSubscription: ${{ parameters.SerCon }}
    ResourceGroupName: ${{ parameters.Synapse_ResourceGroupName }}
    WorkspaceName: ${{ parameters.Synapse_WorkspaceName }}
    ToggleOn: true
    Triggers: '${{ parameters.Synapse_EnableTriggers }}'

Alternatives

If you don't like that workaround there are two alternatives until Microsoft fixes the issue. You could use the OverrideParameters option to override the endTime property of a trigger during the Synapse deployment as showed here for Data Factory or use a PowerShell activity to enable the triggers just like for ADF. For this PowerShell option you need to create a PowerShell file in the repository under \CICD\PowerShell with the name SetTriggers.ps1 and the following code (Synapse version):

param
(
    [parameter(Mandatory = $true)] [String] $WorkspaceName,
    [parameter(Mandatory = $true)] [String] $ResourceGroupName,
    [parameter(Mandatory = $true)] [string] $EnableTriggers,
    [parameter(Mandatory = $false)] [Bool] $DisableAllTriggers = $true
)

Write-Host "Checking existance Resource Group [$($ResourceGroupName)]..."
Get-AzResourceGroup -Name $ResourceGroupName > $null
Write-Host "- Resource Group [$($ResourceGroupName)] found."

Write-Host "Checking existance Synapse Workspace [$($WorkspaceName)]..."
Get-AzSynapseWorkspace -ResourceGroupName $ResourceGroupName `
              -Name $WorkspaceName > $null
Write-Host "- Synapse Workspace [$($WorkspaceName)] found."

#Getting triggers
Write-Host "Looking for triggers..."
$Triggers = Get-AzSynapseTrigger -WorkspaceName $WorkspaceName
Write-Host "Found [$($Triggers.Count)] trigger(s)"

# Checking provided triggernames, first split into array
$EnabledTriggersArray = $EnableTriggers.Split(",")
Write-Host "Checking existance of ($($EnabledTriggersArray.Count)) provided triggernames."
# Loop through all provided triggernames
foreach ($EnabledTrigger in $EnabledTriggersArray)
{ 
    # Get Trigger by name
    $CheckTrigger = Get-AzSynapseTrigger    -WorkspaceName $WorkspaceName `
                                            -Name $EnabledTrigger `
                                            -ErrorAction Ignore # To be able to provide more detailed error

    # Check if trigger was found
    if (!$CheckTrigger)
    {
        throw "Trigger $($EnabledTrigger) not found in Synapse Workspace $($WorkspaceName) within resource group $($ResourceGroupName)"
    }
}
Write-Host "- All ($($EnabledTriggersArray.Count)) provided triggernames found."

##############################################
# Disable triggers
##############################################
# Check if all trigger should be disabled
if ($DisableAllTriggers)
{
   # Get all enabled triggers and stop them (unless they should be enabled)
   Write-Host "Getting all enabled triggers that should be disabled."
   $CurrentTriggers = Get-AzSynapseTrigger    -WorkspaceName $WorkspaceName `
                      | Where-Object {$_.RuntimeState -ne 'Stopped'} `
                      | Where-Object {$EnabledTriggersArray.Contains($_.Name) -eq $false}

   # Loop through all found triggers
   Write-Host "- Number of triggers to disable: $($CurrentTriggers.Count)."
   foreach ($CurrentTrigger in $CurrentTriggers)
   {
        # Stop trigger
        Write-Host "- Stopping trigger [$($CurrentTrigger.Name)]."
        try {
            Stop-AzSynapseTrigger -WorkspaceName $WorkspaceName -Name $CurrentTrigger.Name > $null
        } catch {
            Write-Host "error code 1, but disabling trigger that already is disabled"
        }
   }
}

##############################################
# Enable triggers
##############################################
# Loop through provided triggernames and enable them
Write-Host "Enable all ($($EnabledTriggersArray.Count)) provided triggers."

foreach ($EnabledTrigger in $EnabledTriggersArray)
{                   
    # Get trigger details
    $CheckTrigger = Get-AzSynapseTrigger     -WorkspaceName $WorkspaceName `
                                               -Name $EnabledTrigger

    # Check status of trigger
    if ($CheckTrigger.RuntimeState -ne "Started")
    {
        try {
            Write-Host "- Trigger [$($EnabledTrigger)] starting. This will take only a few seconds..."
            Start-AzSynapseTrigger      -WorkspaceName $WorkspaceName `
                                        -Name $EnabledTrigger
        } catch {
            Throw "Error enabling trigger '$EnabledTrigger': $Error[0].Message"
            exit 1
        }
    }
    else
    {
        Write-Host "- Trigger [$($EnabledTrigger)] already started"
    }
}

Then the YAML Code for after the deployment step:

- task: AzurePowerShell@5
  displayName: 'Enabling triggers per environment'
  inputs:
    azureSubscription: ${{ parameters.SerCon }}
    scriptType: filePath
    scriptPath: $(Pipeline.Workspace)\s\CICD\PowerShell\Set_Triggers.ps1
    scriptArguments:
      -WorkspaceName ${{ parameters.Synapse_WorkspaceName }} `
      -ResourceGroupName ${{ parameters.Synapse_ResourceGroupName }} `
      -EnableTriggers "${{ parameters.Synapse_EnableTriggers }}"
    azurePowerShellVersion: latestVersion
    pwsh: true

Conclusion

Microsoft is introducing a very promising new DevOps task for Synapse that will make you CICD task much easier. Hopefully it will be G.A. soon and the small bugs will be solved. In the mean while you could use one of the alternatives if you don't want to use a preview version of the task.

An other solution for Microsoft could be to just add these options in the current deployment task since you always need to stop all triggers first and than start some of the triggers once the new version has been deployed. Is there a good reason to have these as two separate tasks in DevOps? Let it know in the comments below.

ADDITION
The new task toggle-triggers-dev@2 does not yet work with the new type of Azure Service Connection with Workload Identity federation. This will result in the following error: {"code":"U002","message":"The service connection authScheme WorkloadIdentityFederation is not supported by this task"}.

The service connection authScheme
WorkloadIdentityFederation is not supported by this task

Friday, 5 May 2023

Synapse - Automatically check naming conventions

Case
We use naming conventions in Synapse, but sometimes it's just a lot of work to check if everything is correct. Is there a way to automatically check those naming conventions?

Automatically check Naming Conventions

Solution

You can use a PowerShell script to loop through the JSON files from Synapse that are stored in the repository. Prefixes for Linked Services, Datasets, Pipelines, Notebooks, Dataflows and scripts are easy to check since that is just a case of checking the filename of the JSON file. If you also want to check activities or use the Linked Service or Dataset type to check the naming conventions then you also need to check the contents of those JSON files.

There are a lot of different naming conventions, but as long as you are committed to use one, it will make your Synapse workspace more readable and your logs easier to understand. In one glance you will immediately see which part of Synapse is causing the error. Above all it looks more professional and it show you took the extra effort to make it better.

However, having a lot of different naming conventions also makes it is hard to create one script to rule them all. We created a script to check the prefixes of all differents parts of Synapse and you can configure them in a JSON file. You can use this script to either run it once a while to occasionally check your workspace or run it as a Validation step for a pull request in Azure DevOps. Then it acts like a gatekeep that doesn't allow bad named items. You can make it a required step then you first have to solve the issues or make it an optional check and then you will get the result, but you can choose to ignore it. For new projects you should make it required and for big existing projects you should probably first make it optional for a while and then change it to required.

The PowerShell script, the YAML file and the JSON config example are stored in a public GitHub site. This allows us to easily improve the code for you and keep this blog post up-to-date. It also allows you to help out by doing suggestions or even to write some better code.

1) Folder structure repository

Just like the Synapse Deployment scripts we store these validation files in the Synapse respository. We have CICD and a SYN folder in the root. SYN contains the JSON files from the Synapse workspace. The CICD folder had three sub folders: JSON (for the config), PowerShell (for the actual script) and YAML (for the pipeline that is required for validation).

Download the files from the Github Repository and store these in your own Repository according the structure described above. If you use a different structure you have to change the paths in the YAML file.

Repository structure

2) Create pipeline

Now create a new pipeline with the existing YAML file called NamingValidation4Synapse.yml. Make sure the paths in the YAML are following the folder structure from step 1 or change it to your own structure.

If you are not sure about the folders then there is a treeview step in the YAML that will show you the structure of the agent. Just continue with the next steps and after the first run check the result of the treeview step and change the paths in the YAML and run again. You can remove or comment-out the treeview step when everything works.

This example uses the Azure DevOps respository with the following steps:

Go to pipelines and create a new pipeline
Select the Azure Repos Git
Select the Synapse repository
Choose Existing Azure Pipelines YAML file
Choose the right branch and select the YAML file under path
Save it and rename it because the default name is equals to the repos name

Create new pipeline with existing YAML file

3) Branch validation

Now that we have the new YAML pipeline, we can use it as a Build Validation in the branch policies. The example shows how to add them in an Azure DevOps repository.

In DevOps go to Repos in the left menu.
Then click branches to get all branches.
Now hover you mouse above the first branch and click on the 3 vertical dots.
Click Branch policies
Click on the + button in the Build Validation section.
Select the new pipeline created in step 2 (optionally change the Display name)
Choose the Policy requirement (Required or Optional)
Click on the Save button

Repeat these steps for all branches where you need the extra check. Don't add them on feature branches because it will also prevent you doing manual changes in these branches.

All branches

Required or Optional

4) Testing

Now its time to perform a pull request. You will see that the validation will first be queued. So this extra validation will take a little extra time, especially when you have a busy agent. However you can just continue working and wait for the approval or even press Set auto-complete to automatically complete the Pull Request when all approvals and validations are validated. However don't auto-complete if you chose to make it an optional validation because then it will cancel it as soon as all other validations are ready.

As soon as the validation is ready it will show you the first couple if errors.

Required Naming Convention Validation failed

You can click on those couple of errors to see the total number of errors and the first ten errors.

Get error count and first 10 errors

And you can click on one of those errors to see the entire output including all errors and all correct items.

And at the bottom you will find a summary with the total number of errors and percentages per Synapse part.

Summary example

Or download your Synapse JSON files and run the PowerShell script locally with for example Visual Studio Code.

Running Naming Validations in VCode

Conclusion

In this post you learned how you could automate your naming conventions check. This helps/forces your team to consistantly use the naming conventions without you being some kind of nitpicking police officer. Everybody can just blame DevOps/themselves.

You can combine this validation with for example the branch validation to prevent accidentily choosing the wrong branch in a Pull Request.

Please submit your suggestions under Issues (and then bug report of feature request) in the GitHub site or drop it in the comments below.

Friday, 28 April 2023

DevOps - Build Validation to check branches

Case
We sometimes mess up when creating pull requests in Azure DevOps by accidentally selecting the wrong source or target branch. These mistakes caused unwantend situations where stuff got promoted to the wrong environment to early. Resolving those mistakes often take a lot of time. Is there a way to prevent these easily made mistakes?

Build Validation for Pull Requests

Solution
The first option is to setup a four-eyes principle where someone else has to approve your work including choosing the right branches for your pull request. This can be done by setting the branch policies. Note that you need to do this for all your branches (excluding feature/personal branches).

Setting minimum number of reviewers for changes

Onother option that you could use is build validations where you compare the source branch with the target branch in a script. Even better is to combine these two options, but if you (sometimes) work alone then this could be an alternative.

Note that this solution only works if you have at lease two branches and that you are also using feature branches.

1) Create extra YAML file

Add an extra YAML file called ValidatePullRequest.yml in your your current YAML folder.

Pipeline for validating pull request

The code of the new file can be found below, but you have to change two things in this code. The first thing is the names from you branches in order (on line 14). If you only have a main and a development branch then it will be [String[]]$branches = "main", "development". The second item is the name/wildcard of hotfix/bugfix branches that will need to be ignored by this check (on line 17). So if your bugfix branches always contain the word hotfix then it will become [String]$fixBranch = "hotfix". The script uses a like to compare the name.

trigger: none
steps:
- task: PowerShell@2
  displayName: 'Validate branches in Pull Request'
  inputs:
    targetType: 'inline'
    script: | 
        #######################################################################
        # PARAMETERS FOR SCRIPT
        #######################################################################
        # All branches in order. You can only do a pull request one up or down.
        # Feature and / or personal branches can only be pulled to or from the
        # latest branch in the row.
        [String[]]$branches = "main", "acceptance", "test", "development", "sprint"

        # Bugfix or hotfix branches can be pulled to and from all other branches
        [String]$fixBranch = "bugfix"


        #######################################################################
        # DO NOT CHANGE CODE BELOW
        #######################################################################
        $SourceBranchName = "$(System.PullRequest.SourceBranch)".toLower().Replace("refs/heads/", "") # sourceBranchName does not exist
        $TargetBranchName = "$(System.PullRequest.targetBranchName)"

        function getBranchNumber
        {
            <#
                .SYNOPSIS
                Get the order number of the branch by looping through all branches and checking then branchname
                .PARAMETER BranchName
                Name of the branch you want to check
                .EXAMPLE
                getBranchNumber -BranchName "myBranch"
            #>
            param (
                [string]$BranchName
            )
            # Loop through branches array to find a specific branchname
            for ($i = 0; $i -lt $branches.count; $i++)
            {
                # Find specific branchname
                if ($branches[$i] -eq $BranchName)
                {
                    # Return branch order number
                    # (one-based instead if zero-based)
                    return $i + 1
                }
            }
            # Unknown branch = feature branch
            return $branches.count + 1
        }


        # Retrieve branch order
        $SourceBranchId = getBranchNumber($SourceBranchName)
        $TargetBranchId = getBranchNumber($TargetBranchName)

        # Show extra information to check the outcome of the check below
        Write-Host "All branches in order: [$($branches -join "] <-> [")] <-> [feature branches]."
        Write-Host "Checking pull request from $($SourceBranchName) [$($SourceBranchId)] to $($TargetBranchName) [$($TargetBranchId)]."

        if ($SourceBranchName -like "*$($fixBranch)*")
        {
            # Pull request for bugbix branches are unrestricted
            Write-Host "Pull request for Bugfix or hotfix branches are unrestricted."
            exit 0
        }
        elseif ([math]::abs($SourceBranchId-$TargetBranchId) -le 1)
        {
            # Not skipping branches or going from feature branch to feature branch 
            Write-Host "Pull request is valid."
            exit 0
        }
        else
        {
            # Invallid pull request that skips one or more branches
            Write-Host "##vso[task.logissue type=error]Pull request is invalid. Skipping branches is not allowed."
            exit 1
        }

In this GitHub repository you will find the latest version of the code

Note 1: The PowerShell script within the YAML either returns 0 (success) of 1 (failure) and that the Write-Host contains a little ##vso block. This allows you to write errors that will show up in the logging of DevOps.

Note 2: The script is fairly flexible in the number of branches that you want use, but it does require a standard order of branches. All unknown branches are considered being feature branches. If you have a naming conventions for feature branches then you could refine the IF statement to also validate those naming conventions to annoy/educate your co-workers even more.

2) Create new pipeline

Now create a new pipeline based on the newly created YAML file from step 1.

Go to Pipelines in Azure DevOps.
Click on the New Pipeline button to create the new pipeline.
Choose the repos type (Azure DevOps Git in our example)
Select the right repositiory if you have mulitple repos
Select Existing Azure Pipelines YAML file.
Select the branch and then the new YAML file under Path.
Save it
Optionally rename the Pipeline name to give it a more readable name (often the repository name will be used as a default name)

3) Add Build Validation to branch

Now that we have the new YAML pipeline, we can use it as a Build Validation. The example shows how to add them in an Azure DevOps repository.

In DevOps go to Repos in the left menu.
Then click branches to get all branches.
Now hover you mouse above the first branch and click on the 3 vertical dots.
Click Branch policies
Click on the + button in the Build Validation section.
Select the new pipeline created in step 2 (optionally change the Display name)
Click on the Save button

Repeat these steps for all branches where you need the extra check. Don't add them on feature branches because it will also prevent you doing manual changes in these branches.

Adding the Build Validation as a Branch Policy

4) Testing

Now its time to perform a 'valid' and an 'invalid' pull request. You will see that the validation will first be queued. So this extra validation will take a little extra time, especially when you have a busy agent. However you can just continue working and wait for the approval or even press Set auto-complete to automatically complete the Pull Request when all approvals and validations are validated.

If the validation fails you will see the error message and then you can click on it to also find the two regular Write-Host lines (60 and 61) with the list of branches and the branch names with there branch order number. This should help you to figure out what mistake your made.

Invalid pull request

For the 'legal' pull request you will first see the validation being queued and then you will see the green circle with the check mark. You can click on it to see why this was a valid Pull request.

Valid pull request

Conclusions

In this post you learned how to create Build Validations for your Pull Requests in Azure DevOps. In this case a simple script to compare the Source and Target branch from a Pull Request. In a second example, coming online soon, we will validate the naming conventions within a Synapse Workspace (or ADF) during a Pull Request. This will prevent your colleagues (you are of course always following the rules) from not following the naming conventions because the Pull Request will automatically fail.

Sneak preview

Special thanks to colleague Joan Zandijk for creating the initial version of the YAML and Powershell script for the branch check.

Saturday, 1 April 2023

Synapse - Cleanup workspace after deployment

Case
A while ago we created a script to clean up the Synapse Workspace before deploying new pipelines, datasets, etc. to Synapse. This helps you to remove old parts like for example triggers because the Synapse deployment is incremental and does not do any deletes. That script works fine, however for workspaces with a lot of items it could take up to 30 minutes to clean up everything. In that post we already mentioned some future improvements...

Cleanup Synapse Workspace after deployment

Update: use DeleteArtifactsNotInTemplate: true in deployment task to avoid powershell

Solution
This version of the cleanup script compares the items in the Synapse Workspace against the Artifact that you just deployed. Everything that is in the workspace but not in the artifact will now be deleted (afterwards). This will significantly shorten the cleanup period during deployment.

You need to add this YAML step to your Synapse Deployment pipeline (preferably after the Deployment step) and change the path of the PowerShell script to your setup. The first parameter is the Synapse Workspace Name of the environment you are deploying to. The second one is the corresponding Resource Group Name. The last one is the location of the artifact. In this case the Pipeline.Workspace variable + "s\Synapse" (see screenshot below of the treeview on the agent).

###################################
# 5 Cleanup Synapse
###################################
- task: AzurePowerShell@5
  displayName: '5 Cleanup Synapse'
  inputs:
	azureSubscription: ${{ parameters.ServiceConnnection }}
	scriptType: filePath
	scriptPath: $(Pipeline.Workspace)\s\CICD\Powershell\ClearSynapse.ps1
	scriptArguments:
	  -WorkspaceName ${{ parameters.TargetWorkspaceName }} `
	  -ResourceGroupName ${{ parameters.ResourceGroupName }} `
	  -ArtifactDirectory $(Pipeline.Workspace)\s\Synapse
	azurePowerShellVersion: latestVersion
	pwsh: true

Showing the path of the Artifact root folder

Save the PowerShell Script below in your repository and change the path in the above YAML script. We like to have a CICD folder in the root to store everything deployment related.

Repos folder structure

param (
   [Parameter (Mandatory = $true, HelpMessage = 'Synapse name')]
   [ValidateNotNullOrEmpty()]
   [string] $WorkspaceName,
   
   [Parameter (Mandatory = $true, HelpMessage = 'Resourcegroup name')]
   [ValidateNotNullOrEmpty()]
   [string] $ResourceGroupName,
   
   [Parameter (Mandatory = $true, HelpMessage = 'Artifact Directory')]
   [ValidateNotNullOrEmpty()]
   [string] $ArtifactDirectory
)

# Two default Linked Services we cannot remove
[string] $WorkspaceDefaultSqlServer = "$($WorkspaceName)-WorkspaceDefaultSqlServer"
[string] $WorkspaceDefaultSqlStorage = "$($WorkspaceName)-WorkspaceDefaultStorage"

# A little dummy protection to check whether you have configured the right folder
# If these folders are not available there is probably something wrong and we
# don't want to delete everything in the workspace.
if (!(Test-Path -Path (Join-Path $ArtifactDirectory "integrationRuntime")) -AND
    !(Test-Path -Path (Join-Path $ArtifactDirectory "linkedService")) -AND 
    !(Test-Path -Path (Join-Path $ArtifactDirectory "pipeline")))
    {
        Write-Output "Artifact folder $($ArtifactDirectory) "
        throw "Dummy protection - Probably not the right folder that stores your artifact"
    }


#######################################################
# 1) Checking for resource locks and removing them
#######################################################
Write-Output "==========================================="
Write-Output "1) Removing resource locks"
Write-Output "==========================================="

# Getting all locks on the Azure Synapse Workspace
$lock = Get-AzResourceLock -ResourceGroupName $ResourceGroupName -ResourceName $WorkspaceName -ResourceType "Microsoft.Synapse/workspaces"
Write-Output "Found $($lock.Count) locks"

# Check if the collection of Azure resource locks is not emtpy
if($null -ne $lock)
{
    # Looping through all resource locks to remove them one by one
    $lock | ForEach-Object -process {
        # Remove lock
        Write-Output "Removing Lock Id: $($lock.LockId)"
        Remove-AzResourceLock -LockId $_.LockId -Force
    }
}
Write-Output "Step 'Removing resource locks' completed`r`n"



#######################################################
# 2) Stop and remove Triggers not in Artifact
#######################################################
Write-Output "==========================================="
Write-Output "2) Stop and remove Triggers not in Artifact"
Write-Output "==========================================="

# Check if the artifact contains any triggers
if (Test-Path -Path (Join-Path $ArtifactDirectory "trigger"))
{
    # Getting all Triggers from Artifact
    $ArtifactTriggers = Get-ChildItem -Path (Join-Path $ArtifactDirectory "trigger") -Filter "*.json" | Select-Object -ExpandProperty BaseName

    # Getting all Triggers from Synapse
    $SynapseTriggers = Get-AzSynapseTrigger -WorkspaceName $WorkspaceName | Select-Object -ExpandProperty Name

    # Getting Triggers from Synapse that are not in the Artifact
    $Triggers = Compare-Object -ReferenceObject $ArtifactTriggers -DifferenceObject $SynapseTriggers | Select-Object -ExpandProperty InputObject
} else {
    # Fill collection with all existing triggers to remove them
    # because artifact doesn't contain any triggers anymore
    Write-Output "Path not found in Artifact, removing all existing Triggers in Synapse Workspace"
    $Triggers = Get-AzSynapseTrigger -WorkspaceName $WorkspaceName | Select-Object -ExpandProperty Name
}
Write-Output "Found $($Triggers.Count) Triggers that are not in the Artifact"

# Stopping all Triggers before deleting them
$Triggers | ForEach-Object -process { 
    Write-Output "Stopping Trigger $($_)"
    try {
        # Trying to stop each Trigger
        Stop-AzSynapseTrigger -WorkspaceName $WorkspaceName -Name $($_) -ErrorAction Stop
    }
    catch {
        if ($_.Exception.Message -eq "{}")
        {
            # Ignore failures for triggers that are already stopped
            Write-Output "Trigger stopped"
        }
        else {
            # Unexpected error
            Write-Output "Something went wrong while stopping trigger!"
            Throw $_
        }
    }
    # Remove trigger
    Write-Output "Removing Trigger $($_)"
    Remove-AzSynapseTrigger -Name $_ -WorkspaceName $WorkspaceName -Force
}
Write-Output "Step 'Stop and remove Triggers not in Artifact' completed`r`n"



#######################################################
# 3) Remove Pipelines not in Artifact
#######################################################
Write-Output "==========================================="
Write-Output "3) Remove Pipelines not in Artifact"
Write-Output "==========================================="

# Check if artifact contains any pipelines
if (Test-Path -Path (Join-Path $ArtifactDirectory "pipeline"))
{
    # Getting all Pipelines from Artifact
    $ArtifactPipelines = Get-ChildItem -Path (Join-Path $ArtifactDirectory "pipeline") -Filter "*.json" | Select-Object -ExpandProperty BaseName

    # Getting all Pipelines from Synapse
    $SynapsePipelines = Get-AzSynapsePipeline -WorkspaceName $WorkspaceName | Select-Object -ExpandProperty Name

    # Getting Pipelines from Synapse that are not in the Artifact
    $Pipelines = Compare-Object -ReferenceObject $ArtifactPipelines -DifferenceObject $SynapsePipelines | Select-Object -ExpandProperty InputObject
} else {
    # Fill collection with all existing pipelines to remove them 
    # because artifact doesn't contain any pipelines anymore
    Write-Output "Path not found in Artifact, removing all existing Pipelines in Synapse Workspace"
    $Pipelines = Get-AzSynapsePipeline -WorkspaceName $WorkspaceName | Select-Object -ExpandProperty Name
}
Write-Output "Found $($Pipelines.Count) Synapse Pipelines that are not in the Artifact"

# Trying to delete all pipelines. If a pipeline is still referenced
# by an other pipeline it will continue to remove other pipelines 
# before trying to remove it again... max 100 times. So don't create
# chains of pipelines that are too long
[int] $depthCount = 0
while ($Pipelines.Count -gt 0 -and $depthCount -lt 100)
{
    # Loop through collection of pipelines and try to remove them
    $Pipelines | ForEach-Object -process { 
        Write-Output "Trying to delete pipeline $($_)"
        Remove-AzSynapsePipeline -Name $_ -WorkspaceName $WorkspaceName -Force -ErrorAction SilentlyContinue
    }

    # Wait 2 seconds before retry and raise retry counter
    Start-Sleep -Seconds 2
    $depthCount += 1

    # Check if artifact contains any pipelines
    if (Test-Path -Path (Join-Path $ArtifactDirectory "pipeline"))
    {   
        # Getting all Pipelines from Synapse
        $SynapsePipelines = Get-AzSynapsePipeline -WorkspaceName $WorkspaceName | Select-Object -ExpandProperty Name
        
        # Getting Pipelines from Synapse that are not in the Artifact
        $Pipelines = Compare-Object -ReferenceObject $ArtifactPipelines -DifferenceObject $SynapsePipelines | Select-Object -ExpandProperty InputObject
    }  else {
        # Fill collection with all existing pipelines to remove them 
        # because artifact doesn't contain any pipelines anymore
        $Pipelines = Get-AzSynapsePipeline -WorkspaceName $WorkspaceName | Select-Object -ExpandProperty Name
    }
    
    # Check if there are any pipelines left for a retry
    if ($Pipelines.count -gt 0)
    {
        Write-Output "Still found $($Pipelines.Count) Synapse Pipelines that are not in the Artifact. Starting next iteration."
    } else {
        Write-Output "Deletion of Pipelines not in Artifact completed"
    }
}
# Error when you have circulair pipeline links or just way to many levels
if ($depthCount -eq 100)
{
    throw "Too many levels of child pipelines or circulair relations!"
}
Write-Output "Step 'Remove Pipelines not in Artifact' completed`r`n"



#######################################################
# 4) Remove Notebooks not in Artifact
#######################################################
Write-Output "==========================================="
Write-Output "4) Remove Notebooks not in Artifact"
Write-Output "==========================================="

# Check if artifact contains any pipelines
if (Test-Path -Path (Join-Path $ArtifactDirectory "notebook"))
{
    # Getting all Notebooks from Artifact
    $ArtifactNotebooks = Get-ChildItem -Path (Join-Path $ArtifactDirectory "notebook") -Filter "*.json" | Select-Object -ExpandProperty BaseName

    # Getting all Notebooks from Synapse
    $SynapseNotebooks = Get-AzSynapseNotebook -WorkspaceName $WorkspaceName | Select-Object -ExpandProperty Name

    # Getting Notebooks from Synapse that are not in the Artifact
    $Notebooks = Compare-Object -ReferenceObject $ArtifactNotebooks -DifferenceObject $SynapseNotebooks | Select-Object -ExpandProperty InputObject
} else {
    # Fill collection with all existing notebooks to remove them 
    # because artifact doesn't contain any notebooks anymore
    Write-Output "Path not found in Artifact, removing all existing Notebooks in Synapse Workspace"
    $Notebooks = Get-AzSynapseNotebook -WorkspaceName $WorkspaceName | Select-Object -ExpandProperty Name
}    
Write-Output "Found $($Notebooks.Count) Synapse Notebooks that are not in the Artifact"

# Loop through collection of Notebooks to delete them
$Notebooks | ForEach-Object -process {
    Write-Output "Deleting Notebook $($_)"
    Remove-AzSynapseNotebook -Name $($_) -WorkspaceName $WorkspaceName -Force
}
Write-Output "Step 'Remove Notebooks not in Artifact' completed`r`n"



#######################################################
# 5) Remove SQL-Scripts not in Artifact
#######################################################
Write-Output "==========================================="
Write-Output "5) Remove SQL-Scripts not in Artifact"
Write-Output "==========================================="

# Check if artifact contains any SQL Scripts
if (Test-Path -Path (Join-Path $ArtifactDirectory "sqlscript"))
{
    # Getting all SQL-scripts from Artifact
    $ArtifactSQLScripts = Get-ChildItem -Path (Join-Path $ArtifactDirectory "sqlscript") -Filter "*.json" | Select-Object -ExpandProperty BaseName

    # Getting all SQL-scripts from Synapse
    $SynapseSQLScripts = Get-AzSynapseSqlScript -WorkspaceName $WorkspaceName | Select-Object -ExpandProperty Name

    # Getting SQL-scripts from Synapse that are not in the Artifact
    $SQLScripts = Compare-Object -ReferenceObject $ArtifactSQLScripts -DifferenceObject $SynapseSQLScripts | Select-Object -ExpandProperty InputObject
} else {
    # Fill collection with all existing SQL Scripts to remove them 
    # because artifact doesn't contain any SQL Scripts anymore
    Write-Output "Path not found in Artifact, removing all existing SQL-Scripts in Synapse Workspace"
    $SQLScripts = Get-AzSynapseSqlScript -WorkspaceName $WorkspaceName | Select-Object -ExpandProperty Name
} 
Write-Output "Found $($SQLScripts.Count) Synapse SQL-Scripts that are not in the Artifact"

# Loop through collection of SQL scripts to delete them
$SQLScripts | ForEach-Object -Process {
    Write-Output "Deleting SQL-script $($_)"
    Remove-AzSynapseSqlScript -Name $($_) -WorkspaceName $WorkspaceName -Force
}
Write-Output "Step 'Remove SQL-Scripts not in Artifact' completed`r`n"



#######################################################
# 6) Remove Datasets not in Artifact
#######################################################
Write-Output "==========================================="
Write-Output "6) Remove Datasets not in Artifact"
Write-Output "==========================================="

# Check if artifact contains any datasets
if (Test-Path -Path (Join-Path $ArtifactDirectory "dataset"))
{
    # Getting all Datasets from Artifact
    $ArtifactDatasets = Get-ChildItem -Path (Join-Path $ArtifactDirectory "dataset") -Filter "*.json" | Select-Object -ExpandProperty BaseName

    # Getting all Datasets from Synapse
    $SynapseDatasets = Get-AzSynapseDataset -WorkspaceName $WorkspaceName | Select-Object -ExpandProperty Name

    # Getting Datasets from Synapse that are not in the Artifact
    $Datasets = Compare-Object -ReferenceObject $ArtifactDatasets -DifferenceObject $SynapseDatasets | Select-Object -ExpandProperty InputObject
} else {
    # Fill collection with all existing Datasets to remove them 
    # because artifact doesn't contain any Datasets anymore
    Write-Output "Path not found in Artifact, removing all existing Datasets in Synapse Workspace"
    $Datasets = Get-AzSynapseDataset -WorkspaceName $WorkspaceName | Select-Object -ExpandProperty Name
}
Write-Output "Found $($Datasets.Count) Synapse Datasets that are not in the Artifact"

# Loop through collection of Datasets to delete them
$Datasets | ForEach-Object -process { 
    Write-Output "Deleting Dataset $($_)"
    Remove-AzSynapseDataset -Name $_ -WorkspaceName $WorkspaceName -Force
}
Write-Output "Step 'Remove Datasets not in Artifact' completed`r`n"



#######################################################
# 7) Remove Linked Services not in Artifact
#######################################################
Write-Output "==========================================="
Write-Output "7) Remove Linked Services not in Artifact"
Write-Output "==========================================="

# Check if artifact contains any Linked Services
if (Test-Path -Path (Join-Path $ArtifactDirectory "linkedService"))
{
    # Getting all Linked Services from Artifact
    $ArtifactLinkedServices = Get-ChildItem -Path (Join-Path $ArtifactDirectory "linkedService") -Filter "*.json" | Where-Object {($_ -NotLike "*WorkspaceDefaultSqlServer.json" -and  $_ -NotLike "*WorkspaceDefaultStorage.json") } | Select-Object -ExpandProperty BaseName
    
    # Getting all Linked Services from Synapse
    $SynapseLinkedServices = Get-AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and  $_.Name -ne $WorkspaceDefaultSqlStorage) } | Select-Object -ExpandProperty Name

    # Getting Linked Services from Synapse that are not in the Artifact
    $LinkedServices = Compare-Object -ReferenceObject $ArtifactLinkedServices -DifferenceObject $SynapseLinkedServices | Select-Object -ExpandProperty InputObject
} else {
    # Fill collection with all existing Linked Services to remove them 
    # because artifact doesn't contain any Linked Services anymore
    Write-Output "Path not found in Artifact, removing all existing Linked Services in Synapse Workspace"
    $LinkedServices = Get-AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and  $_.Name -ne $WorkspaceDefaultSqlStorage) } | Select-Object -ExpandProperty Name
}
Write-Output "Found $($LinkedServices.Count) Synapse Linked Services that are not in the Artifact"

# Trying to delete all linked services. If a linked service is still
# referenced by an other linked service it will continue to remove 
# other linked services before trying to remove it again... 
# max 100 times. Example: KeyVault linked services
$depthCount = 0
while ($LinkedServices.Count -gt 0 -and $depthCount -lt 100)
{
     # Loop through collection of Linked Services and try to remove them
    $LinkedServices | ForEach-Object -process { 
        Write-Output "Trying to delete Linked Service $($_)"
        Remove-AzSynapseLinkedService -Name $_ -WorkspaceName $WorkspaceName -Force -ErrorAction Continue
    }

    # Wait 2 seconds before retry and raise retry counter
    Start-Sleep 2 
    $depthCount += 1

    # Check if artifact contains any Linked Services
    if (Test-Path -Path (Join-Path $ArtifactDirectory "linkedService"))
    {
        # Getting all Linked Services from Synapse
        $SynapseLinkedServices = Get-AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and  $_.Name -ne $WorkspaceDefaultSqlStorage) } | Select-Object -ExpandProperty Name

        # Getting Linked Services from Synapse that are not in the Artifact
        $LinkedServices = Compare-Object -ReferenceObject $ArtifactLinkedServices -DifferenceObject $SynapseLinkedServices | Select-Object -ExpandProperty InputObject
    } else {
        # Fill collection with all existing Linked Services to remove them 
        # because artifact doesn't contain any Linked Services anymore
        $LinkedServices = Get-AzSynapseLinkedService -WorkspaceName $WorkspaceName | Where-Object {($_.Name -ne $WorkspaceDefaultSqlServer -and  $_.Name -ne $WorkspaceDefaultSqlStorage) } | Select-Object -ExpandProperty Name
    }
    
    # Check if there are any Linked Services left for a retry
    if ($LinkedServices.count -gt 0)
    {
        Write-Output "Still found $($LinkedServices.Count) Synapse Linked Services that are not in the Artifact. Starting next iteration."
    } else {
        Write-Output "Deletion of Linked Services not in Artifact completed"
    }
}

# Error when you have circulair Linked Services links or just way to many levels
if ($depthCount -eq 100)
{
    throw "Too many levels of references to other Linked Services!"
}
Write-Output "Step 'Remove Linked Services not in Artifact' completed"
Write-Output "==========================================="

There is a check at the start that will check whether it can find some expected sub folders in the supplied artifact path. Otherwise it will asume that you don't have any datasets, pipelines, etc. and cleanup your entire Synapse Workspace.

If nothing has to be cleaned/removed the script is ready within seconds.

No cleanup necessary

If items do have to be removed it is still ready in a few minutes instead of half an hour. Notice the iterations for cleaning up the pipelines. It will try to delete a pipeline, but if it is still used by an other pipeline then it will continue with the next pipeline and try again afterwards. The same construction is used for Linked Services where for example a Key Vault Linked Service can still be used by an other Linked Service.

Some cleanup was necessary

Conclusions

In this post you learned how to clean your Synapse Workspace much more efficiently by comparing the Workspace and the Artifact. This way you don't have to delete each item in your workspace, but only the ones you deleted during development.

Note that not all parts of Synapse are available in this clean up script. For example KQL scripts and Dataflows are still missing. They will be added later on. If you want to add those yourself make sure they are on the right place within the script Dataflows should probably added before the pipelines and the KQL scripts in front of or after the SQL Scripts. Feel free to let us know if you have any improvements for this script that you would like the share with the community.

Special thanks to colleague Joan Zandijk for helping out.