Microsoft BI Tools: November 2022

Saturday, 12 November 2022

Deploy Azure SQL Databases via a DevOps pipeline

Case
A while ago we did a post about deploying an Azure SQL Database via Azure DevOps using the deployment pipelines. Since that post we mostley used YAML pipelines. So in this post an update on how to accomplish the same with a YAML pipeline.

Deploy Azure SQL DB with DevOps

Solution
The starting point of this post is that you have a Visual Studio database project connected to the DevOps repository.

1) Variable Groups
First we start with some variable groups to parameterize the YAML code. For this example we have one General variable group with variables like the (visual studio) projectname, databasename and service connection (if each environment uses the same service connection). Then there are also variable groups per environment (Development, Test, Acceptance, Production) with variables like the SQL Server name and service connection (if each environment uses a different service connection).

Pipelines > Library > Variable Groups

Whether you need a variable group for DEV depends on how you do your SQL development. Some teams do the development in the central development database and use the compare option in Visual Studio to update the SQL project. In that case you don't want to deploy the project to the DEV database.

Example of variables in variable groups

2) Environments
Second preparation step is the creation of enviroments. This allows you to add approvals and checks for specific environments. You probably want some approvals before you update your production environment. You can reuse these environments for other deployments like ADF, Synapse or Power BI.

Pipelines > Environments

Same note about the DEV environment as in the previous step. If you don't deploy to DEV then you don't need the 'Devploy to DEV' environment.

3) Repos folder structure
Our folder structure in the repos is as follows (if you use a different structure you need to adjust the YAML to it). We have a SQL folder with a subfolder for our Visual Studio database project. The name of the project/folder is stored in the variable group (see step 1). We also have a CICD folder in the root where we store everything that is deployment related. For SQL deployment we only have a subfolder called YAML for the pipelines, but for other deployments we also have a subfolder for PowerShell scripts.

Folder structure of repos

In the YAML folder we have a BuildSQL.yml and this is the main pipeline file that you will be using to add a new pipeline. It calls DeploySQL.yml once for each environment.

4) Build pipeline
Lets explain the YAML of the build pipeline. It first starts by adding the general variable group to the pipeline so that you are able to use those variables from step 1.

Next is the trigger that shows you when the pipeline will be started. In this example it listens to changes in multiple branches, but only when those changes happen in the SQL folder. This part will be different per branche strategy.

Then we have the stages. The first stage is de build stage where the artifact is created and published for later use. In the Job part we have two noteworthy items: workspace clean will first clean your agent before starting. This is handy if you have a self-hosted agent instead of a Microsoft hosted agent. The agent pool that will be used is the second important part. Name is voor self-hosted agents and vmImage is for Microsoft hosted agents.

Then the steps withing the first stage.

Checkout will retrieve the repository content to your agent. We need the visual studio project on the agent to build it.
The treeview step is an optional debug step to show you where the files are on your agent after the checkout. It makes it easier to specify the paths in the next steps. You can remove it or comment it out when your are ready with your pipeline.
The MSBuild step will build your SQL project and create a dacpac file which you need for deployment. This is equals to building your project within Visual Studio. If you have a self-hosted agent you will need to either install Visual Studio or msbuild on your VM.
The Copy step copies the generated dacpac file to a stage folder, but also copies the publish file with settings for publishing your project.
The last step is to publish the copied files as an artifact

The last stages of this YAML file are copies of each other and then slitly changed for the enviroment specific settings. You can add or remove stages if you have more or less environments.

In these stages you first see that the environment specific variable group is added to the pipeline and it shows which agent will be used for the deployment. If your Azure SQL server only uses private endpoints then the self-hosted agent is required because the Microsoft hosted agents won't be able to connect to your server.

The condition part in this example is optional and depends on the branch strategy. Remove that line if you don't use multiple branches for deployment. The last part of these stages is calling the second YAML file and passing values from the specifi variable group as parameters to the second YAML file.

###################################
# General Variables
###################################
variables:
  - group: SQLParamsGen


###################################
# When to create a pipeline run
###################################
trigger:
  branches:
    include:
    - Development
    - Acceptance
    - main
  paths:
    include:
    - SQL/*

stages:
###################################
# Create DacPac Artifcat
###################################
- stage: CreateSQLArtifact
  displayName: Create SQL Artifact (dacpac)

  jobs:
  - job: Job
    displayName: 'Build DacPac'
    workspace:
      clean: all
    pool:
      name: DataPlatform
      # vmImage: 'windows-latest' 
    steps:

    ###################################
    # 1 Retrieve Repository
    ###################################
    - checkout: self
      displayName: '1 Retrieve Repository'
      clean: true

    ###################################
    # 2 Show treeview of agent
    ###################################
    - powershell: |
        Write-Output "Dit is de tree omgeving"
        tree "$(Pipeline.Workspace)" /F
      displayName: '3 Treeview Workspace and ArmTemplateOutput content '

    ###################################
    # 3 Build Visual Studio project
    ###################################
    - task: MSBuild@1
      displayName: '3. Creating Artifact'
      inputs:
        solution: '$(System.DefaultWorkingDirectory)/SQL/$(SqlProjectName)/$(SqlProjectName).sqlproj'
        msbuildArchitecture: x86
        msbuildVersion: latest

    ###################################
    # 4 Stage dacpac and publish
    ###################################
    - task: CopyFiles@2
      displayName: '4. Copy Artifact'
      inputs:
        contents: |
          **\*.dacpac
          **\*.publish.xml
        TargetFolder: '$(build.artifactstagingdirectory)'

    ###################################
    # 5 Publish dacpac and xml artifact
    ###################################
    - task: PublishPipelineArtifact@1
      displayName: '5. Publish Artifact'
      inputs:
        targetPath: '$(Build.ArtifactStagingDirectory)'
        artifact: 'SQL_Dacpac'
        publishLocation: 'pipeline'



###################################
# Deploy Dev environment
###################################
- stage: DeployDev
  displayName: Deploy DEV
  variables:
    - group: SQLParamsDev
  pool:
    name: DataPlatform
    # vmImage: 'windows-latest'
  condition: eq(variables['Build.SourceBranchName'], 'Development')
  jobs:
    - template: DeploySQL.yml
      parameters:
        env: DEV
        ServiceConnection: $(ServiceConnection)
        SqlServerName: $(SqlServerName)
        SqlDatabaseName: $(SqlDatabaseName)
        SqlProjectName: $(SqlProjectName)



###################################
# Deploy Acc environment
###################################
- stage: DeployAcc
  displayName: Deploy ACC
  variables:
    - group: SQLParamsAcc
  pool:
    name: DataPlatform
    # vmImage: 'windows-latest'
  condition: eq(variables['Build.SourceBranchName'], 'Acceptance')
  jobs:
    - template: DeploySQL.yml
      parameters:
        env: ACC
        ServiceConnection: $(ServiceConnection)
        SqlServerName: $(SqlServerName)
        SqlDatabaseName: $(SqlDatabaseName)
        SqlProjectName: $(SqlProjectName)



###################################
# Deploy Prd environment
###################################
- stage: DeployPrd
  displayName: Deploy PRD
  variables:
    - group: SQLParamsPrd
  pool:
    name: DataPlatform
    # vmImage: 'windows-latest'
  condition: eq(variables['Build.SourceBranchName'], 'Main')
  jobs:
    - template: DeploySQL.yml
      parameters:
        env: PRD
        ServiceConnection: $(ServiceConnection)
        SqlServerName: $(SqlServerName)
        SqlDatabaseName: $(SqlDatabaseName)
        SqlProjectName: $(SqlProjectName)

5) Deploy pipeline
The second YAML file does the deployment of the dacpac to your Azure SQL Database. It starts with the YAML parameters. You can specify allowed values or default values for your parameters.

The job part specifies which enviroment you want to use (see step 2) and then the steps of the deployment:

The checkout retrieves to files from the repos to the agent. For this example it is not necessary, but if you perform some extra steps like executing a PowerShell then you need this to get the PowerShell file to your agent.
The treeview step is an optional debug step to show you where the artifact files are on your agent. You can remove it or comment it out when you don't need it anymore.
The SqlAzureDacpacDeployment@1 is the actual deployment step for the dacpac and it also adds the publish profile file. If you don't use a publish profile then remove that line. Note that if you are using a self-hosted agent then you need to install SqlPackage.exe on that VM for this step.

###################################
# Parameters
###################################
parameters:
  - name: env
    displayName: Enviroment
    type: string
    values: 
    - DEV
    - ACC
    - PRD
  - name: ServiceConnection
    displayName: Service Connection
    type: string
  - name: SqlServerName
    displayName: Sql Server Name
    type: string
  - name: SqlDatabaseName
    displayName: Sql Server Name
    type: string
  - name: SqlProjectName
    displayName: Sql Project Name
    type: string


jobs:
  - deployment: deploymentjob${{ parameters.env }}
    displayName: Deployment Job ${{ parameters.env }} 
    environment: Deploy to ${{ parameters.env }}
    strategy:
      runOnce:
        deploy:
          steps:
          ###################################
          # 1 Check out repository to agent
          ###################################
          - checkout: self
            displayName: '1 Retrieve Repository'
            clean: true 
 
          ###################################
          # 2 Show environment and treeview
          ###################################
          - powershell: |
              Write-Output "Deploying ${{ parameters.SqlProjectName }} to DB ${{ parameters.SqlDatabaseName }} and server ${{ parameters.SqlServerName }} in the ${{ parameters.env }} environment"
              tree "$(Pipeline.Workspace)" /F
            displayName: '2 Show environment and treeview Pipeline_Workspace'

          ###################################
          # 3 Deploy DacPac
          ###################################             
          - task: SqlAzureDacpacDeployment@1
            displayName: '3 Deploy DacPac' 
            inputs:
              azureSubscription: '${{ parameters.ServiceConnection }}'
              AuthenticationType: 'servicePrincipal'
              ServerName: '${{ parameters.SqlServerName }}.database.windows.net'
              DatabaseName: '${{ parameters.SqlDatabaseName }}' 
              deployType: 'DacpacTask'
              DeploymentAction: 'Publish'
              DacpacFile: '$(Pipeline.Workspace)/SQL_Dacpac/SQL/${{ parameters.SqlProjectName }}/bin/debug/${{ parameters.SqlProjectName }}.dacpac'
              PublishProfile: '$(Pipeline.Workspace)/SQL_Dacpac/SQL/${{ parameters.SqlProjectName }}/${{ parameters.SqlProjectName }}.publish.xml'
              IpDetectionMethod: 'AutoDetect'

6) Azure DevOps Service Connection

Make sure you have a DevOps Service Connection (type Azure Resource Manager) and also make sure that your Azure SQL Server uses Azure Active Directory authentication (see your server in the Azure Portal and then go to Azure Active Directory in the left menu to check it).

Azure DevOps Project Service Connection

The Service Principal behind this Service Connection should be either a member of the Azure Active Directory group that is specified under Azure Active Directory admin on that page or...

Azure Active Directory authentication with AAD group

... or you can add the Service Principal as an user with a TSQL command

-- Add your Azure AD Service Principal as external to SQL DB
CREATE USER [YourServicePrincipal] FROM EXTERNAL PROVIDER;

-- Add db_owner role to your Service Principal
ALTER ROLE [db_owner] ADD MEMBER [YourServicePrincipal]

If you don't have a Service Principal then it is also possible to use a SQL database user. Please see the documentation of the Azure SQL Database deployment v1 task for that.

7) Testing
Now go to pipelines and add a new YAML pipeline based on an existing YAML file in your repository and select the BuildSQL.yml file. Then either run it manually or change something in your database project in the repos to trigger pipeline.

Run your new SQL Deployment pipeline

Conclusion

In this post you learned how to deploy your Azure SQL Databases with a YAML pipeline in Azure DevOps. Note that if you don't have public endpoints on your Azure SQL Server that you will need a self-hosted agent in your Azure subscription (a Virtual Machine with Devops software) because the Microsoft hosted agents will use public endpoints. This will be described a next post.

Update when you get a Get-SpnAccessToken is obsolete warning during deployment then check this solution.

Monday, 7 November 2022

Create Data Lake containers and folders via DevOps

Case
I need a process to create Azure Data Lake containers throughout my DTAP environment of my Azure Data Platform. Manually is not an option because we want to minimize owner and contributor access to the Data Lake of acceptance and production, but Synapse and Data Factory don't have a standard activity to create ADL containers. How to automatically create Azure Data Lake Containers (and folders) ?

Storage Account (datalake) containers

Solution
An option is to use a PowerShell script that is executed by the Custom activity in combination with an Azure Batch service. Or an Azure Automation runbook with the same PowerShell script that is executed by a Web(hook) activity.

However since you probably don't need create new containers during every (ADF/Synapse) pipeline run, we suggest to do this via an Azure Devops Pipeline as part of your CICD proces with the same PowerShell script. You could either create a separte CICD pipeline for it or integrate it in your Synapse or ADF pipeline.

The example below creates containers and optionaly also folders and subfolders within these container. Synapse and Data Factory will create folders with forexample the Copy Data activity

1) Repos folder structure

For this example we use a CICD folder in the repos with subfolders for PowerShell, YAML and Json.

Repos folder structure

2) JSON config

Because we don't want to hardcode the containers and folders we use a JSON file as input for the PowerShell script. This JSON file is stored within the JSON folder of the DevOps Repository. We use the same JSON file for the entire environment, but you can ofcourse create a separate file for each environment if you need for example different containers on production. Our file is called config_storage_account.json

The folder array in this example is optional and when left empty no folders will be created. You can create subfolders within folders by separating them with a forwardslash.

{
"containers":   {
                "dataplatform":["folder1","folder2/subfolder1","folder2/subfolder2"]
                , "SourceX":["Actual","History"]
                , "SourceY":["Actual","History"]
                , "SourceZ":[]
                }
}

3) PowerShell code
The PowerShell script called SetStorageAccounts.ps1 is stored in the PowerShell folder and contains three parameters:

ResourceGroupName - The name of the resource group where the storage account is located.
StorageAccountName - The name the storage account
JsonLocation - The location of the json config file in the repos (see previous step)

It checks the existance of both the config file and the storage account. Then first loop through the containers from the config and within the container loop it loops through the folders of that specific container. For container names and folderpaths it does some small corrections for often made mistakes.

Note that the script will not delete containers and folders (or set authorizations to them). This is of course possible, but make sure to test this very thoroughly and even with testing a human error in configuring the config file is easy to make and could cause lots of data lose!

# This PowerShell will create the containers provided in the JSON file
# It does not delete of update containers and folders or set authorizations
param (
    [Parameter (Mandatory = $true, HelpMessage = 'Resource group name of the storage account.')]
    [ValidateNotNullOrEmpty()]
    [string] $ResourceGroupName,

    [Parameter (Mandatory = $true, HelpMessage = 'Storage account name.')]
    [ValidateNotNullOrEmpty()]
    [string] $StorageAccountName,

    [Parameter (Mandatory = $true, HelpMessage = 'Location of config_storage_account.json on agent.')]
    [ValidateNotNullOrEmpty()]
    [string] $JsonLocation
 )

# Combine path and file name for JSON file. The file name is hardcoded and the
# same for each environment. Create an extra parameters for the filename if
# you need different files/configurations per environment.
$path = Join-Path -Path $JsonLocation -ChildPath "config_storage_account.json"
Write-output "Extracting containernames from $($path)"


# Check existance of file path on the agent
if (Test-Path $path -PathType leaf) {
    
    # Get all container objects from JSON file
    $Config = Get-Content -Raw -Path $path | ConvertFrom-Json

    # Create containers array for looping
    $Config | ForEach-Object { 
        $Containers = $($_.containers) | Get-Member -MemberType NoteProperty | Select-Object -ExpandProperty Name
    }
    
    # Check Storage Account existance and get the context of it
    $StorageCheck = Get-AzStorageAccount -ResourceGroupName $ResourceGroupName -Name $StorageAccountName -ErrorAction SilentlyContinue        
    $context = $StorageCheck.Context

    # If Storage Account found
    if ($StorageCheck) {
        # Storage Account found
        Write-output "Storage account $($StorageAccountName) found"

        # Loop through container array and create containers if the don't exist
        foreach ($container in $containers) {
            # First a little cleanup of the container

            # 1) Change to lowercase
            $container = $container.ToLower()

            # 2) Trim accidental spaces
            $container = $container.Trim()


            # Check if container already exists
            Write-output "Checking existence of container $($container)"
            $ContainerCheck = Get-AzStorageContainer -Context $context -Name $container -ErrorAction SilentlyContinue 

            # If container exists
            if ($ContainerCheck) {
                Write-Output "Container $($container) already exists"
            }
            else {
                Write-Output "Creating container $($container)"
                New-AzStorageContainer -Name $container -Context $context | Out-Null
            }

            # Get container folders from JSON
            Write-Output "Retrieving folders from config"
            $folders = $Config.containers.$container
            Write-Output "Found $($folders.Count) folders in config for container $($container)"

            # Loop through container folders
            foreach ($folder in $folders) {
                # First a little cleanup of the folders

                # 1) Replace backslashes by a forward slash
                $path = $folder.Replace("\","/")

                # 3) Remove unwanted spaces
                $path = $path.Trim()
                $path = $path.Replace("/ ","/")
                $path = $path.Replace(" /","/")

                # 3) Check if path ends with a forward slash
                if (!$path.EndsWith("/")) {
                    $path = $path + "/"
                }
                    
                # Check if folder path exists
                $FolderCheck = Get-AzDataLakeGen2Item -FileSystem $container -Context $context -Path $path  -ErrorAction SilentlyContinue
                if ($FolderCheck) {
                    Write-Output "Path $($folder) exists in container $($container)"
                } else {
                    New-AzDataLakeGen2Item -Context $context -FileSystem $container -Path $path -Directory | Out-Null
                    Write-Output "Path $($folder) created in container $($container)"
                }
            }

        }
    } else {
        # Provided storage account not corrrect
        Write-Output "Storageaccount: $($StorageAccountName) not available, containers not setup."
    }
} else {
    # Path to JSON file incorrect
    Write-output "File $($path) not found, containers not setup."
}

4) YAML file.

If you integrate this in your existing Data Factory or Synapse YAML pipeline then you only need to add one PowerShell step. Make sure you have a checkout step to copy the config and powershell file from the repository to the agent. You may also want to add a (temporary) treeview step to check the paths on your agent. This makes it easier to configure paths within your YAML code.

parameters:
  - name: SerCon
    displayName: Service Connection
    type: string
  - name: Env
    displayName: Environment
    type: string
    values: 
    - DEV
    - ACC
    - PRD
  - name: ResourceGroupName
    displayName:
    type: string
  - name: StorageAccountName
    displayName:
    type: string

jobs:
    - deployment: deploymentjob${{ parameters.Env }}
      displayName: Deployment Job ${{ parameters.Env }} 
      environment: Deploy to ${{ parameters.Env }}

      strategy:
        runOnce:
          deploy:
            steps:
            ###################################
            # 1 Check out repository to agent
            ###################################
            - checkout: self
              displayName: '1 Retrieve Repository'
              clean: true 

            ###################################
            # 3 Show environment and treeview
            ###################################
            - powershell: |
                Write-Output "Deploying Synapse in the ${{ parameters.Env }} environment"
                tree "$(Pipeline.Workspace)" /F
              displayName: '2 Show environment and treeview Pipeline_Workspace'

            ###################################
            # 3 Create containers in datalake
            ###################################
            - task: AzurePowerShell@5
              displayName: '3 Create data lake containers'
              inputs:
                azureSubscription: ${{ parameters.SerCon }}
                scriptType: filePath
                scriptPath: $(Pipeline.Workspace)\s\CICD\PowerShell\SetStorageAccounts.ps1
                scriptArguments:
                  -ResourceGroupName ${{ parameters.ResourceGroupName }} `
                  -StorageAccountName ${{ parameters.StorageAccountName }} `
                  -JsonLocation $(Pipeline.Workspace)\s\CICD\Json\
                azurePowerShellVersion: latestVersion
                pwsh: true

5) The result

Now it's time to run the YAML pipeline and check the Storage Account to see wether the containers and folders are created.

DevOps logs of creating containers and folders

Created data lake folders in container

Conclusion

In this post you learned how to create containers and folders in the Storage Account / Data Lake via a little PowerShell script and a DevOps pipeline, but you can also reuse this PowerShell script in for the mentioned alternative solutions.

Again the note about also deleting containers and folders. Make sure to double check the code, but also the procedures to avoid human errors and potenially loose a lot of data. You might want to setup soft deletes in your storage account to have a fallback scenario for screwups.