Thursday 15 October 2020

Synapse snack: No new Azure SQL Database allowed

Case
I'm creating a new Synapse Analytics Workspace but it throws an error when deploying it: Location '' is not accepting creation of new Windows Azure SQL Database servers at this time.
Location '' is not accepting creation of new
Windows Azure SQL Database servers at this time.
(Code: SqlServerRegionDoesNotAllowProvisioning)










Solution
This doesn't mean that all Azure SQL Server are sold-out in your Azure region and that you have to wait until Microsoft buys more servers. It means SQL Server is not registered as an allowed resource provider on your Azure Subscription.

This will probably not happen very often because people who want to use Synapse have almost certainly used an Azure SQL Server before, but on a new subscription it is not registered by default.

When creating a new Synapse Workspace it does ask to first register the Microsoft.Synapse resource provider, but it doesn't mention that it also requires Microsoft.Sql.
The Synapse resource provider needs to be registered
to this subscription
















Register Resource Providers
The solution is very simple, but it does require owner permissions on the Azure Subscription.
  • In the Azure Portal go to Subscriptions (search for Subscriptions if it's not on the dashboard or in the menu)
  • Click on your Subscription if you have multiple subscriptions to go to the overview page of your subscription
  • In the left menu (at the bottom) click on Resource Providers
  • Search for SQL
  • Select Microsoft.SQL and click on the register button
Register a Resource Provider




















Registering will take a couple of minutes and after that you can add a Synapse Workspace without the above error. One last observation... it doesn't create any (visible) Azure SQL Servers in any of the resource groups.
Deployment of Synapse Workspace succeeded


Tuesday 13 October 2020

Execute pipelines from an other Data Factory Sync

Case
Can I use the Execute Pipeline Activity to execute pipelines from an other Data Factory? Or do we need an other activity for this?
Can you use the Execute Pipeline Activity for this?
























Solution
First of all why would you like to have multiple Data Factories other then for a DTAP-street (Development, Testing, Acceptance and Production). There are several reasons, for example:
  • Different departments/divisions each having their own Data Factory
    For example to prevent changes to your pipelines by users of a different department or to make it easier to split the Azure consumption between two divisions.
  • Different regions for international companies
    Either due legal reasons where for example the data may not leave the European Union
    Or to prevent paying unnecessary outbound data costs when your data is spread over different regions
  • Security reasons
    To prevent others to use the access provided via the Managed Identity of ADF. If you give your ADF access via MSI to an Azure Key Vault or an Azure Storage Account then everybody using that ADF can access that service via ADF.
If you have any other good reasons to use multiple Data Factories please let us know in the comments below.

And although you may have multiple Data Factories you could still use one Data Factory to execute pipelines from a different Data Factory. However you cannot use the Execute Pipeline Activity because it can only execute pipelines within the same Data Factory. 

You can either use the Web Activity or the Web Hook Activity for this. The Web Activity always executes the pipeline asynchronous. This means it does not wait for the result. The Web Hook Activity executes the child pipeline synchronous. Which means it waits until the child pipeline is ready and you could also retrieve the execution result via a call back. In this blog post we will show the synchronous Webhook Activity and in a previous blog post we already showed the asynchronous Web Activity.

1) Give parent access to child via MSI
We will not use a user to execute the pipeline in the child(/worker) Data Factory, but instead we will give the managed identity (MSI) of the parent(/master) Data Factory access to the child(/worker) Data Factory. The minimum role needed is Data Factory Contributor, but you could also use a regular Contributor or Owner (but less is more).
  • Go to the child(/worker) Data Factory (DivisionX in this example)
  • In the left menu click on Access control (IAM)
  • Click on the +Add button and choose Add role assignment
  • Select Data Factory Contributor as Role
  • Use Data Factory as Assign access to
  • Optionally change the subscription
  • Optionally enter a (partial) name of your parent ADF (if you have a lot of data factories)
  • Select your parent ADF and click on the Save button
Give one ADF access to other ADF















2) Add Call Back to Child Pipeline
The parent(/master) pipeline will call the child(/worker) and wait until it receives a call back or until the timeout exceeds. Therefor we need to add a call back activity (in the form of a Web/Webhook activity) to the child pipeline. This activity should be the last activity in your pipeline. There can be more (for example one for success and one for failure), but only the first call back will be handled by the parent.
  • Go to the child(/worker) Data Factory (DivisionX in this example)
  • Open ADF via the Author &Monitor link
  • Open the pipeline that you want to execute from the parent(/master) pipeline
  • Add a String parameter called callBackUri. The value will be automatically provided by the parent
  • Add a String parameter called myInputParam1. The value will be manually provided by the parent
  • Add a Web Activity with the following settings
    • URL: @pipeline().parameters.callBackUri   (to retrieve the pipeline parameter)
    • Method: POST
    • Body: {"Output":{"myOutputParam1":"failed"},"StatusCode":"401"}
      Every status code above 399 will tell the parent that the child failed. This can be a random number, but you could also use the official list to give a little more meaning. An other option is to use the output tag which can contain one or more 'output' parameters that can be read by the parent pipeline. You could for example pass through the error message with the Add dynamic content option.

      If you also want to pass an error description to the parent then you must extend the json message with an error tag {"Output":{"myOutputParam1":"failed"},"StatusCode":"401","error":"ErrorCode":"ParameterError","Message":"Required parameters where not provided"}}
Add two pipeline parameters










Web Activity for callback 













3) Determine URL
This solution will call the Create Run RestAPI of ADF to execute the pipeline. For this you need to replace the marked parts of the URL below by the Subscription ID, Resource Group name, Data Factory name and the Pipeline name of the child(/worker) pipeline. We will use this URL in the next step.

Example URL:
https://management.azure.com/subscriptions/aaaaaa-bbbb-1234-cccc-123456789/resourceGroups/DivisionX/providers/Microsoft.DataFactory/factories/DevisionX-ADF/pipelines/MyChildPipeline/createRun?api-version=2018-06-01


4) Webhook Activity
So for this second example we will use the Webhook Activity. This will execute the pipeline of the child(/worker) pipeline, but now it will wait until it receives a call back or until the timeout exceeds. Besides a status it can also retrieve messages from the child(/worker) ADF. See the json message in step two.
  • Go to your master ADF and click on Author & Monitor
  • Create a new pipeline and add a Webhook Activity to the canvas of the new pipeline
  • Give it a suitable descriptive name on the General Tab
  • Go to the Settings tab and enter the URL of the previous step
  • Choose POST as Method
  • Add a new header called Content-Type and with value application/json
  • As Body enter a JSON message. This could either be a dummy message or you could supply parameters in this message. The child parameter is called myParam1: {"myInputParam1":"bla bla"}
  • Use MSI as Authentication method
  • Enter this URL as Resource: https://management.core.windows.net/
Webhook Activity calling a child pipeline in an other ADF























5) Testing
Now trigger the new parent pipeline and check the monitor of the child ADF. You will see it receives two parameters. One provided in the JSON message in de BODY property and one provided by the Webhook activity itself.
Child pipeline - two parameters










Note that the child pipeline did not fail, because we handled the error. However the parent pipeline did fail because we sent a status code higher than 399. Next check the monitor of the child ADF and see the output parameters of the Webhook activity. It received a value from the child pipeline that could be used in a next activity
Parent pipeline - the result of the callback















Summary
In this blog post you learned how to give one ADF access to an other ADF and how to execute pipelines in that other ADF. The Webhook activity solution gives you a little more control compared to the Web activity. And with the correct json messages you can pass through messages and parameters from the parent to the child and back to the parent.

.

Saturday 10 October 2020

Snack: Create Azure Automation Runbook

Case
How do you write and execute PowerShell code in Azure?

Azure Automation PowerShell Runbook



















Solution
There are several options to execute PowerShell code in Azure. For example in an Azure Function or in an Azure DevOps pipeline. This post shows you how to write and execute PowerShell code in an Azure Automation Runbook. Other blog posts with PowerShell solutions for DWH projects will link to this blogpost (to don't repeat ourselves). You can find them here.

Note: the screenshots and animated gifs could be slightly outdated on the next layout change of the Azure portal, but we will try to occasionally update those.

1) Create Azure Automation Account
To create a PowerShell runbook we first we need to create an Automation Account. If you already have one with the Run As Account enabled then you can skip this step. The Run As Account allows you to login and easily interact with other Azure Services.
  • Go to the Azure portal and create a new resource
  • Search for automation
  • Select Automation Account
  • Choose a useful name for the Automation Account (it will probably host multiple runbooks)
  • Select your Subscription, Resource Group and the Region
  • Most examples will use the Azure Run As account. So make sure to enable it and then click on the Create button.
Create Azure Automation Account


















2) Add Modules
Before you start writing code you often first need to add some PowerShell modules to your Azure Automation Account. For this example we will add a PowerShell module called Az.Sql which for example is used to up- and downscale Azure SQL databases. Note that modules often depend on other modules like this one depends on Az.Accounts. Make sure to add those modules first, but you will be notified if an other module is required. Try to avoid the outdated AzureRm modules and use the Az modules instead (you cannot mix them).

If you forget this step you will get error messages while running your code that state that some of your commands are not recognized:
Get-AzSqlDatabase : The term 'Get-AzSqlDatabase' is not recognized as the name of a cmdlet, function, script 
file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct 
and try again.
  • Go to the newly created Azure Automation Account
  • Go to Modules in the left menu
  • Click on the Browse Gallery button
  • Search for Az.Accounts
  • Click on Az.Accounts in the result and import this module
  • Also search for Az.Sql (but wait until Az.Accounts is actually imported)
  • Click on Az.Sql in the result and import this module
Adding a new module to your Automation Account


















Note: if you are using an existing Automation Account then you probably already added Az.Accounts, but you might need to update it to a newer version.

2b)  Update Modules
Sometimes you need to update one or more modules within your automation account. Not sure why, but unfortunately the removed this option from the Azure Automation gui! More information here. The new way is to import PowerShell code to a new runbook and then run that runbook to update all your modules (or manually delete all the modules one by one and then add them again). 
  • Go to github and click on the little triangle on the Code button to download the zip file
  • Unzip the downloaded file
  • Go to runbooks and click on Import a runbook
  • Import the local file Update-AutomationAzureModulesForAccount.ps1
  • Give the runbook a description (Name and Runbook type are automatically populated)
  • Edit the runbook and hit the testpanel
  • Enter values for the parameters, besides ResourceGroupName and AutomationAccountName fill in the AzureModuleClass with Az (default value is AzureRm)
  • Hit the start button and check
Update all modules













3) Create Runbook
Now we are ready to create a PowerShell runbook in the newly created Azure Automation Account and start writing some PowerShell code.
  • Go back to the overview page of your newly created Azure Automation Account
  • Click on Runbooks in the left menu
  • Click on the + Create a runbook button to create a new Runbook
  • Enter a descriptive name for the Runbook like UpDownScaleDb
  • Select PowerShell as Runbook type
  • Add a short description to explain the purpose your code and click on the Create button
Create PowerShell Runbook



















4) Edit Runbook code
Next edit the new Runbook if it wasn't already opened by the previous step and start writing (or pasting) code in the editor. It often exists of three parts: Parameters, Login and the actual code. Make sure not to store secrets in you code, but use Azure Key Vault instead. Want to send a notification from within your runbook, then use SendGrid.
# PowerShell example code for testing
Param
(
    # Get your name
    [Parameter(Mandatory=$true,Position=1)]
    [ValidateLength(1,50)]
    [string]$Name
)

Write-Output "Hello $($Name)"

5) Testing
It is often easier to write and test the main code first in PowerShell ISE or Visual Studio Code on your local machine which is much faster and easier to debug. But to test the complete code including parameters and the login you can test your code within the runbook editor. Note that it can take over a minute before the code executes since it will first enter an execution queue.
  • Click on the Test pane button above your script.
  • Then optionally fill in the parameter values on the left side
  • And then hit the Start button to execute the script.
Testing your runbook


















6) Scheduling Runbook
If you want to schedule the execution of your runbook in Azure Automation you first need to publish it via the Runbook editor. After it has been published you can add a schedule to this runbook.
  • Edit the script in the runbook editor
  • Click on publish (the editor will close and you will be redirected to the overview page)
  • In the overview page click on Link to schedule
  • In the Schedule menu you can select an existing schedule or create a new one
  • In the Parameter menu you can provide the value for the parameters
Add schedule to runbook


















7) Add Webhook
If you don't want to schedule your Runbook, but call it from an other service like Azure Data Factory, you have to create a Webhook for ADF

Make sure to choose a correct expire date. By default it is only valid for one year. Also make sure to copy the URL, because you can only see and copy it once. You need this URL in ADF for the Web(hook) activity. Don't share the URL because it is a URL and Password in one.
Adding a Webhook to a Runbook


















Summary
In this post a general explanation on how to create and use an Azure Automation PowerShell Runbook. The real solutions are available here and from now on those posts about Azure Automation will link to this post for the general steps.






Friday 2 October 2020

Pause all/an Azure Synapse SQL Pool(s) (Az)

Case
I want to schedule a pause of my Azure Synapse SQL Pools to save some money on my Azure bills. Back in 2017 when Synapse was still called SQL Data Warehouse we wrote a post with the AzureRM modules which are now outdated. How does it work with AZ?
Pause and resume Synapse SQL Pools
























Solution
For this example we will have two scripts. The first to stop all Azure Synapse SQL Pools within an Azure subscription. This is especially handy for Development, Test or Acceptance environments where you only turn on the Synapse SQL Pools when you need to develop or test something. When you forget to pause them afterwards this script will pause them all on a scheduled moment. The second script is for pausing (or resuming) a specific Synapse SQL Pool. Probably more suitable for a production environment where you don't want to ruthlessly pause all Synapse SQL Pools.

Note: All basics to create your first Azure Automation Runbook can be found here. Combine that with the code below.

1) Modules
If you want to use this script in an Azure Automation Runbook you first need to add the module Az.Sql and that module first wants the module Az.Accounts to be installed. If you already added AzureRM modules to your Azure Automation account then that is no problem. As long as you only don't mix them up in your runbooks. On your own PC you cannot have both installed. You first need to remove all AzureRm modules before you can add Az modules. 
  • Go to your Azure Automation account in the Azure Portal
  • Click on Modules in the left menu
  • Click on Browse Gallery (not on Add a module)
  • Search for Az.Sql and click on it
  • Next click on Import and then on the Ok button
Note 1: If you haven't installed Az.Account then it will ask you to do that first.
Note 2: It takes a few minutes to import a module
Adding a module

















2) Login
If you want to run the script in PowerShell ISE then you first need to login and select your subscription if you have multiple subscriptions.
# PowerShell code
# Login to Azure (browser popup will appear)
Connect-AzAccount -Confirm

# Optional: select your subscription
Set-AzContext -SubscriptionName "mysubscription"
Login to Azure with PowerShell ISE





















In your Runbook you need to add the following code instead which uses your Run as Account to login. Please read this blogpost for all details.
# PowerShell code
########################################################
# Log in to Azure with AZ (standard code)
########################################################
Write-Verbose -Message 'Connecting to Azure'

# Name of the Azure Run As connection
$ConnectionName = 'AzureRunAsConnection'
try
{
    # Get the connection properties
    $ServicePrincipalConnection = Get-AutomationConnection -Name $ConnectionName        
 
    'Log in to Azure...'
    $null = Connect-AzAccount `
        -ServicePrincipal `
        -TenantId $ServicePrincipalConnection.TenantId `
        -ApplicationId $ServicePrincipalConnection.ApplicationId `
        -CertificateThumbprint $ServicePrincipalConnection.CertificateThumbprint 
}
catch 
{
    if (!$ServicePrincipalConnection)
    {
        # You forgot to turn on 'Create Azure Run As account' 
        $ErrorMessage = "Connection $ConnectionName not found."
        throw $ErrorMessage
    }
    else
    {
        # Something else went wrong
        Write-Error -Message $_.Exception.Message
        throw $_.Exception
    }
}
########################################################

3) The pause all script
Below the above script we need to add the following script which first gets all Azure SQL Servers and then checks whether they also host a Synapse SQL Pool. When one is found it checks the status. If it is still Online (active) then it will pause it. This will take several minutes. Afterwards it will recheck the status and show how long it took to pause. Feel free to add more checks and let us know in the comments what you added.
# PowerShell code
# Get all SQL Servers to check whether they host a Synapse SQL Pool
$allSqlServers = Get-AzSqlServer

# Loop through all SQL Servers
foreach ($sqlServer in $allSqlServers)
{
    # Log which SQL Servers are checked and in which resource group
    Write-Output "Checking SQL Server [$($sqlServer.ServerName)] in Resource Group [$($sqlServer.ResourceGroupName)] for Synapse SQL Pools"
    
    # Get all databases from a SQL Server, but filter on Edition = "DataWarehouse"
    $allSynapseSqlPools = Get-AzSqlDatabase -ResourceGroupName $sqlServer.ResourceGroupName `
                                           -ServerName $sqlServer.ServerName `
                                           | Where-Object {$_.Edition -eq "DataWarehouse"}
    # Loop through each found Synapse SQL Pool
    foreach ($synapseSqlPool in $allSynapseSqlPools)
    {
        # Show status of found Synapse SQL Pool
        # Available statuses: Online Paused Pausing Resuming
        Write-Output "Synapse SQL Pool [$($synapseSqlPool.DatabaseName)] found with status [$($synapseSqlPool.Status)]"
        
        # If status is online then pause Synapse SQL Pool
        if ($synapseSqlPool.Status -eq "Online")
        {
            # Pause Synapse SQL Pool
            $startTimePause = Get-Date
            Write-Output "Pausing Synapse SQL Pool [$($synapseSqlPool.DatabaseName)]"
            $resultsynapseSqlPool = $synapseSqlPool | Suspend-AzSqlDatabase

            # Show that the Synapse SQL Pool has been pause and how long it took
            $endTimePause = Get-Date
            $durationPause = NEW-TIMESPAN –Start $startTimePause –End $endTimePause
            $synapseSqlPool = Get-AzSqlDatabase -ResourceGroupName $sqlServer.ResourceGroupName `
                                                -ServerName $sqlServer.ServerName `
                                                -DatabaseName $synapseSqlPool.DatabaseName
            Write-Output "Synapse SQL Pool [$($synapseSqlPool.DatabaseName)] paused in $($durationPause.Hours) hours, $($durationPause.Minutes) minutes and  $($durationPause.Seconds) seconds. Current status [$($synapseSqlPool.Status)]"
        }
    }
}
In PowerShell ISE the result will look this. To see the result of your runbook you need to check the jobs and the the output.
Running the script in PowerShell ISE







4) The pause one script - parameters
To only select one Synapse SQL Pool we need to provide three parameters: Resource Group name, SQL Server name and the name of the SQL Pool. For this we will add parameter code at the beginning of the complete script (above login). You can add parameter validations to make it more monkey proof.
# PowerShell code
<#
    .SYNOPSIS
        Pause an Azure Synapse SQL Pool
    .DESCRIPTION
        By providing the following parameters you can pause one
        specific Azure Synapse SQL Pool. It will only pause when
        the status is 'online'

    .PARAMETER resourceGroupName
        This is the Resource group where Azure Synapse Analytics
        SQL Pool is located

    .PARAMETER sqlServerName
        This is the name of the Azure SQL Server hosting the Azure
        Synapse Analytics SQL Pool
    
    .PARAMETER SynapseSqlPoolName
        This is the name of the Azure Synapse Analytics SQL Pool

#>
Param(
    # This is the Resource group where Azure Synapse Analytics SQL Pool is located   
    [Parameter(Mandatory=$True)]  
    [String] $resourceGroupName
    ,
    # This is the name of the Azure SQL Server hosting the Azure Synapse Analytics SQL Pool
    [Parameter(Mandatory=$True)]  
    [String] $sqlServerName
    ,
    # This is the name of the Azure Synapse Analytics SQL Pool
    [Parameter(Mandatory=$True)]  
    [String] $SynapseSqlPoolName
) 

5) The pause one script
Now the actual pause one script to pause only one SQL Pool. This replaces the script of step 3. If you also want a resume script you just have to replace Pause-AzSqlDatabase by Resume-AzSqlDatabase. and of course chancing some of the texts and if statement (Online => Paused). You could also merge the pause and resume script into one script by adding an extra parameter to indicate what you want to do.
# PowerShell code
# Get one specific Synapse SQL Pool
$synapseSqlPool = Get-AzSqlDatabase -ResourceGroupName $resourceGroupName `
                                    -ServerName $sqlServerName `
                                    -DatabaseName $SynapseSqlPoolName `
                                    | Where-Object {$_.Edition -eq "DataWarehouse"}

# Check if the Synapse SQL Pool can be found with the provided parameters
if ($synapseSqlPool)
{
    # Show status of found Synapse SQL Pool
    # Available statuses: Online Paused Pausing Resuming
    Write-Output "Synapse SQL Pool [$($synapseSqlPool.DatabaseName)] found with status [$($synapseSqlPool.Status)]"

    # If status is online then pause Synapse SQL Pool
    if ($synapseSqlPool.Status -eq "Online")
    {
        # Pause Synapse SQL Pool
        $startTimePause = Get-Date
        Write-Output "Pausing Synapse SQL Pool [$($synapseSqlPool.DatabaseName)]"
        $resultsynapseSqlPool = $synapseSqlPool | Suspend-AzSqlDatabase

        # Show that the Synapse SQL Pool has been pause and how long it took
        $endTimePause = Get-Date
        $durationPause = NEW-TIMESPAN –Start $startTimePause –End $endTimePause
        $synapseSqlPool = Get-AzSqlDatabase -ResourceGroupName $resourceGroupName `
                                            -ServerName $sqlServerName `
                                            -DatabaseName $SynapseSqlPoolName
        Write-Output "Synapse SQL Pool [$($synapseSqlPool.DatabaseName)] paused in $($durationPause.Hours) hours, $($durationPause.Minutes) minutes and  $($durationPause.Seconds) seconds. Current status [$($synapseSqlPool.Status)]"
    }
}
else
{
    Throw "Synapse SQL Pool [$($SynapseSqlPoolName)] not found. Check parameter values."
}

Running the script in PowerShell ISE





Summary
In this post you saw how to pause one or more Synapse SQL Pools to save some money on your Azure bill, but note that the storage costs will continue when you pause Synapse. Next step is to either schedule it within your Azure Automation account or to add a Webhook and execute it from an other service like Azure Data Factory

Besides scripting you can also use the Rest API of Synapse which is particularly easy if you want to pause or resume from within an ADF pipeline, but that will be explored in an other blogpost.