Friday, 2 April 2021

ADF Snack - hide activity input and output from log

Case
I'm retrieving sensitive secrets from Azure Key Vault in an Azure Data Factory pipeline and using those values to call an other service. Both the output and the input with the secret value are showing in the Data Factory log. Is there a way to prevent logging sensitive inputs or outputs in ADF?
Secret unintentionally showed in log




















Solution
The solution is very simpel. Almost all activities have a Secure input and Secure output option on the General tab. When checking the Secure input box none of the input will show up in the logs and when checking the Secure output box none of the output will be logged. This is very useful for not logging passwords and other secrets used in your pipeline.






















Now we are running the same pipeline again with the Secure output option turned on and now the secret won't be visible in the log.
No more output

 















And you can of course still use the output in an subsequent activity with an expression. Just be sure to hide the input of that activity. Note that some activities like Set variable don't have the options to hide input and output.
You can still use hidden output



















Summary
In this post you learned how to hide sensitive input and output from activities so that they don't show up in the Azure Data Factory logging. The Key Vault example used in this post is explained here.



Saturday, 6 March 2021

dataMinds Saturday - REST APIs in Data Factory

On Saturday March 13th I will present a session about using REST APIs in Azure Data Factory instead of scripting. Despite of PowerShell being our friend in Azure for a long time. It's not everybody's cup of thee. Most BI developers and probably also a lot of DWH developers rather do anything else before they start writing PowerShell scripts. 

Luckily Microsoft brought Rest APIs to almost every service in Azure. So instead of writing, and maintaining, PowerShell scripts that are easily over a 100 lines of code, you now have one URL to accomplish the same task. In this session I will show you how to handle those REST APIs in Azure Data Factory Pipelines (and they work in Synapse Studio Pipelines as well). Join me via https://datamindssaturday.be/ Download PowerPoint

PowerPoint will be available after the presentation















Demo material

    Monday, 1 March 2021

    Pausing/resuming Dedicated SQL Pools from Synapse

    Case
    In a previous post we described how we can pause and resume our Azure Dedicated SQL pools via Azure Data Factory. The question we are going to solve now: "How can we pause and resume from within an Azure Synapse workspace"?


    Pause / Resume SQL dedicated Pools

























    Solution
    Pausing and resuming from within an Azure Synapse workspace is very similar, but there are some differences because there are two types of Azure Dedicated SQL Pools. When you create it from within the Azure Synapse Workspace then the provider will be 'Microsoft.Synapse' (hosted in Synapse), but when you create it outside the Azure Synapse Workspace then the provider will be 'Microsoft.Sql' (hosted in SQL Server). Both have different Rest APIs and different security roles. In this example we will demo the Synapse hosted version but we will also mention the differences. For the SQL version you could also take a look at our previous post which handles the SQL version from within Data Factory.


    1) Give Azure Synapse Access to SQL Dedicated Pools
    To call the Rest API we need to grant our Azure Synapse workspace access to the Dedicated SQL Pool or more specific to the Synapse hosting it. Perhaps a bit odd, but we need to grant Synapse access to its own resources (for the SQL version you need grant access to the SQL Server hosting the SQL Pool). 

    For the security role you always need to avoid 'Owner' and use 'Contributor'. This is because the Owner role can also change security related items (for the SQL version you can use 'SQL Server Contributor')
    • In the Azure portal go to the Azure Synapse Workspace hosting the SQL Pool that you want to pause or resume
    • In the left menu click on Access control (IAM)
    • Click on Add, Add role assignment
    • In the 'Role' drop down select 'Contributor'
    • In the 'Assign access to' drop down select user, group or service principal.
    • Search for your Synapse Studio name (in our example‘gansdorp’), select it and click on Save

    Grant Synapse Contributor role to SQL Server

















    If you forget this step then you will receive an authorization error while executing your Synapse pipeline.
    2108 Authorization Failed

     












    {"error":
    {"code":"AuthorizationFailed"
    ,"message":"The client 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' does not have authorization to perform action 'Microsoft.Synapse/workspace/resume/action' over scope '/subscriptions/xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/RG_bitools/providers/Microsoft.Synapse/workspace/gansdorp/dedsqlpools' or the scope is invalid. If access was recently granted, please refresh your credentials."}
    }
    

    2) Determine URL
    Now it is almost time to edit your Synapse pipeline. The first step will be adding a Web activity to call the Rest API (for the SQL version you must use this Rest API). Find the operation you want to perform and then find the example URL to construct the new URL.

    Pause compute
    https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Synapse/workspaces/{workspacename}/sqlPools/{DedicatedSQLPoolName}/pause?api-version=2019-06-01-preview

    Resume compute
    https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Synapse/workspaces/{workspacename}/sqlPools/{DedicatedSQLPoolName}/resume?api-version=2019-06-01-preview

    Within these URL's you need to replace all parts that start and end with a curly bracket: {subscription-id}, {resource-group-name}, {workspacename} and {DedicatedSQLPoolName} (including the brackets themselves). 

    Example URL
    https://management.azure.com/subscriptions/aaaa-bbbb-1234-cccc/resourceGroups/RG-DEV/providers/Microsoft.Synapse/workspaces/gansdorp/dedsqlpool/pause?api-version=2019-06-01-preview
    Example Rest API URL










    3) Add Web Activity
    To call the Rest API we will use the Web Activity in the Synapse pipeline. The actual Rest API call is synchronous, which means it waits for it to finish the pause or resume action and then it will return a message. This also means that you don't have to build any checks to make sure it is already online.
    • Add a Web activity to your pipeline and give it a suitable name
    • Go to the Settings tab and use the URL from the previous step in the URL property
    • Choose POST as method
    • Fill in {} as body (we don't need it, but it is required)
    • Choose MSI as authentication method
    • As the last step enter this URL https://management.azure.com/ as Resource

    web activity






















    To first check the current state of the SQL Pool you can use the Get Rest API and then use an If Condition to check its output with an expression like @Activity('Get Current State').Output.Properties.State

    Summary
    In this post you learned how pause and resume your Dedicated SQL Pool within Azure Synapse Studio, the method is very similar to the ADF version. There are some differences, however that is mostly caused by the Service that is hosting the SQL Pool. This could be either a SQL Server or your own Synapse Workspace. It is expected that one of both will probably disappear, because it isn't likely that Microsoft will be maintaining two different services with both their own set op Rest APIs.


    Saturday, 27 February 2021

    DevOps Snack: Change PowerShell version in YAML

    Case
    I'm executing a PowerShell script via my DevOps YAML pipeline. How do I change the PowerShell version from 5 to 7 in de PowerShell@2 task?
    YAML PowerShell Task












    Solution
    In your YAML code add the argument pwsh: true to the inputs (the default value is false). Now the script will be executed by pwsh.exe (PowerShell core) instead of powershell.exe (PowerShell 5). Changing the version only works on Windows agents, because only they can run both PowerShell 5 and PowerShell Core.
    steps:
    - task: PowerShell@2
      displayName: RunWithPs5
      inputs:
        pwsh: false
        filePath: '$(Pipeline.Workspace)\s\PowerShell\GetVersion.ps1'
    
    - task: PowerShell@2
      displayName:  RunWithPs7
      inputs:
        pwsh: true
        filePath: '$(Pipeline.Workspace)\s\PowerShell\GetVersion.ps1'
    
    #Show PowerShell version in PowerShell script GetVersion.ps1
    Write-Host "PowerShell Version"
    $PSVersionTable.PSVersion 
    
    The Result
    Now execute the pipeline and compare both PowerShell tasks. You see two signs that the PowerShell version has changed.
    YAML PowerShell 5 and 7














    Conclusion
    An easy option to change the PowerShell executable from 5 to core. This is very handy if you need certain cmdlets that are only available in PowerShell Core like Test-Json.