Saturday, 23 May 2020

Create and deploy a Python Azure Function

Case
I want to create an Azure Function with Python code. How do I create and deploy one in Azure?
Write Python in Visual Studio Code to create an Azure Function











Solution
In this blogpost we will create and deploy a very simple 'hello world' Azure Function with an HTTP trigger which you can extend to your own needs. After that we can use the Azure Data Factory pipeline with an Azure Function activity to execute it.

As an alternative you could also create an Azure Function with a Blob Storage trigger that executes when a new file arrives, but we rather want to use that same trigger type to start an Azure Data Factory pipeline that then starts this Function followed by other pipeline activities. This way we have one place that does the triggering/orchestration.

1) Create new Azure Function project
Please first follow the steps of our previous post on how to prepare Visual Studio Code for creating Azure Functions with Python. After that open Visual Studio code and perform the steps below to create your first hello world Function.
  • In Visual Studio code click on the Azure icon in the left menu.
  • In the newly opened pane click on the folder with the lightning icon on it to create a new project. (An additional function can later-on be added to the project with the Lightning-plus icon)
  • Select the folder of the new project (or use the Browse... option)
  • Next select Python as coding language
  • Select the Python interpreter. In our example it is py 3.8.3 (if it's not in the list you need to browse to python.exe on your device)
  • Select HTTP trigger as the template for this example
  • Enter the Function name. This is the name of the function within the project (that can contain multiple functions)
  • For this test example use anonymous as Authorization level
  • The project has been created, but continue with the next step below with some additional actions before you can start coding
Create new Azure Function project















2) Select interpreter and install Linter pylint
To finish the creation of the project we need to select the location of the interpreter and install a linter.
  • Click on the popup in the bottom left corner to select the interpreter. 
  • On the top in the middle you can now select the interpreter. Select the one in the .venv folder which is a subfolder of your project.
  • Next step is to install Linter pylint by clicking on the install button on the new popup in the bottom left corner.
  • Now wait a few seconds for the installation to finish
Select interpreter and install Linter pylint















3) Code in __init__.py
The file '__init__.py' contains your Python code. Below you see the standard / generated code with some extra comment lines for if you are new to Python. For this example we do not extend the code.
# Import module for logging purposes
import logging

# Import module for Azure Functions and give it an alias
import azure.functions as func

# Main function and entry point of this Azure Function
def main(req: func.HttpRequest) -> func.HttpResponse:
    # Log information
    logging.info('Python HTTP trigger function processed a request.')

    # Retrieve parameter 'name' from querystring
    name = req.params.get('name')
    # If not found try to retrieve it from the request body
    if not name:
        try:
            # Check if there is a request body
            req_body = req.get_json()
        except ValueError:
            # On failure do nothing
            pass
        else:
            # On success try to retrieve name from request body
            name = req_body.get('name')

    # If a name was found then response with 'Hello [name]'
    if name:
        return func.HttpResponse(f"Hello {name}!")
    else:
        # If a name was not found response with an error message
        return func.HttpResponse(
             "Please pass a name on the query string or in the request body",
             status_code=400
        )

4) Debug locally
Now we are going to test the Azure Function locally on our Windows device. There are multiple ways to start debugging. Pressing F5 is probably the easiest. See animated gif for more options.
  • In the Run menu on the top of the screen you will find the Start Debugging option. 
  • The terminal pane on the bottom will show a lot of details. Wait a few seconds for it to finish and click (while also pressing CTRL) on the green URL.
  • A new browser window will open and it shows the error output that it cannot find the name parameter.
  • In the browser add a querystring after the URL: ?name=Joost (or your own name of course). Now it will respond with a greeting
  • Close the browser and then hit the disconnect icon on top to stop debugging
Debugging your function locally














5) Create Azure Function in Azure Portal
Before you can deploy your newly created function you first need to create an Azure Function in the Azure portal.

  • Go to the Azure Portal and click on Create a resource
  • Search for function and select Function App
  • Click on the Create button
  • On the Basics tab you find the most important settings
  • Select your Subscription and Resource Group
  • Enter an unique Function App name
  • Select Python as Runtime stack
  • Select the Python Version (3.8 in our example)
  • Select the Region (probably the same as your Resource Group)
  • Optionally go to the Hosting tab for extra settings
  • Choose a new or existing Storage account
  • Change the Plan type (default: Serverless)
  • Optionally go to the Monitoring tab for extra settings
  • Disable or enable Application insights and change its name
  • Click the Review + create button
  • Review the settings and click on the Create button

Create new Azure Function (app) on Azure portal


















Note 1: you cannot create an Azure Function with a linux worker (python) if there is already a Windows worker (C#) in that same resource group and with the same region. You will then get an error: LinuxWorkersNotAllowedInResourceGroup - Linux workers are not available in resource group bitools. Use this link to learn more https://go.microsoft.com/fwlink/?linkid=831180. Also see the Azure Function documentation. Summary: don't mix C# and Python functions within the same resource group.

Note 2: you could also perform these steps within Visual Studio Code during deployment.

6) Deploy Azure Function to Azure Portal
Now that we have an (empty) Azure Functions app in the Azure portal we can deploy our newly created Azure Function to this resource.
  • In Visual Studio code click on the Azure icon in the left menu.
  • In the newly opened pane click on the blue arrow (deploy) icon
  • In the drop down select your Azure Functions App from the previous step
Deploy Azure Functions from Visual Studio Code














7) Testing in portal
Now that we have deployed our project to Azure Functions we can test it in the Azure Portal. For this example we will use the post method.
  • Go to the Azure Portal and then open you Azure Functions App
  • In the left menu click on Functions
  • In the list of functions click on your function (only one in this example)
  • In the left menu click on Code + Test
  • Click on the test button (top center)
  • Change the HTTP method to post
  • Select one of the keys
  • Enter a JSON message in the body: {"name":"Joost"} (name=case-sensitive)
  • Click on the Run button and see the result
Testing in the Azure Portal

















7) Executing from Azure Data Factory
Now if you want to execute this new Azure Function in Azure Data Factory with the Azure Function Activity you can follow the steps in this previous post. However without code changes it will return an error stating that the response is invalid: 3603 - Response Content is not a valid JObject
3603 - Response Content is not a valid JObject
















At the moment it is returning a so called JArray, but it is expecting a JObject (J = JSON). Any other return types than JObject will throw the error above. To overcome this we need two code changes. First we need to import the JSON module by adding: import json at the top of the code with the other imports. Then we need to adjust the return of the Hello world message. See changes after lines 4 and 30.
# Import module for logging purposes
import logging

# import json module to return a json message
import json

# Import module for Azure Functions and give it an alias
import azure.functions as func

# Main function and entry point of this Azure Function
def main(req: func.HttpRequest) -> func.HttpResponse:
    # Log information
    logging.info('Python HTTP trigger function processed a request.')

    # Retrieve parameter 'name' from querystring
    name = req.params.get('name')
    # If not found try to retrieve it from the request body
    if not name:
        try:
            # Check if there is a request body
            req_body = req.get_json()
        except ValueError:
            # On failure do nothing
            pass
        else:
            # On success try to retrieve name from request body
            name = req_body.get('name')

    # If a name was found then response with 'Hello [name]'
    if name:
        #return func.HttpResponse(f"Hello {name}!")
        message = {'message': f"Hello {name}!"}
        return json.dumps(message)
    else:
        # If a name was not found response with an error message
        return func.HttpResponse(
             "Please pass a name on the query string or in the request body",
             status_code=400
        )

Below a couple of screenshots on how to configure and test this in Azure Data Factory. You might want to store the Function key in Azure Key Vault to avoid keys in your ETL/ELT code.
Set up the Azure Function Activity in Azure Data Factory

















After configuring the Azure Function activity you can hit the debug button and see the result. This output could then be used as input for successive pipeline activities
Successfully executing and getting the response


















Conclusion
First a big thank you to colleague Jasper Diefenbach for helping me out with the Python stuff. In this blog post you learned how to create, test and deploy your first (very basic) Azure Function App with Python code. Then we also showed you how to execute this from Azure Data Factory. In a couple of follow up posts we will show you how to build some useful functions for DWH projects and show you some technical stuff like adding Azure Key Vault to the game. Also check out the C# version of this blogpost.


Thursday, 21 May 2020

Setup Visual Studio code for Azure Functions

Case
I want to create Azure Functions on my Windows device, but which tools and extensions do I need to install?
Write Python or C# in Visual Studio code to create Azure Functions









Solution
In this blogpost we will show you which tools you need to install to create an Azure Function with either .NET or with Python code. The screenshots are of the current versions at the time of writing, but you might just want to take the latest stable version when downloading. In the upcoming Azure Functions posts we will create some basic Hello World functions to show the basics of creating and deploying your first Function. After that we will show some more useful functions for the Data Warehousing developers. For example to convert Excel or XML files to an easier readable format for Azure Data Factory or Synapse Polybase: CSV.

1) Download and install Visual Studio Code
For this blog post we will be using Visual Studio Code instead of the regular Visual Studio. Where this 'regular' Visual Studio is a so called Integrated Development Environment (IDE), the newer Visual Studio Code is more a lightweight source code editor. Ideal for some coding with PowerShell, C# or Python. Use the link below to download Visual Studio Code and then install it.
https://code.visualstudio.com/download
Installing Visual Studio Code




















2) Install extensions for Python
If you want to use Python for your Azure Functions you need to install Python for Windows and the Python extension for Visual Studio code. First download and install Python for windows 64bit. The default is a 32 bit version, but you can also find the 64 bit version slightly down the page (search for Windows x86-64 executable installer).
https://www.python.org/downloads/windows/
Install Python for Windows 64bit
















Then install the Python extension for Visual Studio code. When clicking on the install button on the website it will ask to open it with Visual Studio Code. Within Visual Studio Code you have to click on the install button again. After installation it will ask to point to the previously installed Python interpreter.
https://marketplace.visualstudio.com/items?itemName=ms-python.python
Install Python extension for Visual Studio code


















Note: You can also use the Extensions icon in the left menu of Visual Studio code to search for this specific extension.

There is one last installation required: Linter Pylint, but Visual Studio Code will ask for it when creating your first Azure Function with Python code: Linter pylint is not installed.
Install Linter pylint for Visual Studio Code







Installing Linter pylint within Visual Studio Code














3) Install extentions for C#
If you want to create Azure Functions with C# then you first need to install .Net Core SDK. Make sure to install the version to Build apps (Run Apps is not sufficient). The minimum version is .NET Framework 4.7.2 or .NET Core 2.2, but try the most recent version depending on your needs and the Runtime version of Azure Functions. If you forget this then you will recieve an error while trying to create a C# Function.
Receive an error when have not installed .NET Core SDK











Furthermore you should install the C# extension from Microsoft. This extension is not mandatory, but will be recommended when creating your first C# Azure Function.
Install C# extension for Visual Studio code














Note: You can also use the Extensions icon in the left menu of Visual Studio code to search for this specific extension.

4) Install extensions for Azure Functions
Next extension is Azure Functions for Visual Studio code. When clicking on the install button on the website it will ask to open it in Visual Studio Code. Within Visual Studio Code you have to click on the install button again.
https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-azurefunctions
Azure Functions for Visual Studio Code














Note: You can also use the Extensions icon in the left menu of Visual Studio code to search for this specific extension.

5) Install Azure Functions Core Tools
To make you able to debug the Azure Function code locally we need Azure Functions Core Tools, but to install that we first need to install Node Package Manager (NPM) which is included in nodejs (more detailed info here).
https://nodejs.org/en/download/
Install NodeJS with NPM




















Last step of this installation is to open a Command Promt (or PowerShell promt) in Administrator mode to install Azure Functions Core Tools. With the command npm -v you can check your npm version. Now use the following command for the installation (more detailed info here):
1
npm i -g azure-functions-core-tools@3 --unsafe-perm true
Install Azure Functions Core Tools via command prompt















Conclusion
In this introduction post you read which tools and extensions to install to create Azure Functions. Quite a lot installations, but manageable when following the steps above. We focused on the most popular languages (in the DWH scene) C# and Python, but there are way more languages to choose from like Java(script) or PowerShell. Each with its own extensions.

As mentioned before the next post about Azure Functions will be about deploying your first simple function with Python or C#. After that we will focus on the more functional Azure Functions solutions, but with a focus on the DWH scene. Also bringing Azure Key Vault to the game is a must for Azure Functions.

Sunday, 3 May 2020

Use Managed Service Identity for Synapse PolyBase

Case
I a previous PolyBase example, to read data from the storage account, we stored the secret in Synapse. Is it possible to use Managed Identity instead of storing secrets in Synapse? However I can not find the Managed Identity of  my SQL Server.
Azure Synapse Analytics with PolyBase reading Azure Storage Account














Solution
You can use a Managed Identity, but there are two requirements. First this only works with 'StorageV2 (general purpose v2)'. 'BlobStorage' or 'Storage (general purpose v1)' will not work! Secondly you need to register your SQL Server that hosts Synapse in your Active Directory. This will allow you to select your SQL Server within the Access control (IAM).

1) Create Storage Account
Create an Azure Storage Account and make sure the type is StorageV2 (general purpose v2). The storage account of this example is called 'bitoolsstorage' and it has a container called 'mycontainer'. You can choose your own names, but these names will be used in the example code.
Bijschrift toevoegen















2) Create Synapse
Create a Synapse Data Warehouse including a SQL Server to host it. Our SQL Server is called 'bitoolssynapseserver' and our Synapse SQL Pool (data warehouse) is called 'synapsedwh'. Again choose your own names and change those in the example code below.
Synapse SQL Pool (data warehouse)












3) Register SQL Server in AD
Next step is to register the SQL Server that hosts your Synapse DWH in the Active Directory. This will allow you to find your SQL Server in the next step as a Managed Identity. At the moment of writing this needs to be done via PowerShell and cannot be done via the portal.

We will be using Cloud Shell (PowerShell in the portal), but you can also use PowerShell (ISE) on your Windows device but then you have to execute two extra commands (login and select subscription).

  • Click on the Cloud Shell icon in the upper right corner (next to the searchbox). This will start PowerShell in the portal. If this is the first time using it you first need to connect it to an Azure Storage Account.
  • Then execute the Set-AzSqlServer command. The first parameter is the resource group where SQL Server is located. The second parameter is the name of SQL Server (without .database.windows.net) and the last parameter will assign the Managed Identity.
1
2
# PowerShell
Set-AzSqlServer -ResourceGroupName "Joost_van_Rossum" -ServerName "bitoolssynapseserver" -AssignIdentity
Register SQL Server as Managed Identity
















If you are using PowerShell on your Windows device instead of Cloud Shell then use this code
1
2
3
4
5
6
7
8
9
# PowerShell
# Login to Azure (popup will appear)
Connect-AzAccount
 
# Select your subscription
Select-AzSubscription -SubscriptionId "2c67b23a-4ba2-4273-bc82-274a743b43af"
 
# Assign Managed Identity
Set-AzSqlServer -ResourceGroupName "Joost_van_Rossum" -ServerName "bitoolssynapseserver" -AssignIdentity

4) Storage Blob Data Contributor
Now it's time to give your SQL Server access to the Azure Storage Account. The role we need for this according the documentation is 'Storage Blob Data Contributor', but I also tested it with 'Storage Blob Data Reader' and that works fine as well (since we are only reading data). Note: You need to be owner of the resource (group) to delegate access to others.
  • Go to your Storage Account from step 1
  • Click on Access control (IAM) in the left menu
  • Click on the + Add icon and choose Add role assignment
  • In the Role drop down select 'Storage Blob Data Contributor'
  • Leave the Assign access to drop down unchanged
  • In the Select box start typing the name of your SQL Server
  • Select your SQL Server and click on the Save button
Deligate Access to Managed Identity of SQL Server
















5) Master Key
We are finished in the Azure portal and now its time to start with the actual PolyBase code. Start SQL Server Managed Studio (SSMS), but make sure your Synapse is not paused.

First step is to create a master key to encrypt any secrets, but only if you do not already have one (although we will not use any secrets). You can check that in the table sys.symmetric_keys. If a row exists where the symmetric_key_id column is 101 (or the name column is '##MS_DatabaseMasterKey##') then you already have a master key. Otherwise we need to create one. For Synapse a masterkey password is optional. For this example we will not use the password.
1
2
3
4
5
6
7
8
9
10
--Master key
IF NOT EXISTS (SELECT * FROM sys.symmetric_keys WHERE symmetric_key_id = 101)
BEGIN
    PRINT 'Creating Master Key'
    CREATE MASTER KEY;
END
ELSE
BEGIN
    PRINT 'Master Key already exists'
END


6) Credentials
Next step is to create a credential which will be used to access the Storage Account. For a Managed Identity you don't use secrets:
1
2
3
4
5
--Credential
CREATE DATABASE SCOPED CREDENTIAL bitools_msi
WITH
    IDENTITY = 'Managed Service Identity'
;

Tip:
Give the credential a descriptive name so that you know where it is used for. You can find all credentials in the table sys.database_credentials:
1
2
--Find all credential
SELECT * FROM sys.database_credentials


7) External data source
With the credential from the previous step we will create an External data source that points to the Storage Account and container where your file is located. Execute the code below where:
  • TYPE = HADOOP (because PolyBase uses the Hadoop APIs to access the container)
  • LOCATION = the connection string to the container in your Storage Account starting with abfss.
  • CREDENTIAL = the name of the credentials created in the previous step.
1
2
3
4
5
6
7
--Create External Data Source
CREATE EXTERNAL DATA SOURCE bitoolsstorage_abfss
WITH (
    TYPE = HADOOP,
    CREDENTIAL = bitools_msi
);

Tip:
Give the external source a descriptive name so that you know where it is used for. You can find all external data sources in the table sys.external_data_sources:
1
2
--Find all external sources
SELECT * FROM sys.external_data_sources

Notice that the filename or subfolder is not mentioned in the External Data Source. This is done in the External Table. This allows you to use multiple files from the same folder as External Tables.


8) External File format
Now we need to describe the format used in the source file. In our case we have a comma delimited file. You can also use this file format to supply the date format, compression type or encoding.
1
2
3
4
5
6
--Create External Data Source
CREATE EXTERNAL FILE FORMAT TextFile
WITH (
    FORMAT_TYPE = DelimitedText,
    FORMAT_OPTIONS (FIELD_TERMINATOR = ',')
);

Tip:
Give the format a descriptive name so that you know where it is used for. You can find all external file formats in the table sys.external_file_formats:
1
2
--Find all external file formats
SELECT * FROM sys.external_file_formats

9) External Table
The last step before we can start quering, is creating the external table. In this create table script you need to specify all columns, datatypes and the filename that you want to read. The filename starts with a forward slash. You also need the datasource from step 7 and the file format from step 8.
1
2
3
4
5
6
7
8
9
10
11
12
--Create External table
CREATE EXTERNAL TABLE dbo.sensordata (
    [Date] nvarchar(50) NOT NULL,
    [temp] INT NOT NULL,
    [hmdt] INT NOT NULL,
    [location] nvarchar(50) NOT NULL
)
WITH (
    LOCATION='/bitools_sample_data_AveragePerDayPerBuilding.csv',
    DATA_SOURCE=bitoolsstorage_abfss, -- from step 7
    FILE_FORMAT=TextFile              -- from step 8
);
Note:
PolyBase does not like columnname headers. It will handle it like a regular data row and throw an error when the datatype doesn't match. There is a little workaround for this with REJECT_TYPE and REJECT_VALUE. However this only works when the datatype of the header is different than the datatypes of the actual rows. Otherwise you have to filter the header row in a subsequent step.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
--Create External table with header
CREATE EXTERNAL TABLE dbo.sensordata2 (
    [Date] DateTime2(7) NOT NULL,
    [temp] INT NOT NULL,
    [hmdt] INT NOT NULL,
    [location] nvarchar(50) NOT NULL
)
WITH (
    LOCATION='/bitools_sample_data_AveragePerDayPerBuilding.csv',
    DATA_SOURCE=bitoolsstorage_abfss,
    FILE_FORMAT=TextFile,
    REJECT_TYPE = VALUE, -- Reject rows with wrong datatypes
    REJECT_VALUE = 1     -- Allow 1 failure (the header)
);
You can find all external tables in the table sys.external_tables.
1
2
--Find all external tables
SELECT * FROM sys.external_tables
However you can also find the External Table (/the External Data Source/the External File Format) in the Object Explorer of SSMS.
SSMS Object Explorer

























10) Query external table
Now you can query the external table like any other regular table. However the table is read-only so you can not delete, update or insert records. If you update the source file then the data in this external table also changes instantly because the file is used to get the data.
1
2
3
--Testing
SELECT count(*) FROM dbo.sensordata;
SELECT * FROM dbo.sensordata;
Quering an external table

























Conclusion
In this post you learned how to give the Managed Identity of SQL Server access to your Storage Account. This saves you some maintenance for the secrets. And you learned how to use PolyBase to read files from that Storage Account using the Managed Identity.




Friday, 1 May 2020

Azure Data Factory - Use Key Vault Secret in pipeline

Case
I want to use secrets from Azure Key Vault in my Azure Data Factory (ADF) pipeline, but only certain properties of the Linked Services can be filled by secrets of a Key Vault. Is it possible to retrieve Key Vault secrets in the ADF pipeline itself?

Using Azure Key Vault for Azure Data Factory













Solution
Yes the easiest way is to use a web activity with a RestAPI call to retrieve the secret from Key Vault. The documentation is a little limited and only shows how to retrieve a specific version of the secret via the secret identifier (a guid). This is not a workable solution for two reasons:
  1. The guid changes each time you change the secret. In probably 99% of the cases you just want to get the latest of version of the secret. This means you need to change that guid in ADF as well when you change a secret.
  2. The guid differs on each environment of your key vault (dev, test, acceptance and production). This makes it hard to use this solution in a multi ADF environment.
For this example we will misuse Key Vault a little bit as a configuration table and retrieve the RestAPI url of a database from the Key Vault. The example assumes you already have a Key Vault filled with secrets. If you don't have that then executes the first two steps of this post.

1) Access policies
First step is to give ADF access to the Key Vault to read its content. You can now find ADF by its name so you don't have to search for its managed identity guid, but using that guid is also still possible.
  • Go to Access policies in the left menu of your Key Vault
  • Click on the blue + Add Access Policy link
  • Leave Configure from template empty
  • Leave Key permissions unselected (we will only use a Secret for this example)
  • Select Get for Secret permissions
  • Leave Certificate permissions unselected (we will only use a Secret for this example)
  • Click on the field of Select principal to find the name of your Azure Data Factory
  • Leave Authorized application unchanged
  • Click on Add and a new Application will appear in the list of Current Access Policies
Add Access policy
















Note: for this specific example we do not need to create a Key Vault Linked Service in ADF.

2) Determine URL of secret
To retrieve the secrets we need the RestAPI URL of that secret. This URL is constructed as
https://{Name Keyvault}.vault.azure.net/secrets/{SecretName}?api-version=7.0

{Name Keyvault} : is the name of the keyvault you are using
{SecretName} : is the secretName

In this example the secretName is "SQLServerURL" and the URL should be looking like this https://{Name Keyvault}.vault.azure.net/secrets/SQLServerURL?api-version=7
Get the SecretName from Key Vault












3) Web activity
Next we have to add the activity ‘web’ into the ADF pipeline. Use the following settings in the settings tab.
  • Set URL to https://{Name Keyvault}.vault.azure.net/secrets/SQLServerURL?api-version=7.0
  • Set Method to Get
  • Under Advanced select MSI
  • And set the resource to https://vault.azure.net

Configuring the Web activity to retrieve the secret


















4) Retrieve value
Now we want to use the secret from the Key Vault in a successive activity, in this case another web activity to upscale a database. In the URL property of this activity we now use the output value from the previous webactivity.
1
@activity('GetURL').output.value
Retrieve output value via expression















5) The result
To check the result of the changes we need to execute the pipeline.
Execute the pipeline to check the result

















Note: if you are using this to retrieve real secrets like passwords and you don't want them to show up in the logging of Azure Data Factory then check the Secure output property on the general tab of your activity.
Don't show secret in logging




















Conclusion
In this blogpost your learned how to retrieve Key Vault secrets in ADF. The trick is to retrieve them by there name instead of by there version Guid. This will always give you the latest version and allows you to use this construction in multiple environments.

Update: ADF now supports Global parameters to store parameters that can be used by all pipelines