Monday, 23 October 2017

Use PolyBase to read Data Lake in Azure SQL DW

Case
I have a file in an Azure Data Lake Store (ADLS) folder which I want to use in my Azure SQL Data Warehouse. In a previous blog post you used PolyBase to get the data from an Azure Blob Storage container via its access keys. How can I use PolyBase to get the data from ADLS and push the content of that file to Azure SQL DW?
Azure SQL Data Warehouse - PolyBase on ADLS















Solution
In the previous blog post we showed how to read that file from an Azure Blob Storage container via its access keys using PolyBase. However ADLS does not work with those keys, but uses the Azure Active Directory to provide access. To get authorization via Azure Active Directory we need to register a 'Web app / API'  application in Azure Active Directory that does the authorization for us. That sounds very difficult and the documentation on MSDN is not very helpful, but in the end it was quite easy.

a) Starting point
The starting point of this blog post is a file in an ADLS folder called 'mySubFolder'. The file was created in a previous blog post about U-SQL that can quickly process large amounts of data files in Azure Data Lake Analytics (ADLA). The name of our ADLS is 'bitools'.
Starting point: CSV file in ADLS
The content of the CSV file
























a1) App registrations
Go to the Azure portal and search for Azure Active Directory in the search box located in the header. This will bring you to the Azure Active Directory from your subscription. Then click on App registrations in the menu. It will show a list of all existing registrations. Next step is to click on New application registration to create a new registration for our data lake.
New application registration














a2) New application registration
Enter a new descriptive name like 'Data Lake bitools' so you will know where it is used for. Choose 'Web app / API' as Application type and then you need to enter a URL. Since we are not using the Sign-on URL property (we use the Azure sign-on), you can just enter any url like 'https://microsoft-bitools.blogspot.com'. When complete click on the Create button.
New application registration

























a3) Edit application registration - Application ID
Now search your newly created Application registration to get its Application ID. Copy that to a notepad (we need it later on). You can also edit additional properties like giving it a custom logo to make it more recognizable if you have an extensive list of app registrations.
Copy Application ID

















a4) Edit application registration - Keys
Continue editing and click on Keys in the menu to create a new access key. Give it a suitable name and expirationperiod. After clicking Save make sure to copy the generated key to the same notepad as before since you can only get it once! If you lose it you have to delete and recreate it.
Create new key










a5) Active Directory ID
Now go back to your Azure Active Directory to copy the Directory ID. You can find it when you click on Properties in the menu. Copy this ID to the same notepad that now should contain three values (ApplicationID, generated key and DirectoryID).
Get Directory ID






















a6) Setting access root folder
Go to your ADLS and click on Data Explorer. You are now in the root of your ADLS. Click on Access and then on Add to assign new permissions. Search for your Registered Application called 'bitools'. Then select it and click on the Select button. In the root folder we only need Execute permissions. 'Add to' should stay on 'This folder' and 'Add as' should stay on 'An access permission entry'. Click on the Ok button to confirm.
Setting permissions on root folder










If you forget to give Execute permissions to the root folder you will get an error when adding an external table later on:
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:
GETFILESTATUS failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.).
[dbd91a77-1b0a-4f11-9710-d7c1b6b05268][2017-10-21T12:26:47.7748687-07:00]: Error [GETFILESTATUS failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.).
[dbd91a77-1b0a-4f11-9710-d7c1b6b05268][2017-10-21T12:26:47.7748687-07:00]] occurred while accessing external file.'


a7) Setting access sub folder
Now we have to repeat this for our subfolder called 'mySubFolder'. Click on the folder and you should see the source file. Click on Access and then on Add to assign new permissions. Search for your Registered Application called 'bitools'. Then select it and click on the Select button. In this sub folder we need Read and Execute permissions. 'Add to' should be changed to on 'This folder and all children' and 'Add as' should stay on 'An access permission entry'. Click on the Ok button to confirm.
Setting permissions on sub folder










An alternative could be to give the bitools app read and execute rights on the root including all children. That saves you one step, but is less secure if you use your Data Lake for multiple purposes.


b) SSMS
Now its time to start with the actual PolyBase code, but before we start make sure your Azure SQL Data Warehouse is started and use SQL Server Management Studio (SSMS) to connect to your Data Warehouse. Notice that the icon of a SQL DW is different than SQL DB.
Icon SQL DW vs SQL DB
















b1) Master key
In the next step we will use a credential that points to the registered application. To encrypt that credential, we first need to create a master key in our Azure SQL Data Warehouse, but only if you do not already have one. You can check that in the table sys.symmetric_keys. If a row exists where the symmetric_key_id column is 101 (or the name column is '##MS_DatabaseMasterKey##') then you already have a master key. Else we need to create one. For Azure SQL Data Warehouse a masterkey password is optional. For this example we will not use the password.
--Master key
IF NOT EXISTS (SELECT * FROM sys.symmetric_keys WHERE symmetric_key_id = 101)
BEGIN
    PRINT 'Creating Master Key'
    CREATE MASTER KEY;
END
ELSE
BEGIN
    PRINT 'Master Key already exists'
END 

b2) Credentials
Next step is to create a credential which will be used to access the subfolder in ADLS. For this you need the ID's and key from the notepad. The IDENTITY has the following format:
[ApplicationID]@https://login.windows.net/[DirectoryID]/oauth2/token
Replace ApplicationID (including the square brackets) with the ID from step a3 and DirectoryID (including the square brackets) with the ID from a5. The SECRET should be filled with the key from step a4. After setting the correct ID's and key, execute the following code:
--Credential
CREATE DATABASE SCOPED CREDENTIAL bitools_user
WITH
    IDENTITY = 'aaf0ab52-560e-40b1-b4df-caac1f0e5376@https://login.windows.net/3xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxb/oauth2/token',
    SECRET = '6LUnE4shZ4p1jUhj7/fkLH03yfbSxi2WRWre9c0yVTs=';
;

Tip:
Give the credential a descriptive name so that you know where it is used for. You can find all credentials in the table sys.database_credentials:
--Find all credential
SELECT * FROM sys.database_credentials

b3) External data source
With the credential from the previous step we will create an External data source that points to the ADLS folder where your file is located. Execute the code below where:
  • TYPE = HADOOP (because PolyBase uses the Hadoop APIs to access the container)
  • LOCATION = the connection string to ADLS (replace 'bitools' with the name of your own ADLS name).
  • CREDENTIAL = the name of the credentials created in the previous step.
--Create External Data Source
CREATE EXTERNAL DATA SOURCE AzureDataLakeStore
WITH (
TYPE = HADOOP,
LOCATION = 'adl://bitools.azuredatalakestore.net',
CREDENTIAL = bitools_user
);

Tip:
Give the external source a descriptive name so that you know where it is used for. You can find all external data sources in the table sys.external_data_sources:
--Find all external sources
SELECT * FROM sys.external_data_sources

Notice that the filename or subfolder is not mentioned in the External Data Source. This is done in the External Table. This allows you to use multiple files from the same folder as External Tables.


b4) External File format
Now we need to describe the format used in the source file. In our case we have a comma delimited file. You can also use this file format to supply the date format, compression type or encoding.
--Create External Data Source
CREATE EXTERNAL FILE FORMAT TextFile
WITH (
    FORMAT_TYPE = DelimitedText,
    FORMAT_OPTIONS (FIELD_TERMINATOR = ',')
);

Tip:
Give the format a descriptive name so that you know where it is used for. You can find all external file formats in the table sys.external_file_formats:
--Find all external file formats
SELECT * FROM sys.external_file_formats

b5) External Table
The last step before we can start quering, is creating the external table. In this create table script you need to specify all columns, datatypes and the filename that you want to read. The filename starts with a forward slash. You also need the datasource from step 3 and the file format from step 4.
--Create External table
CREATE EXTERNAL TABLE dbo.sensordata (
    [Date] DateTime2(7) NOT NULL,
    [temp] INT NOT NULL,
    [hmdt] INT NOT NULL,
    [location] nvarchar(50) NOT NULL
)
WITH (
    LOCATION='/mySubFolder/bitools_sample_data_AveragePerDayPerBuilding.csv',
    DATA_SOURCE=AzureDataLakeStore, -- from step 3
    FILE_FORMAT=TextFile            -- from step 4
);
Note:
PolyBase does not like columnname headers. It will handle it like a regular data row an throw an error when the datatype doesn't match. There is a little workaround for this with REJECT_TYPE and REJECT_VALUE. However this only works when the datatype of the header is different than the datatypes of the actual rows. Else you have to filter the header row in a subsequent step.
--Create External table with header
CREATE EXTERNAL TABLE dbo.sensordata5 (
    [Date] DateTime2(7) NOT NULL,
    [temp] INT NOT NULL,
    [hmdt] INT NOT NULL,
    [location] nvarchar(50) NOT NULL
)
WITH (
    LOCATION='/mySubFolder/bitools_sample_data_AveragePerDayPerBuilding.csv',
    DATA_SOURCE=AzureDataLakeStore,
    FILE_FORMAT=TextFile,
    REJECT_TYPE = VALUE, -- Reject rows with wrong datatypes
    REJECT_VALUE = 1     -- Allow 1 failure (the header)
);
You can find all external tables in the table sys.external_tables.
--Find all external tables
SELECT * FROM sys.external_tables
However you can also find the External Table (/the External Data Source/the External File Format) in the Object Explorer of SSMS.
SSMS Object Explorer
























b6) Query external table
Now you can query the external table like any other regular table. However the table is read-only so you can not delete, update or insert records. If you update the source file then the data in this external table also changes instantly because the file is used to get the data.
--Testing
SELECT count(*) FROM dbo.sensordata;
SELECT * FROM dbo.sensordata;
Quering an external table
























b7) What is next?
Most likely you will be using a CTAS query (Create Table As Select) to copy and transform the data to an other table since this is the fasted/preferred way in SQL DW. In a subsequent post we will explain more about CTAS, but here is how a CTAS query looks like.
--CTAS
CREATE TABLE [dbo].[Buildings]
WITH
(
    DISTRIBUTION = ROUND_ROBIN
,   CLUSTERED COLUMNSTORE INDEX
)
AS
SELECT  [location]
,       [date]
,       [temp]
,       [hmdt]
FROM    [dbo].[sensordata]
;

In some cases you could also use an SELECT INTO query as an alternative for CTAS.

Summary
In this post you saw how easy it was to read a file from the Azure Data Lake Store and use it as a table in Azure SQL Data Warehouse. Although it did required some extra steps compared to PolyBase on an Azure Blob Storage. Jhon Masschelein (B|L|T) has a very helpful post about this matter.

In an other post we will explain the basic usage of the CTAS query which is the preferred way to handle large sets of data in Azure SQL DW and in its on-premises precursor APS (a.k.a. PDW).

Monday, 16 October 2017

Power BI - Use Bookmarks for Chart information

Case
Do you also regularly get the question from customers about which data the chart shows and what the definitions are of this data? I do. The October update from Power BI Desktop includes Bookmarks (preview). How can you use bookmarks in your dashboard to answer those questions in a user friendly way?

Bookmarks in Power BI 













Solution
Bookmarks enables you to save interesting states of your report. A state can be saved inside your report as a bookmark. These bookmarks can be helpful to tell a story to a customer (like a PowerPoint presentation) and allow you to navigate through the report or to store important analyzes. You can also use a state (bookmark) to use two different visuals for the same data, for example a table and a column chart. Using a button (self-made images), you can switch from visual. In this post we will show you how to combine a chart and related information using bookmarks.

Before we get started, it is important to know that Power BI Desktop is now available in the Windows Store. If you install this via the Windows Store, your Power BI Desktop will always be up to date and you no longer have to manually download and install a new version every month. A big advantage.

Windows Store - Power BI Desktop
















1) Create the Report (images)
First, make sure you have turned on the preview feature 'Bookmarks' in the menu. Go to Options and settings and click on Options. On the left (Global) you see Preview features, click on this. Select the bookmark feature and save your settings. You may have to restart Power BI Desktop.

Power BI Desktop - Turn on preview features




















We use the same report from an earlier post. In this report, we have four visuals and three of them are charts. We want to show some information about the chart 'Total Sales and Profit per Month' in the upper left corner.

First, we insert a self made black question mark image to the upper right corner of the chart. This will be default look of the report (and will be the first bookmark later on). Now we want to show the information of the chart. Therefore we insert another question mark image on the same spot, a gray one. The black question mark is not visible anymore. Last, we insert a text box on the same spot as the chart. The chart will not be visible now. This text box contains information about the report and you can also add a link for additional information. The report now shows us information about the chart (screenshot 2).

Power BI Desktop - Add all the visuals to the report
















2) Create the Bookmarks
We want to create interaction between those two states, because now only the last visuals (with information about the chart) are visible in the report. For this we need the bookmarks.

The first bookmark is the state of the original report, with the chart. Note that this state has a black question mark. For this bookmark, we have to hide the other visuals. In this case, the gray question mark image and the text box with information. Go to the 'View' menu and select the 'Selection Pane'. On the right side the 'Selection' pane will appear. Deselect the gray question mark image and text box in 'Selection'. Now we see the first original state again. We want to save this state by adding a bookmark. Go to 'View' and select the 'Bookmarks Pane'. On the right the 'Bookmarks' pane appears. Click on 'Add' and give the bookmark a suitable name.

The second bookmark must contain the information about the chart. Open the 'Selection Pane' again make the gray question mark image and text box visible again. Now hide the chart 'Total Sales and Profit per Month' and the black question mark image. Save this as a new bookmark.

Finally, when we click on the black question mark, we must ensure that we will be redirected to the information state and then back to the default state (by clicking the gray question mark). Click on the first bookmark called 'Home (overview)' and select black question mark image. you see a new option called 'Link'. Turn this on, choose 'Bookmark' as type and select the bookmark 'Chart 1 information (selected)'. Now select the second bookmark and follow the same steps, but now you select the gray question mark and you link this image to the first bookmark 'Home (overview)'.

Power BI Desktop - Create the bookmarks

















NOTE:
First, you have to create all the visuals in your report and then you can create bookmarks in combination with the selection pane to show or hide those visuals for a specific bookmark. Otherwise, when you first create a bookmark of the original state and then add new visuals for the second bookmark, the first bookmark will also inherit those new visuals.

3) Result
We will now watch the result in the Power BI service. Open the published report and click on the black question mark. The information about the chart is now displayed instead of the chart it self. A new click on the gray question mark will bring you back to the original report with the chart. It works!

Power BI Service - Result

Bookmarks View
There is a new feature in the 'View' menu, the 'Bookmarks pane'. It is below the already existing 'Selection pane'. Turn this feature on and the 'Bookmarks' pane appears on the right. Click on 'View' in 'Bookmarks' and you will see a Bookmark bar below. In this bar you can switch between the bookmarks. Enter the full screen mode and it looks like a PowerPoint presentation with different slides. Cool!

Power BI Service - Bookmarks View













Power BI Desktop
You can also open the bookmarks view in Power BI Desktop to make sure it works before publish the report. Open the 'Bookmarks pane' as before and click on the same 'View' button. It works the same as in the Power BI service.

You can download the Power BI report here.

Conclusion
In this post you saw how to customize your dashboard using bookmarks. This is one of the many ways to use them, besides presentation or storing important analyzes. With this, you do not have to refer your users to another location for associated documentation. Also, what I see in practice as well, an information page (sort of landings page) is not necessary anymore.

Since this is a preview, there will be more features in the next versions. This post describes the updates and improvements related to the bookmarking feature, released in December 2017.

Wednesday, 11 October 2017

Schedule Process Azure Analysis Services database

Case
I have my tabular database hosted in Azure Analysis Services, but how do I process it without on-premises tools or services?
Process Azure Analysis Services



















Solution
One solution could be using some PowerShell code in Azure Automation Runbooks. With only a few lines of code you can process your database.

1) Automation Account
First we need an Azure Automation Account to run the Runbook with our PowerShell code. If you don't have one or want to create a new one, then search for Automation under Monitoring + Management and give it a suitable name, then select your subscription, resource group and location. For this example I will choose West Europe since I'm from the Netherlands. Also make sure the Create Azure Run as account option is on Yes (we need it for step 3).
Azure Automation Account





















2) Credentials
Next step is to create Credentials to run this Runbook with. This works very similar to the Credentials in SQL Server Management Studio. Go to the Azure Automation Account and click on Credentials in the menu. Then click on Add New Credentials. You could just use your own Azure credentials, but the best options is to use a service account with a non-expiring password. Otherwise you need to change this regularly. Make sure this account has the appropriate rights to process the cube.
Create new credentials



















3) Connections
This step is for your information only and to understand the code. Under Connections you will find a default connection named 'AzureRunAsConnection' that contains information about the Azure environment, like the tendant id and the subscription id. To prevent hardcoded values we will retrieve these fields in the PowerShell code.
Azure Connections



















4) Variables
An other option to prevent hardcoded values in your PowerShell code it to use Variables. We will use this option to provide the Analysis Server Name and the Database Name to specify which database you want to process. Go to Variables and add a new string variable AnalysisServerName and add the name of the server that starts with asazure://westeurope.asazure.windows.net as value. Then repeat this with a string variable called DatabaseName for the database name of your tabular model. You can find the values on the Azure Analysis Services Overview page.
Add variables



















5) Modules
The Azure Analysis Services process methods (cmdlets) are in a separate PowerShell module called "SqlServer" which is not included by default. If you do not add this module you will get errors telling you that the method is not recognized. Note that this is a different module then for pausing/resume and upscale/downscale AAS.

Go to the Modules page and check whether you see the SqlServer module in the list. If not then use the 'Browse gallery' button to add it. Adding a module could take a few moments.
Add modules















6) Runbooks
Now it is finally time to add a new Azure Runbook for the PowerShell code. Click on Runbooks and then add a new runbook (There are also five example runbooks of which AzureAutomationTutorialScript could be useful as an example). Give your new Runbook a suitable name and choose PowerShell as type.
Add Azure Runbook
















7) Edit Script
After clicking Create in the previous step the editor will we open. When editing an existing Runbook you need to click on the Edit button to edit the code. You can copy and paste the code below to your editor. Study the green comments to understand the code. Also make sure to compare the variable names in the code to the once created in step 4 and change them if necessary.
Edit the PowerShell code












# PowerShell code 
# Connect to a connection to get TenantId and SubscriptionId
$Connection = Get-AutomationConnection -Name "AzureRunAsConnection"
$TenantId = $Connection.TenantId
$SubscriptionId = $Connection.SubscriptionId
  
# Get the service principal credentials connected to the automation account. 
$null = $SPCredential = Get-AutomationPSCredential -Name "SSISJoost"
 
# Login to Azure ($null is to prevent output, since Out-Null doesn't work in Azure)
Write-Output "Login to Azure using automation account 'SSISJoost'."
$null = Login-AzureRmAccount -TenantId $TenantId -SubscriptionId $SubscriptionId -Credential $SPCredential
  
# Select the correct subscription
Write-Output "Selecting subscription '$($SubscriptionId)'."
$null = Select-AzureRmSubscription -SubscriptionID $SubscriptionId
  
# Get variable values
$DatabaseName = Get-AutomationVariable -Name 'DatabaseName'
$AnalysisServerName = Get-AutomationVariable -Name 'AnalysisServerName'

# Show info before processing (for testing/logging purpose only)
Write-Output "Processing $($DatabaseName) on $($AnalysisServerName)"

#Process database
$null = Invoke-ProcessASDatabase -databasename $DatabaseName -server $AnalysisServerName -RefreshType "Full" -Credential $SPCredential  

# Show done when finished (for testing/logging purpose only)
Write-Output "Done"

Note 1: This is a very basic script. No error handling has been added. Check the AzureAutomationTutorialScript for an example. Finetune it for you own needs.
Note 2: Because Azure Automation doesn't support Out-Null I used an other trick with the $null =. However the Write-Outputs are for testing purposes only. Nobody sees them when they are scheduled.


7) Testing
You can use the Test Pane menu option in the editor to test your PowerShell scripts. When clicking on Run it will first Queue the script before Starting it. Running takes a couple of minutes.
Testing the script in the Test Pane















After that use SSMS and login to your Azure Analysis Services and checkout the properties of your database. The Last Data Refresh should be very recent.
Login with SSMS to check the Last Data Refresh property






















8) Publish
When your script is ready, it is time to publish it. Above the editor click on the Publish button. Confirm overriding any previously published versions.
Publish the Runbook













9) Schedule
And now that we have a working and published Azure Runbook, we need to schedule it. Click on Schedule to create a new schedule for your runbook. For the process cube script I created a schedule that runs every working day on 9:00PM (21:00) to process the database. Now you need to check the properties in SSMS to check whether the scheduled script works. It takes a few minutes to run, so don't worry too soon.
Add schedule














Summary
In this post you saw how you can process your Azure Analysis Services database with only a few lines of easy code. The module you need is not included by default and it is a different module than the previous AAS PowerShell scripts from this blog.

You could borrow a few lines of code from the pause / resume script to check whether the server is online before processing it.