Monday, 16 October 2017

Power BI - Use Bookmarks for Chart information

Do you also regularly get the question from customers about which data the chart shows and what the definitions are of this data? I do. The October update from Power BI Desktop includes Bookmarks (preview). How can you use bookmarks in your dashboard to answer those questions in a user friendly way?

Bookmarks in Power BI 

Bookmarks enables you to save interesting states of your report. A state can be saved inside your report as a bookmark. These bookmarks can be helpful to tell a story to a customer (like a PowerPoint presentation) and allow you to navigate through the report or to store important analyzes. You can also use a state (bookmark) to use two different visuals for the same data, for example a table and a column chart. Using a button (self-made images), you can switch from visual. In this post we will show you how to combine a chart and related information using bookmarks.

Before we get started, it is important to know that Power BI Desktop is now available in the Windows Store. If you install this via the Windows Store, your Power BI Desktop will always be up to date and you no longer have to manually download and install a new version every month. A big advantage.

Windows Store - Power BI Desktop

1) Create the Report (images)
First, make sure you have turned on the preview feature 'Bookmarks' in the menu. Go to Options and settings and click on Options. On the left (Global) you see Preview features, click on this. Select the bookmark feature and save your settings. You may have to restart Power BI Desktop.

Power BI Desktop - Turn on preview features

We use the same report from an earlier post. In this report, we have four visuals and three of them are charts. We want to show some information about the chart 'Total Sales and Profit per Month' in the upper left corner.

First, we insert a self made black question mark image to the upper right corner of the chart. This will be default look of the report (and will be the first bookmark later on). Now we want to show the information of the chart. Therefore we insert another question mark image on the same spot, a gray one. The black question mark is not visible anymore. Last, we insert a text box on the same spot as the chart. The chart will not be visible now. This text box contains information about the report and you can also add a link for additional information. The report now shows us information about the chart (screenshot 2).

Power BI Desktop - Add all the visuals to the report

2) Create the Bookmarks
We want to create interaction between those two states, because now only the last visuals (with information about the chart) are visible in the report. For this we need the bookmarks.

The first bookmark is the state of the original report, with the chart. Note that this state has a black question mark. For this bookmark, we have to hide the other visuals. In this case, the gray question mark image and the text box with information. Go to the 'View' menu and select the 'Selection Pane'. On the right side the 'Selection' pane will appear. Deselect the gray question mark image and text box in 'Selection'. Now we see the first original state again. We want to save this state by adding a bookmark. Go to 'View' and select the 'Bookmarks Pane'. On the right the 'Bookmarks' pane appears. Click on 'Add' and give the bookmark a suitable name.

The second bookmark must contain the information about the chart. Open the 'Selection Pane' again make the gray question mark image and text box visible again. Now hide the chart 'Total Sales and Profit per Month' and the black question mark image. Save this as a new bookmark.

Finally, when we click on the black question mark, we must ensure that we will be redirected to the information state and then back to the default state (by clicking the gray question mark). Click on the first bookmark called 'Home (overview)' and select black question mark image. you see a new option called 'Link'. Turn this on, choose 'Bookmark' as type and select the bookmark 'Chart 1 information (selected)'. Now select the second bookmark and follow the same steps, but now you select the gray question mark and you link this image to the first bookmark 'Home (overview)'.

Power BI Desktop - Create the bookmarks

First, you have to create all the visuals in your report and then you can create bookmarks in combination with the selection pane to show or hide those visuals for a specific bookmark. Otherwise, when you first create a bookmark of the original state and then add new visuals for the second bookmark, the first bookmark will also inherit those new visuals.

3) Result
We will now watch the result in the Power BI service. Open the published report and click on the black question mark. The information about the chart is now displayed instead of the chart it self. A new click on the gray question mark will bring you back to the original report with the chart. It works!

Power BI Service - Result

Bookmarks View
There is a new feature in the 'View' menu, the 'Bookmarks pane'. It is below the already existing 'Selection pane'. Turn this feature on and the 'Bookmarks' pane appears on the right. Click on 'View' in 'Bookmarks' and you will see a Bookmark bar below. In this bar you can switch between the bookmarks. Enter the full screen mode and it looks like a PowerPoint presentation with different slides. Cool!

Power BI Service - Bookmarks View

Power BI Desktop
You can also open the bookmarks view in Power BI Desktop to make sure it works before publish the report. Open the 'Bookmarks pane' as before and click on the same 'View' button. It works the same as in the Power BI service.

You can download the Power BI report here.

In this post you saw how to customize your dashboard using bookmarks. This is one of the many ways to use them, besides presentation or storing important analyzes. With this, you do not have to refer your users to another location for associated documentation. Also, what I see in practice as well, an information page (sort of landings page) is not necessary anymore.

Since this is a preview, there probably will be more features in the next versions. Go with it and share your experiences with Microsoft (or us). Feedback is of course appreciated.

Wednesday, 11 October 2017

Schedule Process Azure Analysis Services database

I have my tabular database hosted in Azure Analysis Services, but how do I process it without on-premises tools or services?
Process Azure Analysis Services

One solution could be using some PowerShell code in Azure Automation Runbooks. With only a few lines of code you can process your database.

1) Automation Account
First we need an Azure Automation Account to run the Runbook with our PowerShell code. If you don't have one or want to create a new one, then search for Automation under Monitoring + Management and give it a suitable name, then select your subscription, resource group and location. For this example I will choose West Europe since I'm from the Netherlands. Also make sure the Create Azure Run as account option is on Yes (we need it for step 3).
Azure Automation Account

2) Credentials
Next step is to create Credentials to run this Runbook with. This works very similar to the Credentials in SQL Server Management Studio. Go to the Azure Automation Account and click on Credentials in the menu. Then click on Add New Credentials. You could just use your own Azure credentials, but the best options is to use a service account with a non-expiring password. Otherwise you need to change this regularly. Make sure this account has the appropriate rights to process the cube.
Create new credentials

3) Connections
This step is for your information only and to understand the code. Under Connections you will find a default connection named 'AzureRunAsConnection' that contains information about the Azure environment, like the tendant id and the subscription id. To prevent hardcoded values we will retrieve these fields in the PowerShell code.
Azure Connections

4) Variables
An other option to prevent hardcoded values in your PowerShell code it to use Variables. We will use this option to provide the Analysis Server Name and the Database Name to specify which database you want to process. Go to Variables and add a new string variable AnalysisServerName and add the name of the server that starts with asazure:// as value. Then repeat this with a string variable called DatabaseName for the database name of your tabular model. You can find the values on the Azure Analysis Services Overview page.
Add variables

5) Modules
The Azure Analysis Services process methods (cmdlets) are in a separate PowerShell module called "SqlServer" which is not included by default. If you do not add this module you will get errors telling you that the method is not recognized. Note that this is a different module then for pausing/resume and upscale/downscale AAS.

Go to the Modules page and check whether you see the SqlServer module in the list. If not then use the 'Browse gallery' button to add it. Adding a module could take a few moments.
Add modules

6) Runbooks
Now it is finally time to add a new Azure Runbook for the PowerShell code. Click on Runbooks and then add a new runbook (There are also five example runbooks of which AzureAutomationTutorialScript could be useful as an example). Give your new Runbook a suitable name and choose PowerShell as type.
Add Azure Runbook

7) Edit Script
After clicking Create in the previous step the editor will we open. When editing an existing Runbook you need to click on the Edit button to edit the code. You can copy and paste the code below to your editor. Study the green comments to understand the code. Also make sure to compare the variable names in the code to the once created in step 4 and change them if necessary.
Edit the PowerShell code

# PowerShell code 
# Connect to a connection to get TenantId and SubscriptionId
$Connection = Get-AutomationConnection -Name "AzureRunAsConnection"
$TenantId = $Connection.TenantId
$SubscriptionId = $Connection.SubscriptionId
# Get the service principal credentials connected to the automation account. 
$null = $SPCredential = Get-AutomationPSCredential -Name "SSISJoost"
# Login to Azure ($null is to prevent output, since Out-Null doesn't work in Azure)
Write-Output "Login to Azure using automation account 'SSISJoost'."
$null = Login-AzureRmAccount -TenantId $TenantId -SubscriptionId $SubscriptionId -Credential $SPCredential
# Select the correct subscription
Write-Output "Selecting subscription '$($SubscriptionId)'."
$null = Select-AzureRmSubscription -SubscriptionID $SubscriptionId
# Get variable values
$DatabaseName = Get-AutomationVariable -Name 'DatabaseName'
$AnalysisServerName = Get-AutomationVariable -Name 'AnalysisServerName'

# Show info before processing (for testing/logging purpose only)
Write-Output "Processing $($DatabaseName) on $($AnalysisServerName)"

#Process database
$null = Invoke-ProcessASDatabase -databasename $DatabaseName -server $AnalysisServerName -RefreshType "Full" -Credential $SPCredential  

# Show done when finished (for testing/logging purpose only)
Write-Output "Done"

Note 1: This is a very basic script. No error handling has been added. Check the AzureAutomationTutorialScript for an example. Finetune it for you own needs.
Note 2: Because Azure Automation doesn't support Out-Null I used an other trick with the $null =. However the Write-Outputs are for testing purposes only. Nobody sees them when they are scheduled.

7) Testing
You can use the Test Pane menu option in the editor to test your PowerShell scripts. When clicking on Run it will first Queue the script before Starting it. Running takes a couple of minutes.
Testing the script in the Test Pane

After that use SSMS and login to your Azure Analysis Services and checkout the properties of your database. The Last Data Refresh should be very recent.
Login with SSMS to check the Last Data Refresh property

8) Publish
When your script is ready, it is time to publish it. Above the editor click on the Publish button. Confirm overriding any previously published versions.
Publish the Runbook

9) Schedule
And now that we have a working and published Azure Runbook, we need to schedule it. Click on Schedule to create a new schedule for your runbook. For the process cube script I created a schedule that runs every working day on 9:00PM (21:00) to process the database. Now you need to check the properties in SSMS to check whether the scheduled script works. It takes a few minutes to run, so don't worry too soon.
Add schedule

In this post you saw how you can process your Azure Analysis Services database with only a few lines of easy code. The module you need is not included by default and it is a different module than the previous AAS PowerShell scripts from this blog.

You could borrow a few lines of code from the pause / resume script to check whether the server is online before processing it.

Monday, 25 September 2017

Power BI Snack : Drillthrough in Power BI

In SQL Server Reporting Services (SSRS) we are used to use subreports and drillthrough, but in Power BI this was not possible until the release in September. How does it work?

Drillthrough in Power BI

As said, we use this functionality a lot in SSRS. For example, when you are dealing with multiple audiences who use the report. Managing board or senior management are interested in the total sales per year/month. An operational manager is probably interested in the same numbers, but then by week/day, per location and more details about the products sold.

To build a drilltrough report, we created a Power BI report based on the sample database of Microsoft. You can download the database WideWorldImporters and more here. Our first report is the main report, where we are showing the total sales and profit per month and per state. You can also filter this data per year.

1) Configure the drillthrough filter
After creating the main report, we want to show more details about the sales and profit per state. For this we have built another report (subreport) with a number of graphs and one table for the details. This table shows the sales and profit per city. Next, we add the Drillthrough filter in the subreport. Go to Filters and now you also see Drillthrough filters as possibility. Add the appropriate column here, in this case 'State Province'.

Power BI - Drillthrough filter

Once you added the filter, an arrow (icon) appears automatically in the upper left corner of the "subreport". By clicking this arrow, you return to the main report. You can customize this arrow to your desired layout.

Power BI - Back to main report navigator (default and customized)

2) Result
Now we can use the drillthrough functionality in the main report. Go to the table and right click on a state, for example 'Alabama'. At the bottom, select Drillthrough and the subreport Sales per State. The subreport will now be opened with all sales data for the state Alabama. When you want to return to the main report, click the blue arrow in the upper left corner.

Unfortunately you can only filter the subreport with the drillthrough filter. So when you have multiple report filters in your report, for example 'Calender Year', the subreport will not filter on year. As soon this is the possibility, the other sub rapport called 'Sales per State (future)' is a better solution.

You can download the Power BI report here.

Power BI - Drillthrough to your subreport

Power BI has taken a good step for more interaction between different reports, hopefully this is just a beginning. There are still some improvements to be made. For example, the possibility to pass through all selected report filters in the main report to the subreport. Now only the selected drillthrough filter is passed by.

You can vote on this idea here. Of course, we already voted.

Tuesday, 29 August 2017

Azure - Continue with Azure Data Lake for Big Data

I an earlier post we showed you how to transform sensor data using Azure Data Lake. Many companies are gathering (or already have) a lot of Big Data in many different files. How can we use Azure Data Lake Analytics (ADLA) to handle these files?

Big Data and U-SQL

Just like the previous post, the sensor data is already stored in an Azure Data Lake Store (ADLS). Next, we build and configure an U-SQL Job. This is Microsoft's new Big Data query language that you can use in ADLA. Last time we developed in the Azure Portal, but there are other options. Last month, Microsoft released a Visual Studio plug-in for Azure Data Lake and Stream Analytics. This allows you, while writing U-SQL queries, to use other benefits of Visual Studio like Team Foundation Server (TFS), debugging and adding C# code for custom inputs and outputs.

In this case we have sensor data from one year. The data is stored in several files: one file per day. We want to create a U-SQL job that aggregates the data per day and then stores all the data. For now we focus on the query itself. See here how to create an ADLA service/account and to create a new U-SQL Job.

1) Install plug-in for Visual Studio
First we have to download and install the plug-in Microsoft Azure Data Lake and Stream Analytics Tools for Visual Studio. You can download the plug-in here. Besides the creating and debugging of U-SQL scripts, you can also build queries of Azure Stream Analytics jobs using this plug-in.

2) Write the Query
Open Visual Studio and create a new U-SQL project. Our U-SQL script is called 'multipleFiles'. The starting point is the query we made in an earlier post extracting one single sensor file.

Because we have multiple files, we are creating a dynamic FROM clause using variables. In this case the folder path from ADLS. We use the following syntax for this:"bitools_sample_data_{*}.csv". This is a wildcard and will get you every file of the year (see comment in the query below for the input files structure). We also skip the first row, the headers.

// File naming convention: bitools_sample_data_01-01-2016.csv, bitools_sample_data_01-02-2016.csv etc.
// Create variable for input files
DECLARE @folderInput string = "/SensorData/Input/";
DECLARE @inputString string = @folderInput + "bitools_sample_data_{*}.csv";

To retrieve the data from the files, we use an EXTRACT statement. In an earlier post, we extracted the data as a string. Now we extract the 'time' column as date time format (just like the source file), using the variable in the FROM clause we created earlier.

// Extract the sensor data from CSV file (skip the header)
@sensorData = 
        [time]                    DateTime
    ,   [dsplid]                  string
    ,   [dspl]                    string
    ,   [temp]                    string
    ,   [hmdt]                    string
    ,   [status]                  string
    ,   [location]                string
    ,   [EventProcessedUtcTime]   string
    ,   [PartitionId]             string
    ,   [EventEnqueuedUtcTime]    string
    FROM @inputString
    USING Extractors.Csv(skipFirstNRows:1);

Next we aggregate the data into averages based on the 'time' and 'location' column, using a SELECT statement. We convert the 'time' column to a date format, because we want to aggregate per day. We give the column names a suitable name. You may have noticed that we do not select all the columns, because we do not need all columns from the source file.

// Aggregate the sensor data (average per location) and data type conversions
@result =
        time.ToString("yyyy-MM-dd") AS Date
    ,   AVG(Convert.ToInt32([temp])) AS Temperature
    ,   AVG(Convert.ToInt32([hmdt])) AS Humidity
    ,   [location] AS Location
    FROM @sensorData
    ,   [location];

Finally, we save the data in a new CSV file. In the OUTPUT statement, you can also add an ORDER BY clause. We want the header back in our output data and therefore we use 'outputHeader'.

// Save the sensor data to a new CSV file
OUTPUT @result
TO "/SensorData/Output/bitools_sample_data_AveragePerDayPerLocation.csv"
    [Location] ASC
USING Outputters.Csv(outputHeader : true, quoting:false);

See below a screenshot of the full query in Visual Studio.

Visual Studio - U-SQL script

3) Run the Job
When you have built the query, click 'Submit' and then the Job View screen automatically appears. This is similar to Job Details in the Azure Portal that we used earlier. But when you look closely, you see Visual Studio offers more information then the portal. For example, more details at 'Job Summary' and errors details.

Visual Studio - Run U-SQL script

Error details
When you have an error in the U-SQL query, you can see often the details of this error directly in the 'Job View' screen. In case of an Vertex user code error, you do not immediately see the error details on this screen. If you want to see details of this error, scroll down in the 'Job Summary' and click on 'Resources'. Then choose 'Profile' and search for the keyword 'jobError'. This row contains the details of the error.

Visual Studio - U-SQL Query error details

4) Result
Now go to the Azure portal and to your Azure Data Lake Store. Open the new file in 'Data Explorer'. Our output file is located in the folder 'SensorData' and then 'Output'. The result should look like this:

Azure Portal - View result in Data Lake Store

In this post we went deeper into building an U-SQL script using Visual Studio. In our opinion, you should develop as much as possible in Visual Studio, because we all know the benefits of this tool like TFS and debugging.

Sunday, 20 August 2017

Use PolyBase to read Blob Storage in Azure SQL DW

I have a file in an Azure Blob Storage container which I want to use in my Azure SQL Data Warehouse. How can I push the content of that file to Azure SQL DW?

You could of course use an ETL product or Azure Data Factory, but you can also use PolyBase technology in Azure SQL DW and use that file as an external table. The data stays in the Azure Blob Storage file, but you can query the data like a regular table.

Before we start, make sure your Azure SQL Data Warehouse is started and use SQL Server Management Studio (SSMS) to connect to your Data Warehouse. Notice that the icon of a SQL DW is different than SQL DB.

1) Master key
In the next step we will use an credential that points to the Azure Blob Storage. To encrypt that credential, we first need to create a master key in our Azure SQL Data Warehouse, but only if you do not already have one. You can check that in the table sys.symmetric_keys. If a row exists where the symmetric_key_id column is 101 (or the name column is '##MS_DatabaseMasterKey##') then you already have a master key. Else we need to create one. For Azure SQL Data Warehouse a password for that master key is optional. For this example we will not use the password.
--Master key
IF NOT EXISTS (SELECT * FROM sys.symmetric_keys WHERE symmetric_key_id = 101)
    PRINT 'Creating Master Key'
    PRINT 'Master Key already exists'

2) Credentials
Next step is to create a credential which will be used to access the Azure Blob Storage. Go to the Azure portal and find the Storage Account that contains your blob file. Then go to the Access keys page and copy the key1 (or key2).
Access keys

Then execute the following code where IDENTITY contains a random string and SECRET contains the copied key from your Azure Storage account.
    IDENTITY = 'user',
    SECRET = 'JGadV/tAt1npuNwkiH9HnI/wosi8YS********=='

Tip: give the credential a descriptive name so that you know where it is used for. You can find all credentials in the table sys.database_credentials:
--Find all credential
SELECT * FROM sys.database_credentials

3) External data source
With the credential from the previous step we will create an External data source that points to the Azure Blob Storage container where your file is located. Execute the code below where:
  • TYPE = HADOOP (because PolyBase uses the Hadoop APIs to access the container)
  • LOCATION = the connection string to the container (replace [ContainerName] with the name of the container and [StorageAccountName] with the name of your storage account).
  • CREDENTIAL = the name of the credentials created in the previous step.
--Create External Data Source
    LOCATION = 'wasbs://[ContainerName]@[StorageAccountName]',
    CREDENTIAL = AzureStorageCredential

Tip: give the external source a descriptive name so that you know where it is used for. You can find all external data sources in the table sys.external_data_sources:
--Find all external sources
SELECT * FROM sys.external_data_sources

Notice that the filename is not mentioned in the External Data Source. This is done in the External Table. This allows you to use multiple files from the same container as External Tables.
Filename not in External Data Source

4) External File format
Now we need to describe the format used in the source file. In our case we have a comma delimited file. You can also use this file format to supply the date format, compression type or encoding.
--Create External Data Source
    FORMAT_TYPE = DelimitedText,

Tip: give the format a descriptive name so that you know where it is used for. You can find all external file formats in the table sys.external_file_formats:
--Find all external file formats
SELECT * FROM sys.external_file_formats

5) External Table
The last step before we can start quering, is creating the external table. In this create table script you need to specify all columns, datatypes and the filename that you want to read. The filename starts with a forward slash. You also need the datasource from step 3 and the file format from step 4.
--Create External table
CREATE EXTERNAL TABLE dbo.sensordata (
    [Date] DateTime2(7) NOT NULL,
    [temp] INT NOT NULL,
    [hmdt] INT NOT NULL,
    [location] nvarchar(50) NOT NULL
    DATA_SOURCE=AzureStorage, -- from step 3
    FILE_FORMAT=TextFile      -- from step 4
Note: PolyBase does not like columnname headers. It will handle it like a regular data row an throw an error when the datatype doesn't match. There is a little workaround for this with REJECT_TYPE and REJECT_VALUE. However this only works when the datatype of the header is different than the datatypes of the actual rows. Else you have to filter the header row in a subsequent step.
--Create External table with header
CREATE EXTERNAL TABLE dbo.sensordata5 (
    [Date] DateTime2(7) NOT NULL,
    [temp] INT NOT NULL,
    [hmdt] INT NOT NULL,
    [location] nvarchar(50) NOT NULL
    REJECT_TYPE = VALUE, -- Reject rows with wrong datatypes
    REJECT_VALUE = 1     -- Allow 1 failure (the header)
You can find all external tables in the table sys.external_tables.
--Find all external tables
SELECT * FROM sys.external_tables
However you can also find the External Table (/the External Data Source/the External File Format) in the Object Explorer of SSMS.
SSMS Object Explorer

6) Query external table
Now you can query the external table like any other regular table. However the table is read-only so you can not delete, update or insert records. If you update the source file then the data in this external table also changes instantly because the file is used to get the data.
SELECT count(*) FROM dbo.sensordata;
SELECT * FROM dbo.sensordata;
Quering an external table

7) What is next?
Most likely you will be using a CTAS query (Create Table As Select) to copy and transform the data to an other table since this is the fasted/preferred way in SQL DW. In a subsequent post we will explain more about CTAS, but here is how a CTAS query looks like.
CREATE TABLE [dbo].[Buildings]
SELECT  [location]
,       [date]
,       [temp]
,       [hmdt]
FROM    [dbo].[sensordata]

In some cases you could also use an SELECT INTO query as an alternative for CTAS.

In this post you saw how easy it was to read a file from the Azure Blob Storage and use it as a table in Azure SQL Data Warehouse. The big advantage of PolyBase is that you only have one copy of the data because the data stays in the file. In a next post we will see how to read the same file from the Azure Data Lake Store which does not use the Access keys.
In an other post we will explain the basic usage of the CTAS query which is the preferred way to handle large sets of data in Azure SQL DW and in its on-premises precursor APS (a.k.a. PDW).

Sunday, 30 July 2017

Azure SQL Database vs Azure SQL Data Warehouse

As we slowly move from on-premises data warehouses with Microsoft SQL Server to cloud data warehouses in Microsoft Azure, we need to know more about the various options in Azure. You probably already used an Azure SQL Database, but Microsoft also introduced Azure SQL Data Warehouse. What are the differences between these two databases?
Azure SQL DB vs Azure SQL DW

Back in 2013, Microsoft introduced Azure SQL Database which has its origin in the on-premises Microsoft SQL Server. In 2015 (however public availability was in July 2016) Microsoft added SQL Data Warehouse to the Azure cloud portfolio which has its origin in the on-premises Microsoft Analytics Platform System (APS). This was a Parallel Data Warehouse (PDW) combined with Massively Parallel Processing (MPP) technology and included standard hardware. It is the 'big brother' of SQL Server, but with a slightly different purpose.

In this post we will briefly describe the differences between these two Microsoft Azure Services, but first Microsofts own definitions:
  • Azure SQL Database is a relational database-as-a service using the Microsoft SQL Server Engine (more);
  • Azure SQL Data Warehouse is a massively parallel processing (MPP) cloud-based, scale-out, relational database capable of processing massive volumes of data (more);

1) Purpose: OLAP vs OLTP
Although both Azure SQL DB and Azure SQL DW are cloud based systems for hosting data, their purpose is different. The biggest difference is that SQL DB is specifically for Online Transaction Processing (OLTP). This means operational data with a lot of short transactions like INSERT, UPDATE and DELETE by multiple people and/or processes. The data is most often highly normalized stored in many tables.

On the other hand SQL DW is specifically for Online Analytical Processing (OLAP) for data warehouses. This means consolidation data with a lower volume, but more complex queries. The data is most often stored de-normalized with fewer tables using a star or snowflake schema.

2) Achitecture
In order to make the differences more clear a quick preview of the architecture of Azure SQL Data Warehouse, where you see a whole collection of Azure SQL Databases and separated storage. The maximum number of compute notes at the moment is 60.
    Architecture SQL DW: Decouples (Blob) storage from compute (SQL DB)

    3) Storage size
    The current size limit of an Azure SQL Database is 4TB, but it has been getting bigger over the past few years and will probably end up around 10TB in the near future. On the other hand we have the Azure SQL Data Warehouse which has no storage limit at all (only the limit of your wallet), because the storage is separated from the compute.

    3) Pricing
    The pricing is also quite different. Where Azure SQL DB starts with €4,20 a month, Azure SQL DW starts around €900,- a month excluding the cost of storage which is included in SQL DB. The storage costs for Azure SQL DW are around €125,- per TB per month. And the maximum costs of a single SQL DB is around €13500,- where SQL DW ends around a massive €57000,- (excl. storage). But when you take a look at the architecture above, it should be no surprise that SQL DW is more expensive than SQL DB, because it consists of multiple SQL DBs.

    However, SQL DW has one big trick up its sleeve that SQL DB hasn't: you can pause it completely and then you only pay for storage. If you start your SQL DW with your ETL job and pause it right after you processed your Azure Analysis Services then you only need it a small percentage of the month.

    Note: prices are from July 2017

    4) DTU vs DWU
    SQL DB has 15 different pricing tiers which specify the number of Database Transaction Units  (DTU) and the storage size/type:
    - Basic
    - Standard (S0, S1, S2, S3)
    - Premium (P1, P2, P3, P4, P6, P11, P15)
    - Premium RS (PRS1, PRS2, PRS4, PRS6)
    Basic has only 5 DTUs and the highest number of DTUs is, at the time of writing, 4000.
    The term DTU is a bit vague. It is a mysterious combination of RAM, CPU and read-write rates, but basically if you want to double the performance of your current database you just need to double the number of DTU's for your database.

    SQL DW has 12 different pricing tiers and uses Data Warehouse Units (DWU) to specify the performance level.
    - DWU100, 200, 300, 400, 500, 600, 1000, 1200, 1500, 2000, 3000, 6000
    The term DWU is a little less vague, because if you divide that number by 100 you have the number of compute nodes available for that pricing tier. On the other hand the exact combination of CPU, memory and IOPS per compute note is unknown.

    Because both services have a different purpose it is a bit strange to compare the hardware, but according to this MSDN blog post 1 DWU is approximately 7,5 DTU.

    But there is also some similarity: for both services you can use the same script to change the pricing tier on the fly to either give the performance a real boost when needed or the save money in the quiet hours.

    5) Concurrent Connection
    Although SQL DW is a collection of SQL Databases the maximum number of concurrent connections is much lower than with SQL DB. SQL DW has a maximum of 1024 active connections where SQL DB can handle 6400 concurrent logins and 30000 concurrent sessions. This means that in the exceptional case where you have over a thousand active users for your dashboard you probably should consider SQL DB to host the data instead of SQL DW.
    For more details see the SQL DB Recource Limitations and SQL DW Recource Limitations.

    6) Concurrent Queries
    Besides the maximum connections, the number of concurrent queries is also much lower. SQL DW can execute a maximum of 32 queries at one moment where SQL DB can have 6400 concurrent workers (requests). This is where you see the differences between OLTP and OLAP.
    For more details see the SQL DB Recource Limitations and SQL DW Recource Limitations.

    7) PolyBase
    Azure SQL Data Warehouse supports PolyBase. This technology allows you to access data outside the database with regular Transact SQL. It can for example use a file in an Azure Blob Storage container as a (external) table. Other options are importing and exporting data from Hadoop or Azure Data Lake Store. Although SQL Server 2016 also supports PolyBase, Azure SQL Database does not (yet?) support it.

    8) Query language differences
    Although SQL DW uses SQL DB in the background there are a few minor differences when quering or creating tables:
    - SQL DW cannot use cross databases queries. So all your data should be in the same database.
    - SQL DW can use IDENTITY, but only for INT or BIGINT. Moreover the IDENTITY column cannot be used as part of the distribution key.
    - Also see this SQL DW list of unsupported table features.

    9) Replication
    SQL DB supports active geo-replication. This enables you to configure up to four readable secondary databases in the same or different location. SQL DW does not support active geo-replication, only Azure Storage replication. However this is not a live, readable, synchronized copy of your database! It's more like a backup.

    10) In Memory OLTP tables
    SQL DB supports in-memory OLTP. SQL DW is OLAP and does not support it.

    11) Always encrypted
    SQL DB supports Always Encrypted to protect sensitive data. SQL DW does not support it.

    Although Azure SQL DB looks much cheaper on a monthly basis, this doesn't mean you should always choose SQL DB by default. One big advantage is that you can pause SQL DW. For example a stage database is only used during the ETL process. Why should you always have this database up an running? Or why should your datamart stay online 24*7 if your end users only use Analysis Services to browse the data.

    On the other hand the original purpose of both services is different (OLTP vs OLAP), but this doesn't mean you should always use Azure SQL DW for your data warehouses. Depending on several factors like data size, complexity, required up-time and budget, Azure SQL DB could also host your data warehouse. You could even mix them in your project. For example Stage and Historical/Persistent stage in Azure SQL DW and your Datamart in Azure SQL DB.

    Please leave a comment if you know more significant differences that are worth mentioning.

    Sunday, 23 July 2017

    Azure - Use Azure Data Lake for Big Data

    We have collected sensor data and we want to use this in a Data Warehouse (DWH). Because we do not want to store raw data in our database, we need to resolve this first. How can we accomplish this with Azure Data Lake?
    Over Azure Data Lake

    We use Azure Data Lake Store (ADLS) to store the sensor data. As we know from an earlier post, ADLS is extremely suitable for storing unstructured data and we showed an example of how you can store this sensor data. In the first example the data, one file of each day, is already stored in ADLS. You can download the file here. Next we are going to aggregate this data per day and create a file that is ready to load.

    To accomplish this we will use another feature of Azure Data Lake, called Azure Data Lake Analytics (ADLA). With this and Data Lake Store, Microsoft offers new features similar to Apache Hadoop to deal with petabytes of Big Data. The advantage of Data Lake Analytics is that it supports Hadoop, but also introduce a similar language like T-SQL, called U-SQLThis is Microsoft's new Big Data query language. It is T-SQL with a little touch of C# to add even more features to the language. Click here for more details about this.

    If you don't have sensor data ready, you can download a sensor generator for free. This 'SensorEventGenerator' can be found here. In addition, you have to create an Event Hub / IoT Hub to sending these sensor data to Azure. Click here for more information. In our case we have generated our own data with a similar program and uploaded to ADLS.

    1) Create Data Lake Analytics
    First we have to create a new Data Lake Service with a Data Lake Analytics account. We give the account a suitable name and we choose the same resource group as the ADLS uses. As a last step we need to choose the Data Lake Store, in our case 'bitoolsadls' where the sensor files are stored.

    Azure Portal - Create Data Lake Analytics

    Important to know is the pricing. The creation of the service is free, also while it is running. The payment starts with the use of U-SQL Jobs. You pay for computing power (measured in Analytics Units). More information about pricing here. For example, if we want to run a job 24 hours with 1 UAs (and that complies), this will results in the following costs:

    Data Lake Analytics Cost Indication
    Depending of the amount of data, the number of Analytics Units (UAs) must be increased. 

    1) Create new Job
    Now everything is set, we must create a new U-SQL Job. Go to your Data Lake Analytics you just created and click on 'New Job'. Give it a suitable name. Notice that you have two other options to change: Priority and UAs. By 'Priority' you can determine the importance of the job. For example, if the job has Priority 1, this job will always start first. The other one are the Analytics Units (as we explained earlier). Increasing the 'UAs', will give you a cost indication. For now we leave this both on default, because it is our only job. 

    2) Write a Query
    Let's get started with the query. As I mentioned before, it similar to T-SQL, because you can also write SELECT statements with the familiar FROM and WHERE clauses and transfer the data to a new location. In addition, you must use C# for data type conversions and for example to determine today's date. In this case we retrieve the raw data, aggregate this data and store it in a new CSV file in a new ADLS folder.

    To retrieve the data from the file, use EXTRACT and FROM. You can also use T-SQL variables in the FROM clause to avoid hard coded paths, but for now we fill in our file path hard coded. We do not need to fill in the Data Lake name itself, because the default is our ADLS account. We also have to use the USING statement for specify the extract format, in this case a CSV file. More details about this statement here. Important to know is that we specify that the first row must be skipped, because that's the header. The result of this query will be stored in the variable '@sensorData'.

    Next we want to aggregate our data, because the granularity of the raw data is per second. We take the averages of temperature (temp) and humidity (hmdt) per day and per location. We need a SELECT statement on '@sensorData' variable for this. Besides the aggregations, we are doing some datatype conversions, because extracting the data as string is currently the preferred way. We also use GROUP BY for dividing the result into groups (per day and per location). You may have noticed that we do not select all the columns, because we do not need all columns from the source file.

    Once we have retrieved and transform the data, we want to save it in a new file. Therefore, we use the OUTPUT statement. Define the folder path. This will be automatically created with the new file. Because we are also saving the new file in a CSV format, we will use the same extractor by USING, except now without the 'skipFirstNRows' parameter. Every time you start the job, it will overwrite the file destination.

    The query:

    // Extract the sensor data from CSV file (skip the header)
    @sensorData = 
            [time]                    string
        ,   [dsplid]                  string
        ,   [dspl]                    string
        ,   [temp]                    string
        ,   [hmdt]                    string
        ,   [status]                  string
        ,   [location]                string
        ,   [EventProcessedUtcTime]   string
        ,   [PartitionId]             string
        ,   [EventEnqueuedUtcTime]    string
        FROM "/SensorData/Input/bitools_sample_data_01-01-2016.csv"
        USING Extractors.Csv(skipFirstNRows:1);
    // Aggregate the sensor data (average per location) and data type conversions
    @result =
            DateTime.ParseExact([time], "yyyy-MM-dd HH:mm:ss", null).Date AS Date
        ,   AVG(Convert.ToInt32([temp])) AS Temperature
        ,   AVG(Convert.ToInt32([hmdt])) AS Humidity
        ,   [location] AS Location
        FROM @sensorData
        GROUP BY
            DateTime.ParseExact([time], "yyyy-MM-dd HH:mm:ss", null).Date
        ,   [location];
    // Save the sensor data to a new CSV file
    OUTPUT @result
    TO "/SensorData/Output/bitools_sample_data_01-01-2016_AveragePerDayPerLocation.csv"
    USING Outputters.Csv(outputHeader : true, quoting:false);

    Note 1:
    The statements EXTRACT and OUTPUT use absolute or relative file paths. That's why we cannot use SELECT for retrieving the data. We also use the C# syntax (//) to add comments. Click here and here for more information about the U-SQL language.

    Note 2:
    PolyBase does not handle column headers that well. If you want to read this output file with PolyBase, you could consider removing "outputHeader : true" from the OUTPUT part of the query.

    Azure Portal - Create the U-SQL Job

    3) Run the Job
    When the query is done, we must click on 'Submit Job'. Now a new screen will appear where we can monitor the running job. When the job is successfully 'Finalizing', we can preview the output file. Notice that the new file has 10 rows: one average row for temperature and humidity per location.

    Azure Portal - Run the U-SQL Job

    Finally, let see if the file is stored in the Azure Data Lake Store. Go to your ADLS and click on 'Data Explorer'. Find your output folder and there it is!

    Azure Portal - See result in Data Lake Store

    We showed you how Azure Data Lake is suitable for storing and transform Big Data, in this case sensor data. Off course, there are more ways to get this together using the Cortana Intelligence Suite, for example Stream Analytics.

    In this post we used one single file, but often companies have hundreds or thousand of files. In a next post we show you how to handle multiple input files in Azure Data Lake Analytics.

    Saturday, 1 July 2017

    SSMS Snack: Start SSMS as different user

    My company uses different Windows users for the various DWH environments (Development, Test, Acceptance & Production). How can I connect to a SQL Server instance in SQL Server Management Studio (SSMS) with a different user so that I can still use Windows Authentication?
    Windows Authentication

    The quick solution is to hold the Shift-key while right clicking the SSMS shortcut in the start menu. Then the 'Run as different user' option appears, which allows you to enter different credentials. After that the user name field for the Windows Authentication changes.
    Right click the shortcut and choose Run as different user

    Now you can run SSMS with a different account

    Now the User name changes

    Runas shortcut
    A more permanent solution is to create a new shortcut with a runas command in it. Instead of the standard SSMS command (see Target):
    "C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn\ManagementStudio\Ssms.exe" 

    you use:
    RUNAS /user:myDomain\myUserName /savecred "C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn\ManagementStudio\Ssms.exe"
    (110 = SQL 2012 / 120 = SQL 2014 / 130 = SQL 2016 / 140 = SQL 2017)

    • The /user: option allows you to use a different user.
    • The /savecred option will save the password after the first time (not in  Windows 7 Home)
    • For more options check out this site or execute "runas /?" in the command prompt to show all options. Some forums/blogs recommend the /netonly option to only use the provided user for remote access, but that often doesn't work causing SSMS not to start.

    Change Target field and optionally the Comment

    To finish this off: click on the Change Icon button and browse to SSMS.exe to select the familiar icon.
    Change icon to finish off the shortcut

    The first time you will see a command prompt where you have to enter your password. If you added the /savecred option then the second time you will only see a short 'flash' of the command prompt and then SSMS will start. You could get rid of it by changing the Run property to minimized (after the first execution).
    Enter password

    SSMS commandline options
    You can even extend this solution with some extra options for SSMS itself. Like providing the instance and database name.
    SSMS command line options

    In this post you saw how you can start up SSMS with a different domein user so you can still use Windows Authentication. This not only works for SSMS, but for other programs like Visual Studio as well:
    RUNAS /user:myDomain\myUserName /savecred "C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\devenv.exe"
    Related Posts Plugin for WordPress, Blogger...