Microsoft BI Tools

Monday 17 April 2017

Schedule Pause/Resume of Azure Data Warehouse

Case
I only want to use my Azure Data Warehouse for a couple of hours per day. I want to automatically pause and resume my Azure DWH to minimize the Azure costs. How do I do that in Azure?

Pause and Resume Azure Data Warehouse

* Update Oct 2 2020: scripts with AZ modules *

1) Automation Account
First we need an Azure Automation Account to run the Runbook. If you don't have one or want to create a new one, then search for Automation under Monitoring + Management and give it a suitable name, then select your subscription, resource group and location. For this example I will choose West Europe since I'm from the Netherlands.

Azure Automation Account

2) Credentials
Next step is to create Credentials to run this runbook with. This works very similar to the Credentials in SQL Server Management Studio. Go to the Azure Automation Account and click on Credentials in the menu. Then click on Add New Credentials. You could just use your own Azure credentials, but the best options is to use a service account with a non-expiring password. Otherwise you need to change this regularly.

Create new credentials

3) Connections
This step is for your information only and to understand the code. Under Connections you will find a default connection named 'AzureRunAsConnection' that contains information about the Azure environment, like the tendant id and the subscription id. To prevent hardcoded connection details we will retrieve these fields in the PowerShell code.

Azure Connections

4) Variables
An other option to prevent hardcoded values in your PowerShell code it to use Variables. We will use this option to provide the Resource Group, SQL Server Name and Database Name. Go to Variables and add a new variable for ResourceGroupName and add the name of the Resource Group that is used by your Azure DB. Then repeat this for ServerName (name only without .database.windows.net) and DatabaseName (the name of your Azure DWH database).

Add variables

5) Runbooks
Now it is time to add a new Azure Runbook for the PowerShell code. Click on Runbooks and then add a new runbook (There are also four example runbooks of which AzureAutomationTutorialScript could be useful as an example). Give your new Runbook a suitable name and choose PowerShell as type.

Add Azure Runbook

6) Edit Script
After clicking Create in the previous step the editor will we open. When editing an existing Runbook you need to click on Edit to edit the code. You can copy and paste the code below to your editor. Study the green comments to understand the code.

Edit the PowerShell code

# PowerShell code
# Don't continue in case of an error
$ErrorActionPreference = "Stop"

# Connect to a connection to get TenantId and SubscriptionId
$Connection = Get-AutomationConnection -Name "AzureRunAsConnection"
$TenantId = $Connection.TenantId
$SubscriptionId = $Connection.SubscriptionId

# Get the service principal credentials connected to the automation account. 
$SPCredential = Get-AutomationPSCredential -Name "SSISJoost"

# Login to Azure ($null is to prevent output, since Out-Null doesn't work in Azure)
Write-Output "Login to Azure using automation account 'SSISJoost'."
$null = Login-AzureRmAccount -TenantId $TenantId -SubscriptionId $SubscriptionId -Credential $SPCredential

# Select the correct subscription
Write-Output "Selecting subscription '$($SubscriptionId)'."
$null = Select-AzureRmSubscription -SubscriptionID $SubscriptionId

# Get variable values
$ResourceGroupName = Get-AutomationVariable -Name 'ResourceGroupName'
$ServerName = Get-AutomationVariable -Name 'ServerName'
$DatabaseName = Get-AutomationVariable -Name 'DatabaseName'

# Get old status (for testing/logging purpose only)
$OldDbSetting = Get-AzureRmSqlDatabase -DatabaseName $DatabaseName -ServerName $ServerName -ResourceGroupName $ResourceGroupName
$OldStatus = $OldDbSetting.Status

#Write-Output "Resuming $($DatabaseName). Current status: $($OldStatus)" 
#$null = Resume-AzureRmSqlDatabase -DatabaseName $DatabaseName -ServerName $ServerName -ResourceGroupName $ResourceGroupName

Write-Output "Resuming $($DatabaseName). Current status: $($OldStatus)" 
$null = Suspend-AzureRmSqlDatabase -DatabaseName $DatabaseName -ServerName $ServerName -ResourceGroupName $ResourceGroupName

# Show done when finished (for testing/logging purpose only)
Write-Output "Done"

Note 1: This is a very basic script. No error handling has been added. Check the AzureAutomationTutorialScript for an example. Finetune it for you own needs.
Note 2: There are often two versions of an method like Get-AzureRmSqlDatabase and Get-AzureSqlDatabase. Always use the one with "Rm" in it (Resource Managed), because that one is for the new Azure portal. Without Rm is for the old/classic Azure portal.
Note 3: Because Azure Automation doesn't support Out-Null I used an other trick with the $null =. However the Write-Outputs are for testing purposes only. Nobody sees them when they are scheduled.
Note 4: The code the Resume is commented out with a hashtag. Either use the two suspend rows or the two pause rows.

7) Testing
You can use the Test Pane menu option in the editor to test your PowerShell scripts. When clicking on Run it will first Queue the script before Starting it. Running takes several minutes (a bit longer then changing the tier).

Testing the script in the Test Pane

8) Publish
When your script is ready, it is time to publish it. Above the editor click on the Publish button. Confirm overriding any previously published versions.

Publish the Runbook

9) Schedule
And now that we have a working and published Azure Runbook, we need to schedule it. Click on Schedule to create a new schedule for your runbook. For this pause script I created a schedule that runs every working day on 9:00PM (21:00). The resume script could for example be scheduled on working days at 7:00AM. Now you need to hit the refresh button in the SQL Databases overview in Azure to see if it really works. It takes a few minutes to run, so don't worry too soon.

Add Schedule

Summary
In this post you saw how you can pause and resume your Azure Data Warehouse to save some money in Azure during the quiet hours. The screenshots only show the pause script, but the resume script is commented out. For pause and resume you have a couple of options. You could create one script with a parameter that indicates a pause or resume. An other option is to use a fancy if-construction that uses the current status and/or time to decide whether you need to pause or to resume. Or just create two separate scripts for pause and resume.

Tuesday 11 April 2017

Schedule Upscale/downscale Azure SQL DB(/DWH)

Case
I want to downgrade the Tier of my Azure SQL DB automatically after working hours and upgrade it in the morning right before working hours to minimize the Azure costs. How do I do that in Azure?

Change the Tier of your SQL Azure DB

Solution
You could solve this with a scripting language like PowerShell and run that PowerShell script each morning and evening with SQL Server Agent or Windows Scheduler, but for this solution I will use Azure Runbook with its scheduler. So the entire solution runs in Azure.

1) Automation Account
First we need an Azure Automation Account to run the Runbook. If you don't have one or want to create a new one, then search for Automation under Monitoring + Management and give it a suitable name, then select your subscription, resource group and location. For this example I will choose West Europe since I'm from the Netherlands.

Azure Automation Account

Create new credentials

Azure Connections

Add variables

5) Runbooks
Now it is time to add a new Azure Runbook for the PowerShell code. Click on Runbooks and then add a new runbook (There are also four example runbooks of which AzureAutomationTutorialScript could be useful as example). Give your new Runbook a suitable name and choose PowerShell as type.

Add Azure Runbook

Edit the PowerShell code

# PowerShell code
# Connect to a connection to get TenantId and SubscriptionId
$Connection = Get-AutomationConnection -Name "AzureRunAsConnection"
$TenantId = $Connection.TenantId
$SubscriptionId = $Connection.SubscriptionId

# Get the service principal credentials connected to the automation account. 
$SPCredential = Get-AutomationPSCredential -Name "SSISJoost"

# Login to Azure ($null is to prevent output, since Out-Null doesn't work in Azure)
Write-Output "Login to Azure using automation account 'SSISJoost'." 
$null = Login-AzureRmAccount -TenantId $TenantId -SubscriptionId $SubscriptionId -Credential $SPCredential

# Select the correct subscription
Write-Output "Selecting subscription '$($SubscriptionId)'."
$null = Select-AzureRmSubscription -SubscriptionID $SubscriptionId

# Get variable values
$ResourceGroupName = Get-AutomationVariable -Name 'ResourceGroupName' 
$ServerName = Get-AutomationVariable -Name 'ServerName' 
$DatabaseName = Get-AutomationVariable -Name 'DatabaseName'

# Get old tier settings (for testing/logging purpose only)
$OldDbSetting = Get-AzureRmSqlDatabase -DatabaseName $DatabaseName -ServerName $ServerName -ResourceGroupName $ResourceGroupName
$OldEdition = $OldDbSetting.Edition
$OldPricingTier = $OldDbSetting.CurrentServiceObjectiveName

# Set new edition https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tiers
$NewEdition = "Basic"   # Basic, Standard, Premium, Premium RS
$NewPricingTier = ""    # Basic, S0, S1, S2, S3, P1, P2, P3, P4, P6, P11, P13, PRS1, PRS2, PRS4, PRS6

# Set NewPricingTier to Basic for Basic edition
if (($NewPricingTier -eq "") -and ($NewEdition -eq "Basic"))
{
    $NewPricingTier = "Basic"  
}

# Write old and new edition to screen (for testing/logging purpose only)
Write-Output "Changing Database Tier from $($OldEdition) ($($OldPricingTier)) to $($NewEdition) ($($NewPricingTier)) "

# Set new tier
$null = Set-AzureRmSqlDatabase -DatabaseName $DatabaseName -ServerName $ServerName -ResourceGroupName $ResourceGroupName -Edition $NewEdition -RequestedServiceObjectiveName $NewPricingTier

# Show done when finished (for testing/logging purpose only
Write-Output "Done"

Note 1: This is a very basic script. No error handling has been added. Check the AzureAutomationTutorialScript for an example. Finetune it for you own needs.
Note 2: There are often two versions of an method like Set-AzureRmSqlDatabase and Set-AzureSqlDatabase. Always use the one with "Rm" in it (Resource Managed), because that one is for the new Azure portal. Without Rm is for the old/classic Azure portal.
Note 3: Because Azure Automation doesn't support Out-Null I used an other trick with the $null =. However the Write-Outputs are for testing purposes only. Nobody sees them when they are scheduled.
Note 4: This also works for Azure Data Warehouses. Just use other Tiers like DWU400.

7) Testing
You can use the Test Pane menu option in the editor to test your PowerShell scripts. When clicking on Run it will first Queue the script before Starting it. Running takes about two minutes.

Testing the script in the Test Pane

8) Publish
When your script is ready, it is time to publish it. Above the editor click on the Publish button. Confirm overriding any previously published versions.

Publish the Runbook

9) Schedule
And now that we have a working and published Azure Runbook, we need to schedule it. Click on Schedule to create a new schedule for your runbook. For this downgrade script I created a schedule that runs every working day on 7:00PM (19:00). The upgrade script could for example be scheduled on working days at 7:00AM. Now you need to hit the refresh button in the SQL Databases overview in Azure to see if it really works. It takes a few minutes to run, so don't worry too soon.

Add Schedule

Summary
In this post you saw how you can upgrade and downgrade your Azure SQL Databases to save some money in Azure during the quiet hours. The screenshots only show the downgrade script. For the upgrade version you have a couple of options. You could create one script with a parameter that indicates an upgrade or downgrade. An other option is to use a fancy if-construction that uses the current tier and/or time to decide whether you need an upgrade or downgrade. Or just create two separate scripts for upgrade and downgrade.

Tuesday 28 March 2017

SSRS Snack: Excel Rendering Extension Error

Case
I am trying to export my SSRS report to Excel, but I'm getting an error. How Can I solve it?

Excel Rendering Extension: Unknown image format image/x-png

Solution
You are using one or more PNG pictures in your reports and SSRS doesn't know how to render these because they have an unknown mime format: image/x-png

1) Solution Explorer
Go to your SSRS project in Visual Studio and search for PNG images in your Solution Explorer.

Solution Explorer

2) Properties
Now go to the properties of this image file (F4) and locate the property MIME Type.

image/x-png

3) Change MIME Type
Change the MIME Type from image/x-png to image/png and repeat this for all PNG images.

image/png

4) Deploy and test
Now deploy your change project and reopen the report to test the excel export

Export to Excel

Thursday 23 March 2017

Loading tables by using BIML and meta architecture

Case

How can we simplify the process of loading database tables and reduce the time needed to create SSIS packages.

Solution

There are several steps that need to be taken prior to creating such a solution.
These steps are:

Create the databases 'Repository' and 'Staging' and required schema's
Create a Meta table called 'Layer' and a table called 'TableList' in the repository database
Create the Sales tables in the Sales database
Fill the Meta tables with the required meta data
Create a BIML script that will create the Extract and Load SSIS package
Generate the package using BIML Express in Visual Studio 2015 to create the SSIS package

For this solution the following prerequisites need to be met;

SQL Server will be used as source and destination platform
The Adventure Works 2014 database will be used as source
The selected tables from the Adventure Works database are present in the destination database and all required schema’s and specific datatypes, if applicable (we will be using a few tables from the Person schema that do not use custom datatypes)
Visual Studio 2015 is installed (Community/Professional/Enterprise)
BIML Express is installed

1) - Create the databases and schema's

In this step the databases 'Repository' and Staging are created and the required schema's.

--Create the databases Repository and Staging and required schema's
CREATE DATABASE [Repository] CONTAINMENT = NONE ON  PRIMARY 
( NAME = N'Repository', FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\Repository.mdf' , SIZE = 7168KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
 LOG ON 
( NAME = N'Repository_log', FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\Repository_log.ldf' , SIZE = 5184KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO
ALTER DATABASE [Repository] SET RECOVERY SIMPLE;
GO
USE Repository
go
CREATE SCHEMA rep
GO
CREATE DATABASE [Staging] CONTAINMENT = NONE ON  PRIMARY 
( NAME = N'Staging', FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\Staging.mdf' , SIZE = 7168KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
 LOG ON 
( NAME = N'Staging_log', FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\Staging_log.ldf' , SIZE = 5184KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO
ALTER DATABASE [Staging] SET RECOVERY SIMPLE;
GO
USE Staging
go
CREATE SCHEMA Sales
GO

2) - Create the Meta tables

During this step we will be creating the meta tables 'Layer' and 'TableList'. The first table will contain the id, name and prefix of the layers present in the Datawarehouse environment. In this blog the only entry present is the Staging area id and name. Normally this table also holds the name of for example the Datawarehouse and Datamart layer or any other layers present in a Business Intelligence environment.
The column 'LayerId' is used in the other table named 'TableList' and is used to make the distinction between the table names present in the respective layer. In this blog there will only be Staging Area tables described.
The table 'TableList' contains the following columns;

LayerId - The id of the layer the table belongs to
TableName - The name of the table
SchemaName - The name of the schema of the table
TableType - The type of the table (fe. user table)
LoadOrderNr - The order in which the tables are loaded (or created by other processes)
WhereClause - Any optional where clause that is used during the load proces (Default this column must be set to '1 = 1')
ActiveInd - Indicates if the table is active or inactive during the BIML creation proces
InsertDate - The date when the table entry was inserted in the 'TableList' table

Note: Some of the above columns are not or less applicable to the BIML script in this post, but they are used in other generic scripts used to create and load datawarehouse packages (more about this in future posts)

--Create the meta tables
USE [Repository]
GO
IF OBJECT_ID('[rep].[Layer]', 'U') IS NOT NULL
BEGIN
 DROP TABLE [rep].[Layer]
END
GO
CREATE TABLE [rep].[Layer](
 [LayerId] [smallint] NOT NULL,
 [LayerName] [nvarchar](50) NOT NULL,
 [LayerPrefix] [nvarchar](10) NOT NULL,
) ON [PRIMARY]
GO
IF OBJECT_ID('[rep].[TableList]', 'U') IS NOT NULL
BEGIN
 DROP TABLE [rep].[TableList]
END
GO
CREATE TABLE [rep].[TableList](
 [LayerId] [smallint] NULL,
 [TableName] [nvarchar](100) NULL,
 [SchemaName] [nvarchar](100) NULL,
 [ServerNr] [smallint] NULL,
 [TableType] [nvarchar](100) NULL,
 [LoadOrderNr] [int] NULL,
 [WhereClause] [nvarchar](250) NULL,
 [PrimaryKey] [nvarchar](250) NULL,
 [ActiveInd] [nchar](1) NULL,
 [InsertDate] [datetime] NULL
) ON [PRIMARY]
GO

3) - Create the Sales tables

During this step we will be creating the Sales tables in the target database Staging.

--Create the Sales tables
USE Staging
GO
CREATE TABLE [Sales].[ShoppingCartItem](
 [ShoppingCartItemID] [int] IDENTITY(1,1) NOT NULL,
 [ShoppingCartID] [nvarchar](50) NOT NULL,
 [Quantity] [int] NOT NULL,
 [ProductID] [int] NOT NULL,
 [DateCreated] [datetime] NOT NULL,
 [ModifiedDate] [datetime] NOT NULL,
) ON [PRIMARY]
GO
CREATE TABLE [Sales].[SpecialOffer](
 [SpecialOfferID] [int] IDENTITY(1,1) NOT NULL,
 [Description] [nvarchar](255) NOT NULL,
 [DiscountPct] [smallmoney] NOT NULL,
 [Type] [nvarchar](50) NOT NULL,
 [Category] [nvarchar](50) NOT NULL,
 [StartDate] [datetime] NOT NULL,
 [EndDate] [datetime] NOT NULL,
 [MinQty] [int] NOT NULL,
 [MaxQty] [int] NULL,
 [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
 [ModifiedDate] [datetime] NOT NULL,
) ON [PRIMARY]
GO
CREATE TABLE [Sales].[SpecialOfferProduct](
 [SpecialOfferID] [int] NOT NULL,
 [ProductID] [int] NOT NULL,
 [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
 [ModifiedDate] [datetime] NOT NULL,
) ON [PRIMARY]
GO

4) - Fill the meta tables with the required meta data

After creating the database and metadata tables, they need to be filled with the meta data that will be used by the BIML script in the next step ('BIML Load_STG_Tables_From_Microsoft.biml').

The script provided below inserts the layer information used in this blog and the table meta information of those tables for which the SSIS load proces will be created.

If you want to test the Where Clause functionality you can replace the value '1 = 1' with '1 = 1 AND ShoppingCartItemID = 2' in the column 'WhereClause' in the table 'TableList' for the tablename 'sales.ShoppingCartItem'. This will place a filter on the table.

The BIML script will use the meta table information to create one SSIS package with the name 'SSIS STG Load STG Tables SQL.dtsx'

--Insert the meta information in the meta tables
USE [Repository]
GO
TRUNCATE TABLE [rep].[Layer];
TRUNCATE TABLE [rep].[TableList];
INSERT [rep].[Layer] ([LayerId], [LayerName], [LayerPrefix]) VALUES (1, N'staging', N'stg');
INSERT [rep].[TableList] ([LayerId], [TableName], [SchemaName], [TableType], [LoadOrderNr], [WhereClause], [ActiveInd], [InsertDate]) VALUES (1, N'SpecialOffer', N'Sales', N'user_table', 1, N'1 = 1', N'1', CAST(GETDATE() AS DateTime));
INSERT [rep].[TableList] ([LayerId], [TableName], [SchemaName], [TableType], [LoadOrderNr], [WhereClause], [ActiveInd], [InsertDate]) VALUES (1, N'SpecialOfferProduct', N'Sales', N'user_table', 1, N'1 = 1', N'1', CAST(GETDATE() AS DateTime));
INSERT [rep].[TableList] ([LayerId], [TableName], [SchemaName], [TableType], [LoadOrderNr], [WhereClause], [ActiveInd], [InsertDate]) VALUES (1, N'ShoppingCartItem', N'Sales', N'user_table', 1, N'1 = 1', N'1', CAST(GETDATE() AS DateTime));
GO

5) - Create the BIML script

Once the previous steps have been executed it is time to create the BIML script. The BIML script starts with declaring the information needed to create the connection strings to the different database and the server(s) where they recide on. For this example all the databases are SQL Server 2016 databases. It would also be possible to store that information in a meta table but for this post the information is placed inside the BIML script. The BIML script will create one package with the name 'SSIS STG Load STG Tables SQL' and for each table in the 'TableList' table a sequence container will be created with two SSIS components. The first component is a SQL Task component that will use a T-SQL command to truncate the target table. The second component is a Data Flow Task containing a Source and Destination component which will load the data from the target to the source table. Alle the sequence components are executed parallel to each other.

--The BIML code that can be placed inside a BIML file.

    
    <# 
        string pRepServerName    = "localhost"; 
        string pRepDatabaseName  = "Repository";
        string pRepProvider      = "SQLNCLI11.1;Integrated Security=SSPI";
        string pRepSchema        = "rep";
      
        string pSourceServerName = "localhost";
        string pSourceDBName     = "AdventureWorks2014";
        string pSourceProvider   = "SQLNCLI11.1;Integrated Security=SSPI";
        string pSourceSchema     = "Sales";
        
        string pTargetServerName = "localhost";
        string pTargetDBName     = "Staging";
        string pTargetProvider   = "SQLNCLI11.1;Integrated Security=SSPI";
        string pTargetSchema     = "Sales";
    #>
  
    
    <#
        string pLayer            = "Staging";
     #>
    
    <#
    string csRepository = String.Format("Data Source={0};Initial Catalog={1};Provider={2};Auto Translate=False;"
          ,pRepServerName, pRepDatabaseName, pRepProvider);
   
    string csSource = String.Format("Data Source={0};Initial Catalog={1};Provider={2};Auto Translate=False;"
          ,pSourceServerName, pSourceDBName, pSourceProvider);
    
    string csTarget = String.Format("Data Source={0};Initial Catalog={1};Provider={2};Auto Translate=False;"
          ,pTargetServerName, pTargetDBName, pTargetProvider);      
    #>
    
    
    
        
        
    

    
        
            
                <#
                  StringBuilder sMETAGetTableName = new System.Text.StringBuilder();
                  
                  sMETAGetTableName.Append("SELECT ");
                  sMETAGetTableName.Append("    TableName ");
                  sMETAGetTableName.Append("  , SchemaName ");
                  sMETAGetTableName.Append("  , WhereClause ");
                  sMETAGetTableName.Append("FROM ");
                  sMETAGetTableName.Append(pRepSchema);
                  sMETAGetTableName.Append(".TableList AS TAB ");
                  sMETAGetTableName.Append("INNER JOIN ");
                  sMETAGetTableName.Append(pRepSchema);
                  sMETAGetTableName.Append(".Layer AS LYR ");
                  sMETAGetTableName.Append("  ON ( TAB.LayerId = LYR.LayerId) ");
                  sMETAGetTableName.Append("WHERE 1 = 1 ");
                  sMETAGetTableName.Append(" AND TAB.ActiveInd = 1 ");
                  sMETAGetTableName.Append(" AND LYR.LayerName = '");
                  sMETAGetTableName.Append(pLayer);
                  sMETAGetTableName.Append("' ");
                  sMETAGetTableName.Append("ORDER BY ");
                  sMETAGetTableName.Append("   TAB.LoadOrderNr");
                  
                  DataTable tblMETATableNames = ExternalDataAccess.GetDataTable(csRepository, sMETAGetTableName.ToString());
                  foreach (DataRow METATableNameRow in tblMETATableNames.Rows) {
                #>
                 <#=METATableNameRow["TableName"] #>" ConstraintMode="Linear">
                    
                        "
                                    ResultSet="None"
                                    ConnectionName="OLEDB Target">
                            TRUNCATE TABLE <#=pTargetSchema #>.<#=METATableNameRow["TableName"] #> 
                            
                        
                        ">
                            
                                "
                                             ConnectionName="OLEDB Source"
                                             ValidateExternalMetadata="false">
                                    SELECT
        CAST(1 AS INTEGER) AS DUMMY_COLUMN
<#                                          StringBuilder sGETSelectColumn = new System.Text.StringBuilder();
                                            sGETSelectColumn.Append("SELECT " );
                                            sGETSelectColumn.Append("     col.name AS column_name " );
                                            sGETSelectColumn.Append("FROM sys.columns AS col " );
                                            sGETSelectColumn.Append("INNER JOIN sys.objects AS obj " );
                                            sGETSelectColumn.Append("    ON(col.object_id = obj.object_id) " );
                                            sGETSelectColumn.Append("INNER JOIN sys.types AS typ " );
                                            sGETSelectColumn.Append("    ON(col.user_type_id = typ.user_type_id)" );
                                            sGETSelectColumn.Append("WHERE   1 = 1 " );
                                            sGETSelectColumn.Append("   AND obj.name = '"+ METATableNameRow[0].ToString() +"'");
                                            DataTable tblSelectColumn = ExternalDataAccess.GetDataTable(csSource, sGETSelectColumn.ToString());
                                            foreach (DataRow SelectColumn in tblSelectColumn.Rows) {
#>
        , [<#=SelectColumn["COLUMN_NAME"] #>]
<# } #>FROM <#=METATableNameRow["SchemaName"] #>.<#=METATableNameRow["TableName"] #>
WHERE <#=METATableNameRow["WhereClause"] #>
                                    
                                
                                
                                " 
                                                  ConnectionName="OLEDB Target">
                                    " />
                                
                            
                        
                    
                
                
                <# } #>
            
        
    


<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.Data"#>
<#@ import namespace="System.Data.SqlClient"#>
<#@ import namespace="System.Text"#>

6) - Generate the package using BIML Express

Once the BIML file has been created it is time to generate the SSIS package by using BIML Express in Visual Studio 2015.
The package can be simply generated by right clicking the BIML package and selecting 'Generate SSIS Packages'.

Using BIML Expres to generate the package

The package has been generated by BIML

Using BIML Expres to generate the package

Summary

In this post we create the following components;

Repository database and one schema
Staging database and one schema
Two meta tables to be used by the BIML script from this post (and possible future posts)
Three Staging tables
A BIML file called 'BIML Load_STG_Tables_From_Microsoft.biml'
A generated SSIS Package named 'SSIS STG Load STG Tables SQL.dtsx'

Saturday 25 February 2017

Azure - Understanding Stream Analytics Data Lake Store Destination

Case

How do you use Azure Data Lake Store to store Stream Analytics data and why would you do that?

Solution

In an earlier post we have set up a Stream Analytics Job where sensor data is send to a Blob file. In this case we create a new Stream Analytics Job with Data Lake Store as output and were we use the sensor data as input. Despite the fact that the Input and Output is different, the query stays the same as other Stream Analytics Jobs. You can find the configuration of the Stream Analytics Job query here.

A Data Lake Store is a scalable repository optimized for storing IoT data, log files and other large datasets for Big Data scenarios. It contains folders and these folders contain the files (data). As said before, you can also store (sensor) data using Blob storage. These two storage options are similar, but there are some important differences, which we will explain later in this post.

Overview Data Lake Store with different types of data

1) Create a Data Lake Store

First we have to create the Data Lake Store. Click on New (+ icon) in your portal and under the Storage category you will find the Data Lake Store. Give it a suitable name. Next we choose the Resource Group which we created earlier by setting up the IoT Hub. The benefit of choosing the same group is that among other things, the rights are the same. At this time there are not many options available in both Location and Pricing, so it's default.

Azure Portal - Create Data Lake Store

2) Create the Stream Analytics Job
Before creating this new job, we already added a new consumer group to our IoT Hub, called 'DataLake'. Using multiple consumer groups makes it possible for several consumer applications to read data from this IoT Hub independently. Click here to see where you can add/manage consumer group(s).

Note:
The Azure Portal is still in development, so adding a new consumer group is now at a different place than in an earlier post. See screenshot below.

Azure Portal - Adding Consumer Groups

Now we can create the job. We choose the same Resource Group as the IoT Hub and Storage account, just like the creation of the Data Lake. Our Location is the Netherlands, so we choose West-Europe.

Azure Portal - Create Stream Analytics Job

3) Configure the Stream Analytics Job
We start with defining the Input. We choose 'Data stream' as Source Type because the sensor data is an ongoing stream and is derived from the IoT Hub. Under Source we choose 'IoT hub'. In our case we have one IoT Hub, but when you have multiple IoT Hubs you can choose one from the list. Now our IoT Hub appears automatically. Next we choose 'datalake' as Consumer group. This is the one we have created earlier. Finally we choose 'JSON' as Event serialization format.

Now we can specify the Output. Give it a suitable name and choose 'Data Lake Store' as Sink. The corresponding (in thise case the one we have created earlier) Account Name will be selected automatically. Next we enter a file path to store the files in our Data Lake Store account. Optionally, you can specify this path with date and time. Now the data will be stored in multiple instances per day and per hour and this makes the storage more clearly. At last you will choose the 'JSON' format.

In this post we only configuring the Input and Output of the job, as mentioned earlier. Now we can run the job with a valid query.

Azure Portal - Configure Stream Analytics Job

Result
After running the Stream Analytics Job, the data is now stored in the Data Lake Store. You can find this in the Azure portal. Open the Data Lake Store and go to Data Explorer. Now you see one folder named 'sensordata'. This folder contains multiple subfolders: year, month, day and hour. Now we have only one file with data in the subfolder 'hour', but each next hour there will be a new file (as long as your Stream Analytics Job is running). This is exactly what we have configured earlier. In each file, the data is stored per 10 seconds. It works!

Azure Portal - Result Data Lake Store

Differences between Blog Storage and Data Lake Store

The first big difference is the size limits. There are no limitations in account/file size or number of files at a Data Lake Store, while Blob storage has such restrictions. In addition Data Lake Store has built-in Hadoop integration. Therefore (along with the unlimited storage) this makes it suitable for storing Big Data and then analyze this data. Another difference is the authentication. Blob storage works with generated storage access keys, while a Data Lake Store use Azure Active Directory for this.

Overall a Data Lake Store has more possibilities then Blob storage and is optimized for Big Data purposes. The general purpose of Blob Storage is storing data in different scenarios like backups or media files (for streaming). The starting prices are lower for Blob storage, but you have different storage prices for your Blob Storage account. This means: you can make it as expensive as you want and in some cases the monthly charges per GB will be higher than a Data Lake Store. Click here for more details about the differences and prices between a Data Lake Store and Blob storage.

Conclusion
Azure Data Lake Store is very useful with Big Data scenarios because it can combine storage (to more then 1 petabyte) with the ability to analyze this data. This can be done with Hadoop (built-in integration) or Azure Data Lake Analytics, which is specially optimized to work with Azure Data Lake Store. From this perspective it offers more than, for example Blob storage.

Tuesday 21 February 2017

Use BIML and csv files to create tables

Case

The case is about importing flat files (CSV’s) without the necessity of metadata. Because BIML always checks if the tables are accessible before creating the packages, the first step is to create the tables with BIML and the second step is to create the SSIS packages for transporting the data.

Creating tables in the database
Packages to fill this database

Because of the size of the solution I’ve created two separate solutions, one for creating the tables and a second for creating the SSSI packages. You can click on the link to go to the other solution (which I will deploy later this month).

Solution - Creating tables with BIML

In this solution, we create a BIML that is going to create the tables and all the columns are defined as strings.
We have to create two BIML scripts, the first script for defining the table definition and the second for creating the actual package.

1) CSV files

For this example we are using two CSV files (age and sickleave) which are comma separated and have columnnames on the first row. These columnnames will be used in the create table statement

the drop map

content csv file

2) Tabledefinitions
The first biml is called “1_TableDefinitions.biml”
In this biml we define the path were the CSV files are located, an array with the names of the csv files and also some string which we going to use further in the code.
We use two “foreach loops”, the first one loops trough the array with files and the second one loops trough the actual file (to extract the column names).

Normally (without the loop) the code should look like this:

< tables>
 <columns>
   <column datatype="Int32" identityincrement="1" identityseed="1" name="AgeID">
   <column datatype="Int32" name="AgeFrom">
   <column name="AgeTo">
   <column datatype="String" length="255" name="AgeCategoryEmployee">
  </columns>
 </columns>
</tables>

Default BIML uses INT as an default datatype, in this case we use a string.

Now we add the loop in place and the complete code looks like this

<Biml xmlns="http://schemas.varigence.com/biml.xsd">

<#  
    string Prefix="Man";
    
    // the locatie of the csv's'
    string path = @"D:\Drop\Man";
    // Put all the filenames with the extension csv in a string array
    string[] myFiles = Directory.GetFiles(path, "*.csv");
    // string that will be filled with the filename
    string filename;
    // string array for columnnames extracted from CSV
    string[] myColumns;
#>
        <Connections>
            <OleDbConnection 
            Name="OLEDB_STG_<#=Prefix#>" 
            ConnectionString="Data Source=APPL43;Initial Catalog=dummy_STG;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;">
            </OleDbConnection>
        </Connections>
        <Databases>
             <Database ConnectionName="OLEDB_STG_<#=Prefix#>" Name="dummy_STG"/>
        </Databases>
       <Schemas>
              <Schema Name="dbo" DatabaseName="dummy_STG" Owner="dbo"/>
       </Schemas>
       <Tables>
            <!-- loop trough the array of files-->
            <#  foreach(string filePath in myFiles) 
                {
               // extract the filename from the path to use as tablename     
               fileName = Path.GetFileNameWithoutExtension(filePath);
            
            #>
            <Table Name="<#=Prefix#>_<#=fileName#>" SchemaName="dummy_STG.dbo">
                <Columns>
                   <!-- loop trough the file looking for the columnnames-->
                    <#
                    // read first row of csv to extract to columnnames
                    // and split on comma to create an array
                    StreamReader myFile = new StreamReader(filePath);

                    myColumns = myFile.ReadLine().Split(',');
                    // close file after reading first line
                    myFile.Close();

                    // Loop through column array
                    foreach(string myColumn in myColumns) 
                    {  
                    #>
                      <Column Name="<#=myColumn#>" DataType="String" Length="255"></Column>
                 <#  }   #>
                </Columns>    
            </Table>
            <# }#>
       </Tables>
</Biml>
 
<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.IO"#>

3) CreateTable

Secondly we are going to create the biml, called 2_CreateTables.biml. which creates the actual package that contains the create statements to generate the tables.
BIML has an method to create SQL tables “RootNode.Tables.First().GetTableSql();”
We use this method to create ‘SQL create statement’ the of table

The code looks like this

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="CreateTables" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <# 
                // Loop trough the table definition os the first biml
                foreach(var table in RootNode.Tables) {#>
                <ExecuteSQL Name="SQL - Drop_Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
                    <DirectInput>
                        <#=table.GetTableSql()#>    
                    </DirectInput>
                </ExecuteSQL>
                <# } #>
            </Tasks>
        </Package>
    </Packages>
</Biml>

<!--Includes/Imports for C#-->
<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.Data"#>
<#@ import namespace="System.Data.SqlClient"#>

We’ve created 2 bimls, 1_TableDefinitions.biml and 2_CreateTables.biml. Now comes the important part (I’m using Biml Express) for generating the package. First we click on 1_TableDefinitions and secondly on and 2_CreateTables, if you have selected the 2 biml scripts you click with your right mouse on 1_TableDefinitions.biml and generate SSIS packages. If you do this otherwise, you will get an empty SSIS package. .

Generate SSIS package

Below you can see the result of your BIML scripts: a package with an execute SQL Task for each table you need to create.

Visual studio

The actual create statement looks like this

SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO
-------------------------------------------------------------------
IF EXISTS (SELECT * from sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Man_Age]') AND type IN (N'U'))
DROP TABLE [dbo].[Man_Age]
GO

CREATE TABLE [dbo].[Man_Age]
(
-- Columns Definition
 [AgeID] nvarchar(255) NOT NULL 
, [AgeFrom] nvarchar(255) NOT NULL 
, [AgeTo] nvarchar(255) NOT NULL 
, [AgeCategoryEmployee] nvarchar(255) NOT NULL 
, [AgeCategoryClient] nvarchar(255) NOT NULL 

-- Constraints

)
ON "default"
WITH (DATA_COMPRESSION = NONE)
GO
-------------------------------------------------------------------

Summary

We created two biml scripts one for creating to table definition and one which creates the actuale packages.The result in Management Studio looks like this.

Management studio result

In the next blog I’m going to explain how to create SSIS packages that transport the data from the csv files

Sunday 29 January 2017

Create and populate Time Dimension

Case
We have sensors in our building that output data every minute which we want to use in a dimensional model. Therefore we need a Time Dimension to aggregate data on parts of the day. How do you create and populate a Time Dimension?

International Time Dimension

Solution
Since creating and populating a Time Dimension is a one-time job, T-SQL is probably the best way to accomplish this. For this example we need a Time Dimension on minute level and therefore we have created a loop that executes a insert query 1440 times (24hours * 60minutes).

Besides the hours and minutes we also want to add a name of the day part in two languages and a couple of time strings in both 12 and 24hour notation to make this dimension usable for various countries.

-- Drop the table if it already exists
IF OBJECT_ID('dbo.DimTime', 'U') IS NOT NULL 
BEGIN
    DROP TABLE dbo.DimTime;
END

-- Then create a new table
CREATE TABLE [dbo].[DimTime](
    [TimeId] [int] IDENTITY(1,1) NOT NULL,
    [Time] [time](0) NULL,
    [Hour] [int] NULL,
    [Minute] [int] NULL,
    [MilitaryHour] int NOT null,
    [MilitaryMinute] int NOT null,
    [AMPM] [varchar](2) NOT NULL,
    [DayPartEN] [varchar](10) NULL,
    [DayPartNL] [varchar](10) NULL,
    [HourFromTo12] [varchar](17) NULL,
    [HourFromTo24] [varchar](13) NULL,
    [Notation12] [varchar](10) NULL,
    [Notation24] [varchar](10) NULL
);

-- Needed if the dimension already existed
-- with other column, otherwise the validation
-- of the insert could fail.
GO

-- Create a time and a counter variable for the loop
DECLARE @Time as time;
SET @Time = '0:00';

DECLARE @counter as int;
SET @counter = 0;


-- Two variables to store the day part for two languages
DECLARE @daypartEN as varchar(20);
set @daypartEN = '';

DECLARE @daypartNL as varchar(20);
SET @daypartNL = '';


-- Loop 1440 times (24hours * 60minutes)
WHILE @counter < 1440
BEGIN

    -- Determine datepart
    SELECT  @daypartEN = CASE
                         WHEN (@Time >= '0:00' and @Time < '6:00') THEN 'Night'
                         WHEN (@Time >= '6:00' and @Time < '12:00') THEN 'Morning'
                         WHEN (@Time >= '12:00' and @Time < '18:00') THEN 'Afternoon'
                         ELSE 'Evening'
                         END
    ,       @daypartNL = CASE
                         WHEN (@Time >= '0:00' and @Time < '6:00') THEN 'Nacht'
                         WHEN (@Time >= '6:00' and @Time < '12:00') THEN 'Ochtend'
                         WHEN (@Time >= '12:00' and @Time < '18:00') THEN 'Middag'
                         ELSE 'Avond'
                         END;

    INSERT INTO DimTime ([Time]
                       , [Hour]
                       , [Minute]
                       , [MilitaryHour]
                       , [MilitaryMinute]
                       , [AMPM]
                       , [DayPartEN]
                       , [DayPartNL]
                       , [HourFromTo12]
                       , [HourFromTo24]
                       , [Notation12]
                       , [Notation24])
                VALUES (@Time
                       , DATEPART(Hour, @Time) + 1
                       , DATEPART(Minute, @Time) + 1
                       , DATEPART(Hour, @Time)
                       , DATEPART(Minute, @Time)
                       , CASE WHEN (DATEPART(Hour, @Time) < 12) THEN 'AM' ELSE 'PM' END
                       , @daypartEN
                       , @daypartNL
                       , CONVERT(varchar(10), DATEADD(Minute, -DATEPART(Minute,@Time), @Time),100)  + ' - ' + CONVERT(varchar(10), DATEADD(Hour, 1, DATEADD(Minute, -DATEPART(Minute,@Time), @Time)),100)
                       , CAST(DATEADD(Minute, -DATEPART(Minute,@Time), @Time) as varchar(5)) + ' - ' + CAST(DATEADD(Hour, 1, DATEADD(Minute, -DATEPART(Minute,@Time), @Time)) as varchar(5))
                       , CONVERT(varchar(10), @Time,100)
                       , CAST(@Time as varchar(5))
                       );

    -- Raise time with one minute
    SET @Time = DATEADD(minute, 1, @Time);

    -- Raise counter by one
    set @counter = @counter + 1;
END

Below a very basic example of the usage of this Time Dimension. The LightsAttrium is 0 or 1 indicating whether the lights in the atrium are on or off. By using the average of that during a daypart (night, moning, etc) it shows the percentage of the lights being on. At night it's almost always on and during the afternoon it's off most of the time.

Basic example of usage.

Conclusion
Above a very complete, but yet still basic, example of a Time Dimension. You can add or remove columns for your own specific purposes (let us know the additions in the comments below). Extending this to for example seconds is very easy. Just change the 1440 in the WHILE to 86400 (24*60*60) and change the DATEADD at the bottom to "second" instead of "minute".

Thursday 5 January 2017

Azure Snack - Grant Access to your Azure SQL DB

Case

You want to give an user permission to connect to your database in Azure through SQL Server Management Studio. How do you do this?

Solution

In SQL Server Management Studio (SSMS) with an on-premise database you can do a lot with a GUI, like creating an user and giving this user the right permissions. In this case we use SQL Server Authentication. With a database in Azure you have to write code in Transact-SQL for this, instead of using GUI. Of course you can also use T-SQL in combination with an on-premise database, but for the convenience I use a GUI. Unfortunately, this GUI is not supported in an Azure database. Just like 'IntelliSense' for example.

When you connect to your database server and you want to create a new login, you will see two different windows: a GUI within an on-premise database and a new query window in an Azure database. We will continue with query's, which needs to be run in a particular order.

SSMS - SQL Server on-premise versus Azure SQL Server

1) Create new login
First we need to create a new login to grant access to the database server. Therefore you have to be administrator on the database server. Click on the Security folder and then on Logins to create a 'New Login...'.

Then we edit the generated query to this:

CREATE LOGIN [RobertSmith] 
WITH PASSWORD = 'Abcdefg123!' 
GO

Note:
The password must include letters, numbers, symbols and have a minimum length.

2) Create user
When we have created the login, we can link this login to a specific database (user). Click on the Security folder in the specific database and then on Users to create a 'New User...'. The engine also generates a code at this point.

We change the generated query to:

USE [sensory]
GO

CREATE USER [RobertSmith]
FOR LOGIN [RobertSmith]
WITH DEFAULT_SCHEMA = [dbo]
GO

3) Permissions
Once the user has been created, you can further specify the authorization . For example, read access to one of more schemas. This can be done by the following query:

USE [sensory]
GO

GRANT select ON SCHEMA :: [analyse] TO [RobertSmith] 
GO

Now the user can login with SQL Authentication with the created credentials. Important is to select the right database, otherwise SSMS will automatically make connection to the master database. In this case we gave permissions to a specific database, so the attempt will fail.

SSMS - Select the database before connecting to server

Conclusion
This isn't very difficult to understand for anybody with some experience within SSMS, but this is not a daily action and in that scenario there is a chance that you do not have the query's ready. Anyway, for now you have to do it with T-SQL in Azure, but perhaps in the future also with an GUI.