Microsoft BI Tools

Tuesday, 28 March 2017

SSRS Snack: Excel Rendering Extension Error

Case
I am trying to export my SSRS report to Excel, but I'm getting an error. How Can I solve it?

Excel Rendering Extension: Unknown image format image/x-png

Solution
You are using one or more PNG pictures in your reports and SSRS doesn't know how to render these because they have an unknown mime format: image/x-png

1) Solution Explorer
Go to your SSRS project in Visual Studio and search for PNG images in your Solution Explorer.

Solution Explorer

2) Properties
Now go to the properties of this image file (F4) and locate the property MIME Type.

image/x-png

3) Change MIME Type
Change the MIME Type from image/x-png to image/png and repeat this for all PNG images.

image/png

4) Deploy and test
Now deploy your change project and reopen the report to test the excel export

Export to Excel

Thursday, 23 March 2017

Loading tables by using BIML and meta architecture

Case

How can we simplify the process of loading database tables and reduce the time needed to create SSIS packages.

Solution

There are several steps that need to be taken prior to creating such a solution.
These steps are:

Create the databases 'Repository' and 'Staging' and required schema's
Create a Meta table called 'Layer' and a table called 'TableList' in the repository database
Create the Sales tables in the Sales database
Fill the Meta tables with the required meta data
Create a BIML script that will create the Extract and Load SSIS package
Generate the package using BIML Express in Visual Studio 2015 to create the SSIS package

For this solution the following prerequisites need to be met;

SQL Server will be used as source and destination platform
The Adventure Works 2014 database will be used as source
The selected tables from the Adventure Works database are present in the destination database and all required schema’s and specific datatypes, if applicable (we will be using a few tables from the Person schema that do not use custom datatypes)
Visual Studio 2015 is installed (Community/Professional/Enterprise)
BIML Express is installed

1) - Create the databases and schema's

In this step the databases 'Repository' and Staging are created and the required schema's.

--Create the databases Repository and Staging and required schema's
CREATE DATABASE [Repository] CONTAINMENT = NONE ON  PRIMARY 
( NAME = N'Repository', FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\Repository.mdf' , SIZE = 7168KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
 LOG ON 
( NAME = N'Repository_log', FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\Repository_log.ldf' , SIZE = 5184KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO
ALTER DATABASE [Repository] SET RECOVERY SIMPLE;
GO
USE Repository
go
CREATE SCHEMA rep
GO
CREATE DATABASE [Staging] CONTAINMENT = NONE ON  PRIMARY 
( NAME = N'Staging', FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\Staging.mdf' , SIZE = 7168KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
 LOG ON 
( NAME = N'Staging_log', FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\Staging_log.ldf' , SIZE = 5184KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO
ALTER DATABASE [Staging] SET RECOVERY SIMPLE;
GO
USE Staging
go
CREATE SCHEMA Sales
GO

2) - Create the Meta tables

During this step we will be creating the meta tables 'Layer' and 'TableList'. The first table will contain the id, name and prefix of the layers present in the Datawarehouse environment. In this blog the only entry present is the Staging area id and name. Normally this table also holds the name of for example the Datawarehouse and Datamart layer or any other layers present in a Business Intelligence environment.
The column 'LayerId' is used in the other table named 'TableList' and is used to make the distinction between the table names present in the respective layer. In this blog there will only be Staging Area tables described.
The table 'TableList' contains the following columns;

LayerId - The id of the layer the table belongs to
TableName - The name of the table
SchemaName - The name of the schema of the table
TableType - The type of the table (fe. user table)
LoadOrderNr - The order in which the tables are loaded (or created by other processes)
WhereClause - Any optional where clause that is used during the load proces (Default this column must be set to '1 = 1')
ActiveInd - Indicates if the table is active or inactive during the BIML creation proces
InsertDate - The date when the table entry was inserted in the 'TableList' table

Note: Some of the above columns are not or less applicable to the BIML script in this post, but they are used in other generic scripts used to create and load datawarehouse packages (more about this in future posts)

--Create the meta tables
USE [Repository]
GO
IF OBJECT_ID('[rep].[Layer]', 'U') IS NOT NULL
BEGIN
 DROP TABLE [rep].[Layer]
END
GO
CREATE TABLE [rep].[Layer](
 [LayerId] [smallint] NOT NULL,
 [LayerName] [nvarchar](50) NOT NULL,
 [LayerPrefix] [nvarchar](10) NOT NULL,
) ON [PRIMARY]
GO
IF OBJECT_ID('[rep].[TableList]', 'U') IS NOT NULL
BEGIN
 DROP TABLE [rep].[TableList]
END
GO
CREATE TABLE [rep].[TableList](
 [LayerId] [smallint] NULL,
 [TableName] [nvarchar](100) NULL,
 [SchemaName] [nvarchar](100) NULL,
 [ServerNr] [smallint] NULL,
 [TableType] [nvarchar](100) NULL,
 [LoadOrderNr] [int] NULL,
 [WhereClause] [nvarchar](250) NULL,
 [PrimaryKey] [nvarchar](250) NULL,
 [ActiveInd] [nchar](1) NULL,
 [InsertDate] [datetime] NULL
) ON [PRIMARY]
GO

3) - Create the Sales tables

During this step we will be creating the Sales tables in the target database Staging.

--Create the Sales tables
USE Staging
GO
CREATE TABLE [Sales].[ShoppingCartItem](
 [ShoppingCartItemID] [int] IDENTITY(1,1) NOT NULL,
 [ShoppingCartID] [nvarchar](50) NOT NULL,
 [Quantity] [int] NOT NULL,
 [ProductID] [int] NOT NULL,
 [DateCreated] [datetime] NOT NULL,
 [ModifiedDate] [datetime] NOT NULL,
) ON [PRIMARY]
GO
CREATE TABLE [Sales].[SpecialOffer](
 [SpecialOfferID] [int] IDENTITY(1,1) NOT NULL,
 [Description] [nvarchar](255) NOT NULL,
 [DiscountPct] [smallmoney] NOT NULL,
 [Type] [nvarchar](50) NOT NULL,
 [Category] [nvarchar](50) NOT NULL,
 [StartDate] [datetime] NOT NULL,
 [EndDate] [datetime] NOT NULL,
 [MinQty] [int] NOT NULL,
 [MaxQty] [int] NULL,
 [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
 [ModifiedDate] [datetime] NOT NULL,
) ON [PRIMARY]
GO
CREATE TABLE [Sales].[SpecialOfferProduct](
 [SpecialOfferID] [int] NOT NULL,
 [ProductID] [int] NOT NULL,
 [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
 [ModifiedDate] [datetime] NOT NULL,
) ON [PRIMARY]
GO

4) - Fill the meta tables with the required meta data

After creating the database and metadata tables, they need to be filled with the meta data that will be used by the BIML script in the next step ('BIML Load_STG_Tables_From_Microsoft.biml').

The script provided below inserts the layer information used in this blog and the table meta information of those tables for which the SSIS load proces will be created.

If you want to test the Where Clause functionality you can replace the value '1 = 1' with '1 = 1 AND ShoppingCartItemID = 2' in the column 'WhereClause' in the table 'TableList' for the tablename 'sales.ShoppingCartItem'. This will place a filter on the table.

The BIML script will use the meta table information to create one SSIS package with the name 'SSIS STG Load STG Tables SQL.dtsx'

--Insert the meta information in the meta tables
USE [Repository]
GO
TRUNCATE TABLE [rep].[Layer];
TRUNCATE TABLE [rep].[TableList];
INSERT [rep].[Layer] ([LayerId], [LayerName], [LayerPrefix]) VALUES (1, N'staging', N'stg');
INSERT [rep].[TableList] ([LayerId], [TableName], [SchemaName], [TableType], [LoadOrderNr], [WhereClause], [ActiveInd], [InsertDate]) VALUES (1, N'SpecialOffer', N'Sales', N'user_table', 1, N'1 = 1', N'1', CAST(GETDATE() AS DateTime));
INSERT [rep].[TableList] ([LayerId], [TableName], [SchemaName], [TableType], [LoadOrderNr], [WhereClause], [ActiveInd], [InsertDate]) VALUES (1, N'SpecialOfferProduct', N'Sales', N'user_table', 1, N'1 = 1', N'1', CAST(GETDATE() AS DateTime));
INSERT [rep].[TableList] ([LayerId], [TableName], [SchemaName], [TableType], [LoadOrderNr], [WhereClause], [ActiveInd], [InsertDate]) VALUES (1, N'ShoppingCartItem', N'Sales', N'user_table', 1, N'1 = 1', N'1', CAST(GETDATE() AS DateTime));
GO

5) - Create the BIML script

Once the previous steps have been executed it is time to create the BIML script. The BIML script starts with declaring the information needed to create the connection strings to the different database and the server(s) where they recide on. For this example all the databases are SQL Server 2016 databases. It would also be possible to store that information in a meta table but for this post the information is placed inside the BIML script. The BIML script will create one package with the name 'SSIS STG Load STG Tables SQL' and for each table in the 'TableList' table a sequence container will be created with two SSIS components. The first component is a SQL Task component that will use a T-SQL command to truncate the target table. The second component is a Data Flow Task containing a Source and Destination component which will load the data from the target to the source table. Alle the sequence components are executed parallel to each other.

--The BIML code that can be placed inside a BIML file.

    
    <# 
        string pRepServerName    = "localhost"; 
        string pRepDatabaseName  = "Repository";
        string pRepProvider      = "SQLNCLI11.1;Integrated Security=SSPI";
        string pRepSchema        = "rep";
      
        string pSourceServerName = "localhost";
        string pSourceDBName     = "AdventureWorks2014";
        string pSourceProvider   = "SQLNCLI11.1;Integrated Security=SSPI";
        string pSourceSchema     = "Sales";
        
        string pTargetServerName = "localhost";
        string pTargetDBName     = "Staging";
        string pTargetProvider   = "SQLNCLI11.1;Integrated Security=SSPI";
        string pTargetSchema     = "Sales";
    #>
  
    
    <#
        string pLayer            = "Staging";
     #>
    
    <#
    string csRepository = String.Format("Data Source={0};Initial Catalog={1};Provider={2};Auto Translate=False;"
          ,pRepServerName, pRepDatabaseName, pRepProvider);
   
    string csSource = String.Format("Data Source={0};Initial Catalog={1};Provider={2};Auto Translate=False;"
          ,pSourceServerName, pSourceDBName, pSourceProvider);
    
    string csTarget = String.Format("Data Source={0};Initial Catalog={1};Provider={2};Auto Translate=False;"
          ,pTargetServerName, pTargetDBName, pTargetProvider);      
    #>
    
    
    
        
        
    

    
        
            
                <#
                  StringBuilder sMETAGetTableName = new System.Text.StringBuilder();
                  
                  sMETAGetTableName.Append("SELECT ");
                  sMETAGetTableName.Append("    TableName ");
                  sMETAGetTableName.Append("  , SchemaName ");
                  sMETAGetTableName.Append("  , WhereClause ");
                  sMETAGetTableName.Append("FROM ");
                  sMETAGetTableName.Append(pRepSchema);
                  sMETAGetTableName.Append(".TableList AS TAB ");
                  sMETAGetTableName.Append("INNER JOIN ");
                  sMETAGetTableName.Append(pRepSchema);
                  sMETAGetTableName.Append(".Layer AS LYR ");
                  sMETAGetTableName.Append("  ON ( TAB.LayerId = LYR.LayerId) ");
                  sMETAGetTableName.Append("WHERE 1 = 1 ");
                  sMETAGetTableName.Append(" AND TAB.ActiveInd = 1 ");
                  sMETAGetTableName.Append(" AND LYR.LayerName = '");
                  sMETAGetTableName.Append(pLayer);
                  sMETAGetTableName.Append("' ");
                  sMETAGetTableName.Append("ORDER BY ");
                  sMETAGetTableName.Append("   TAB.LoadOrderNr");
                  
                  DataTable tblMETATableNames = ExternalDataAccess.GetDataTable(csRepository, sMETAGetTableName.ToString());
                  foreach (DataRow METATableNameRow in tblMETATableNames.Rows) {
                #>
                 <#=METATableNameRow["TableName"] #>" ConstraintMode="Linear">
                    
                        "
                                    ResultSet="None"
                                    ConnectionName="OLEDB Target">
                            TRUNCATE TABLE <#=pTargetSchema #>.<#=METATableNameRow["TableName"] #> 
                            
                        
                        ">
                            
                                "
                                             ConnectionName="OLEDB Source"
                                             ValidateExternalMetadata="false">
                                    SELECT
        CAST(1 AS INTEGER) AS DUMMY_COLUMN
<#                                          StringBuilder sGETSelectColumn = new System.Text.StringBuilder();
                                            sGETSelectColumn.Append("SELECT " );
                                            sGETSelectColumn.Append("     col.name AS column_name " );
                                            sGETSelectColumn.Append("FROM sys.columns AS col " );
                                            sGETSelectColumn.Append("INNER JOIN sys.objects AS obj " );
                                            sGETSelectColumn.Append("    ON(col.object_id = obj.object_id) " );
                                            sGETSelectColumn.Append("INNER JOIN sys.types AS typ " );
                                            sGETSelectColumn.Append("    ON(col.user_type_id = typ.user_type_id)" );
                                            sGETSelectColumn.Append("WHERE   1 = 1 " );
                                            sGETSelectColumn.Append("   AND obj.name = '"+ METATableNameRow[0].ToString() +"'");
                                            DataTable tblSelectColumn = ExternalDataAccess.GetDataTable(csSource, sGETSelectColumn.ToString());
                                            foreach (DataRow SelectColumn in tblSelectColumn.Rows) {
#>
        , [<#=SelectColumn["COLUMN_NAME"] #>]
<# } #>FROM <#=METATableNameRow["SchemaName"] #>.<#=METATableNameRow["TableName"] #>
WHERE <#=METATableNameRow["WhereClause"] #>
                                    
                                
                                
                                " 
                                                  ConnectionName="OLEDB Target">
                                    " />
                                
                            
                        
                    
                
                
                <# } #>
            
        
    


<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.Data"#>
<#@ import namespace="System.Data.SqlClient"#>
<#@ import namespace="System.Text"#>

6) - Generate the package using BIML Express

Once the BIML file has been created it is time to generate the SSIS package by using BIML Express in Visual Studio 2015.
The package can be simply generated by right clicking the BIML package and selecting 'Generate SSIS Packages'.

Using BIML Expres to generate the package

The package has been generated by BIML

Using BIML Expres to generate the package

Summary

In this post we create the following components;

Repository database and one schema
Staging database and one schema
Two meta tables to be used by the BIML script from this post (and possible future posts)
Three Staging tables
A BIML file called 'BIML Load_STG_Tables_From_Microsoft.biml'
A generated SSIS Package named 'SSIS STG Load STG Tables SQL.dtsx'

Saturday, 25 February 2017

Azure - Understanding Stream Analytics Data Lake Store Destination

Case

How do you use Azure Data Lake Store to store Stream Analytics data and why would you do that?

Solution

In an earlier post we have set up a Stream Analytics Job where sensor data is send to a Blob file. In this case we create a new Stream Analytics Job with Data Lake Store as output and were we use the sensor data as input. Despite the fact that the Input and Output is different, the query stays the same as other Stream Analytics Jobs. You can find the configuration of the Stream Analytics Job query here.

A Data Lake Store is a scalable repository optimized for storing IoT data, log files and other large datasets for Big Data scenarios. It contains folders and these folders contain the files (data). As said before, you can also store (sensor) data using Blob storage. These two storage options are similar, but there are some important differences, which we will explain later in this post.

Overview Data Lake Store with different types of data

1) Create a Data Lake Store

First we have to create the Data Lake Store. Click on New (+ icon) in your portal and under the Storage category you will find the Data Lake Store. Give it a suitable name. Next we choose the Resource Group which we created earlier by setting up the IoT Hub. The benefit of choosing the same group is that among other things, the rights are the same. At this time there are not many options available in both Location and Pricing, so it's default.

Azure Portal - Create Data Lake Store

2) Create the Stream Analytics Job
Before creating this new job, we already added a new consumer group to our IoT Hub, called 'DataLake'. Using multiple consumer groups makes it possible for several consumer applications to read data from this IoT Hub independently. Click here to see where you can add/manage consumer group(s).

Note:
The Azure Portal is still in development, so adding a new consumer group is now at a different place than in an earlier post. See screenshot below.

Azure Portal - Adding Consumer Groups

Now we can create the job. We choose the same Resource Group as the IoT Hub and Storage account, just like the creation of the Data Lake. Our Location is the Netherlands, so we choose West-Europe.

Azure Portal - Create Stream Analytics Job

3) Configure the Stream Analytics Job
We start with defining the Input. We choose 'Data stream' as Source Type because the sensor data is an ongoing stream and is derived from the IoT Hub. Under Source we choose 'IoT hub'. In our case we have one IoT Hub, but when you have multiple IoT Hubs you can choose one from the list. Now our IoT Hub appears automatically. Next we choose 'datalake' as Consumer group. This is the one we have created earlier. Finally we choose 'JSON' as Event serialization format.

Now we can specify the Output. Give it a suitable name and choose 'Data Lake Store' as Sink. The corresponding (in thise case the one we have created earlier) Account Name will be selected automatically. Next we enter a file path to store the files in our Data Lake Store account. Optionally, you can specify this path with date and time. Now the data will be stored in multiple instances per day and per hour and this makes the storage more clearly. At last you will choose the 'JSON' format.

In this post we only configuring the Input and Output of the job, as mentioned earlier. Now we can run the job with a valid query.

Azure Portal - Configure Stream Analytics Job

Result
After running the Stream Analytics Job, the data is now stored in the Data Lake Store. You can find this in the Azure portal. Open the Data Lake Store and go to Data Explorer. Now you see one folder named 'sensordata'. This folder contains multiple subfolders: year, month, day and hour. Now we have only one file with data in the subfolder 'hour', but each next hour there will be a new file (as long as your Stream Analytics Job is running). This is exactly what we have configured earlier. In each file, the data is stored per 10 seconds. It works!

Azure Portal - Result Data Lake Store

Differences between Blog Storage and Data Lake Store

The first big difference is the size limits. There are no limitations in account/file size or number of files at a Data Lake Store, while Blob storage has such restrictions. In addition Data Lake Store has built-in Hadoop integration. Therefore (along with the unlimited storage) this makes it suitable for storing Big Data and then analyze this data. Another difference is the authentication. Blob storage works with generated storage access keys, while a Data Lake Store use Azure Active Directory for this.

Overall a Data Lake Store has more possibilities then Blob storage and is optimized for Big Data purposes. The general purpose of Blob Storage is storing data in different scenarios like backups or media files (for streaming). The starting prices are lower for Blob storage, but you have different storage prices for your Blob Storage account. This means: you can make it as expensive as you want and in some cases the monthly charges per GB will be higher than a Data Lake Store. Click here for more details about the differences and prices between a Data Lake Store and Blob storage.

Conclusion
Azure Data Lake Store is very useful with Big Data scenarios because it can combine storage (to more then 1 petabyte) with the ability to analyze this data. This can be done with Hadoop (built-in integration) or Azure Data Lake Analytics, which is specially optimized to work with Azure Data Lake Store. From this perspective it offers more than, for example Blob storage.

Tuesday, 21 February 2017

Use BIML and csv files to create tables

Case

The case is about importing flat files (CSV’s) without the necessity of metadata. Because BIML always checks if the tables are accessible before creating the packages, the first step is to create the tables with BIML and the second step is to create the SSIS packages for transporting the data.

Creating tables in the database
Packages to fill this database

Because of the size of the solution I’ve created two separate solutions, one for creating the tables and a second for creating the SSSI packages. You can click on the link to go to the other solution (which I will deploy later this month).

Solution - Creating tables with BIML

In this solution, we create a BIML that is going to create the tables and all the columns are defined as strings.
We have to create two BIML scripts, the first script for defining the table definition and the second for creating the actual package.

1) CSV files

For this example we are using two CSV files (age and sickleave) which are comma separated and have columnnames on the first row. These columnnames will be used in the create table statement

the drop map

content csv file

2) Tabledefinitions
The first biml is called “1_TableDefinitions.biml”
In this biml we define the path were the CSV files are located, an array with the names of the csv files and also some string which we going to use further in the code.
We use two “foreach loops”, the first one loops trough the array with files and the second one loops trough the actual file (to extract the column names).

Normally (without the loop) the code should look like this:

< tables>
 <columns>
   <column datatype="Int32" identityincrement="1" identityseed="1" name="AgeID">
   <column datatype="Int32" name="AgeFrom">
   <column name="AgeTo">
   <column datatype="String" length="255" name="AgeCategoryEmployee">
  </columns>
 </columns>
</tables>

Default BIML uses INT as an default datatype, in this case we use a string.

Now we add the loop in place and the complete code looks like this

<Biml xmlns="http://schemas.varigence.com/biml.xsd">

<#  
    string Prefix="Man";
    
    // the locatie of the csv's'
    string path = @"D:\Drop\Man";
    // Put all the filenames with the extension csv in a string array
    string[] myFiles = Directory.GetFiles(path, "*.csv");
    // string that will be filled with the filename
    string filename;
    // string array for columnnames extracted from CSV
    string[] myColumns;
#>
        <Connections>
            <OleDbConnection 
            Name="OLEDB_STG_<#=Prefix#>" 
            ConnectionString="Data Source=APPL43;Initial Catalog=dummy_STG;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;">
            </OleDbConnection>
        </Connections>
        <Databases>
             <Database ConnectionName="OLEDB_STG_<#=Prefix#>" Name="dummy_STG"/>
        </Databases>
       <Schemas>
              <Schema Name="dbo" DatabaseName="dummy_STG" Owner="dbo"/>
       </Schemas>
       <Tables>
            <!-- loop trough the array of files-->
            <#  foreach(string filePath in myFiles) 
                {
               // extract the filename from the path to use as tablename     
               fileName = Path.GetFileNameWithoutExtension(filePath);
            
            #>
            <Table Name="<#=Prefix#>_<#=fileName#>" SchemaName="dummy_STG.dbo">
                <Columns>
                   <!-- loop trough the file looking for the columnnames-->
                    <#
                    // read first row of csv to extract to columnnames
                    // and split on comma to create an array
                    StreamReader myFile = new StreamReader(filePath);

                    myColumns = myFile.ReadLine().Split(',');
                    // close file after reading first line
                    myFile.Close();

                    // Loop through column array
                    foreach(string myColumn in myColumns) 
                    {  
                    #>
                      <Column Name="<#=myColumn#>" DataType="String" Length="255"></Column>
                 <#  }   #>
                </Columns>    
            </Table>
            <# }#>
       </Tables>
</Biml>
 
<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.IO"#>

3) CreateTable

Secondly we are going to create the biml, called 2_CreateTables.biml. which creates the actual package that contains the create statements to generate the tables.
BIML has an method to create SQL tables “RootNode.Tables.First().GetTableSql();”
We use this method to create ‘SQL create statement’ the of table

The code looks like this

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="CreateTables" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <# 
                // Loop trough the table definition os the first biml
                foreach(var table in RootNode.Tables) {#>
                <ExecuteSQL Name="SQL - Drop_Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
                    <DirectInput>
                        <#=table.GetTableSql()#>    
                    </DirectInput>
                </ExecuteSQL>
                <# } #>
            </Tasks>
        </Package>
    </Packages>
</Biml>

<!--Includes/Imports for C#-->
<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.Data"#>
<#@ import namespace="System.Data.SqlClient"#>

We’ve created 2 bimls, 1_TableDefinitions.biml and 2_CreateTables.biml. Now comes the important part (I’m using Biml Express) for generating the package. First we click on 1_TableDefinitions and secondly on and 2_CreateTables, if you have selected the 2 biml scripts you click with your right mouse on 1_TableDefinitions.biml and generate SSIS packages. If you do this otherwise, you will get an empty SSIS package. .

Generate SSIS package

Below you can see the result of your BIML scripts: a package with an execute SQL Task for each table you need to create.

Visual studio

The actual create statement looks like this

SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO
-------------------------------------------------------------------
IF EXISTS (SELECT * from sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Man_Age]') AND type IN (N'U'))
DROP TABLE [dbo].[Man_Age]
GO

CREATE TABLE [dbo].[Man_Age]
(
-- Columns Definition
 [AgeID] nvarchar(255) NOT NULL 
, [AgeFrom] nvarchar(255) NOT NULL 
, [AgeTo] nvarchar(255) NOT NULL 
, [AgeCategoryEmployee] nvarchar(255) NOT NULL 
, [AgeCategoryClient] nvarchar(255) NOT NULL 

-- Constraints

)
ON "default"
WITH (DATA_COMPRESSION = NONE)
GO
-------------------------------------------------------------------

Summary

We created two biml scripts one for creating to table definition and one which creates the actuale packages.The result in Management Studio looks like this.

Management studio result

In the next blog I’m going to explain how to create SSIS packages that transport the data from the csv files

Sunday, 29 January 2017

Create and populate Time Dimension

Case
We have sensors in our building that output data every minute which we want to use in a dimensional model. Therefore we need a Time Dimension to aggregate data on parts of the day. How do you create and populate a Time Dimension?

International Time Dimension

Solution
Since creating and populating a Time Dimension is a one-time job, T-SQL is probably the best way to accomplish this. For this example we need a Time Dimension on minute level and therefore we have created a loop that executes a insert query 1440 times (24hours * 60minutes).

Besides the hours and minutes we also want to add a name of the day part in two languages and a couple of time strings in both 12 and 24hour notation to make this dimension usable for various countries.

-- Drop the table if it already exists
IF OBJECT_ID('dbo.DimTime', 'U') IS NOT NULL 
BEGIN
    DROP TABLE dbo.DimTime;
END

-- Then create a new table
CREATE TABLE [dbo].[DimTime](
    [TimeId] [int] IDENTITY(1,1) NOT NULL,
    [Time] [time](0) NULL,
    [Hour] [int] NULL,
    [Minute] [int] NULL,
    [MilitaryHour] int NOT null,
    [MilitaryMinute] int NOT null,
    [AMPM] [varchar](2) NOT NULL,
    [DayPartEN] [varchar](10) NULL,
    [DayPartNL] [varchar](10) NULL,
    [HourFromTo12] [varchar](17) NULL,
    [HourFromTo24] [varchar](13) NULL,
    [Notation12] [varchar](10) NULL,
    [Notation24] [varchar](10) NULL
);

-- Needed if the dimension already existed
-- with other column, otherwise the validation
-- of the insert could fail.
GO

-- Create a time and a counter variable for the loop
DECLARE @Time as time;
SET @Time = '0:00';

DECLARE @counter as int;
SET @counter = 0;


-- Two variables to store the day part for two languages
DECLARE @daypartEN as varchar(20);
set @daypartEN = '';

DECLARE @daypartNL as varchar(20);
SET @daypartNL = '';


-- Loop 1440 times (24hours * 60minutes)
WHILE @counter < 1440
BEGIN

    -- Determine datepart
    SELECT  @daypartEN = CASE
                         WHEN (@Time >= '0:00' and @Time < '6:00') THEN 'Night'
                         WHEN (@Time >= '6:00' and @Time < '12:00') THEN 'Morning'
                         WHEN (@Time >= '12:00' and @Time < '18:00') THEN 'Afternoon'
                         ELSE 'Evening'
                         END
    ,       @daypartNL = CASE
                         WHEN (@Time >= '0:00' and @Time < '6:00') THEN 'Nacht'
                         WHEN (@Time >= '6:00' and @Time < '12:00') THEN 'Ochtend'
                         WHEN (@Time >= '12:00' and @Time < '18:00') THEN 'Middag'
                         ELSE 'Avond'
                         END;

    INSERT INTO DimTime ([Time]
                       , [Hour]
                       , [Minute]
                       , [MilitaryHour]
                       , [MilitaryMinute]
                       , [AMPM]
                       , [DayPartEN]
                       , [DayPartNL]
                       , [HourFromTo12]
                       , [HourFromTo24]
                       , [Notation12]
                       , [Notation24])
                VALUES (@Time
                       , DATEPART(Hour, @Time) + 1
                       , DATEPART(Minute, @Time) + 1
                       , DATEPART(Hour, @Time)
                       , DATEPART(Minute, @Time)
                       , CASE WHEN (DATEPART(Hour, @Time) < 12) THEN 'AM' ELSE 'PM' END
                       , @daypartEN
                       , @daypartNL
                       , CONVERT(varchar(10), DATEADD(Minute, -DATEPART(Minute,@Time), @Time),100)  + ' - ' + CONVERT(varchar(10), DATEADD(Hour, 1, DATEADD(Minute, -DATEPART(Minute,@Time), @Time)),100)
                       , CAST(DATEADD(Minute, -DATEPART(Minute,@Time), @Time) as varchar(5)) + ' - ' + CAST(DATEADD(Hour, 1, DATEADD(Minute, -DATEPART(Minute,@Time), @Time)) as varchar(5))
                       , CONVERT(varchar(10), @Time,100)
                       , CAST(@Time as varchar(5))
                       );

    -- Raise time with one minute
    SET @Time = DATEADD(minute, 1, @Time);

    -- Raise counter by one
    set @counter = @counter + 1;
END

Below a very basic example of the usage of this Time Dimension. The LightsAttrium is 0 or 1 indicating whether the lights in the atrium are on or off. By using the average of that during a daypart (night, moning, etc) it shows the percentage of the lights being on. At night it's almost always on and during the afternoon it's off most of the time.

Basic example of usage.

Conclusion
Above a very complete, but yet still basic, example of a Time Dimension. You can add or remove columns for your own specific purposes (let us know the additions in the comments below). Extending this to for example seconds is very easy. Just change the 1440 in the WHILE to 86400 (24*60*60) and change the DATEADD at the bottom to "second" instead of "minute".

Thursday, 5 January 2017

Azure Snack - Grant Access to your Azure SQL DB

Case

You want to give an user permission to connect to your database in Azure through SQL Server Management Studio. How do you do this?

Solution

In SQL Server Management Studio (SSMS) with an on-premise database you can do a lot with a GUI, like creating an user and giving this user the right permissions. In this case we use SQL Server Authentication. With a database in Azure you have to write code in Transact-SQL for this, instead of using GUI. Of course you can also use T-SQL in combination with an on-premise database, but for the convenience I use a GUI. Unfortunately, this GUI is not supported in an Azure database. Just like 'IntelliSense' for example.

When you connect to your database server and you want to create a new login, you will see two different windows: a GUI within an on-premise database and a new query window in an Azure database. We will continue with query's, which needs to be run in a particular order.

SSMS - SQL Server on-premise versus Azure SQL Server

1) Create new login
First we need to create a new login to grant access to the database server. Therefore you have to be administrator on the database server. Click on the Security folder and then on Logins to create a 'New Login...'.

Then we edit the generated query to this:

CREATE LOGIN [RobertSmith] 
WITH PASSWORD = 'Abcdefg123!' 
GO

Note:
The password must include letters, numbers, symbols and have a minimum length.

2) Create user
When we have created the login, we can link this login to a specific database (user). Click on the Security folder in the specific database and then on Users to create a 'New User...'. The engine also generates a code at this point.

We change the generated query to:

USE [sensory]
GO

CREATE USER [RobertSmith]
FOR LOGIN [RobertSmith]
WITH DEFAULT_SCHEMA = [dbo]
GO

3) Permissions
Once the user has been created, you can further specify the authorization . For example, read access to one of more schemas. This can be done by the following query:

USE [sensory]
GO

GRANT select ON SCHEMA :: [analyse] TO [RobertSmith] 
GO

Now the user can login with SQL Authentication with the created credentials. Important is to select the right database, otherwise SSMS will automatically make connection to the master database. In this case we gave permissions to a specific database, so the attempt will fail.

SSMS - Select the database before connecting to server

Conclusion
This isn't very difficult to understand for anybody with some experience within SSMS, but this is not a daily action and in that scenario there is a chance that you do not have the query's ready. Anyway, for now you have to do it with T-SQL in Azure, but perhaps in the future also with an GUI.

Thursday, 15 December 2016

Azure Event Hub vs IoT Hub

Case
During our journey we noticed that in our team there is some confussion about the differences between an Event hub and an IoT hub. After some research we find out that there are a lot of similarities but also differences. In this blog I will explain the concept of an Event/IoT hub and a best practice when to use an event hub and when to use an IoT hub.
The goal of this article is to give a global image of the Event hub and IoT hub. Please follow the links for more in-depth information.

Solution

Before we can find out what the differences and similarities are, the first question that is: “what is an event hub, and how do we use it?”

1) Event hub

An Event hub is a gateway to the Azure cloud. It’s main purpose is to collect the incoming data and pas it to the Azure cloud, as seen in figure 1. An Event hub process the income data, but on a low profile scale. It doesn’t have advanced sequencing or delivery guaranties. Therefore Event hubs are a high scale messaging service, with a low latency and a high reliability. In our cases we use an event hub to collect the data from the raspberry, but it can also be used in other cases, like collecting data from console games or other telemetry.

Figure1: Event Hub

Protocol

The connected devices/entities are called: Event publisher

Connecting Event publisher to the Azure event hub is easy, because it support the HTTP/AMQP protocols. The most used protocol is AMQP protocol. See here for more information about this subject

Partition

The Event hub uses partitions. Partitions are an ordered sequences that keep events in the Event hub. This sequence is based on the ‘first in first out’ principal. The number of partitions that can be used at the same time is between 2 and 32. Please note that when you create an event hub, you have to set the number of partitions. Strangely it cannot be changed afterwards. Mostly the number of partitions are ased on the amount of readers you are going to use (meaning the use of partitions further in the process). The default number of partitions is 4.

In short an Event hub is a high scale telemetry, one way, service, using the HTTP/AMQP protocol and is generally available worldwide.

For more information ‘how to develop with event hub’ see the programming guide
Setting up an Event hub follows in an other blog (comming soon).

2) IoT Hub

But with the ‘grow of Iot’ there came additional needs: control, device authentication and authorization, protocol translation, etc.

Since an Event hub is an one way point of entry it’s limited in the additional needs as mentioned before..

Figure2: IoT hub

And this is where the IoT Hub kick in. The IoT hub can do the same things as an Event hub, but it’s capable of much more. The most important thing, it can handle bi directional traffic, meaning that an IoT hub is capable of sending data back to the connected devices.

Now it’s possible to command and control the devices, e.g. you can send a disconnect event to the device or a threshold event, e.g. when the machine reach a certain temperature that you can shutdown the machine.

Devices can be registered, so you can identify devices to check whether they are allowed to connect. It’s possible to connect more than 10 million devices (where the Event hub can handle up to 1 million devices) , it is also easy to import bulk device identities (which is easy when you are use 10 million devices ) .

The IoT hub can handle device error reporting, e.g. you can check the failed connection attempts per device. This can result in disconnection/disabling the device/Sensor in the IoT hub (so the sensor isn’t allowed to connect to the hub anymore).

It also support the AMQP over webSockets en MQTT protocol whereby the latter no protocol gateway is needed (when using Azure IoT SDKs).

For more in-depth information about the IoT hub, please see also the reference architecture
For setting up the IoT hub see our earlier blogspot: Setting up IoT hub

Summary

The IoT Hub can do the same as an Event hub, but much more. Mostly because the bi-directional communication possibility, ergo an IoT hub is 'Event Hub plus'.

So why not all use the IoT hub instead of the Event hub? Well one thing we didn’t mentioned was the pricing. With all the extra capabilities of the IoT hub the pricing is also a lot higher. Sometimes up to 40 times higher. So for simple event, like reading data from a weather station, or counting how many times a door is opening a IoT hub is not necessary.