Microsoft BI Tools: STREAM

Showing posts with label STREAM_ANALYTICS. Show all posts

Sunday, 19 February 2023

Streaming data Azure & Power BI - Stream Analytics

Case
I want to send streaming data to Power BI for realtime reporting purposes. This post is part of a series on Streaming Data in Azure and Power BI and focusses on getting the streaming data from the Event Hub and sending it to a Power BI streaming dataset with Azure Stream Analylics.

Azure Stream Analytics

Solution
In the previous post we sent messages to the Event Hub and in this post we will read those messages and send them to Power BI. For this we need a service that can handle streaming data. Azure Stream Analytics is the query tool for streaming data in Azure. It uses the regular SQL language (so easy to learn). However it has some extras like the windowing option for the GROUP BY because you can not aggregate the entire stream (it never ends), but you can aggregate within a certain time window.

Posts in this series:

Setting up Azure Event Hubs
Sending test messages to the Event Hub
Setting up hot path in Azure Stream Analytics
Setting up cold pathin Azure Stream Analytics *soon
Viewing streaming data in Power BI

First explain the three most important items of the Stream Analytics Jobs.

Streaming inputs and reference inputs

There are three types of input sources in the query for streaming data: Azure Event Hubs, Azure IoT Hub and Azure Data Lake. But there is also a reference input where your can use an Azure SQL Database table or a blob storage file to enrich or translate your streaming data. Streaming messages are often kept small to speed up everything by using IDs or Codes instead of long strings. With the reference data you can transform them for reporting purposes.

Streaming Outputs

There are several streaming outputs like the Storage Accounts, a SQL database or even a Cosmos DB, but the one we are using for this example is the Power BI output. The Power BI output has a limitation that it can be called roughly once every second and the messages can't be larger then 15KB. This means that you will need a windowing function to aggreate the data to make sure you don't overflood Power BI.

Query

The query language in Stream Analytics is a subset of T-SQL, so very similar and easy to learn if you already know how to query a Microsoft SQL server database. The most common query pattern is a SELECT INTO with a GROUP BY.

SELECT      SomeColumn, Count(*) as Count
INTO        OutputStream
FROM        InputStream TIMESTAMP BY CreatedAt
GROUP BY    SomeColumn, TumblingWindow(second, 30)

Let's create a new Azure Stream Analytics Job and configure a query to push data to Power BI.

1) Create Stream Analytics Job

Creating the new SA Job is very straightforward. For this example the region and the number of streaming units are the important parts.

Go to the Azure portal an create a new Stream Analytics job
Select the correct subscription and resource group
Come up with a good descriptive name for your job. Keep in mind that there is only one query window but you can run multiple queries within that.
Choose the correct region. The same region as your even hub and your Power BI tenant is the best for performance.
Hosting environment is Cloud (unless you have an on-premises Edge environment.
Last thing is the number of Streaming units. This is where you start paying. The default is 3 SUs, but for testing purposes or small jobs 1 SU is more then enough.
Under storage you can setup a secure storage account. For this example you can leave that empty.
Optionally add some Tags and then review and create the ne SA Job.

Create Streaming Analytics Job

2) Create Input stream

Now that we have a SA job we first need to create an input to connect to our Event Hub.

Go to your SA job in the Azure portal and click on Inputs in the left menu under Job topology
In the upper left corner click on + Add stream input and choose Event Hub
A new pane appears on the right site. First enter a descriptive name for your Event Hub.
Now select the correct Subscription and Event Hub Namespace.
Select the existing Event hub name that we created in the previous blog post.
Select the existing Event Hub consumer group which we left default in the previous blog post
For the Authenication mode the easiest way is to select Create system assigned managed identity. This means this specific SA job will get access to the selected Event Hub.
Partition key is for optimizing performance if your input in indeed partitionized. You can leave it empty for this example
We used a JSON structure for our test messages. Therefore select JSON as Event serialization format.
Select UTF-8 as encoding (the only option at this moment)
Leave the Event compression type to None for this example

Create new Input

After saving you will see a couple of notifications to create and test your new input. This takes about a minutes to complete.

Notifications for new Input

3) Create Output stream

After the input it's now time for adding the output to Power BI. For this you need a Power BI workspace where you have admin rights. Once the job is running for the first time and new events are streaming then Stream Analytics will create a Streaming Dataset in your workspace.

Go to your SA job in the Azure portal and click on Outputs in the left menu under Job topology
In the upper left corner click on + Add and choose Power BI
A new pane appears on the right site. First enter a descriptive name for your Power BI output
Next select the Power BI workspace where your streaming dataset will appear
For Authenication mode choose Managed Identity-System assigned. This specific job will then be added as a Contributor
At last enter a Dataset name and a Table name. Note that you can only have one table in a streaming dataset. So don't create a second output to the same Power BI streaming dataset with a different table (we tried).

Create new Output

After saving you will see a couple of notifications to create and test your new output. This takes about a minute to complete.

Notifictions for new Output

In Power BI your will see your new Stream Analytics job as a Contributor in your workspace. However the streaming dataset will only appear once the job is running and pushing data.

Stream Analytics Job as Contributor

4) Create query

With the new Input and Output we will now create a very basic query to push the test data from Azure Event Hub via Azure Stream Analytics to Power BI.

Go to your SA job in the Azure portal and click on Query in the left menu under Job topology
A basic but working query will alread be created for you if you have an input and output
Once opened wait a few moments for Azure Event Hub to refresh. You will see a turning circle icon behind your input. If there is data in your event hub then it will appear after a few moments

Default query and waiting for input data

Default query with data from input

Now you can adjust your query by only selecting the columns you need in Power BI (less is more). The windowing function is not required if the number of messages don't exceed the limits of Power BI
After editing the query hit the Test query button and check the Test results.

Test result of basic query without time window

Test result of basic query with TumblingWindow

Once you are satisfied with the query result hit the Save Query button above the editor

Save query once you're ready

The query for testing (without windowing).

SELECT
    CallId
,   DurationInSeconds
,   EventEnqueuedUtcTime as CallTimeStampStr
INTO
    [pbbitools]
FROM
    [ehbitools]

If you send more request to Power BI than it can handle, then Stream Analytics will try batching multiple messages into one request. For small messages that occasionally exceed the max number of messages this could be a 'workaround'. However batching multiple messages could also cause to exceed an other limit: the max message size. A better solution is to use a windowing function to slow down the stream.

5) Start SA job

Now go back to the overview page of your Stream Analytics job and hit the Start button to start your SA job. The first time you can choose between Now or Custom as a start time to recieve new messages. The second time you can also choose for When last stopped. For this example we choose Now. It will take a view moments to change the status from Created to Starting to Running.

SA Job is running

Once it is started AND new data is send to the Event Hub then the streaming dataset will appear in your Power BI workspace. Notice that the red icon is different compared to a regular dataset.

A new dataset appeared for the streaming data

Conclusion

In this post you learned how to create a (very basic) Stream Analytics job. In a follow up post we will explain the Window Functions in more details. For now you can read our old post about Window Functions, but compared to 6,5 years ago we now have 2 new Windowing functions. However you probably end up using the old TumblingWindow. The next post in this series will be showing the live data in Power BI with automatically changing visuals when new data is collected.

Sunday, 5 February 2023

Streaming data Azure & Power BI - Introduction

Case
I want to send streaming data to Power BI for reporting purposes. What should I take into account when choosing the right architecture?

Streaming Data to Power BI

Solution
If you for example have a helpdesk for your customers where they can call or chat for support then you probably also want some real time reports to see the current state of the calls and chats. Most regular Data Warehouses are often only refreshed once a night and then it's already too late to react to incidents.

For real time reports in Power BI have two main options. The first option is to send the events directly to a Power BI Streaming dataset (Push or Streaming) and then build a report and pin reports visuals to a dashboard. This is an appropriate solution for a lot of real time reports, but there are some limitations. For example there is a maximum number of events per second. Once you exceed that limit you start loosing data. Propably just when you need accurate reports the most: when it is very busy in your helpdesk. An other limitation for streaming datasets in Power BI is the history. It keeps only one hour of data.

The second option is to push the data into Azure Event Hubs and then use Azure Stream Analytics to push it to Power BI. This solves the max number of events per second because Stream Analytics can aggregate or filter the data before sending it to Power BI and Stream Analytics can also send it to for example a data lake to solve your history problem.

Streaming Data to Power BI

In this streaming data series we will explain this second option focussing on the hot path and the capture in the data lake which is part of the cold path. Just like for a 'regular' data warehouse architecture there are a lot of different solutions, but this one is probably the most common and simple solution that will fit the majority of cases. One particular new streaming data feature in Azure that is worth mentioning, is writing to a Delta Lake table. At the moment of writing this is still in public preview and only available in a limited number of Azure regions, but this will fit the Lake House architecture very well.

Posts in this series:

Setting up Azure Event Hubs
Sending test messages to the Event Hub
Setting up hot path in Azure Stream Analytics
Setting up cold pathin Azure Stream Analytics *soon
Viewing streaming data in Power BI

Saturday, 25 February 2017

Azure - Understanding Stream Analytics Data Lake Store Destination

Case

How do you use Azure Data Lake Store to store Stream Analytics data and why would you do that?

Solution

In an earlier post we have set up a Stream Analytics Job where sensor data is send to a Blob file. In this case we create a new Stream Analytics Job with Data Lake Store as output and were we use the sensor data as input. Despite the fact that the Input and Output is different, the query stays the same as other Stream Analytics Jobs. You can find the configuration of the Stream Analytics Job query here.

A Data Lake Store is a scalable repository optimized for storing IoT data, log files and other large datasets for Big Data scenarios. It contains folders and these folders contain the files (data). As said before, you can also store (sensor) data using Blob storage. These two storage options are similar, but there are some important differences, which we will explain later in this post.

Overview Data Lake Store with different types of data

1) Create a Data Lake Store

First we have to create the Data Lake Store. Click on New (+ icon) in your portal and under the Storage category you will find the Data Lake Store. Give it a suitable name. Next we choose the Resource Group which we created earlier by setting up the IoT Hub. The benefit of choosing the same group is that among other things, the rights are the same. At this time there are not many options available in both Location and Pricing, so it's default.

Azure Portal - Create Data Lake Store

2) Create the Stream Analytics Job
Before creating this new job, we already added a new consumer group to our IoT Hub, called 'DataLake'. Using multiple consumer groups makes it possible for several consumer applications to read data from this IoT Hub independently. Click here to see where you can add/manage consumer group(s).

Note:
The Azure Portal is still in development, so adding a new consumer group is now at a different place than in an earlier post. See screenshot below.

Azure Portal - Adding Consumer Groups

Now we can create the job. We choose the same Resource Group as the IoT Hub and Storage account, just like the creation of the Data Lake. Our Location is the Netherlands, so we choose West-Europe.

Azure Portal - Create Stream Analytics Job

3) Configure the Stream Analytics Job
We start with defining the Input. We choose 'Data stream' as Source Type because the sensor data is an ongoing stream and is derived from the IoT Hub. Under Source we choose 'IoT hub'. In our case we have one IoT Hub, but when you have multiple IoT Hubs you can choose one from the list. Now our IoT Hub appears automatically. Next we choose 'datalake' as Consumer group. This is the one we have created earlier. Finally we choose 'JSON' as Event serialization format.

Now we can specify the Output. Give it a suitable name and choose 'Data Lake Store' as Sink. The corresponding (in thise case the one we have created earlier) Account Name will be selected automatically. Next we enter a file path to store the files in our Data Lake Store account. Optionally, you can specify this path with date and time. Now the data will be stored in multiple instances per day and per hour and this makes the storage more clearly. At last you will choose the 'JSON' format.

In this post we only configuring the Input and Output of the job, as mentioned earlier. Now we can run the job with a valid query.

Azure Portal - Configure Stream Analytics Job

Result
After running the Stream Analytics Job, the data is now stored in the Data Lake Store. You can find this in the Azure portal. Open the Data Lake Store and go to Data Explorer. Now you see one folder named 'sensordata'. This folder contains multiple subfolders: year, month, day and hour. Now we have only one file with data in the subfolder 'hour', but each next hour there will be a new file (as long as your Stream Analytics Job is running). This is exactly what we have configured earlier. In each file, the data is stored per 10 seconds. It works!

Azure Portal - Result Data Lake Store

Differences between Blog Storage and Data Lake Store

The first big difference is the size limits. There are no limitations in account/file size or number of files at a Data Lake Store, while Blob storage has such restrictions. In addition Data Lake Store has built-in Hadoop integration. Therefore (along with the unlimited storage) this makes it suitable for storing Big Data and then analyze this data. Another difference is the authentication. Blob storage works with generated storage access keys, while a Data Lake Store use Azure Active Directory for this.

Overall a Data Lake Store has more possibilities then Blob storage and is optimized for Big Data purposes. The general purpose of Blob Storage is storing data in different scenarios like backups or media files (for streaming). The starting prices are lower for Blob storage, but you have different storage prices for your Blob Storage account. This means: you can make it as expensive as you want and in some cases the monthly charges per GB will be higher than a Data Lake Store. Click here for more details about the differences and prices between a Data Lake Store and Blob storage.

Conclusion
Azure Data Lake Store is very useful with Big Data scenarios because it can combine storage (to more then 1 petabyte) with the ability to analyze this data. This can be done with Hadoop (built-in integration) or Azure Data Lake Analytics, which is specially optimized to work with Azure Data Lake Store. From this perspective it offers more than, for example Blob storage.

Wednesday, 23 November 2016

Azure - Understanding Stream Analytics Blob Destination

Case

How do you use Azure Blob Storage to store stream analytics data and why would you do that?

Solution
In earlier posts we have set up an IoT environment with an IoT Hub and a couple of Stream Analytics Jobs where sensor data is sent to different destinations: Power BI for a real-time dashboard and Azure SQL Database to store the data. In this case we create a new Stream Analytics Job with Blob Storage as output and we use the sensor data as input. The Input and Output are different, but the query is the same as other Stream Analytics Jobs. Configuring the query of the Stream Analytics Job can you find here.

Reasons to use Blob Storage is the diversity of storing in the cloud of text or binary files. The unstructured data can include documents, social data (photos, videos, music and blogs), Big Data (logs, IoT and large datasets) or images and text for web applications. Click here for more pricing details about Blob Storage. You can also store (sensor) data in an Azure Data Lake. Click here for more information about this, along with the major differences between Blob Storage and Data Lake.

For the use of Azure Blob storage you need a Storage account. Then you can add containers to this account, which include the Blob files. In the case of an image, you also find a metadata file. In our case the Blob files are JSON files and contain sensor data.

Overview Azure Blob Storage with images

1) Create a Storage account
To make use of Blob Storage you have to create a Storage account first. After we give it a suitable name, we choose Blob storage and 'RA-GRS' as Replication. This is the default and it contains the most options. Click here for more information about this. Next we choose 'Hot' by Access tier, because we want access the sensor data frequently.

Azure Portal - Create a Storage account

Note:
You can also create a Storage account when setting up the Output of the Stream Analytics Job, but you have less options so it is not recommended.

2) Create the Stream Analytics Job

Before creating the new job, we had already add a new consumer group to our IoT Hub. We called it 'blob'. Using multiple consumer groups makes it possible for several consumer applications to read data from this IoT Hub independently. Click here to see where you can add/manage consumer group(s).

Now we can create the job. We choose the same Resource group as the IoT Hub and Storage account, because these are in the same life cycle. Our Location is the Netherlands, so we choose West-Europe.

Azure Portal - Create Stream Analytics Job

3) Configure the Stream Analytics Job

First we must add a new Input to the job. The default Source Type is 'Data stream'. We choose this because the sensor data is an ongoing stream and is derived from the IoT Hub. The Source is 'IoT hub' and then the IoT Hub that you have created automatically appears. If you have more then one IoT Hub, you can choose one from the drop-down list. After this you must choose the right Consumer group. This is the new group (blob) we have created earlier. Finally you choose 'JSON' as Event serialization format.

Next we add a new Output. First choose 'Blob storage' in Sink and 'Use blob storage from current subscription' as Subscription, because you have configured the storage account earlier. Otherwise you can edit the storage account settings here by choosing 'Provide blob storage settings manually'. Then you create a new container. Optionally, you can define one or more instances (subfolders) within the container. With this option you can change the date and time format so you can have multiple instances including different dates and/or time folders. This makes it more clear (just like your own local File Explorer) and you have the choice to select data from a specific day/time. We made a instance called 'sensor'. At last you will choose the 'JSON' format.

As we said, we discuss only the configuration of the Input and Output of the job in this post. After configured the job we will run the job with a valid query.

Azure Portal - Configure the Stream Analytics Job for Blob

Result

Now the data is stored, we want to see what's in our Blob. Therefore you have to go to your Azure Storage account in the portal. Every object, in our case a Blob, that you store in Azure Storage has a unique URL address. For the Blob service with the storage account name (bitoolsblobstorage) you have created and the container name (sensordata) with the instance (sensor) the URL/endpoint is:

http://bitoolsblobstorage.blob.core.windows.net/sensordata/sensor

More information about the Azure Storage endpoints here.

In the portal go to the Storage account you have made earlier. Click on the container URL and then you see the instance (subfolder). Now you can drill down further on the specific month, day and hour of the incoming sensor data.

Azure Portal - Your Blob file

Conclusion

It looks a lot like the other Stream Analytics posts, but in this case the data is stored in a Blob file. From this point you have several options to do something with this data. First you can do nothing off course and in that case you use Blob purely for storage (backup). You will find the other options in the Cortana Intelligence Suite. For example process the data with Azure Data Factory (this can also be done with traditional SSIS, which is off course not a part of the CIS), analyse the data with Machine Learning or visualize the data in Power BI.

Tuesday, 4 October 2016

Azure - Understanding Stream Analytics Windowing Functions

Case
I want to aggregate data in a stream. How does that work in Stream Analytics?

Example data: count people in front of booth

Solution
Because it's a stream you aggregate data in a certain time window instead of the whole dataset. The Stream Analytics Query language is very similar to TSQL and it it has some extensions like the Windowing functions to aggregate data in time windows. At the moment there are three Windowing extensions (Hopping, Sliding and Tumbling), but it is not inconceivable that more windowing functions will be added in the near future.

General rules:

The length of each window is fixed
Windowing only work in combination with the GROUP BY clause
The time units can be day, hour, minute, second, millisecond or microsecond, but the maximum size of the window in all cases is 7 days.

1) Tumbling Window
The Tumbling Window is the easiest to explain. It aggregates data within a X second/minute/etc. time window and does that every X seconds/minutes/etc.

For example: Tell me the average number of visitors per booth over the last 10 seconds every 10 seconds:

SELECT    Booth, avg(HeadCount) as AvgHeadCount
FROM      HeadCountStream TIMESTAMP BY MeasurementTime
GROUP BY  Booth, TumblingWindow(second, 10)

Tumbling Window

2) Hopping Window
The Hopping Window is very similar to the Tumbling Window, but here the windows have a overlap. It aggregates data within a X second/minute/etc. time window and does that every Y seconds/minutes/etc.
For example: Tell me the average number of visitors per booth over the last 10 seconds every 5 seconds:

SELECT    Booth, avg(HeadCount) as AvgHeadCount
FROM      HeadCountStream TIMESTAMP BY MeasurementTime
GROUP BY  Booth, HoppingWindow(second, 10, 5)

Hopping Window

3) Sliding Window
The Sliding Window is the most difficult to explain. It aggregates the values in the time window every time a new event/measurement occurs or an existing event/measurement falls out of the time window.
So when using the Sliding Window you are interested in aggregating values when ever an event occurs. This is in contrast to the Hopping and Tumbling windows which have a fixed interval.

For example: Tell me the average number of visitors per booth in the last 10 seconds:

SELECT    Booth, avg(HeadCount) as AvgHeadCount
FROM      HeadCountStream TIMESTAMP BY MeasurementTime
GROUP BY  Booth, SlidingWindow(second, 10)

Sliding Window

The first aggregation occurs when the first measurement value (1) is streamed.
The second aggregation occurs when a new measurement value (3) is streamed.
The third, fourth, fifth, etc. is equal to the second aggregation because each time a new measurement value (1) is streamed.
The last aggregation occurs when no more new measurement values are streamed and the second last measurement value (2) falls out of the time window.

Sliding Window

Compared to the Hopping Window (with the same data) you only get an extra result row at the start (1) and one at the end (1) because in this example the events happen in a fixed interval. It gets more interesting when the events are coming in more randomly like tweets about a certain subject.

Timestamp by
You probably noticed the TIMESTAMP BY measurementTime clause after the FROM. This tag lets you set the exact timestamp that an event occurred, rather than the arrival time in the IoT Hub. This timestamp is used by the windowing functions.

Testing query with Windowing functions
In the old portal you can test the query and study the result (at the moment of writing, testing is not yet supported in the new portal). For this you need a json file with some messages in it. These messages should look identical like the messages you send via the IoT Hub. For this example I created a text file with the following text in it:

{"headCount":1,"measurementTime":"2016-09-17T18:25:43.511Z","sensorName":"A"}

{"headCount":3,"measurementTime":"2016-09-17T18:25:47.511Z","sensorName":"A"}

{"headCount":2,"measurementTime":"2016-09-17T18:25:53.511Z","sensorName":"A"}

{"headCount":3,"measurementTime":"2016-09-17T18:25:57.511Z","sensorName":"A"}

{"headCount":4,"measurementTime":"2016-09-17T18:26:03.511Z","sensorName":"A"}

{"headCount":2,"measurementTime":"2016-09-17T18:26:07.511Z","sensorName":"A"}

{"headCount":2,"measurementTime":"2016-09-17T18:26:13.511Z","sensorName":"A"}

{"headCount":1,"measurementTime":"2016-09-17T18:26:17.511Z","sensorName":"A"}

When you hit the test button in the query editor you need to upload a json file for testing. Then it will use that data to test your Stream Analytics query and show the result. In the pictures below you will see the test data in the upper right corner and the query result at the bottom. The red numbers show how the average was calculated.

Sliding Window

When you leave out one measurement, two rows will change in the result.

Sliding Window, leaving out 1 measurement

When you leave out two successive measurements, two rows will change in the result and one row will disappear.

Sliding Window, leaving out 2 successive measurements

Conclusion
Tumbling and hopping window are easy to understand. Sliding window is a little harder to understand, but writing the query is very easy. Using the windowing functions is something you would probably want to do in the hot path to stream to PowerBI. This way you don't get to much data in the stream.

Thursday, 29 September 2016

IoT Adventure: 5b - Stream Analytics for Azure SQL Database

Case

Your sensors are connected with an IoT Hub and is generating data. In our previous post we send the real-time data to an Power BI dashboard. What are the other possibilities in Azure with this data?

Solution

In our previous post we distinguish two streams for our data: Cold path and Hot path. In this case we store the data in a Azure SQL Database, which is a form of the Cold path. See here for the full list of Azure SQL Databases (size and prices) where you can choose from. The reason to store the data may be, for example, to analyze the data or to prepare a dataset as input for your Machine Learning experiment/model. Just like the Hot path, we are setting up a (separate) Stream Analytics Job for this. You can have multiple Outputs in one Job, for example real-time Power BI and SQL Database, but when you want to edit the query for the data to the database, you must stop the Job and also your real-time data is not sent. Before we create the job, we first set up the Azure SQL Server and after that the SQL Azure Database. The reason for this is that we want to select the database by the Output of the Stream Analytics Job and therefore we need to create it first (along with the server).

Cold path with Stream Analytics

1) Create the Azure SQL Server
Go to the Azure portal and click on 'More services' on the bottom of the menu (left-hand side of the screen) and search for SQL server. When you have opened it, click on 'Add' to create a new Stream Analytics Job. Perhaps you noticed that in our previous post instead of 'Browse' now 'More service' stands. The portal is still in development so there are regular updates.

Azure Portal - Create SQL Server

Now you can fill in your Server name (the name cannot contain spaces). Next we create a SQL Server login and this is our Server admin. You can also use Azure Active Directory (user or group) for this. Click here for more information. The Subscription is filled automatically. After this we choose a Resource group. For the convenience we choose the Resource group we have created earlier by setting up the IoT Hub, but you can also create a general Resource group so the server can be used for purposes other than IoT. Otherwise the server has the same lifecycles, permissions and policies as the IoT Resource group. Our Location is the Netherlands, so we choose West-Europe.

Azure Portal - Create SQL Server (continuation)

Tip:
When the deployment of the server succeeded, the server must appear in the list of SQL Servers. If not, you must click on the 'Refresh' button at the top under SQL Servers.

2) Create the Azure SQL Database
Next we create the database. In your Azure portal click on 'More services' and search for SQL Database. When you have opened it, click on 'Add' to create a new database.

Azure Portal - Create SQL database

Choose a name for your database. The Subscription is filled automatically and next we choose for the same Resource group as earlier by setting up the SQL Server. Select 'Blank database' (new database) and choose the SQL Server that you have created earlier. If you don't choose a server, Azure creates automatically one. That is the reason why we set up the server first, because maybe you want to create one server (with a general name) and to attach here multiple databases. Otherwise you have a separate server for every database. That can also be a conscious choice off course, but that is not what we want in this case. At last we choose the 'Basic' database, but here you can choose the size that fits your needs. The Collation is default.

Azure Portal - Create SQL database (continuation)

Tip:
If you decide to add in Management Studio (once you have connected) a new database, be aware of the fact that Azure creates default the S3 (Standard) version of a database. Therefore, you should always create a new database in the Azure portal, so that you can choose the right size and price.

3) Connect to SQL Server and create a table
Once the database is created, you can connect to the SQL Server in Management Studio. In this case our Server name is 'bitools.database.windows.net,1433'. As you can see the name includes the default port of 1433 (this is the only port on which the service is available). Next you choose 'SQL Server Authentication' and fill in the login and password that you have created earlier by setting up the SQL Server. The first time you must Sign In with your Azure account. This is also the case when you have not made connection to the server for a while. At last you must add your client IP for access to the server. Your IP is now added to the firewall.

SQL Server Management Studio - Connect to Azure

Before we create the Stream Analytics Job, we must do one last thing and that is create a table where we can store the sensor data. I have created the following table in the database:

CREATE TABLE [dbo].[sensorData](
 [SensorName] [nvarchar](max) NULL,
 [MeasurementCount] [bigint] NULL,
 [MeasurementTime] [datetime] NULL,
 [Temperature] [float] NULL,
 [Humidity] [float] NULL,
 [Pressure] [float] NULL,
 [Altitude] [float] NULL,
 [Decibel] [float] NULL,
 [DoorOpen] [bigint] NULL,
 [Motion] [bigint] NULL,
 [Vibration] [bigint] NULL,
 [Illumination] [float] NULL
)

As you can see I choose for float as datatype, because the standard data types in Azure are floats. This means that input datatypes such as decimal and numeric are converted to floats.

Tip:
You can manage the firewall in the Azure portal. Go in the portal to your server, click on it and under settings you will find 'Firewall'. Here you can add Client IP's to allow connection to the server. Note that this can only be done by an user who have the role of 'Owner'. In this case I created the server so I'm automatically owner of the server.

4) Create the Stream Analytics Job

In your Azure portal click on 'New'. Type in 'Stream Analytics Job' and click on it. Next you click on the result, in this case only Stream Analytics Job. After that, you can click on the 'Create' button.

Azure Portal - Create Stream Analytics Job (extensive)

Tip:
In our previous post we create a new Stream Analytics Job on a faster and different way. This way is extensive and gives some general information about, in this case, a Stream Analytics Job. So if you want more information about a feature in Azure before you create it, this is a useful way.

Now you can fill in your Job name (the name cannot contain spaces) and the Subscription is filled automatically. Next choose a Resource group. These groups are made by setting up the IoT Hub. Click here for more information and how you create it. When you have created this, it appears in the list of 'use existing' and you can choose this one. Our Location is the Netherlands, so we choose West-Europe. At last you can pin your Job right away to your dashboard, with the checkbox at the bottom. You may have noticed at the first screenshot that I have already pin the previous Stream Analytics Job to my dashboard.

Azure Portal - Create Stream Analytics Job (continuation)

5) Define the Input

Once the Job is created, you must add a new Input. Because I have pinned the Job (screenshot 6 of 'Create the Stream Analytics Job'), you can select it from the dashboard. The default Source Type is 'Data stream', because the sensor data is an ongoing stream and is derived from the IoT Hub. Optionally, you can add 'Reference data' as type. This data is like static metadata next to your sensor data, it gives your sensor data more meaning. Here you can find more information about this kind of data. The Source is 'IoT hub' and then the IoT Hub that you have created automatically appears. If you have more then one IoT Hub, you can choose one from the drop-down list. The next thing is to choose the right Consumer group. These groups are made by setting up the IoT Hub. Click here for more information and how you create it. In this case we want to store the sensor data in a Azure database, so you choose 'azuredb'. Finally you choose 'JSON' as Event serialization format. Click here for more information and how you create such a JSON message.

Click on the job
Click on 'Inputs'
Click on 'Add'
Fill in a name, select 'Data stream' as Source Type and select your IoT Hub as Source
Select 'azuredb' as Consumer group and choose 'JSON' as Event serialization format

Azure Portal - Define Input

6) Define the Output

After the Input, you create the Output. When you have given the Output a suitable name, you choose 'SQL database' in Sink. Then select your Database that you have created, in our case 'IoT_Sensor'. After that the Server name, that you have created too, is filled in automatically. Next you must connect to the database with the SQL Server login you made earlier. Now you can choose a table, in our case 'sensorData'

Click on the job and click on 'Outputs'
Click on 'Add'
Fill in a name and select 'SQL database' as Sink
Select your Azure Database that you have created
Log in with the SQL login Username and Password
Choose the created Table

Azure Portal - Define Output

7) Define the Query

Now the Output is defined, you can build up the query. Compared to our previous post you see that there are already some updates have been made. For example, on the left you see your Input and Output. The query needs always an Input and an Output, so that's why we have created the Output first. It is good to know that the language is SQL, but there are certain differences with a normal SQL query. In addition, the standard data types are floats.

Besides a FROM clause, there is an INTO clause. For the FROM you will use your defined Input and the Output is used for the INTO. Additionally, there are various new windowing functions available. This will be discussed in another blog. You will find more details about the Stream Analytics Query Language here. For now we use a simple query without those functions.

The query:

SELECT   CAST(sensorName as nvarchar(max)) as SensorName
,        CAST(1 as bigint) as MeasurementCount
,        CAST(measurementTime as datetime) as MeasurementTime
,        Temperature
,        Humidity
,        Pressure
,        Altitude
,        Decibel
,        CAST(doorOpen as bigint) as DoorOpen
,        CAST(motion as bigint) as Motion
,        CAST(vibration as bigint) as Vibration
,        Illumination
INTO   [saj-bitools-DB-Output] 
FROM   [saj-bitools-DB-Input]

Unfortunately, the testing of the query is not supported in the new Azure Portal. They are working on it.

Azure Portal - Define Query

7) Start the Job

At last you must start the Stream Analytics Job. You can choose between ad-hoc (now) or a scheduled day and time (custom).

Azure Portal - Start the Stream Analytics Job

Result

We have started the Stream Analytics Job and now we want to see the result. Go back to your SQL Server Management Studio and connect to the Azure Database. When you look at the table, you must see the results. In our case we have sent 100 messages to our database. The messages are sent every 10 seconds.

SQL Server Management Studio - Table results

Conclusion

The steps are logical, but the sequence of the execution is very important. Also be careful about which database you buy, because there are big differences between the prices.