Wednesday, 3 August 2016

IoT Adventure: 5a - Stream Analytics Job for Power BI

Your sensors are connected with an IoT Hub and is generating data. What are the possibilities with this data and more important, how does this works in Azure?

In Azure, we can use Stream Analytics to do this. We distinguish two streams for our data:
  • Cold path
  • Hot path
Possibilities with Stream Analytics

There is also the possibility for Machine Learning. You can use this for both Cold path and Hot path. We are skipping this for now. This will be discussed in another blog.

Cold path
The Cold path means that the data will be stored for further processing, before presenting it to the end users. Examples of a Cold path are loading the data into a Azure Data Lake or an Azure SQL Database. For the latter, you have to create first the Azure SQL Database. You also have to organize the authentication, make some custom tables where the data can be stored in and finally make the Stream Analytics Job. Setting up all this will be explained in another blog. 

Hot path
The other path is the Hot path and this means that the data is real-time and will be displayed into Power BI. First you have to create a Stream Analytics Job and then do the configuration.

1) Create the Stream Analytics Job
You can make the Job in both old portal and the new portal. We prefer to make as much as possible in the new portal, because this will be the standard in the future.

Go to 'Browse' on the bottom of the menu (left-hand side of the screen) and search for Stream Analytics jobs. When you have opened it, click on 'Add' to create a new Stream Analytics Job. 

Azure Portal - Create Stream Analytics Job

As you can see, on the left in the Azure menu I have some standard resources. When you click on a blank star (favorite) button, it will append on the left in your menu (see screenshot 2). You can also select 'All recourses' in the menu. This contains a list of all features and here you can add any feature, also a Stream Analytics Job.

Now you can fill in your Job name (the name cannot contain spaces) and the Subscription is filled automatically. Next choose a Resource group. These groups are made by setting up the IoT Hub. Click here for more information and how you create it. When you have created this, it appears in the list of 'use existing' and you can choose this one. Our Location is the Netherlands, so we choose West-Europe. At last you can pin your Job right away to your dashboard, with the checkbox at the bottom. You may have noticed at the first screenshot that I have already pin some some Jobs to my dashboard

Azure Portal - Create Stream Analytics Job (continuation)

Once the Job is created, it will appears in the list of Stream Analytics Jobs with the status 'Created'.

2) Define the Input
Once the Job is created, you must add a new Input. The default Source Type is 'Data stream', because  the sensor data is an ongoing stream and is derived from the IoT Hub. Optionally, you can add 'Reference data' as type. This data is like static metadata next to your sensor data, it gives your sensor data more meaning. Here you can find more information about this kind of data. The Source is 'IoT hub' and then the IoT Hub that you have created automatically appears. If you have more then one IoT Hub, you can choose one from the drop-down list. The next thing is to choose the right Consumer group. These groups are made by setting up the IoT Hub. Click here for more information and how you create it. In our case we want to present the sensor data in a Power BI dashboard, so you choose 'powerbi'. Finally you choose 'JSON' as Event serialization format. Click here for more information and how you create such a JSON message.
  1. Click on the job
  2. Click on 'Inputs'
  3. Click on 'Add'
  4. Fill in a name, select 'Data stream' as Source Type  and select your IoT Hub as Source
  5. Select 'powerbi' as Consumer group and choose 'JSON' as Event serialization format
Azure Portal - Define Input

3) Define the Output
After the Input, you create the Output. When you have given the Output a suitable name, you choose 'Power BI' in Sink. Then you should authorize with your Power BI account. In most cases, this will be the same as your Azure account. Please note that this only can be done with an organization account. When you are successfully logged in, you must choose a Group Workspace. The default is 'My Workspace'. The downside of 'My Workspace' is that when you want to share your Power BI dashboard with other persons and such persons enable to edit this dashboard, this doesn't work in a local workspace. You must choose a workspace where several people are part of, for example a SharePoint group. Everybody in this group can see and edit Power BI dashboard(s) in this workspace. In our example is this the 'IoT' group.
  1. Click on the job and click on 'Outputs'
  2. Click on 'Add'
  3. Fill in a name and select 'Power BI' as Sink
  4. Authorize with your Azure account
  5. Select your Group Workspace
  6. Fill in a Dataset name and Table name
Azure Portal - Define Output

4) Define the Query
Now the Output is defined, you can build up the query. Because the query needs always an Input and an Output, we created the Output first. It is good to know that the language is SQL, but there are certain differences with a normal SQL query. In addition, the standard data types are floats. 
Besides a FROM clause, there is an INTO clause. For the FROM you will use your defined Input and the Output is used for the INTO. Additionally, there are various new windowing functions available. We use one of them. You will find more details about the Stream Analytics Query Language here. For now we use the Tumbling Window function and the Timestamp function. 

Tumbling Window function
With the Tumbling Window function you can define your own time intervals, that will not overlap each other. Our data is send per second to our IoT Hub, but here we make an interval of one minute. Based on that we use several aggregate functions, that will calculate the data for one minute. See this post for more details.

Timestamp By function
The other function is the Timestamp. A query in Stream Analytics contains nearly always a datetime field, because the data is send at a specific time. This field is passed to 'TIMESTAMP BY' function. This ensures that the data is coming in at the time when it is created, instead of the time when the data has been sent. Perhaps the transfer of the data is delayed and causes the data in a different order by entry. 

The query:

SELECT   sensorName
,        Max(measurementTime) as measurementTime
,        CAST(1 as bigint) as measurementCount
,        Avg(decibel) as decibel
,        Sum(doorOpen) as doorOpen
,        Avg(humidity) as humidity
,        Avg(illumination) as illumination
,        Sum(motion) as motion
,        Avg(pressure) as pressure
,        Avg(temperature) as temperature
,        Sum(vibration) as vibration
INTO   [saj-bitools-Output] 
FROM   [saj-bitools-Input] TIMESTAMP by measurementTime 
GROUP BY  sensorName
,         TumblingWindow(minute, 1)

Unfortunately, the testing of the query is not supported in the new Azure Portal. They are working on it.

Azure Portal - Define Query

5) Start the Job
At last you must start the Stream Analytics Job. You can choose between ad-hoc (now) or a scheduled day and time (custom).

Azure Portal - Start the Stream Analytics Job

Power BI
We have started the Stream Analytics Job and now we want to present the data in Power BI. Log in here with your Power BI account. Select on the top left your Group Workspace. This is the same group that you have chosen earlier when defining the Output. When you have selected the right workspace, the dataset (defined in the Output) appears automatically. Select the dataset en now you can build a report. In 10-15 minutes we have created a simple report. It contains the amount of input events, averages of the measurements and a timeline with these measurements.

When you click the refresh button, the data will be refreshed. Next you can pin the live page to a dashboard. If the pin doesn't work the first time by adding at to a new dashboard, just pin the report again to the existing dashboard (you created previously). Now you have your first Power BI dashboard with live sensor data.
  1. Click on 'Power BI'
  2. Click on the menu
  3. Select your Group Workspace
  4. The dataset appears automatically
  5. Click on the dataset
  6. Make the report and click on 'Pin Live Page'
  7. Pin the report to a new dashboard en give it a suitable name
  8. Select the dashboard and you will see your report
Create the Power BI dashboard

Power BI example

In another blog we will show you how to send your sensor data to an Azure SQL Database. Because you have these different options for your output, we choose to create a separate job for our livestream sensor data to Power BI. You can create multiple Inputs, Query's and Outputs in one job. The disadvantage is that you want to edit a single query (for example the livestream query), you must stop the entire job. This means that other Outputs (like Azure database) are stopped too and do not receive any data anymore through this job. 

A lot of steps to do before you can make a Power BI dashboard. If you can build once, you can pretty quickly put together other dashboards.

No comments:

Post a comment

Related Posts Plugin for WordPress, Blogger...