Sunday, 19 February 2023

Streaming data Azure & Power BI - Stream Analytics

Case
I want to send streaming data to Power BI for realtime reporting purposes. This post is part of a series on Streaming Data in Azure and Power BI and focusses on getting the streaming data from the Event Hub and sending it to a Power BI streaming dataset with Azure Stream Analylics.
Azure Stream Analytics
















Solution
In the previous post we sent messages to the Event Hub and in this post we will read those messages and send them to Power BI. For this we need a service that can handle streaming data. Azure Stream Analytics is the query tool for streaming data in Azure. It uses the regular SQL language (so easy to learn). However it has some extras like the windowing option for the GROUP BY because you can not aggregate the entire stream (it never ends), but you can aggregate within a certain time window.

Posts in this series:

First explain the three most important items of the Stream Analytics Jobs.

Streaming inputs and reference inputs
There are three types of input sources in the query for streaming data: Azure Event Hubs, Azure IoT Hub and Azure Data Lake. But there is also a reference input where your can use an Azure SQL Database table or a blob storage file to enrich or translate your streaming data. Streaming messages are often kept small to speed up everything by using IDs or Codes instead of long strings. With the reference data you can transform them for reporting purposes.

Streaming Outputs
There are several streaming outputs like the Storage Accounts, a SQL database or even a Cosmos DB, but the one we are using for this example is the Power BI output. The Power BI output has a limitation that it can be called roughly once every second and the messages can't be larger then 15KB. This means that you will need a windowing function to aggreate the data to make sure you don't overflood Power BI.

Query
The query language in Stream Analytics is a subset of T-SQL, so very similar and easy to learn if you already know how to query a Microsoft SQL server database. The most common query pattern is a SELECT INTO with a GROUP BY.
SELECT      SomeColumn, Count(*) as Count
INTO        OutputStream
FROM        InputStream TIMESTAMP BY CreatedAt
GROUP BY    SomeColumn, TumblingWindow(second, 30)

Let's create a new Azure Stream Analytics Job and configure a query to push data to Power BI.


1) Create Stream Analytics Job
Creating the new SA Job is very straightforward. For this example the region and the number of streaming units are the important parts.
  • Go to the Azure portal an create a new Stream Analytics job
  • Select the correct subscription and resource group
  • Come up with a good descriptive name for your job. Keep in mind that there is only one query window but you can run multiple queries within that.
  • Choose the correct region. The same region as your even hub and your Power BI tenant is the best for performance.
  • Hosting environment is Cloud (unless you have an on-premises Edge environment.
  • Last thing is the number of Streaming units. This is where you start paying. The default is 3 SUs, but for testing purposes or small jobs 1 SU is more then enough.
  • Under storage you can setup a secure storage account. For this example you can leave that empty.
  • Optionally add some Tags and then review and create the ne SA Job.
Create Streaming Analytics Job




































2) Create Input stream
Now that we have a SA job we first need to create an input to connect to our Event Hub.
  • Go to your SA job in the Azure portal and click on Inputs in the left menu under Job topology
  • In the upper left corner click on + Add stream input and choose Event Hub
  • A new pane appears on the right site. First enter a descriptive name for your Event Hub.
  • Now select the correct Subscription and Event Hub Namespace.
  • Select the existing Event hub name that we created in the previous blog post.
  • Select the existing Event Hub consumer group which we left default in the previous blog post
  • For the Authenication mode the easiest way is to select Create system assigned managed identity. This means this specific SA job will get access to the selected Event Hub.
  • Partition key is for optimizing performance if your input in indeed partitionized. You can leave it empty for this example
  • We used a JSON structure for our test messages. Therefore select JSON as Event serialization format.
  • Select UTF-8 as encoding (the only option at this moment)
  • Leave the Event compression type to None for this example

Create new Input



















After saving you will see a couple of notifications to create and test your new input. This takes about a minutes to complete.
Notifications for new Input












3) Create Output stream
After the input it's now time for adding the output to Power BI. For this you need a Power BI workspace where you have admin rights. Once the job is running for the first time and new events are streaming then Stream Analytics will create a Streaming Dataset in your workspace.
  • Go to your SA job in the Azure portal and click on Outputs in the left menu under Job topology
  • In the upper left corner click on + Add and choose Power BI
  • A new pane appears on the right site. First enter a descriptive name for your Power BI output
  • Next select the Power BI workspace where your streaming dataset will appear
  • For Authenication mode choose Managed Identity-System assigned. This specific job will then be added as a Contributor
  • At last enter a Dataset name and a Table name. Note that you can only have one table in a streaming dataset. So don't create a second output to the same Power BI streaming dataset with a different table (we tried).
Create new Output



















After saving you will see a couple of notifications to create and test your new output. This takes about a minute to complete.
Notifictions for new Output

`








In Power BI your will see your new Stream Analytics job as a Contributor in your workspace. However the streaming dataset will only appear once the job is running and pushing data.
Stream Analytics Job as Contributor












4) Create query
With the new Input and Output we will now create a very basic query to push the test data from Azure Event Hub via Azure Stream Analytics to Power BI.
  • Go to your SA job in the Azure portal and click on Query in the left menu under Job topology
  • A basic but working query will alread be created for you if you have an input and output
  • Once opened wait a few moments for Azure Event Hub to refresh. You will see a turning circle icon behind your input. If there is data in your event hub then it will appear after a few moments
Default query and waiting for input data









Default query with data from input

















  • Now you can adjust your query by only selecting the columns you need in Power BI (less is more). The windowing function is not required if the number of messages don't exceed the limits of Power BI
  • After editing the query hit the Test query button and check the Test results.
Test result of basic query without time window
















Test result of basic query with TumblingWindow












  • Once you are satisfied with the query result hit the Save Query button above the editor
Save query once you're ready











The query for testing (without windowing).
SELECT
    CallId
,   DurationInSeconds
,   EventEnqueuedUtcTime as CallTimeStampStr
INTO
    [pbbitools]
FROM
    [ehbitools]
If you send more request to Power BI than it can handle, then Stream Analytics will try batching multiple messages into one request. For small messages that occasionally exceed the max number of messages this could be a 'workaround'. However batching multiple messages could also cause to exceed an other limit: the max message size. A better solution is to use a windowing function to slow down the stream.

5) Start SA job
Now go back to the overview page of your Stream Analytics job and hit the Start button to start your SA job. The first time you can choose between Now or Custom as a start time to recieve new messages. The second time you can also choose for When last stopped. For this example we choose Now. It will take a view moments to change the status from Created to Starting to Running. 
SA Job is running
















Once it is started AND new data is send to the Event Hub then the streaming dataset will appear in your Power BI workspace. Notice that the red icon is different compared to a regular dataset.
A new dataset appeared for the streaming data









Conclusion
In this post you learned how to create a (very basic) Stream Analytics job. In a follow up post we will explain the Window Functions in more details. For now you can read our old post about Window Functions, but compared to 6,5 years ago we now have 2 new Windowing functions. However you probably end up using the old TumblingWindow. The next post in this series will be showing the live data in Power BI with automatically changing visuals when new data is collected.