Thursday 23 February 2023

Streaming data Azure & Power BI - Streaming dataset

Case
I want to send streaming data to Power BI for realtime reporting purposes. This post is part of a series on Streaming Data in Azure and Power BI. It focusses on showing the live data in Power BI.
Power BI Streaming dataset























Solution
In the previous Stream Analytics post we pushed the data from the Event Hub to a new Streaming dataset in Power BI and in this post we will create a Power BI report and dashboard to see the streaming data actually moving on our screen.






Posts in this series:

1) Check dataset
First make sure your Stream Analytics job is running and that data is send to the Event Hub. Only then a new Power BI Dataset will be created. Note that the icon for a streaming Dataset differs from a normal dataset. You can edit the dataset to see which columns are available, but don't change the columns or settings because Stream Analytics is managing it.
New Streaming dataset in Power BI














2) Create report
Now create a new Power BI report on the newly created dataset, publish it and  then view it in Power BI online.
Creating and publish our beautifull report














The workspace with dataset and report











The live report













Now while viewing the report make sure data is still streaming into the Power BI dataset. You will probably notice that nothing is changing or moving in your report. Once you hit the refresh visual button you will see the new data. However continuously hitting the refresh button is not likely an option for you.

3) Pin report visual to Dashboard
Hover your mouse over your report visual and look for the little push pin button. Click on it to create a new dashboard. Repeat this for all the visuals you want to see in live mode.
Pin Report Visual to Dashboard













The workspace with the new dashboard










Now open the newly created dashboard and watch the data streaming live into your visuals. You can adjust the size of the visuals to see more details. We left the bottom right empty to show a PowerShell ISE window pushing data to the Event Hub and with the lowest and slowest settings (basic Event Hub, 1 streaming unit in Stream Analytics and the max of 1 message a second) it just takes a view seconds for the data to appear in Power BI.
Streaming live data into Power BI













Conclusions
In this post we showed you the end of the hot path by creating a very basic report and pinning its visuals to a dashboard to see the data streaming live into your dashboard. You can enrich the dashboard with visuals from your cold path reports to also compare the live data to for example daily everages. The storage part of the cold path will be explained in a next post from this series.

Tip: during testing you can empty the streaming dataset by temporary switching off Historic data analysis in the edit screem of the dataset. You wont see new data until you switch it on again.


Sunday 19 February 2023

Streaming data Azure & Power BI - Stream Analytics

Case
I want to send streaming data to Power BI for realtime reporting purposes. This post is part of a series on Streaming Data in Azure and Power BI and focusses on getting the streaming data from the Event Hub and sending it to a Power BI streaming dataset with Azure Stream Analylics.
Azure Stream Analytics
















Solution
In the previous post we sent messages to the Event Hub and in this post we will read those messages and send them to Power BI. For this we need a service that can handle streaming data. Azure Stream Analytics is the query tool for streaming data in Azure. It uses the regular SQL language (so easy to learn). However it has some extras like the windowing option for the GROUP BY because you can not aggregate the entire stream (it never ends), but you can aggregate within a certain time window.

Posts in this series:

First explain the three most important items of the Stream Analytics Jobs.

Streaming inputs and reference inputs
There are three types of input sources in the query for streaming data: Azure Event Hubs, Azure IoT Hub and Azure Data Lake. But there is also a reference input where your can use an Azure SQL Database table or a blob storage file to enrich or translate your streaming data. Streaming messages are often kept small to speed up everything by using IDs or Codes instead of long strings. With the reference data you can transform them for reporting purposes.

Streaming Outputs
There are several streaming outputs like the Storage Accounts, a SQL database or even a Cosmos DB, but the one we are using for this example is the Power BI output. The Power BI output has a limitation that it can be called roughly once every second and the messages can't be larger then 15KB. This means that you will need a windowing function to aggreate the data to make sure you don't overflood Power BI.

Query
The query language in Stream Analytics is a subset of T-SQL, so very similar and easy to learn if you already know how to query a Microsoft SQL server database. The most common query pattern is a SELECT INTO with a GROUP BY.
SELECT      SomeColumn, Count(*) as Count
INTO        OutputStream
FROM        InputStream TIMESTAMP BY CreatedAt
GROUP BY    SomeColumn, TumblingWindow(second, 30)

Let's create a new Azure Stream Analytics Job and configure a query to push data to Power BI.


1) Create Stream Analytics Job
Creating the new SA Job is very straightforward. For this example the region and the number of streaming units are the important parts.
  • Go to the Azure portal an create a new Stream Analytics job
  • Select the correct subscription and resource group
  • Come up with a good descriptive name for your job. Keep in mind that there is only one query window but you can run multiple queries within that.
  • Choose the correct region. The same region as your even hub and your Power BI tenant is the best for performance.
  • Hosting environment is Cloud (unless you have an on-premises Edge environment.
  • Last thing is the number of Streaming units. This is where you start paying. The default is 3 SUs, but for testing purposes or small jobs 1 SU is more then enough.
  • Under storage you can setup a secure storage account. For this example you can leave that empty.
  • Optionally add some Tags and then review and create the ne SA Job.
Create Streaming Analytics Job




































2) Create Input stream
Now that we have a SA job we first need to create an input to connect to our Event Hub.
  • Go to your SA job in the Azure portal and click on Inputs in the left menu under Job topology
  • In the upper left corner click on + Add stream input and choose Event Hub
  • A new pane appears on the right site. First enter a descriptive name for your Event Hub.
  • Now select the correct Subscription and Event Hub Namespace.
  • Select the existing Event hub name that we created in the previous blog post.
  • Select the existing Event Hub consumer group which we left default in the previous blog post
  • For the Authenication mode the easiest way is to select Create system assigned managed identity. This means this specific SA job will get access to the selected Event Hub.
  • Partition key is for optimizing performance if your input in indeed partitionized. You can leave it empty for this example
  • We used a JSON structure for our test messages. Therefore select JSON as Event serialization format.
  • Select UTF-8 as encoding (the only option at this moment)
  • Leave the Event compression type to None for this example

Create new Input



















After saving you will see a couple of notifications to create and test your new input. This takes about a minutes to complete.
Notifications for new Input












3) Create Output stream
After the input it's now time for adding the output to Power BI. For this you need a Power BI workspace where you have admin rights. Once the job is running for the first time and new events are streaming then Stream Analytics will create a Streaming Dataset in your workspace.
  • Go to your SA job in the Azure portal and click on Outputs in the left menu under Job topology
  • In the upper left corner click on + Add and choose Power BI
  • A new pane appears on the right site. First enter a descriptive name for your Power BI output
  • Next select the Power BI workspace where your streaming dataset will appear
  • For Authenication mode choose Managed Identity-System assigned. This specific job will then be added as a Contributor
  • At last enter a Dataset name and a Table name. Note that you can only have one table in a streaming dataset. So don't create a second output to the same Power BI streaming dataset with a different table (we tried).
Create new Output



















After saving you will see a couple of notifications to create and test your new output. This takes about a minute to complete.
Notifictions for new Output

`








In Power BI your will see your new Stream Analytics job as a Contributor in your workspace. However the streaming dataset will only appear once the job is running and pushing data.
Stream Analytics Job as Contributor












4) Create query
With the new Input and Output we will now create a very basic query to push the test data from Azure Event Hub via Azure Stream Analytics to Power BI.
  • Go to your SA job in the Azure portal and click on Query in the left menu under Job topology
  • A basic but working query will alread be created for you if you have an input and output
  • Once opened wait a few moments for Azure Event Hub to refresh. You will see a turning circle icon behind your input. If there is data in your event hub then it will appear after a few moments
Default query and waiting for input data









Default query with data from input

















  • Now you can adjust your query by only selecting the columns you need in Power BI (less is more). The windowing function is not required if the number of messages don't exceed the limits of Power BI
  • After editing the query hit the Test query button and check the Test results.
Test result of basic query without time window
















Test result of basic query with TumblingWindow












  • Once you are satisfied with the query result hit the Save Query button above the editor
Save query once you're ready











The query for testing (without windowing).
SELECT
    CallId
,   DurationInSeconds
,   EventEnqueuedUtcTime as CallTimeStampStr
INTO
    [pbbitools]
FROM
    [ehbitools]
If you send more request to Power BI than it can handle, then Stream Analytics will try batching multiple messages into one request. For small messages that occasionally exceed the max number of messages this could be a 'workaround'. However batching multiple messages could also cause to exceed an other limit: the max message size. A better solution is to use a windowing function to slow down the stream.

5) Start SA job
Now go back to the overview page of your Stream Analytics job and hit the Start button to start your SA job. The first time you can choose between Now or Custom as a start time to recieve new messages. The second time you can also choose for When last stopped. For this example we choose Now. It will take a view moments to change the status from Created to Starting to Running. 
SA Job is running
















Once it is started AND new data is send to the Event Hub then the streaming dataset will appear in your Power BI workspace. Notice that the red icon is different compared to a regular dataset.
A new dataset appeared for the streaming data









Conclusion
In this post you learned how to create a (very basic) Stream Analytics job. In a follow up post we will explain the Window Functions in more details. For now you can read our old post about Window Functions, but compared to 6,5 years ago we now have 2 new Windowing functions. However you probably end up using the old TumblingWindow. The next post in this series will be showing the live data in Power BI with automatically changing visuals when new data is collected.




Monday 13 February 2023

Show dataset parameters in Azure Synapse Dataflow

Case
I have a parameterized dataset. The parameter is showing in the pipeline's Copy Data Activity, but it's not showing in the Synappse (or ADF) dataflow under source or sink. When going the the Data preview it shows me an error message: No value provided for Parameter 'MyFileName'.
No value provided for Parameter 'abc'










Solution
You added a parameter to your dataset so you can for example use it in a foreach construction in your pipeline.
Dataset with parameter














In your pipeline's Copy Data Activity you can see the dataset parameter as soon as you select your dataset with the parameter.
Parameter showing in Copy Data Activity














However when you select that same dataset as a source (or sink) you won't see the parameter appearing in the editor. On this screen there is no way to provide the parameter and when you debug the source you will get an error stating that you need to provide a value for that parameter: No value provided for Parameter 'MyFileName'.
Dataset selected, but no parameter




















The first option is to provide a default value for your dataset parameter in the dataset itself. Now you won't get that error. This is perhaps in some cases suitable, but for most cases it isn't.
Default value for parameter in dataset













Now go to your pipeline and add a Dataflow Activity for your Dataflow. You will see the Parameter appear in the settings tab (not in the Parameters tab which is for Dataflow parameters only). This is handy for when you when want to debug the Pipeline and Dataflow at the same time, but not for when you just want to debug your Dataflow to see the data preview.

Parameter is back again

















Now go to your Dataflow and click on Debug Settings. Whitin the Debug Settings go to the Parameters tab. Find your source under Dataset parameters and provide a value for debugging.
Provide Value for parameter in Dataflow Debug Settings








Go to your source (or sink) in the Dataflow and then go to Data preview to see the actual data.
There is data in our preview














Conclusion
In this post you learned how to debug your Dataflow when using a parameterized Dataset. Compared to other parts of Synapse (or ADF) it would probably make more sense to first create a Dataflow Parameter (with a default value for debugging). Then show the Dataset Parameter it in the Source (or Sink) settings page where you override it with the Dataflow Parameter.

Monday 6 February 2023

Streaming data Azure & Power BI - Event Hubs

Case
I want to send streaming data to Power BI for reporting purposes. This post is the first part of a series on Streaming Data in Azure and Power BI. Here we will focus on the 'mailbox' service called Azure Event Hubs.
Azure Event Hubs













Solution
Azure Event Hubs is like a giant mailbox that can easily recieve loads of messages. One throughput unit can handle 1000 messages a second. However it can only recieve messages and not send messages it self. Other services like Azure Functions or Azure Stream Analytics can read the messages from this giant mailbox to process them. When the retention period expires the messages will automatically be deleted. 

Note: If want to recieve messages from IoT devices instead of events/messages from applications and services then you should have a look at Azure IoT hub. A similar mailbox service but specialized in IoT messages.

Posts in this series:

1) Create an Azure Event Hubs Namespace
Lets create an Azure Event Hubs. Note that when you create an Azure Event Hubs via the Azure portal it will only create an Event Hubs Namespace. In the next step we need to add a Event Hub to that namespace.

  • Besides the subscription and the resource group you first need to come up with a name for your namespace. This name is part of the URL and therefor only letters, numbers and hyphens (-) are allowed.
  • Choose your Location wisely: pick a region close to rest of your platform (Power BI, the services/application sending messages, data lake, data warehouse or lake house). This helps to keep down the latency and costs for your real time platform.
  • The Pricing tier depends on your needs. For testing purposes Basic is more than sufficient, unless you want to test Capture for which you need at least Standard.
    For production purposes you also need to take a look at the number of Consumer groups, the retention period and the number of events you need to handle.
  • With the Throughput units you can scale up the capacity. One unit can recieve 1MB per second or a 1000 events per second. For this example we set it to 1.
  • For testing purposes you can keep the rest of the settings to de default value.
Create Azure Event Hubs Namespace


















New/empty Azure Event Hubs Namespace


















2) Add Event Hub to namespace
Now we have an empty Event Hubs Namespace. We need to add an actual event hub to recieve messages. In the left menu go to Event Hubs located under Entities. Then click in the + Event Hub button.
+ Event Hub

















Now think of a name for your Event Hub and set the retention time. In this case we set it to the max for Basic which is 24 hours. This means after 24 hours you will loose the messages. So make sure to process the messages within a day. If you didn't use the cheapest pricing tier then you also get an option to Capture the messages. This will be explained in a separate post, but it is basically a Stream Analytics Job saving all your messages in a data lake.
Adding an Event Hub to the Namespace

















3) Add Shared access policies
To send messages to the Event Hub we need a shared access signature policy (SAS). Now go to the newly created Event Hub (callcenter in this example) and then to Shared access policies in the left menu under Setttings.
Shared Access Policies




















Click on the +Add button to create a new one. In this case we want the callcenter application/service only to send messages. Therefore we also called it sendonly and checked only the Send checkbox.
Sendonly SAS policy







In the script we use the send test messages to the Event Hub we need the Primary or Secondary key of the newly created SAS Policy. You can click on it to retrieve them.
Retrieve primary key











4) Testing
To test the new Event Hub you need to start sending messages. The easiest way to test this is to write a little script in Python or PowerShell. We explained that in a separate blog post. When succesful you can see them arriving in your Event Hub.
Sending test messages












Conclusions
In this post we showed how easy it is to create an Azure Event Hub for a little test setup. The hardest part is to pick the right pricing tier. In most cases the Basic or Standard is probably enough. Within those pricing tiers you can scale up by adding extra Throughput units.

In the next post we will process the messages within the Event Hub via Azure Stream Analytics and sending them to Power BI.

Sunday 5 February 2023

Streaming data Azure & Power BI - Introduction

Case
I want to send streaming data to Power BI for reporting purposes. What should I take into account when choosing the right architecture?
Streaming Data to Power BI













Solution
If you for example have a helpdesk for your customers where they can call or chat for support then you probably also want some real time reports to see the current state of the calls and chats. Most regular Data Warehouses are often only refreshed once a night and then it's already too late to react to incidents.

For real time reports in Power BI have two main options. The first option is to send the events directly to a Power BI Streaming dataset (Push or Streaming) and then build a report and pin reports visuals to a dashboard. This is an appropriate solution for a lot of real time reports, but there are some limitations. For example there is a maximum number of events per second. Once you exceed that limit you start loosing data. Propably just when you need accurate reports the most: when it is very busy in your helpdesk. An other limitation for streaming datasets in Power BI is the history. It keeps only one hour of data.

The second option is to push the data into Azure Event Hubs and then use Azure Stream Analytics to push it to Power BI. This solves the max number of events per second because Stream Analytics can aggregate or filter the data before sending it to Power BI and Stream Analytics can also send it to for example a data lake to solve your history problem.
Streaming Data to Power BI












In this streaming data series we will explain this second option focussing on the hot path and the capture in the data lake which is part of the cold path. Just like for a 'regular' data warehouse architecture there are a lot of different solutions, but this one is probably the most common and simple solution that will fit the majority of cases. One particular new streaming data feature in Azure that is worth mentioning, is writing to a Delta Lake table. At the moment of writing this is still in public preview and only available in a limited number of Azure regions, but this will fit the Lake House architecture very well.

Posts in this series: