Showing posts with label EVENTHUB. Show all posts
Showing posts with label EVENTHUB. Show all posts

Sunday 19 February 2023

Streaming data Azure & Power BI - Stream Analytics

Case
I want to send streaming data to Power BI for realtime reporting purposes. This post is part of a series on Streaming Data in Azure and Power BI and focusses on getting the streaming data from the Event Hub and sending it to a Power BI streaming dataset with Azure Stream Analylics.
Azure Stream Analytics
















Solution
In the previous post we sent messages to the Event Hub and in this post we will read those messages and send them to Power BI. For this we need a service that can handle streaming data. Azure Stream Analytics is the query tool for streaming data in Azure. It uses the regular SQL language (so easy to learn). However it has some extras like the windowing option for the GROUP BY because you can not aggregate the entire stream (it never ends), but you can aggregate within a certain time window.

Posts in this series:

First explain the three most important items of the Stream Analytics Jobs.

Streaming inputs and reference inputs
There are three types of input sources in the query for streaming data: Azure Event Hubs, Azure IoT Hub and Azure Data Lake. But there is also a reference input where your can use an Azure SQL Database table or a blob storage file to enrich or translate your streaming data. Streaming messages are often kept small to speed up everything by using IDs or Codes instead of long strings. With the reference data you can transform them for reporting purposes.

Streaming Outputs
There are several streaming outputs like the Storage Accounts, a SQL database or even a Cosmos DB, but the one we are using for this example is the Power BI output. The Power BI output has a limitation that it can be called roughly once every second and the messages can't be larger then 15KB. This means that you will need a windowing function to aggreate the data to make sure you don't overflood Power BI.

Query
The query language in Stream Analytics is a subset of T-SQL, so very similar and easy to learn if you already know how to query a Microsoft SQL server database. The most common query pattern is a SELECT INTO with a GROUP BY.
SELECT      SomeColumn, Count(*) as Count
INTO        OutputStream
FROM        InputStream TIMESTAMP BY CreatedAt
GROUP BY    SomeColumn, TumblingWindow(second, 30)

Let's create a new Azure Stream Analytics Job and configure a query to push data to Power BI.


1) Create Stream Analytics Job
Creating the new SA Job is very straightforward. For this example the region and the number of streaming units are the important parts.
  • Go to the Azure portal an create a new Stream Analytics job
  • Select the correct subscription and resource group
  • Come up with a good descriptive name for your job. Keep in mind that there is only one query window but you can run multiple queries within that.
  • Choose the correct region. The same region as your even hub and your Power BI tenant is the best for performance.
  • Hosting environment is Cloud (unless you have an on-premises Edge environment.
  • Last thing is the number of Streaming units. This is where you start paying. The default is 3 SUs, but for testing purposes or small jobs 1 SU is more then enough.
  • Under storage you can setup a secure storage account. For this example you can leave that empty.
  • Optionally add some Tags and then review and create the ne SA Job.
Create Streaming Analytics Job




































2) Create Input stream
Now that we have a SA job we first need to create an input to connect to our Event Hub.
  • Go to your SA job in the Azure portal and click on Inputs in the left menu under Job topology
  • In the upper left corner click on + Add stream input and choose Event Hub
  • A new pane appears on the right site. First enter a descriptive name for your Event Hub.
  • Now select the correct Subscription and Event Hub Namespace.
  • Select the existing Event hub name that we created in the previous blog post.
  • Select the existing Event Hub consumer group which we left default in the previous blog post
  • For the Authenication mode the easiest way is to select Create system assigned managed identity. This means this specific SA job will get access to the selected Event Hub.
  • Partition key is for optimizing performance if your input in indeed partitionized. You can leave it empty for this example
  • We used a JSON structure for our test messages. Therefore select JSON as Event serialization format.
  • Select UTF-8 as encoding (the only option at this moment)
  • Leave the Event compression type to None for this example

Create new Input



















After saving you will see a couple of notifications to create and test your new input. This takes about a minutes to complete.
Notifications for new Input












3) Create Output stream
After the input it's now time for adding the output to Power BI. For this you need a Power BI workspace where you have admin rights. Once the job is running for the first time and new events are streaming then Stream Analytics will create a Streaming Dataset in your workspace.
  • Go to your SA job in the Azure portal and click on Outputs in the left menu under Job topology
  • In the upper left corner click on + Add and choose Power BI
  • A new pane appears on the right site. First enter a descriptive name for your Power BI output
  • Next select the Power BI workspace where your streaming dataset will appear
  • For Authenication mode choose Managed Identity-System assigned. This specific job will then be added as a Contributor
  • At last enter a Dataset name and a Table name. Note that you can only have one table in a streaming dataset. So don't create a second output to the same Power BI streaming dataset with a different table (we tried).
Create new Output



















After saving you will see a couple of notifications to create and test your new output. This takes about a minute to complete.
Notifictions for new Output

`








In Power BI your will see your new Stream Analytics job as a Contributor in your workspace. However the streaming dataset will only appear once the job is running and pushing data.
Stream Analytics Job as Contributor












4) Create query
With the new Input and Output we will now create a very basic query to push the test data from Azure Event Hub via Azure Stream Analytics to Power BI.
  • Go to your SA job in the Azure portal and click on Query in the left menu under Job topology
  • A basic but working query will alread be created for you if you have an input and output
  • Once opened wait a few moments for Azure Event Hub to refresh. You will see a turning circle icon behind your input. If there is data in your event hub then it will appear after a few moments
Default query and waiting for input data









Default query with data from input

















  • Now you can adjust your query by only selecting the columns you need in Power BI (less is more). The windowing function is not required if the number of messages don't exceed the limits of Power BI
  • After editing the query hit the Test query button and check the Test results.
Test result of basic query without time window
















Test result of basic query with TumblingWindow












  • Once you are satisfied with the query result hit the Save Query button above the editor
Save query once you're ready











The query for testing (without windowing).
SELECT
    CallId
,   DurationInSeconds
,   EventEnqueuedUtcTime as CallTimeStampStr
INTO
    [pbbitools]
FROM
    [ehbitools]
If you send more request to Power BI than it can handle, then Stream Analytics will try batching multiple messages into one request. For small messages that occasionally exceed the max number of messages this could be a 'workaround'. However batching multiple messages could also cause to exceed an other limit: the max message size. A better solution is to use a windowing function to slow down the stream.

5) Start SA job
Now go back to the overview page of your Stream Analytics job and hit the Start button to start your SA job. The first time you can choose between Now or Custom as a start time to recieve new messages. The second time you can also choose for When last stopped. For this example we choose Now. It will take a view moments to change the status from Created to Starting to Running. 
SA Job is running
















Once it is started AND new data is send to the Event Hub then the streaming dataset will appear in your Power BI workspace. Notice that the red icon is different compared to a regular dataset.
A new dataset appeared for the streaming data









Conclusion
In this post you learned how to create a (very basic) Stream Analytics job. In a follow up post we will explain the Window Functions in more details. For now you can read our old post about Window Functions, but compared to 6,5 years ago we now have 2 new Windowing functions. However you probably end up using the old TumblingWindow. The next post in this series will be showing the live data in Power BI with automatically changing visuals when new data is collected.




Monday 6 February 2023

Streaming data Azure & Power BI - Event Hubs

Case
I want to send streaming data to Power BI for reporting purposes. This post is the first part of a series on Streaming Data in Azure and Power BI. Here we will focus on the 'mailbox' service called Azure Event Hubs.
Azure Event Hubs













Solution
Azure Event Hubs is like a giant mailbox that can easily recieve loads of messages. One throughput unit can handle 1000 messages a second. However it can only recieve messages and not send messages it self. Other services like Azure Functions or Azure Stream Analytics can read the messages from this giant mailbox to process them. When the retention period expires the messages will automatically be deleted. 

Note: If want to recieve messages from IoT devices instead of events/messages from applications and services then you should have a look at Azure IoT hub. A similar mailbox service but specialized in IoT messages.

Posts in this series:

1) Create an Azure Event Hubs Namespace
Lets create an Azure Event Hubs. Note that when you create an Azure Event Hubs via the Azure portal it will only create an Event Hubs Namespace. In the next step we need to add a Event Hub to that namespace.

  • Besides the subscription and the resource group you first need to come up with a name for your namespace. This name is part of the URL and therefor only letters, numbers and hyphens (-) are allowed.
  • Choose your Location wisely: pick a region close to rest of your platform (Power BI, the services/application sending messages, data lake, data warehouse or lake house). This helps to keep down the latency and costs for your real time platform.
  • The Pricing tier depends on your needs. For testing purposes Basic is more than sufficient, unless you want to test Capture for which you need at least Standard.
    For production purposes you also need to take a look at the number of Consumer groups, the retention period and the number of events you need to handle.
  • With the Throughput units you can scale up the capacity. One unit can recieve 1MB per second or a 1000 events per second. For this example we set it to 1.
  • For testing purposes you can keep the rest of the settings to de default value.
Create Azure Event Hubs Namespace


















New/empty Azure Event Hubs Namespace


















2) Add Event Hub to namespace
Now we have an empty Event Hubs Namespace. We need to add an actual event hub to recieve messages. In the left menu go to Event Hubs located under Entities. Then click in the + Event Hub button.
+ Event Hub

















Now think of a name for your Event Hub and set the retention time. In this case we set it to the max for Basic which is 24 hours. This means after 24 hours you will loose the messages. So make sure to process the messages within a day. If you didn't use the cheapest pricing tier then you also get an option to Capture the messages. This will be explained in a separate post, but it is basically a Stream Analytics Job saving all your messages in a data lake.
Adding an Event Hub to the Namespace

















3) Add Shared access policies
To send messages to the Event Hub we need a shared access signature policy (SAS). Now go to the newly created Event Hub (callcenter in this example) and then to Shared access policies in the left menu under Setttings.
Shared Access Policies




















Click on the +Add button to create a new one. In this case we want the callcenter application/service only to send messages. Therefore we also called it sendonly and checked only the Send checkbox.
Sendonly SAS policy







In the script we use the send test messages to the Event Hub we need the Primary or Secondary key of the newly created SAS Policy. You can click on it to retrieve them.
Retrieve primary key











4) Testing
To test the new Event Hub you need to start sending messages. The easiest way to test this is to write a little script in Python or PowerShell. We explained that in a separate blog post. When succesful you can see them arriving in your Event Hub.
Sending test messages












Conclusions
In this post we showed how easy it is to create an Azure Event Hub for a little test setup. The hardest part is to pick the right pricing tier. In most cases the Basic or Standard is probably enough. Within those pricing tiers you can scale up by adding extra Throughput units.

In the next post we will process the messages within the Event Hub via Azure Stream Analytics and sending them to Power BI.

Sunday 5 February 2023

Streaming data Azure & Power BI - Introduction

Case
I want to send streaming data to Power BI for reporting purposes. What should I take into account when choosing the right architecture?
Streaming Data to Power BI













Solution
If you for example have a helpdesk for your customers where they can call or chat for support then you probably also want some real time reports to see the current state of the calls and chats. Most regular Data Warehouses are often only refreshed once a night and then it's already too late to react to incidents.

For real time reports in Power BI have two main options. The first option is to send the events directly to a Power BI Streaming dataset (Push or Streaming) and then build a report and pin reports visuals to a dashboard. This is an appropriate solution for a lot of real time reports, but there are some limitations. For example there is a maximum number of events per second. Once you exceed that limit you start loosing data. Propably just when you need accurate reports the most: when it is very busy in your helpdesk. An other limitation for streaming datasets in Power BI is the history. It keeps only one hour of data.

The second option is to push the data into Azure Event Hubs and then use Azure Stream Analytics to push it to Power BI. This solves the max number of events per second because Stream Analytics can aggregate or filter the data before sending it to Power BI and Stream Analytics can also send it to for example a data lake to solve your history problem.
Streaming Data to Power BI












In this streaming data series we will explain this second option focussing on the hot path and the capture in the data lake which is part of the cold path. Just like for a 'regular' data warehouse architecture there are a lot of different solutions, but this one is probably the most common and simple solution that will fit the majority of cases. One particular new streaming data feature in Azure that is worth mentioning, is writing to a Delta Lake table. At the moment of writing this is still in public preview and only available in a limited number of Azure regions, but this will fit the Lake House architecture very well.

Posts in this series:






Tuesday 13 September 2022

Sending test messages to Azure Event Hubs

Case
I want to send dummy messages to Azure Event Hubs to test the streaming process before the actual service starts sending real messages.  Is there an easy way to send a bulk load of test messages to Azure Event Hubs?
Sending messages to Azure Event Hubs











Solution
Yes we could use for example a little bit of PowerShell code to send dummy messages to Azure Event Hubs via a Rest API. To do that we first need to collect some names and a key from Azure Event Hubs.

1)  Namespace and Event Hub name
Go to your Event Hubs Namespace in the Azure Portal and click on Event Hubs on the left side. In the top left corner you will find the Event Hubs Namespace (1) and in the list in the center of the page you will find the name of your Event Hub (2). Copy these names to your Powershell editor.
Namespace and Event Hub name


















2) Shared access policies Name and Key
Now click on your Event Hub in the list above and then click on Shared access policies in the left menu. If there is no policy then create one (send is enough for testing). Then click on the policy to reveal the keys. Copy the Name (3) and one of the keys (4) to your Powershell editor.
Shared access policies name and key








3) The script
Now you have all the things you need from your event hub. Time to do some PowerShell coding. The top part of the script is to create a Shared Access Signature token (SAS token). This token is needed for authorization in the Rest API. In this part of the script you will also need to specify the names and key from the previous two steps under EventHubs Parameters.

The second part is sending a messsage via a Rest API to your event hub. To make it a little more usefull there is a loop to send multiple messages with a pause between each message. You must adjust the dummy message to your own needs by changing the column names and values. You can also specify the number of messages and the pause between each message.
####################################################################
# Create SAS TOKEN FOR AZURE EVENT HUBS
####################################################################
# EventHubs Parameters
$EventHubsNamespace = "bitools"
$EventHubsName = "myeventhub"
$SharedAccessPolicyName = "SendOnly"
$SharedAccessPolicyPrimaryKey = "1fhvzfOkVs+MxsZ/fakeZwrHTImD3YCCN7CGqYCAFN8kU="

# Create SAS Token
[Reflection.Assembly]::LoadWithPartialName("System.Web")| out-null
$URI = "$($EventHubsNamespace).servicebus.windows.net/$($EventHubsName)"
$Expires = ([DateTimeOffset]::Now.ToUnixTimeSeconds())+3600
$SignatureString = [System.Web.HttpUtility]::UrlEncode($URI)+ "`n" + [string]$Expires
$HMACSHA256 = New-Object System.Security.Cryptography.HMACSHA256
$HMACSHA256.key = [Text.Encoding]::ASCII.GetBytes($SharedAccessPolicyPrimaryKey)
$SignatureBytes = $HMACSHA256.ComputeHash([Text.Encoding]::ASCII.GetBytes($SignatureString))
$SignatureBase64 = [Convert]::ToBase64String($SignatureBytes)
$SASToken = "SharedAccessSignature sr=" + [System.Web.HttpUtility]::UrlEncode($URI) + "&sig=" + [System.Web.HttpUtility]::UrlEncode($SignatureBase64) + "&se=" + $Expires + "&skn=" + $SharedAccessPolicyName


####################################################################
# SEND DUMMY MESSAGES
####################################################################
# Message Parameters
$StartNumber = 1
$NumberOfMessages = 10
$MillisecondsToWait = 1000

# Determine URL and header
$RestAPI = "https://$($EventHubsNamespace).servicebus.windows.net/$($EventHubsName)/messages"

# API headers
$Headers = @{
            "Authorization"=$SASToken;
            "Content-Type"="application/atom+xml;type=entry;charset=utf-8";
            }

# Screenfeedback
Write-Host "Sending $($NumberOfMessages) messages to event hub [$($EventHubsName)] within [$($EventHubsNamespace)]"

# Loop to create X number of dummy messages
for($i = $StartNumber; $i -lt $NumberOfMessages+$StartNumber; $i++)
{
    # Create dummy message to sent to Azure Event Hubs
    $Body = "{'CallId':$($i), 'DurationInSeconds':$(Get-Random -Maximum 1000)}"

    # Screenfeedback
    Write-Host "Sending message nr $($i) and then waiting $($MillisecondsToWait) milliseconds"

    # execute the Azure REST API
    Invoke-RestMethod -Uri $RestAPI -Method "POST" -Headers $Headers -Body $Body

    # Wait a couple of milliseconds before sending next dummy message
    Start-Sleep -Milliseconds $MillisecondsToWait
}
When you run the PowerShell script with these parameters then 10 messages will be sent to your Event Hub.

Executing the Powershell script

















4) Check messages in the Event Hub
Now we can check the messages in your event hub. Go to your eventhub (myeventhub in our case) and click on Process data. Then find the Stream Analytics Query editor and execute the query.
Stream Analytics Query editor























Here can see the contents of your 10 dummy messages with a very basic query.
The result












Conclusion
In this post you learned how to test the setup of your Event Hub and if you for example also connect to Azure Stream Analytics and a Power BI streaming dataset with a report and dashboard then you can also see the messages arriving live in your Power BI Dashboard. This will be shown in a separate post.

Note that the script is sending the messages one by one with an even period between each message. You could for example also make the pause period random or even execute the script in multiple PowerShell ISE editors at once to simulate a more random load of arriving messages.

Do you have an easier way to send test messages or a more sophisticated script then please share your knowledge in the comments below.