Wednesday 23 November 2016

Azure - Understanding Stream Analytics Blob Destination

Case
How do you use Azure Blob Storage to store stream analytics data and why would you do that?

Solution
In earlier posts we have set up an IoT environment with an IoT Hub and a couple of Stream Analytics Jobs where sensor data is sent to different destinations: Power BI for a real-time dashboard and Azure SQL Database to store the data. In this case we create a new Stream Analytics Job with Blob Storage as output and we use the sensor data as input. The Input and Output are different, but the query is the same as other Stream Analytics Jobs. Configuring the query of the Stream Analytics Job can you find here.

Reasons to use Blob Storage is the diversity of storing in the cloud of text or binary files. The unstructured data can include documents, social data (photos, videos, music and blogs), Big Data (logs, IoT and large datasets) or images and text for web applications. Click here for more pricing details about Blob Storage. You can also store (sensor) data in an Azure Data Lake. Click here for more information about this, along with the major differences between Blob Storage and Data Lake.

For the use of Azure Blob storage you need a Storage account. Then you can add containers to this account, which include the Blob files. In the case of an image, you also find a metadata file. In our case the Blob files are JSON files and contain sensor data.

Overview Azure Blob Storage with images











1) Create a Storage account
To make use of Blob Storage you have to create a Storage account first. After we give it a suitable name, we choose Blob storage and 'RA-GRS' as Replication. This is the default and it contains the most options. Click here for more information about this. Next we choose 'Hot' by Access tier, because we want access the sensor data frequently. 


Azure Portal - Create a Storage account















Note:
You can also create a Storage account when setting up the Output of the Stream Analytics Job, but you have less options so it is not recommended. 

2) Create the Stream Analytics Job
Before creating the new job, we had already add a new consumer group to our IoT Hub. We called it 'blob'. Using multiple consumer groups makes it possible for several consumer applications to read data from this IoT Hub independently. Click here to see where you can add/manage consumer group(s). 

Now we can create the job. We choose the same Resource group as the IoT Hub and Storage account, because these are in the same life cycle. Our Location is the Netherlands, so we choose West-Europe.

Azure Portal - Create Stream Analytics Job














3) Configure the Stream Analytics Job
First we must add a new Input to the job. The default Source Type is 'Data stream'. We choose this because  the sensor data is an ongoing stream and is derived from the IoT Hub. The Source is 'IoT hub' and then the IoT Hub that you have created automatically appears. If you have more then one IoT Hub, you can choose one from the drop-down list. After this you must choose the right Consumer group. This is the new group (blob) we have created earlier. Finally you choose 'JSON' as Event serialization format

Next we add a new Output. First choose 'Blob storage' in Sink and 'Use blob storage from current subscription' as Subscription, because you have configured the storage account earlier. Otherwise you can edit the storage account settings here by choosing 'Provide blob storage settings manually'. Then you create a new container. Optionally, you can define one or more instances (subfolders) within the container. With this option you can change the date and time format so you can have multiple instances including different dates and/or time folders. This makes it more clear (just like your own local File Explorer) and you have the choice to select data from a specific day/time. We made a instance called 'sensor'. At last you will choose the 'JSON' format. 

As we said, we discuss only the configuration of the Input and Output of the job in this post. After configured the job we will run the job with a valid query. 

Azure Portal - Configure the Stream Analytics Job for Blob














Result
Now the data is stored, we want to see what's in our Blob. Therefore you have to go to your Azure Storage account in the portal. Every object, in our case a Blob, that you store in Azure Storage has a unique URL address. For the Blob service with the storage account name (bitoolsblobstorage) you have created and the container name (sensordata) with the instance (sensor) the URL/endpoint is: 
http://bitoolsblobstorage.blob.core.windows.net/sensordata/sensor
More information about the Azure Storage endpoints here.

In the portal go to the Storage account you have made earlier. Click on the container URL and then you see the instance (subfolder). Now you can drill down further on the specific month, day and hour of the incoming sensor data.

Azure Portal - Your Blob file














Conclusion
It looks a lot like the other Stream Analytics posts, but in this case the data is stored in a Blob file. From this point you have several options to do something with this data. First you can do nothing off course and in that case you use Blob purely for storage (backup). You will find the other options in the Cortana Intelligence Suite. For example process the data with Azure Data Factory (this can also be done with traditional SSIS, which is off course not a part of the CIS), analyse the data with Machine Learning or visualize the data in Power BI. 










Thursday 17 November 2016

Extracting tweets with Azure App Service - Logic App

Case
I want to extract tweets from twitter to my data warehouse without writing code. Is that possible?
Extract #SSIS Tweets













Solution
There are of course several options with traditional ETL tools like SSIS with custom .NET code, a custom twitter tasks or even by calling the Twitter API within R. But Microsoft also recently introduced Microsoft Flow and Microsoft Azure Logic Apps which can connect to various apps and services (like Facebook, Google Drive, GitHub, Twitter, SharePoint, etc.) to collect data, synchronize files or get notifications of certain events, without writing any code! The basics of Flow are free and Logic App is the professional version in Azure with of course some more options.

Solutions:
A) Microsoft Flow
B) Microsoft Azure Logic Apps

1) Microsoft Azure Logic Apps

First go to portal.azure.com, login and create a new Logic App (located under Enterprise Integration). The name of our Logic App is called SSISTweets (you have to choose a different name) and we use West-Europe as the location of the Resource Group since we life in the Netherlands. When you click on the create button the Logic App will be created and you can start editing. You can choose between various default templates or you can create a Blank LogicApp.
Creating a new Logic App

















If you have also tried Microsoft Flow, you will probably notice that there are less templates to choose from. We will choose the Blank LogicApp for this example.

2) Twitter source
In the blank LogicApp you first need to search for a trigger. In this case we want to search for tweets on twitter. The trigger is 'When a new tweet is posted'.
Blank LogicApp searching for Twitter trigger


















When you select the Twitter trigger you need to setup the Twitter connection and the search text. For this example I will login with my Twitter account and search for tweets with #SSIS. You can also change the frequency and interval.
Connecting to Twitter and setting up search


















3) Filtering annoying job tweets
Of course we want to filter those annoying job tweets. Therefore we need to add a condition. In the condition pane you can select fields from the Twitter 'source' to filter on. In this example we are filtering tweets that contain the word 'job' (probably not the perfect filter).
Add 'job' filter


















4) Add SQL Destination
Now we need to store the tweets in a (Azure) SQL Server database. For this example we created a simple table with all nvarchar(255) fields:
CREATE TABLE [dbo].[Tweets](
    [Name] [nvarchar](255) NULL,
    [TweetText] [nvarchar](255) NULL,
    [TweetTime] [nvarchar](255) NULL
)

Add a new step and search for SQL Server - Insert row. Then enter the credentials from your database, select the table (must be an existing table) and map the Twitter fields to the appropriate table columns.
SQL Destination - Insert row


















5) Save LogicApp
After the last step has been added click on Save to save your work and then on Done. Now your newly created flow is ready to run.
Saving the LogicApp

















Now click on the run button and watch the table!
Run LogicApp


















The result
















Conclusion
Microsoft Logic App is very easy to use and you don't need programming skills. There are dozens of apps and services to connect to. If you only need one Logic App and a 15 minute refresh is enough then you could also consider using Microsoft Flow which is free of charge.

Extracting tweets with Microsoft Flow

Case
I want to extract tweets from twitter to my data warehouse without writing code. Is that possible?
Extract #SSIS Tweets













Solution
There are of course several options with traditional ETL tools like SSIS with custom .NET code, a custom twitter tasks or even by calling the Twitter API within R. But Microsoft also recently introduced Microsoft Flow and Microsoft Azure Logic Apps which can connect to various apps and services (like Facebook, Google Drive, GitHub, Twitter, SharePoint, etc.) to collect data, synchronize files or get notifications of certain events, without writing any code! The basics of Flow are free and Logic App is the professional version in Azure with of course some more options.


Solutions:
A) Microsoft Flow
B) Microsoft Azure Logic Apps


1) Microsoft Flow
First go to flow.microsoft.com and login with your Microsoft Account. You can now search for existing templates or create a flow from blank. We will start with a blank flow for this example.
Searching for Twitter templates or use the blank flow


















2) Twitter source
In the blank flow you first need to search for a trigger. In this case we want to search for tweets on twitter. The trigger is 'When a new tweet is posted'.
Blank flow searching for Twitter trigger





















When you select the Twitter trigger you need to setup the Twitter connection and the search text. For this example I will login with my Twitter account and search for tweets with #SSIS.
Connecting to Twitter and setting up search






















3) Filtering annoying job tweets
Of course we want to filter those annoying job tweets. Therefore we need to add a condition. In the condition pane you can select fields from the Twitter 'source' to filter on. In this example we are filtering tweets that contain the word 'job' (probably not the perfect filter).
Add ' job' filter


















4) Add SQL Destination
Now we need to store the tweets in a (Azure) SQL Server database. For this example we created a simple table with all nvarchar(255) fields:
CREATE TABLE [dbo].[Tweets](
    [Name] [nvarchar](255) NULL,
    [TweetText] [nvarchar](255) NULL,
    [TweetTime] [nvarchar](255) NULL
)

Add a new step and search for SQL Server - Insert row. Then enter the credentials from your database, select the table (must be an existing table) and map the Twitter fields to the appropriate table columns.
SQL Destination - Insert row


















5) Create flow
After the last step has been added click on Create flow and then on Done. Now your newly created flow is ready to run. Watch the table!
Create flow


















The result
















Conclusion
Microsoft Flow is very easy to use and you don't need programming skills. There are dozens of apps and services to connect to.
But to keep it for free you can only check once each 15 minutes and only do 750 runs a month. That should be enough for at least on flow, but if that's not enough you can either switch to a premium account which gives you more runs, more checks and even more services to connect to. Or you could switch to professional Microsoft Azure Logic Apps.