Microsoft BI Tools: KQL

Showing posts with label KQL. Show all posts

Tuesday, 31 August 2021

Analyze Azure Data Factory logs - part 3: Power BI

Case
In a previous blog post we showed how to query the logs of Azure Data Factory within Azure Log Analytics, but how can you show that same data in a Power BI report?

Using KQL as source for Power BI

Solution

1) Export KQL query as M query

The previous step of writing that query is the most important step. Make sure you have a query, or better a couple of queries, that represent a dimensional model instead of just a simple table. This will make creating Power BI reports so much easier and faster.

Go to the query editor of the Log Analytics workspace
Then open an existing query (via Query explorer) or write a new one.
Once the query is successful, hit the Export option and choose "Export to Power BI (M query)".
This will result in downloading a text file with an M query in it. Analyze the generated M query and optionally adjust it to your own needs

Export to Power BI (M query)

Note that you can also see your KQL query as a parameter for Rest API call and that besides the M query it also contains some explanation for Power BI.

2) Blank Query Power BI

Now the last step is fairly easy. Especially when your queries are already modeled in a dimensional model. If not you will have to do some extra steps

Open Power BI desktop and choose Get Data
In the Other section at the bottom you will see the Blank Query option
Select it and click Connect. A new Power Query Editor window will open
Now hit the Advanced Editor option in the Home ribbon.
A third Advanced Editor window will open. This is where you copy and paste the content of the downloaded file of step 1 and click on Done.
Give the query a name and click on Close & Apply in the Home ribbon.

Repeat this step for all queries that you created for your report and then you can start creating that report! Well, you might need to add some relations between the tables for the best result.

Add the M query to Power BI

Conclusion

In this post we showed you how to use a KQL query in Power BI. Make sure most of the easy transformations are done within KQL. This saves you a lot of time in Power BI. And as mentioned before: make your query like a fact and dimension table. This makes it so much easier to generate the report.

Monday, 5 July 2021

Analyze Azure Data Factory logs - part 2: queries

Case
In a previous blog post about analyzing ADF logs in Azure Log Analytics we mentioned Kusto Query Language (KQL) as the language used for the standard dashboards, but also for the export to Power BI. How does KQL work and how does it compare to TSQL?

T-SQL vs KQL

Solution
In this episode of the series of ADF - Azure Log Analytics we will show you the very basics of KQL and try to compare it to T-SQL for all us DWH and BI developers. After you understand the basics you can for example edit the Microsoft ADF dashboard (template) in Log Analytics to see the more advanced queries that Microsoft used.

Before you start reading make sure you installed Azure Data Factory Analytics like explained in the first blog post. Without that you can not query on ADF. For these query examples we are using the following three ADF log tables: ADFActivityRun, ADFPipelineRun, ADFTriggerRun. Note that the T-SQL queries are not working and are only used to explain how the KQL queries work.

1) Go to the KQL query editor

To start writing your first KQL query we need to go to the editor in Log Analytics.

Go to your Log Analytics Worspace via the Azure portal
Click on logs in the left menu
Close the query 'welcome window'

Query editor

On the left side of the query editor you see the available tables which you can query. On the bottom right you see the queries that you have executed before. Above the Query history your see the actual query editor. After you have written your first query you can hit the save button to store it. With the Query explorer you can find all your saved queries.

2) Select

The select all columns from a table in KQL is done by only mentioning the table name. In the query result it will show most columns, but not all. Which columns are shown in the result varies per table.

// Get all Pipeline runs
ADFPipelineRun 

// TSQL Equivalent
SELECT * FROM ADFPipelineRun

However you can use the Columns button in the result to (un)select certain (un)wanted columns, but note that this is for the current execution only!

Select 'all' columns in KQL

If you want specific columns in a specific order then we need to use the | project option. Start the next line in the query editor with a pipeline and the word project, followed by a comma separated columnlist:

// Get specific Pipeline runs
ADFPipelineRun 
| project PipelineName, Category, Status, Start, End

// TSQL Equivalent
SELECT PipelineName, Category, Status, Start, End FROM ADFPipelineRun

SELECT specific columns in KQL

3) Where

The WHERE clause in KQL is done by using the | where option quite similar to T-SQL. Note that by default a where is case-sensitive.

// WHERE case-sensitive
ADFPipelineRun 
| where PipelineName == "000_Master"

// TSQL Equivalent
SELECT * FROM ADFPipelineRun WHERE PipelineName = '000_Master'

To make the query case-insenitive you can replace the second = (equals sign) by a ~ (tilde sign).

// WHERE case-insensitive
ADFPipelineRun 
| where PipelineName =~ "000_MASTER"

A like in the where clause can be done with a contains. No need for adding wildcards via %. Furthermore there are a lot more useful string oparations available in the documentation.

// WHERE like
ADFPipelineRun 
| where PipelineName contains "Master"

// TSQL Equivalent
SELECT * FROM ADFPipelineRun WHERE PipelineName like '%Master%'

WHERE is KQL

4) Order by

The ORDER BY clause in KQL is done by using the | order by option again quite similar to T-SQL. You can use the same asc and desc option as in T-SQL.

// ORDER BY
ADFPipelineRun 
| order by Start desc 

// TSQL Equivalent
SELECT * FROM ADFPipelineRun ORDER BY PipelineName desc

KQL seems to be a bit easier with the placement of the order by in the query. You can also first order and then filter, but lets try to keep the same order as T-SQL. An other neat addition of order by is to use of 'nulls first' or 'nulls last' to handle null values.

// ORDER BY
ADFPipelineRun 
| order by Start desc nulls last

ORDER BY in KQL

5) Calculated Columns
To create calculations or just simple aliases we need to use the | extend option. In this first example we are using a case statement to add a status order based on the status column. If you are using project then don't forget to add this new column to the project line.

// Create extra column
ADFPipelineRun 
| extend StatusOrder = case (
    Status == "Queued", "1",
    Status == "InProgress", "2",
    Status == "Failed", "3",
    Status == "Succeeded", "4",
    "-1")
| project PipelineName, Category, Status, StatusOrder, Start, End

// TSQL Equivalent (you can also use an alias with AS behind the END instead of in front of the CASE)
SELECT PipelineName
,      Category
,      Status
,      StatusOrder =  CASE Status  
         WHEN 'Queued' THEN 1
         WHEN 'InProgress' THEN 2
         WHEN 'Failed' THEN 3  
         WHEN 'Succeeded' THEN 4
         ELSE -1
       END
,      Start
,      End
FROM   ADFPipelineRun

Create new column with CASE statement

You can also use this to create an alias (=copy) for a column and if you have multiple extends then put a comma between each extend.

// Multiple extends and adding 'alias'
ADFPipelineRun 
| extend Count = 1, Region = Location

6) Group by

The GROUP BY clause in KQL is done by using the | summarize by option also quite similar to T-SQL, but everything is done on a single line. The summarize operator has a lot of options compared to T-SQL.

// GROUP BY
ADFActivityRun 
| summarize sum(ActivityIterationCount) by PipelineName

// TSQL Equivalent
SELECT sum(ActivityIterationCount)
FROM   ADFActivityRun 
GROUP BY PipelineName

And if you don't want a boring table then you can always add the | render option to for example add a beautiful piechart.

// GROUP BY
ADFActivityRun 
| summarize sum(ActivityIterationCount) by PipelineName
| render piechart

Rendering visuals with KQL

7) Expand JSON fields
You will notice that there are a few JSON columns in the ADF tables. In T-SQL you can use OPENJSON to extract data from those JSON columns, but KQL has an easier solution.

JSON columns

It uses the extend and mv-expand option. It's a 'three-stage rocket', but very easy for simple JSON structures. The best feature is that it doesn't throw errors when the JSON of a particular record doesn't contain an expected JSON field. It just shows a null value.

// Extract JSON data
ADFPipelineRun
| extend    PredecessorsObject = parse_json(Predecessors)
| mv-expand PredecessorsObject 
| extend    InvokedByType = PredecessorsObject.InvokedByType
,           CallingPipelineRunId = PredecessorsObject.PipelineRunId
,           CallingPipelineName = PredecessorsObject.PipelineName
| project  RunId, CorrelationId, PipelineName, Status, Start, InvokedByType, CallingPipelineRunId,CallingPipelineName
| order by Start

The first extend row in the query parses the JSON column to a JSON object with the parse_json(json) operator (or alias todynamics).
The row below that with the mv_expand operator is expanding all columns in the JSON object.
The next row with an other extend allows you to create new columns based on the columns in the expanded JSON object.

Extracting JSON message in KQL

Conclusion

In this post you, as an experience T-SQL writer, learned the very basics of the Kusto Query Language (KQL) to extract information from the Azure Data Factory log data. It will probably take a little while getting used the format of KQL queries but after that you will notice that is has some very useful options to extract information from the log data.

In a follow up post we will use this knowledge to write a couple of KQL queries and export them to Power BI to create a report on your ADF log data. Of course making a dimensional model with those queries makes it much easier and faster to create some fancy reports.

Sunday, 6 June 2021

Analyze Azure Data Factory logs - part 1: setup

Case
Azure Data Factory has a complete monitor that logs all details, but besides a simple filter it has no customization options and we don't want to add old fashioned custom logging to each pipeline with stored procedures to create our own logging. Is there a better alternative?

Azure Data Factory & Azure Log Analytics

Solution
You can use Azure Log Analytics Workspaces to store and analyze the logs of ADF. Within the Azure Market place you can find some standard dashboards to monitor and manage your Data Factory, but you can also write custom queries with Kusto Query Language (KQL). And even better use those custom queries in an Power BI report to create a (semi) live dashboard on your ADF logs.

In the first part of this series of blog posts we will focus on setting up the Log Analytics Workspace and adding the standard dashboards from the Azure market place. Prerequisites: an Azure Data Factory with a couple of pipelines that can run to generate some log data.

1) Create Log Analytics Workspace

For this example we will use a free trail workspace where the retention is limited to 7 days and the a max of 500mb of daily data ingestion. Perfect for a first introduction to Log Analytics.

Go to the Azure Portal and create a new Log Analytics workspace with a describing name and in the same region as your Data Factory.
On the Pricing tier tab choose Free and then hit the Review + Create button.

Create Log Analytics Workspace for ADF

2) Add Azure Data Factory Analytics from Azure Market Place

Next step is to add Data Factory Analytics from Azure Market Place. Note that at the time of writing it is stil in preview.

Go to the URL and click on the Get It Now button.
Select your Log Analytics Workspace and click on Create

Add ADF Analytics from Azure Market Place

3) Connect ADF to Log Analytics Workspace

Now we need to tell your Data Factory to send its logs to the new Log Analytics Workspace.

Go to the ADF Overview page in the Azure Portal and click on Diagnostic settings in the left menu
Click on + Add diagnostic settings
Give the diagnotsic setting a name because you could create multiple ones: ADFDiagnostics
Now first choose what you want to log. The first three are for ADF Activities, Pipelines and Triggers. The second group of items that start with Sandbox are the debug runs and the third group of items are for SSIS. For this example we will only select the first three.
Secondly choose which metrics you want to log: AllMetrics
And thirdly choose the destinations: Send to Log Analytics Workspace
Now select you workspace and click on Save

Connect ADF to the Analytics Workspace

4) See standard dashboards
Now lets see that standard ADF dashboard in Azure Log Analytics, but first make sure you run a couple of ADF pipelines so that there is actually something to see on the dashboard because old runs won't be available. (Note: It could take a couple of minutes before you see the first data appear in Log Analytics)

In Log Analytics the dashboard is called a workbook. You can either directly go to all the workbooks and then search for "AzureDataFactoryAnalytics" or first go Solutions and then to the ADF solution and after that to the workbooks. The second way is a few more clicks, but the list of workbooks will be smaller.

Go to Azure Log Analytics and click on Workbooks in the left menu
In the list of public templates scroll down and click in AzureDataFactoryAnalytics
When opened scroll down to see all visuals.
After the first look you could hit the edit button and start customizing.

AzureDataFactoryAnalytics

ADF dashboard in Log Analytics

Conclusion

In this post you learned how to set up Log Analytics for ADF. Now you have that default workbook (template) which you can adjust to your own needs, but you can also see which KQL queries Microsoft wrote to create those visuals. In a next blogpost we will explain the basics of KQL and show you how you can use those in Power BI to create an ADF dashboard. This could save you building custom log mechanisms because most information is already available in the ADF logs.