Monday, 31 December 2018

Power BI - Introducing dataflows

Case
Power BI recently introduced dataflows. What is it and who should use it?
Power BI dataflows












Solution
According to Microsoft this new addition is for self-service ETL by business annalists and BI professionals. A minor addition from our side: especially business annalists that are skilled with tools like Excel and BI professionals with already a focus on Power BI will probably be very exiting using this new addition. An other group that could benefit of this new feature is the technical application administrator that has a great amount of knowledge of the data model of their application. With dataflows the could do some data preparation for the business annalists

First of all you must have at least a Power BI Pro license and this new preview feature is a Power BI Service-only feature that is not (yet?) available in Power BI Desktop. However you can use the result of a Data Flow which we will show you in the last step.

1) Power BI Service
To create a dataflow, sign in to Power BI Service and go to one of your workspaces, but not My Workspace. Or create a new workspace. Here you will find a new tab called "Dataflows (preview)" and in the Create-menu there is a dataflow option. Click on it to create a new dataflow.

Power BI dataflows - Create new dataflow
















2) Add new entities
In this step we will add a source for our dataflow. You can map this data to one of the standard Common Data Model (CDM) entities, but you can also create 'custom' entities that are not mapped. The CDM is a standard model for example for contacts or accounts to which you can map your sources like CRM or SalesForce. It should make it easier for development, but also analytics. In a subsequent post we will explain the CDM.

For this example we will not map to CDM, but create a 'custom' entity. We will use a text file that contains sensor data. We use Azure Blob Storage to store this data, but of course you can also use an internal file share. However then you first need to install the On-premises data gateway.


As told, we choose "Azure Blobs" as data source and next fill in the connection settings. If you have ever used this storage account before then it will remember the account key. Choose your Blob folder which contains the data and now you can transform your data similar to the Power Query Editor in Power BI Desktop.

Power BI dataflows - Create new Entity
















3) Edit Query
Just like Power BI Desktop you can do data preparation inside an "Edit Query" mode. This Power Query version does not have the full functionality compared to Power BI Desktop, but the expectation is that it will be extended in the upcoming releases. Some limitations of the current Query Editor in dataflows are: you cannot change the datatypes of the columns or use the 'Group By' function.

Power BI dataflows - Power Query Editor















There are also differences between a Pro license and Premium. In case you are doing some basic transformations like combining two queries (using Merge or Append queries) it will cause the following warning when using a Pro license:
This dataflow contains computed entities, which require Premium to refresh. To enable refresh, upgrade this workspace to Premium capacity.

Power BI dataflows - Pro license vs Premium







More information about the differences between a Pro license and Premium here.

4) Save and use dataflow 
After finishing the data preparation, you must save the new dataflow. Give it a suitable name and after saving, you will be asked to refresh the dataflow or schedule it later. You can set up a "Scheduled refresh" the same way as a dataset.

Power BI dataflow - Save (and schedule refresh)














Open Power BI Desktop, select "Get Data" and now you can use a Dataflow as a source for your report.

Power BI Desktop - Use a dataflow as source












Conclusion
In this post we created our first dataflow, a new (preview) feature of Power BI. Despite we are using the first version which is still lacking of some basic features, it already looks very promising. We expect/hope that the limited Power Query possibilities will soon be aligned with those in Power BI Desktop making this a very powerful tool.

The main benefit of this new addition, is that you don’t have to setup and host a separate ETL tool with possibly complex code. Now everything is integrated in one platform. The disadvantages are, besides the limited Power Query options, the lack of versioning and release management.

Will it replace enterprise ETL with tools like SSIS, Azure Data Factory and Azure Databricks? Probably not in the near future. For now it is still self-service ETL which you could use as a first step to enterprise ETL. However, Microsoft will likely extend this tool in the coming years. Just like they did with Power BI itself. Back in 2014 most people didn’t see that as a serious alternative for reporting.

An alternative route map could be integrating Power BI dataflows within your existing BI platform. You can publish (and refresh) the result of your dataflows to Azure Data Lake and then pick up the data, besides Power BI, with tools as Azure Databricks and Azure Data Factory. We will explain this in a subsequent post next year.