Wednesday, 31 October 2018

Power BI Snack: Introducing Data Profiling

Case
When data is not extracted from a cleaned up or validated environment, such as a Data Warehouse or Data Mart, the first step is to analyse and possible clean up the data. How can Power BI help in this?

Data Profiling in Power Query Editor








Solution
The release of October 2018 Desktop update introduced a new (preview) feature that can profiling the data in the Power Query Editor. With this, you can do some quick analysis over your data and you can easily identify errors and empty values in a column.

For this blog post we use sample data from WideWorldImporters. We have saved the data in a CSV file, because it is more common to do this kind of analyses when you load raw data from Text or CSV files into your Power BI Model instead of using validated data from a star schema in a Data Mart.

Apply data profiling in Power BI
Because it is a preview feature, we have to turn this feature on. Go to "File - Options and settings - Options - Preview features" and select Enable column profiling.

Power BI Desktop - Enable Preview feature




















Open the Power Query Editor and load your data. In our case it is a CSV file that contains raw data about sales orders. Go to the "View" tab and select Column quality. With this option you can see whether your data contains errors or empty values, also known as "null" values.

Power BI Desktop - Column quality in Power Query Editor
















Besides this, you can also view the total of unique and distinct values. Go to the "View" tab and select Column distribution. Unique values indicates how many values occur only once and distinct values means how many different values there are in the specific column. Now the Query Editor will also recommend to do a quick fix if you want, for example remove the duplicate values in a column.

Power BI Desktop - Column distribution in Power Query Editor

















Note:
Be aware of the fact that column profiling is only based on top 1000 rows (preview results of the query).

Conclusion
In this post you saw a new feature called "Data Profiling". It is a good start and useful to use this for doing some quick analyses over your data. Note that this feature is in preview.

The program team of Power BI has announced to add more capabilities for data profiling, so we are very curious about what this will bring even more in the future.