Thursday, 30 April 2020

Azure Data Factory - Parameters event based triggers

Case
My files arrive at various moments during the day and they need to be processed immediately on arrival in the blob storage container. At the moment each file has its own pipeline with its own event based trigger. Is there a more sustainable way where I don't have to create a new pipeline for each new file?
Event based triggers can pass through parameters











Solution
Yes, if your source files have names that can easily be matched to table names then it is very easy and you only need a couple of parameters and a single Copy Data activity. If you cannot match a source file by its name then you might want to add a Lookup activity that can find the match in a configuration table. Or if unexpected files can also arrive in that same blob storage container then you also might want to add the Lookup activity to check whether the file is expected.

The solution uses parameters that can be filled by the event based trigger. With a simple expression you can pass on the filename and folderpath to the pipeline.

1) Pipeline parameters
Start with a simple pipeline that only contains a single Copy Data activity that copies a specific file to a specific SQL Server table. Then add two parameters to this pipeline. One for the filename and one for the folderpath: SourceFileName and SourceFolderPath. The folder path parameter is optional for this basic example and will contain the containername and the foldername combined: "container/folder".
Add 2 pipeline parameters


















2) Dataset sourcefile
Go to the Dataset of your source file from the blob storage container. In the Parameters tab add a new string parameters called FileName. As default value you can use the name of the file. Optionally you can add a second string parameter for the folderpath.
Add parameter FileName to dataset














Next step is to use this newly created dataset parameter in the connection to the file. Go to the Connection tab of your dataset and replace the filename with some dynamic content: @dataset().FileName
Add dynamic content for filename












If you want to use the folderpath parameter as well then you can add it to the container textbox and leave the Directory textbox empty (the containername is mandatory, but besides the container name you can also add the folderpath).

3) Dataset stage table
Now repeat the same actions for the dataset that points to your stage table. First add a parameter called TableName in the parameter tab.
Add parameter TableName to dataset













And then replace the Table property with some dynamic content: @dataset().TableName (check the Edit checkbox below the table before adding the expression).
Use dataset parameter to configure the tablename













4) Copy Data activity - Source
Now go back to your pipeline and edit the Source of your Copy Data activity. Notice that you now see a new Dataset property called FileName that points to the parameter of the dataset. This is where you fill the dataset parameter with the value of the pipeline paramter. The value should be set with some dynamic content: @pipeline().parameters.SourceFileName. This is the pipeline parameter created in step 1.
Copy Data activity - Dynamic content in FileName





















5) Copy Data activity - Sink
Now repeat the same for the Sink where you now see a new Dataset property called TableName. Here is where you need to match the filename to the table name. In our example the tablename is equal to the filename without the extension. Therefor we replaced the extension with the replace expression: @replace(pipeline().parameters.SourceFileName,'.csv', '')
Copy Data activity - Dynamic content in TableName





















6) Edit Trigger
The last step of the solution is passing through the filename and folderpath from the trigger to the pipeline parameters with an triggerbody expression that you also see in Azure Logic App and Power Automate:
Returns the filename of the file that caused the trigger:
@triggerBody().fileName
Returns the folderpath of the file that caused the trigger:
@triggerBody().folderPath

When you edit or create a new trigger in the pipeline then the last step is filling in the variables. By using the above expressions they automatically fill them with information of the file that caused the trigger.
triggerBody() expressions






















When two or more file are uploaded then each of them will trigger the same pipeline separately, but all with different values in the pipeline parameters.

7) Testing
After publishing all changes we can upload several files, one at a time or multiple at once, to test the new trigger pipeline. The last two rows indicate that the files where uploaded at the same time, but both triggered the pipeline separately. You can check the parameters by clicking on [@].
Monitoring the pipeline runs











Conclusion
In this post you learned how to use the triggerBody() expression from the event based triggers. You can pass on its value to the Pipeline parameters and then pass on those pipeline parameters to the dataset parameters.

With this very basic construction you can handle as many files as you like. If you do not need to process them as once then we suggest to use the foreach loop construction instead.

Friday, 24 April 2020

Power Apps Snack: Don't repeat yourself

Case
I have some pieces of code in my Power App that I use for several buttons, but I don't want to create multiple copies of that code. Is there a way to create a method or function with custom code that I can call from various buttons?
Power Apps and custom functions











Solution
No, Power Apps does not support methods or functions like real programming languages. However you can use the SELECT function. This function allows you to execute code from other objects on your screen, but only objects that have an OnSelect event.

Example 1: basics
For this example add two buttons to the screen: Button1 and Button2. In the OnSelect of Button1 add a simple Notify expression:
Notify(
    "Hello",
    NotificationType.Success,
    1000
)
Notify in OnSelect of Button1













In the OnSelect of Button2 add the Select expression:
Select(Button1)
Select in OnSelect of Button2













The Result
Now hit the play button to see the preview of your app. Then click both buttons and see the Notify showing for both buttons. You can even make Button1 invisible if you don't want to execute that code by it self, but only via other buttons.
Testing simple Select solution




















Example 2: parameters
By adding variables to the game you can even have some parameters for your new 'function'. Now change the OnSelect code of Button2 to:
UpdateContext({myParam: "Joost"});
Select(Button1)

And then add a third (and optional a fourth) button with the following OnSelect code:
UpdateContext({myParam: "Mark"});
Select(Button1)

Then change the OnSelect code of Button1 to:
Notify(
    "Hello " & myParam,
    NotificationType.Success,
    1000
)
And make Button1 invisible by setting the Visible property to false.

The Result
Now hit the play button again to see the preview of your changes. Click both visible buttons and see the Notify that now shows a different name for both buttons.
Testing Select with 'parameter'




















Conclusion
In this post you learned how NOT to repeat yourself with the Select function in Power Apps. Perhaps not the same features as in JavaScript, C# or any other language, but very useful to keep your code more clean. By hiding the buttons and giving them a descriptive, function-like name they could act like real programming functions.

An other great trick is to use the WITH function which is explained in an other post.


Monday, 20 April 2020

Databases in DevOps - Publishing profile

Case
I can’t release my database project due an error “data loss could occur”
error code





Updating database (Failed)
*** Could not deploy package.
Warning SQL72015: The column [dbo].[Table1].[ColumnToRemove] is being dropped, data loss could occur. 

This happens due the fact that my source table changed/removed a column that’s still available in the target table.


Solution
When deploying a database project, you can add an publish file with this deployment. In this file (it’s a xml file) you can set/adjust certain settings
First, we setup visual studio and secondly, we adjust je build pipeline

1) Visual Studio
Here we are going to create the publish file
  • Right click on the solution and click “Publish”
  • Then a new window pop-up, go to “Advanced”
  • Uncheck - 'Block incremental deployment if data loss might occur’ checkbox.
Publish settings - general
















  • Go to the second tab and check - 'DROP objects in target but not in project' and click “ok”
Publish settings - drop











  • Then select “Create profile” and a new a ‘***.publish.xml’ will be added to the solution

Add publish file
















  • The next step is, is to add the publish file to source control “add ignored file to source control”
Add to source control


















2) DevOps 
First, we edit the build pipeline
  • Go to Azure DevOps -> Pipelines -> Pipelines
  • Then edit your pipeline (please see the blog of Joost for creating a pipeline)
  • Go to the task “Copy files” and add to ‘contents’ “**\*.publish.xml” 
Build pipeline








Last step, is to add the xml file to the release pipeline
  • Go to pipelines -> release
  • Then edit your pipeline and add the publish file
Release pipeline








Summary
In this post you learned how to add a Publish Profile file. In this file you can change the publish settings. In this example we did want to make it possible to truncate tables when a column has been removed.
But it can also be used to disable the deployment of security-related objects to our database, like:
  • ExcludeUsers
  • ExcludeLogins
  • ExcludeDatabaseRoles





Sunday, 22 March 2020

Build dynamic pipelines in Azure Data Factory

Case
I want to create multiple Azure Data Factory (ADF) pipelines that are using the same source and sink location, but with different files and tables. How can we do this in ADF when taking extensibility and maintainability of your pipelines into account?














Solution
Let's say we have a scenario where all the source files are stored into our Azure Storage, for example Blob Storage. We want to load these files into a traditional Data Warehouse (DWH) using an Azure SQL Database, that contains a separate schema for the Staging layer. Earlier we showed you how you can do this using a Foreach Loop in one single pipeline. For this scenario we will not this because:
  • Not loading all Staging tables every day on the same time. 
    • Sometimes it will only load 3 of the total 5 files, because not all the files are coming for a certain day and can be different per day. 
    • Load can be spread during the day, several moments.
  • Be more flexible per Staging load.
    • In most cases you want to have the same Staging load for every table, but sometimes transformations can be different or you need an extra table to prepare the dataset.
  • The use of technical columns in Staging table.
    • In most cases you want to store some metadata, for example "inserted date" and "inserted by". When using a "Copy Data Activity", you have to configure the mapping section when the source and sink fields are not equal. 
One of the solutions is building dynamic pipelines. This will be a combination of parameters, variables and naming convention. The result will be a dynamic pipeline, that we can clone to create multiple pipelines using the same source and sink dataset. In this blog post we will focus on the Staging layer. Of course you can implement this for every layer of your DWH.

Note:
When working in a team, it is important to have a consistent way of development and have a proper naming convention in place. This contributes to the extensibility and maintainability of your application. Click here for an example of naming convention in ADF.

Pipeline

For this solution, we will create a pipeline that contains the following activities:
  • Copy Data activity (load data from source to sink)
  • Stored Procedure activity (update technical columns)
The source will be a CSV file and is stored in a Blob container. The data is based on the customer sales table from WideWorldImporters-Standard. The file will be delivered daily.

Storage account
bitoolssa
Blob container
sourcefiles
Blob folder
WWI
File name*
SalesCustomers20200322.csv
* this is the blog post date, but it should be the date of today

Note:
Blob storage containers only have virtual folders which means that the folder name is stored in the filename. Microsoft Azure Storage Explorer will show it as if it are real folders.

1) Linked Service

Open the ADF portal, go to Connections - Linked services and and click on New. Select Azure Blob Storage and give it a suitable (generic) name. Make connection with your storage account. Create another Linked service for Azure SQL Database, because that will be our destination (sink). Click here how to make connection using Azure Key Vault.
ADF portal - Create Linked Service























2) Dataset (source)

Click New dataset and select Azure Blob Storage. The format is DelimitedText. Give it a suitable (generic) name and select the Linked Service for Blob Storage that we created earlier. 

Once you click 'OK', it will open the dataset automatically. Go to Parameters and add the following:
  • Container -> type "String"
  • Directory -> type "String"
  • File -> type "String"
  • ColumnDelimiter -> type "String"
Go to Connection and now use the applicable parameters to fill File path. You can apply dynamic content for each setting. For now, we also added a parameter for "Column delimiter". At last, we use First row as header.

Click Publishing to save your content locally. For now we did not configure a Git Repository, but of course we recommend that.
ADF portal - Create Source Dataset
























Note:
Use Preview data to verify if the source input is as expected by filling in the correct parameters values.

3) Dataset (sink)

Create another dataset for the sink. Choose Azure SQL Database and give it a suitable (generic) name. Select the Linked Service for Azure SQL Database that we created earlier and click 'OK'.

Add the following Parameters:
  • SchemaName (String)
  • TableName (string)
Go to Connection and click Edit. Fill in both parameters using dynamic content. 
ADF portal - Create Sink Dataset














4) Pipeline - Copy data

Create a new pipeline and include the schema and table name in the name, like "PL_Load_Stg_Customer". Go to Variables and add the following including values:
  • Container -> type "String" and value "sourcefiles"
  • Directory -> type "String" and value "WWI"
  • File -> type "String" and value "SalesCustomers"
  • ColumnDelimiter -> type "String" and value ","
Add the Copy data activity, go to Source and use dynamic content to assign the variables to the input parameters. For the file, we use also expression language to retrieve the correct name of the file dynamically: "@{variables('File')}@{formatDateTime(utcnow(), 'yyyyMMdd')}.csv"

Go to Sink and fill in the schema and table name. We use the SPLIT function to retrieve this from the pipeline name. We use a Pre-copy data script to truncate the table before loading.

At last, go to Mapping and click on Import schemas. This will automatically map the columns with the same names (source and sink). We will remove the columns that do not exists in the source, in this case our technical columns "InsertedDate" and "InsertedBy". We will fill those columns in the next activity.

ADF portal - Create Copy data activity
















5) Pipeline - Stored Procedure

Add the Stored Procedure activity and give it a suitable name. In SQL Account, select the Linked service for Azure SQL Database that we created earlier. We created a SP, that contains dynamic SQL to fill the columns "InsertedDate" and "InsertedBy" for every Staging table. See code below.

/*
==========================================================================================================================
Stored Procedure [dbo].[uspPostLoadStaging]
==========================================================================================================================
Description:

This query will load the following columns:
- InsertedDate
- InsertedBy

In ADF, without using Data Flows (Mapping), you can combine a Copy data activity with a Stored Procedure in order to
fill those (technical) columns during execution of the pipeline. In SSIS this was done by the Derived Column task. 

==========================================================================================================================
Parameter     Parameter description
--------------------------------------------------------------------------------------------------------------------------
@InsertedDate    Date when the data was inserted into the Staging table.
@InsertedBy     Name of a service / account that has inserted the data into the Staging table.
==========================================================================================================================

==========================================================================================================================
Change history

Date  Who      Remark
--------------------------------------------------------------------------------------------------------------------------
2020-03-22 Ricardo Schuurman  Intial creation.
==========================================================================================================================
*/

CREATE PROCEDURE [dbo].[uspPostLoadStaging]
 @SchemaName   AS NVARCHAR(255)
, @TableName   AS NVARCHAR(255)
, @InsertedDate  AS DATETIME
, @InsertedBy   AS NVARCHAR(255)

AS
BEGIN

 SET NOCOUNT ON;
 BEGIN TRY

  DECLARE
   @QueryStep1    AS NVARCHAR(MAX)
  , @QueryStep2    AS NVARCHAR(MAX)

  -- ERROR variables
  , @errorMsg    AS NVARCHAR(MAX) = ''
  , @errorLine    AS NVARCHAR(MAX) = ''

  /* Example values of SP parameters
  , @SchemaName   AS NVARCHAR(255)
  , @TableName   AS NVARCHAR(255)
  , @InsertedDate  AS DATETIME
  , @InsertedBy   AS NVARCHAR(255)

  SET @SchemaName   = N'Stg'
  SET @TableName   = N'Customer'
  SET @InsertedDate  = GETDATE()
  SET @InsertedBy   = N'Test'

  */

  /* ================================================================================================================
   Step 1: Extract schema and table name (based on the ADF pipeline naming convention)
   ================================================================================================================ */

  -- Add LIKE condition for schema
  SET @SchemaName = '%' + @SchemaName + '%'

  SELECT 
   @QueryStep1 = '[' + [TABLE_SCHEMA] + '].[' + [TABLE_NAME] + ']'
  FROM [INFORMATION_SCHEMA].[TABLES]
  WHERE TABLE_TYPE = 'BASE TABLE' 
  AND TABLE_SCHEMA LIKE @SchemaName
  AND TABLE_NAME = @TableName

  /* ================================================================================================================
   Step 2: Execute Update statement using the fixed (technical) columns
   ================================================================================================================ */

  SET @QueryStep2 = N'
   UPDATE ' + @QueryStep1 + '
   SET 
    [InsertedDate]  = ''' + CONVERT(NVARCHAR(30),@InsertedDate, 120) + '''
   , [InsertedBy]  = ''' + @InsertedBy + ''';
  '

  EXEC SP_EXECUTESQL 
   @Query = @QueryStep2

 END TRY

  -------------------- Error Handling  --------------------

  BEGIN CATCH 

    SELECT 
     @errorMsg = ERROR_MESSAGE()
    , @errorLine = ERROR_LINE()

    SET @errorMsg = 'Error Occured with the following Message ' + @errorMsg + 'Error Line Number  '+ @errorLine;

    THROW 50001, @errorMsg, @errorLine;

  END CATCH

END


Create the SP in the database, go to Stored Procedure and select the SP. Click Import parameter and fill the parameters. We use the System variables 'Pipeline Name' and 'Pipeline trigger time' for "InsertedDate" and "InsertedBy". Reuse the values of "SchemaName" and "TableName" from the sink (copy data activity).
ADF portal - Create Stored Procedure activity


















Note:
Instead of using a Stored Procedure with an update query, you can also use the metadata columns of ADF itself. More information here.

Result
Execute the pipeline in ADF (by clicking Debug) and check the result in SQL Server Management Studio. It works!












Note:
In case of a "SqlFailedToConnect" error, make sure the firewall of the Azure SQL Database allows the Integration Runtime of ADF to access. Go to the Azure portal or use sp_set_firewall_rule on the  database to create a firewall rule. 


Summary
In this blog post you saw how you can create a dynamic pipeline using parameters and variables in combination with a proper naming convention in ADF, for example the names of your pipelines. This way of working can contribute to a consistent solution and code. In case of a new Staging load table, clone the existing pipeline and use the schema and table name in the pipeline name. You only have to change the mapping in the copy data activity.

To make your pipelines even more dynamic, you can also use metadata that is stored in a database instead of ADF itself and retrieve it using the Stored Procedure activity. 

Saturday, 21 March 2020

Embed a Power BI tile in Power Apps with filters

Case
I want to include a Power BI tile in my Power App. Is it possible to manipulate the Power BI tile with data from the Power App?
Power BI tile in Power Apps with filter

























Solution
Power Apps has its own charts. So why would you want to use a Power BI tile in your Power App in the first place? The charts from Power Apps are pretty fast, but since Power Apps is not a reporting tool the functionality is quite basic. The two major disadvantages are:

  • The charts don't have options in the GUI for sorting, filtering or grouping. So either your Data Source is already prepared or you to do that with functions in Power Apps (Filter, GroupBy, SortByColumns, etc.). Not ideal, especially not for large datasets.
  • The charts are very basic (Column, Line or Pie chart). You could combine a column and line chart by creating two separate charts and then lay the line chart over the column chart. Again workable, but not ideal.


Let's see how you can add a Power BI tile to your Power App and what you have to do to make it more dynamic with filters.

1) Pin visual to dashboard
Power Apps can only add Power BI visuals that are pinned to a dashboard. Go to your Power BI report on PowerBI.com and click on the little pin icon above the Visual that you need in Power Apps. After that specify to which existing or new Power BI Dashboard you want to publish it.
Pin your visual to a dashboard

























2) Add Visual to Power Apps
In the Insert menu of Power Apps you see the Charts submenu in which you will find the option to add a Power BI tile. Adding it is very straight forward. Just select the right Workspace, Dashboard and then the Tile.
Adding a Power BI tile to Power Apps


















3) Adding dynamic filters
If you go to the properties of the new Power BI tile in Power Apps you will find the TileUri property. This is the URL of your Power BI tile which you can extend with filters. The standard URL looks something like:
"https://app.powerbi.com/embed?dashboardId=11111111-aaaa-1234-aaaa-111111111111&tileId=11111111-aaaa-1234-aaaa-111111111111&groupId=11111111-aaaa-1234-aaaa-111111111111&config=abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc1230%3d"
The DATA TileUri




















You can simply add a filter to this URL by adding an extra parameter to the querystring. Let's say we have a field called Department in the table FactInternetSales and we want to filter on the value 'Amsterdam'.
The field on which we want to filter

























Before the last double quote of the TileUri you must add &filter=[tablename]/[columnname] eq '[value]'. The result looks like:

"https://app.powerbi.com/embed?dashboardId=11111111-aaaa-1234-aaaa-111111111111&tileId=11111111-aaaa-1234-aaaa-111111111111&groupId=11111111-aaaa-1234-aaaa-111111111111&config=abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc1230%3d&filter=FactInternetSales/Department eq 'Amsterdam'"

Note that filter is in lowercase otherwise it won't work. For more OData like filter examples see the documentation of Power BI filters. A great tip is to first test the query string filter in Power BI itself and then copy and paste it to Power Apps. After that you can replace the hard-coded value 'Amsterdam' by a value from for example a drop drown.
Adding a Drop Down to use it in the filter

























With the drop down the URL looks like:

"https://app.powerbi.com/embed?dashboardId=11111111-aaaa-1234-aaaa-111111111111&tileId=11111111-aaaa-1234-aaaa-111111111111&groupId=11111111-aaaa-1234-aaaa-111111111111&config=abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc123abc1230%3d&filter=FactInternetSales/Department eq '" & ddDepartment.Selected.Department & "'"
If your table name or field name contains spaces, dashes or non-ansi characters you need to replace those with special characters in your query string.

4) The result
Now you can hit the play button and test the dynamic filter. It works very well, but it isn't super fast. Therefore you shouldn't have more than three Power BI tiles loaded at the same time. With the property LoadPowerBIContent you can control which tile is loading.

When clicking on the Power BI tile in your Power App the user will be redirected to Power BI online to do some more in-depth analysis. If you don't want that you need to set the PowerBIInteractions property to false (default is true).
Testing the dynamic filter

























Conclusion
In this post you learned how to use Power BI tiles, that are published to a dashboard, in Power Apps. Very straight forward to use and they have way more options than the built-in charts. One last point for attention. If you use these Power BI tiles then make sure that your Power Apps users have a Power BI licence and are authorized to the used tiles.