Monday, 24 April 2017

Use BIML to create SSIS packages from csv files

Case
The Case is about importing flat files (CSV’s) without the necessity of metadata.  Because BIML always check’s if the tables are accessible before creating the packages. The first step is to create the tables with BIML and the Second step is creating the SSIS packages to transport the data.

1. Creating tables in the database
2. Packages to fill this database

Because of the size of the solution I’ve created two solution, one for creating the tables and secondly creating the SSSI packages. you can click on the link to go to the other solution (later this month I’will deploy to second script).
In this solution, we create a BIML that is going to create the tables and all the columns are defined as strings

Solution - Creating SSIS packages
we have created the tables in the database, now we can start creating the SSIS package that extract the data from the CSV and transport them into the database.
Our SSIS packages contain a table truncation and a dataflow task with a simple transport from source to destination. We also create a masterpackage which execute the SSIS packages.
The CSV file we are going to use for this solution looks like this

CSV file









1) Determine the colums
Normally (if we look to the example above) we can use this code to determine the columns:

</Columns>
                <Column Name="AgeId" Delimiter=","></Column>
                <Column Name="AgeFrom" Delimiter=","></Column>
                <Column Name="AgeTo" Delimiter=","></Column>
                <Column Name="AgeCategoryEmployee" Delimiter=","></Column>
                <Column Name="AgeCategoryClient" Delimiter="CRLF"></Column>
 <Colums>
 
Note that the last colums uses the CRLF delimiter (CR = Carriage Return and LF = Line Feed)
Now we want BIML to do this for us, so we add a loop that loops through the first row of the textfile  and when the loop is at the end of the row, it uses the CRLF delimiter
The code should look like this:
<Columns>
            <# { # >
                
                StreamReader myFile = new StreamReader(filePath);
                myColumns = myFile.ReadLine().Split(',');
                 myFile.Close();
                
                // to determine the colum delimeter 
                int columnCount = 0;
                string columnDelimiter = ",";

                foreach (string myColumn in myColumns) 
                {
                    columnCount++;
                    if (columnCount == myColumns.Length)
                    {
                        columnDelimiter = "CRLF";
                    }
                    {
                        columnDelimiter = ",";
                    }
                #>
                <Column Name="&lt#=myColumn#>" Delimiter="&lt#=columnDelimiter#>">&lt/Column>
                 } #>
</Columns>


2) Creating the ssis package
The SSIS package we are going to create looks like this:
SISS package









The complete BIML code for creating this packages is:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <#
    string Applicatie = "Manual";
    string Prefix = "Man";
    
    string fileName;
    string path = @"D:\Drop\Man";
    string[] myFiles = Directory.GetFiles(path, "*.csv");
    
    string[] myColumns;
    #>
    <FileFormats>
    <#
    foreach (string filePath in myFiles)
    {

    #>
    <FlatFileFormat Name="FlatFileFormat<#=Path.GetFileNameWithoutExtension(filePath)#>" RowDelimiter="CRLF" ColumnNamesInFirstDataRow="true" IsUnicode="false">
        <Columns>
            <# 
                
                StreamReader myFile = new StreamReader(filePath);
                myColumns = myFile.ReadLine().Split(',');
                 myFile.Close();
                
                // to determine the column delimeter 
                int columnCount = 0;
                string columnDelimiter = ",";

                 foreach (string myColumn in myColumns) 
                {
                    columnCount++;
                    if (columnCount == myColumns.Length)
                    {
                        columnDelimiter = "CRLF";
                    }
                    {
                        columnDelimiter = ",";
                    }
                #>
                <Column Name="<#=myColumn#>" Delimiter="<#=columnDelimiter#>"></Column>
                <# } #>
            </Columns>
        </FlatFileFormat>
            <#}#>
    </FileFormats>
    <Connections>
            <#
            foreach (string filePath in myFiles)
            {
            
            #>
            <FlatFileConnection Name="FF_CSV - <#=Path.GetFileNameWithoutExtension(filePath)#>" FilePath="<#=filePath#>" FileFormat="FlatFileFormat<#=Path.GetFileNameWithoutExtension(filePath)#>">
                
            </FlatFileConnection>
            <# } #>
            <OleDbConnection 
                Name="STG_<#=Applicatie#>" 
                ConnectionString="Data Source=APPL43;Initial Catalog=dummy_STG;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;">
            </OleDbConnection>
    </Connections> 
        <Packages>

    <#       // Loop trough the files
            int TableCount = 0;
            foreach (string filePath in myFiles)
            { 
                TableCount++;
                fileName = Path.GetFileNameWithoutExtension(filePath);
                #>
                <Package Name="stg_<#=Prefix#>_<#=TableCount.ToString()#>_<#=fileName#>" ConstraintMode="Linear" AutoCreateConfigurationsType="None" ProtectionLevel="DontSaveSensitive">
                    <Variables>
                        <Variable Name="CountStage" DataType="Int32" Namespace="User">0</Variable>
                    </Variables>               
                    <Tasks>
                        <ExecuteSQL ConnectionName="STG_<#=Applicatie#>" Name="SQL - Truncate <#=fileName#>">
                            <DirectInput>TRUNCATE TABLE dbo.<#=Prefix#>_<#=fileName#></DirectInput>
                        </ExecuteSQL>
                        
                        <Dataflow Name="DFT - Transport CSV_<#=fileName#>">
                            <Transformations>
                                <FlatFileSource Name="SRC_FF - <#=fileName#> " ConnectionName="FF_CSV - <#=Path.GetFileNameWithoutExtension(filePath)#>">
                                </FlatFileSource>
                                
                                <OleDbDestination ConnectionName="STG_<#=Applicatie#>" Name="OLE_DST - <#=fileName#>" >
                                <ExternalTableOutput Table="dbo.<#=Prefix#>_<#=fileName#>"/>
                                </OleDbDestination>
                            </Transformations>
                        </Dataflow>
                    </Tasks>
                </Package>
    <#    }    #>

                <!-- Create Master Package -->
                <Package Name="stg_<#=Prefix#>_0_Master" ConstraintMode="Parallel" AutoCreateConfigurationsType="None" ProtectionLevel="DontSaveSensitive">
                    <Tasks>
                    <#  int TableCount2 = 0;
                        foreach (string filePath in myFiles)
                        { 
                                TableCount2++;
                                fileName = Path.GetFileNameWithoutExtension(filePath); #>
                
                            <ExecutePackage Name="stg_<#=Prefix#>_<#=TableCount2.ToString()#>_<#=fileName#>">
                                <ExternalProjectPackage  Package="stg_<#=Prefix#>_<#=TableCount2.ToString()#>_<#=fileName#>.dtsx" />
                            </ExecutePackage>
                        <#
                        }
                        #>    
                    </Tasks>
                </Package>

    </Packages>
</Biml>

<!--Includes/Imports for C#-->
<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.Data"#>
<#@ import namespace="System.IO"#>


We call this “3_Generate_Man_SSIS.bml”
If we run this BIML script this will be the end result

End result










Summary
In the first blogspot, we explained how to create the tables, we need these tables which transport the actual data.
It’s important to first create the tables in the database and then create the SSIS packages.
Of course, this is a simple example for filling the staging area, feel free to add more component, like a row count transformation etc.

This whole solution is built with BIML Express.





No comments:

Post a Comment