Tuesday, 26 November 2019

Archiving within a Blob Storage container

Case
New files are delivered to my Blob Storage container each day. After processing them I want to archive them into folders with a timestamp in the name and after x days delete them. Which Azure resource can I use for this task?
Container with archive folders















Solution
One of the services you could use is an Azure Automation runbook with some PowerShell code. Note that blob storage containers only have virtual folders which means that the foldername is stored in the filename. Microsoft Azure Storage Explorer will show it as if it are real folders.

1) Create Automation Account
First we need to create an Azure Automation account to host the runbook with PowerShell code.
  • Go to the Azure portal and create a new resource
  • Search for automation
  • Select Automation Account
  • Choose a useful name for the Automation Account
  • Select your Subscription, Resource Group and the Region
  • Decide whether you need an Azure Run As account and click on the Create button.
    Note: We don't need it for this specific Runbook, because we will use the access key to access the Blob Storage Container. If you intend to add more runbooks to manage other Azure Resources you should probably enable it.
Create Azure Automation Account
















2) Add Module Az.Storage
Before we start writing code we need to add a PowerShell module called Az.Storage. This module contains methods we need in our code.

If you forget this step you will get error messages while running your code that state that some of your commands are nog recognized:
New-AzStorageContext : The term 'New-AzStorageContext' is not recognized as the name of a cmdlet, function, script 
file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct 
and try again.
  • Go to the newly created Azure Automation Account
  • Go to Modules in the left menu
  • Click on the Browse Gallery button
  • Search for Az.Storage
  • Click on Az.Storage in the result and import this module
Adding a new module to your Automation Account
















3) Create Runbook
Now we need to add a PowerShell Runbook to this new Automation Account.
  • Go to the overview page of your newly created Azure Automation Account
  • Click on Runbooks in the left menu
  • Click on the + Create a runbook button to create a new Runbook
  • Enter a descriptive name for the Runbook
  • Select PowerShell as Runbook type
  • Optionally add a description and click on the Create button
Create Runbook
















4) Edit Runbook code
Now first look up the Storage Account Name and key (1 or 2) from your storage account that you want to cleanup with this Archive script. In the first example we will hardcode these two parameters in the PowerShell code it self. The second example will show you an alternative.
Lookup Account Name and Key















Next open the new Runbook if it wasn't already opened by the previous step. Copy the code below and paste it in the editor. Then study the code and its comments to understand the code and to make sure we don't steal your data. If you never want to delete files then just remove the cleanup part starting on row 74.

Since the storage account container doesn't have real (sub)folders, we will use virtual folders which means the foldername is actually stored in the filename. Therefore we cannot move files to a subfolder, but we can rename them instead. Unfortunately renaming a blob is not a standard method available in the Azure Storage module. Therefore we will use a custom function from Martin Brandl that makes a copy with the new name and deletes the original.

# PowerShell code

##########################################################################
############################### PARAMETERS ###############################
##########################################################################
$StorageAccountName = "bitools"
$StorageAccountKey = "XTZSqCcF7q43SwX3FAKEgG5ezCC3l5jor5gUajoTWnnk4qFAKEk52UuR3lYqw1eaFAKEbQe3M4CpSbGDVnSCKg=="
$ContainerName = "sensordata"
$DaysHistory = 31


##########################################################################
########################## Rename-AzStorageBlob ##########################
##########################################################################
# Custom function because Rename method does not exist in Storage modules
# Function copied from about-azure.com and adjusted from AzureRM to Az
# https://about-azure.com/2018/02/13/rename-azure-storage-blob-using-powershell/

function Rename-AzStorageBlob
{
    [CmdletBinding()]
    Param
    (
        [Parameter(Mandatory=$true, ValueFromPipeline=$true, Position=0)]
        [Object]$Blob,
        [Parameter(Mandatory=$true, Position=1)]
        [string]$NewName
    )


  Process {
    $blobCopyAction = Start-AzStorageBlobCopy `
        -ICloudBlob $Blob.ICloudBlob `
        -DestBlob $NewName `
        -Context $Blob.Context `
        -DestContainer $Blob.ICloudBlob.Container.Name
 
    $status = $blobCopyAction | Get-AzStorageBlobCopyState
 
    while ($status.Status -ne 'Success')
    {
        $status = $blobCopyAction | Get-AzStorageBlobCopyState
        Start-Sleep -Milliseconds 50
    }
 
    $Blob | Remove-AzStorageBlob -Force
  }
}


##########################################################################
############################### ARCHIVING ################################
##########################################################################
Write-Output "Start archiving $($ContainerName) in $($StorageAccountName)"

# Get 'context' of the source container
$StorageContext = New-AzStorageContext -StorageAccountName $StorageAccountName -StorageAccountKey $StorageAccountKey

# Determine prefix of datetime
$Prefix = (get-date).ToString("yyyyMMdd-HHmmssfff")

# Loop through all blobs that are not in a virtual folder
$blobs = Get-AzStorageBlob -Container $ContainerName -Context $StorageContext | where {$_.Name -notlike "*/*" }
foreach ($blob in $blobs)
{
    $NewName = $Prefix + "\" + $blob.name
    Write-Output "Archive $($blob.name) to $($NewName)"
   
    #Rename blob file.csv => 20190226\file.csv
    Get-AzStorageBlob -Container $ContainerName -Context $StorageContext -Blob $blob.name  | Rename-AzStorageBlob -NewName $NewName
}


##########################################################################
################################ CLEANUP #################################
##########################################################################
Write-Output "Start cleanup $($ContainerName) in $($StorageAccountName)"

# Determine date in history based on given days
$HistoryLimit = (get-date).AddDays(-$DaysHistory).ToString("yyyyMMdd-HHmmssfff")

# Get all blobs where the names are at least 19 chararters long
# and from that set get the files that matches 99999999-999999999/
# example: 20191123-171423559/filename.csv
$blobs = Get-AzStorageBlob -Container $ContainerName -Context $StorageContext | where {$_.Name.Length -gt 19} | where {($_.Name).Substring(0,19) -match "^\d{8}[-]\d{9}/$"}

# Loop through the set with blobs that matches our requirements
# and delete those from the container
foreach ($blob in $blobs)
{
    if ( ($blob.name).Substring(0,8) -lt $HistoryLimit)
    {
        Write-Output "Remove old blob $($blob.name)"
        Remove-AzStorageBlob -Context $StorageContext -Blob $blob.name -Container $ContainerName
    }
}

Not very flexible and secure to store your Storage Accountname and Accesskey in the code it self. So in this second example we will replace it by parameters that can be passed from Azure Data Factory. You can find more details in this blog post. Only the parameter part is different and the rest of the code is unchanged.
# PowerShell code

##########################################################################
############################### PARAMETERS ###############################
##########################################################################
Param
(
    # ContainerName is required
    [Parameter(Mandatory=$False,Position=1)]
    [object] $WebhookData
)

# Get all parameters from body (passed from Data Factory Web Activity)
$Parameters = (ConvertFrom-Json -InputObject $WebhookData.RequestBody)

# Get single parameter from set of parameters
$ContainerName = $Parameters.ContainerName

$StorageAccountName = $Parameters.StorageAccountName
$StorageAccountKey = $Parameters.StorageAccountKey
$ContainerName = $Parameters.ContainerName
$DaysHistory = $Parameters.DaysHistory
The parameters that will be provided in Azure Data Factory (ADF) via a JSON message will look like this:
{
"StorageAccountName":"BiTools",
"StorageAccountKey ":"XTZSqCcF7q43SwX3FAKEgG5ezCC3l5jor5gUajoTWnnk4qFAKEk52UuR3lYqw1eaFAKEbQe3M4CpSbGDVnSCKg==",
"ContainerName":"SensorData",
"DaysHistory":"31"
}

Another alternative, for when you don't want to use ADF, is using the Azure Key Vault to store the key and then use some PowerShell code to retrieve it in your runbook. The code is very simple, but see this post for more details.
# PowerShell code snippet
# Retrieve value from Key Vault
$StorageAccountKey = (Get-AzKeyVaultSecret -VaultName "MyKeyVault" -Name "StorageAccountKey").SecretValueText

5) Testing
Testing the functionality of your code is the easiest if you still have the hardcoded parameters. Then you can just use the Test pane in the Runbook editor like the animation below. If you want to test it with the parameters for ADF then you first need to create a webhook and then create and run an ADF pipeline to test your code.
Testing the code
















Summary
In this post we explained how to archive files in a container from a Storage Account using PowerShell. We also showed how to pass the parameters like the account key from ADF instead of hard coding it. However we also don't want to store secrets in ADF. In a next post we will show you how to store the Storage account key in an Azure Key Vault and show how to read that secret within ADF to use it as a parameter for this Runbook.
Passing secrets parameters using KeyVault, DataFactory and a Runbook