I’m hoping the title of this post is fairly self explanatory. Your here because like me you found that the MSDN language reference page for creating U-SQL assemblies states that it’s possible to store the DLL’s in Azure Blob Storage. But it doesn’t actually tell you how. Well please continue my friends and I’ll show you how.
The offending article: https://msdn.microsoft.com/en-us/library/azure/mt763293.aspx
The offending text snippet:
Specifies the assembly DLL either in form of a binary literal or as a string literal or static string expression/string variable. The binary literal represents the actual .NET assembly DLL, while the string values represent a URI or file path to a .NET assembly DLL file in either an accessible Azure Data Lake Storage or Windows Azure Blob Storage. If the provided source is a valid .NET assembly, the assembly will be copied and registered, otherwise an error is raised.”
Before going any further, this post isn’t a dig at the usual lack of Microsoft documentation. Mainly because when I posted this problem as a question on Stack Overflow the missing information was provided from the horses mouth, Mr Michael Rys (@MikeDoesBigData). Therefore, all is forgiven and I’m more than happy to write this post on Microsoft’s behalf and for my fellow developers. #SQLFamily
Thanks again Mike. Moving on…
Within your Azure subscription you have the following services already deployed and running.
- Azure Data Lake Analytics (ADLa)
- Azure Data Lake Store (ADLs)
- Azure Storage, with a suitable blob container.
You are also comfortable with referencing assemblies in your U-SQL scripts and so far have done so by in lining the complied assembly in the U-SQL file. Or have stored the DLL in ADLs with a simple file path reference to the ADLs root directory.
The most important thing you’ll need to do to get this working, as Mike mentions in the SO answer, is allow your ADLa service to access the blob storage account. This only requires a few clicks in the Azure portal.
From the ADLa blade choose Data Sources and click Add Data Source.
Populate the preceding blade drop down menus with your preferred choices and click Add. You should then have the storage account listed as a ADLa data source. As below
Note; The Azure Storage account doesn’t need to be in the same data centre as the ADLa service, unlike ADLa and ADLs.
Next the U-SQL.
To reference a DLL in the blob storage account container we need to create the assembly using the wasb URL. Like this:
Complete CREATE ASSEMBLY syntax.
1 2 3
CREATE ASSEMBLY IF NOT EXISTS [YourSchema].[PurpleFrog.Pauls.DataLakeHelperFunctions] FROM "wasb://AllSupportingFiles@MiscBlobsAccount.blob.core.windows.net/ PurpleFrog.Pauls.DataLakeHelperFunctions.dll";
Why Do This
Hopefully pre-empting some comments on this post. Given that we can inline the assembly and store it in ADLs. Why would you want to put the DLL’s in a separate storage account?
Well, this is really just for operational convenience. In a recent project I was working on we had created a lot of custom code. Not just for Azure Data Lake, but also Azure Data Factory. We therefore used a blob storage account as a support bucket for all compiled code and parent object files. This gave us a centralised place to deploy to regardless of what service was consuming the libraries. Again, just for convenience. All DLL’s in one place for all services.
I hope you found this post helpful.
Many thanks for reading.
Now we all know what a date dimension is and there are plenty of really great examples out there for creating them in various languages. Well, here’s my U-SQL version creating the […] Continue Reading…
Let’s try and keep this post short and sweet. Diving right in imagine a scenario where we have an Azure Data Factory (ADF) pipeline that includes activities to perform U-SQL jobs in […] Continue Reading…
Like most things developed its very normal to have multiple environments for the same solution; dev, test, prod etc. Azure Data Factory is no exception to this. However, where it […] Continue Reading…
Unlike T-SQL, U-SQL does not currently support MERGE statements. Our friend that we have come to know and love since its introduction in SQL Server 2008. Not only that, […] Continue Reading…
To set the scene for the title of this blog post lets firstly think about other services within Azure. You’ll probably already know that most services deployed require authentication […] Continue Reading…
This year was my first time at the PASS summit in Seattle. I’d of course been to plenty of other community events and conferences, but this was the big […] Continue Reading…
When creating an Azure Data Factory (ADF) solution you’ll quickly find that currently it’s connectors are pretty limited to just other Azure services and the T within ETL (Extract, […] Continue Reading…
In 2015 I attended a SQL Relay day in Birmingham, as a humble attendee with a colleague. I sat on the edge of my seat in every session soaking up all I could from the […] Continue Reading…
Warning, this is going to be a dry post. Beer or water required before you continue reading.
For those that don’t already know Azure Stream Analytics is Microsoft’s cloud based […] Continue Reading…