So friends, some more lessons learnt when developing with U-SQL and Azure Data Lake. I’ll try and keep this short.
Problem
You have a U-SQL stored procedure written and working fine within your Azure Data Lake Analytics service. But we need to add some more business logic or something requiring a little C# magic. This is the main thing I love about U-SQL, having that C# code behind file where I can extend my normal SQL behaviour. So, being a happy little developer you write your class and method to support the U-SQL above and you recreate your stored procedure. Great!
Next, you try to run that stored procedure…
[ExampleDatabase].[dbo].[SimpleProc]();
But are hit with an error, similar to this:
E_CSC_USER_INVALIDCSHARP: C# error CS0103: The name ‘SomeNameSpaceForCodeBehind’ does not exist in the current context.
Why?
Submitting U-SQL queries containing C# code behind methods works fine normally. But once you wrap it up as a stored procedure within the ADL analytics database the complied C# is lost. Almost as if the U-SQL file/procedure no longer has its lovely code behind file at all!
Just to be explicit with the issue. Here is an example stored procedure that I’ve modified from the Visual Studio U-SQL Sample Application project. Note my GetHelloWord method that I’ve added just for demonstration purposes.
DROP PROCEDURE IF EXISTS [dbo].[SimpleProc];
CREATE PROCEDURE [dbo].[SimpleProc]()
AS
BEGIN
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
@WithCodeBehind =
SELECT
*,
SomeNameSpaceForCodeBehind.MyCodeBehind.GetHelloWorld() AS SomeText
FROM @searchlog;
OUTPUT @WithCodeBehind
TO "/output/SearchLogResult1.csv"
USING Outputters.Csv();
END;
This U-SQL file then has the following C#, with my totally original naming conventions. No trolls please, this is not the point of this post 🙂
namespace SomeNameSpaceForCodeBehind
{
public class MyCodeBehind
{
static public string GetHelloWorld()
{
string text = "HelloWorld";
return text;
}
}
}
So, this is what doesn’t work. Problem hopefully clearly defined.
Solution
To work around this problem, instead of using a C# code behind file for the procedure we need to move the class into its own assembly. This requires a little more effort and plumbing, but does solve this problem. Plus, this approach is probably more familiar to people that have ever worked with CLR functions in SQL Sever that they want to use within a stored procedure.
This is what we need to do.
- Add a C# class library to your Visual Studio solution and move the U-SQL code behind into a library name space.
- Build the library and use the DLL to create an assembly within the ADL analytics database. The DLL can live in your ADL store root, in line it or create it from Azure Blob Store. I have another post on that here if your interested.
CREATE ASSEMBLY IF NOT EXISTS [HelloWorld] FROM "assembly/ClassLibrary1.dll";
- Finally, modify your stored procedure to use the assembly instead of the code behind name space. The new stored procedure should look like this.
DROP PROCEDURE IF EXISTS [dbo].[SimpleProc];
CREATE PROCEDURE [dbo].[SimpleProc]()
AS
BEGIN
//Complied library:
REFERENCE ASSEMBLY [HelloWorld];
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
@WithCodeBehind =
SELECT *,
//Changed to use assembly:
HelloWorld.ClassLibrary1.GetHelloWorld() AS SomeText
FROM @searchlog;
OUTPUT @WithCodeBehind
TO "/output/SearchLogResult1.csv"
USING Outputters.Csv();
END;
This new procedure executes without error and gets around the problem above.
I hope this helps and allows you to convert those complex U-SQL scripts to procedures, while retaining any valuable code behind functionality.
Many thanks for reading
What if I just want to call a function in my assembly to get say a file input path or output path (string) ?
Hi Graham, great question. For that you probably want to consider some higher level management task that first resolves the path of the file first. Then either dynamically generates the USQL to include the path. Or maybe have a U-SQL procedure with a parameter where you pass in the path. Either way it would require a 2 step process. Getting the U-SQL code behind to do this first would be a chicken/egg situation. Unless you consider writing yourself a custom extractor. Mmmm. Thinking about it more. I wonder if U-SQL would allow the setting of a local variable as the file path from the result of a code behind method. I think you’ve given me an idea for my next blog post 🙂 Cheers Paul