Passing Parameters to U-SQL from Azure Data Factory

Let’s try and keep this post short and sweet. Diving right in imagine a scenario where we have an Azure Data Factory (ADF) pipeline that includes activities to perform U-SQL jobs in Azure Data Lake (ADL) Analytics. We want to control the U-SQL by passing the ADF time slice value to the script, hopefully a fairly common use case. This isn’t yet that intuitive when constructing the ADF JSON activity so I hope this post will save you some debugging time.

For my example I’ve created a stored procedure in my ADL Analytics database that accepts a parameter @TimeSliceStart as a string value in the format yyyyMMdd.

[ExampleDatabase].[dbo].[usp_DoSomeStuff](@TimeSliceStart);

This doesn’t have to be a stored procedure. ADF is also happy if you give if U-SQL files or even just inline the entire script. Regardless, the ADF parameter handling is the same.

In my ADF JSON activity I then have the following;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"name": "DataLakeJob1",
"properties": {
"description": "Run USQL with timeslice param",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
	"script": "[ExampleDatabase].[dbo].[usp_DoSomeStuff](@TimeSliceStart);",
	"degreeOfParallelism": 5,
	"priority": 1,
	"parameters": {
		"TimeSliceStart": "$$Text.Format('{0:yyyyMMdd}', Time.AddMinutes(SliceStart, 0))"
	}
} // etc ... } ] // etc ...}}}

Notice in the extended properties we have a parameters attribute that can include children of the actual variables we want to pass to the U-SQL script.

Here’s the important things to understand about this ADF parameters attribute.

  • The name of the parameter must match the name of the variable expected by U-SQL exactly.
  • As you would expect the data types of the expected variable and JSON parameter must match.
  • It is perfectly acceptable to have multiple parameters in the ADF JSON and written in any order.

So how does this work?…

What ADF does when calling ADL is take the parameters listed in the  JSON and write out a bunch of U-SQL ‘DECLARE @Variable’ lines. These then get appended to the top of the actual U-SQL script before giving it to ADL as a job to run. You can see this if you go into the ADL Analytics blades in the Azure portal, select the job created by the ADF activity then choose Duplicate Script. This reveals the actual U-SQL used in the job.

Here’s the proof.

 

 

 

 

 

 

 

 

 

 

Then…

Just knowing what ADF does when converting the JSON parameters to U-SQL declared variables is the main take away here.

That’s it! I promised short and sweet.

Many thanks for reading.


29 Responses to Passing Parameters to U-SQL from Azure Data Factory

  • One remark:

    In order to use the parameters during devleopment of the script, you can declare them as EXTERNAL. For your script you could create a row with
    DECLARE EXTERNAL @TimeSliceStart string = “20161101”;

    Then development can test this dataset and the use of the parameter in the script. When Data Factory runs it will add the DECLARE @TimeSliceStart string parameter, that will override DECLARE EXTERNAL.

    And one question:
    You say that the datatype of the parameters should match. But have you found a way to pass parameters as integers? I tried this but got a Data Factory error that the value of the parameter had to be string. Solved this by inputting a string, and parsing it to another parameter that was integer in U-SQL.

  • Thanks for this- I was ripping my hair out digging through the ADLA .NET SDK trying to figure out how Data Factory did it. Running into “To be added.” in the documentation constantly doesn’t help.

  • hi Paul,
    Can be parameter value be substitute from within a config? I try and it doesn’t substitute.
    I use .properties.activities[0].typeProperties.parameters.paramName.

  • Thanks for your nice blog. I have tried to do an u-sql script, which includes ref assemblies and behind C# code and then executing it from ADF, as you have described, but I keep getting the following error.
    E_CSC_USER_DDLENTITYDOESNOTEXIST
    Message
    Procedure ‘XXXXX.dbo.scriptxx’ does not exist.
    Where should the usql script and other files be located? or do you have any other idea?
    Thanks
    Jakob

    • Hi Jakob, thanks for your comment. Yes, when calling USQL scripts from ADF that require C# methods you’ll need to build them separately and add the DLL’s as referenced assemblies. This is because ADF can only pass the USQL to ADL and not its built code behind file. I think this is problem with the service, but it can be worked around. I’ll probably write a separate blog post about this, given you’ve mentioned it. Thanks again.

      • Hi Poul,
        Thanks for your fast feedback, I did think the issue with the code behind, but still not success. Have you done anything special when uploading the initial job usp_DoSomeStuff? I did all my upload and development using VS2015.
        Thanks
        Jakob

        • Hi Jakob, nothing special. But if C# is required I compile it in Visual Studio as a class library as a separate project, but in the same solution as my data factory and data lake projects. Then I upload the DLL into blob storage and create the assembly in data lake analytics. Finally the procedure just needs a reference to the assembly in order to access the class methods.

          • Hi Poul,
            I got it. All I needed was to create by script as a procedure, using CREATE PROCEDURE. now it works
            Thanks again

            Jakob

  • Can’t pass a character as a parameter, as it gives the error “All strings must be enclosed in double quotes (“)”

    • Hi Kityy, thanks for your comment. Yes, you will need to be careful with the JSON injection into the USQL. I’d suggest checking the ADL job to see what ADF has passed for execution.

  • Hi Paul,

    we need to constantly perform calculations based on the input data we are receiving (refresh intervalls of ~5-10 minutes). The jobs themselves aren’t extremely complicated (just a couple of aggregates and calcuations of timespans) but they should run pretty frequently as they are used for monitoring industrial production.

    Do you think Data Factory along with U-SQL-Jobs is the right tool for the job?
    Analytics-Wise U-SQL offers all we need but i’m a bit concerned about the scheduling part.

    Data Factory has a limitation on the minimum frequency of 15 minutes to run jobs (according to https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits#data-factory-limits).

    The only other scheduling option i’m aware of is Azure Scheduler along with custom code to execute the jobs.

    Do you have any suggestions for us?

    Thank you very much!

    • Hi Markus, thanks for your comments. You are correct about the current frequency limit in Azure Data Factory. However, if you require something more regularly then its certainly possible to have multiple activities within a single pipeline in ADF, but calling the same USQL with time offsets to achieve the required throughput. I’d be happy to discuss and advise on the ADF design. Feel free to email sales@purplefrogsystems.com. Many thanks

  • How can avoid already processed file from U-SQL?

    • Hi Pawan, thanks for the question. I would suggest you pass the ADF time slice to your U-SQL script. Then include the value as part of your query WHERE clause. Check out this post as a starting point: https://www.purplefrogsystems.com/paul/2017/02/passing-parameters-to-u-sql-from-azure-data-factory/

      • Hello Paul,

        Can you please advise how can we move the file after processing it through U-SQL. I am thinking of calling C# code in U-SQL after all processing to move the file to processed folder.

        Please advise

        • Hi Gaurav
          Thanks for your comments. Depending on your requirements I’d suggest using Azure Data Factory.
          Cheers
          Paul

          • Thanks for you response.

            Does it mean.. Use ADF which call U-SQL Script. After processing how can i move the file.

            Please suggest.

          • You need two ADF activities. The first to execute the USQL. Then the second to perform the data movement.
            Thanks

  • Hi Paul,

    I am also having the same sort of concern. I am using ADF to call U-SQL script for processing files. After processing, how can I move the file from ADLS to BLOB STORAGE depending on some custom logic?

    Thanks,
    Suchismita

    • Hi Suchismita
      Thanks for your comment. As mentioned, I suggest using Azure Data Factory. This can handle the execution of the U-SQL and then with a downstream activity the movement of the dataset to other storage services.
      Kind regards
      Paul

      • If i have two activity one to execute U-SQL script and second activity to move the same file after processing.

        It can be possible that second activity move the file first even first activity does not process the same file using U-SQL.

        We are using V1 ADF and how can we control that first activity with U-SQL must completed first and than second activity move the file.

        Thanks,
        Gaurav sharma

  • Hi Paul,

    It is very much useful for me. Have 1question: When a U-SQL activity being executed from ADF, in ADLA it will create a job with some default name (Ex: ADF-9huu06-abcd) as shown above. I would like to know if we can specify a custom name, so that it would be easy to identify the job. Hope my question was clear. Many thanks.

    • Hi Prabhakar, thanks for the comment. Yes, that makes sense. As I understand Microsoft are already aware of this request and it is in the backlog of changes that can be voted for via the feedback website. Cheers

  • Hello, thank’s for this good article !

    Do you know if it’s possible to do the same with .NET SDK ?

    Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *

HTML tags are not allowed.

1,037 Spambots Blocked by Simple Comments

Paul’s Frog Blog

Paul is a Microsoft Data Platform MVP with 10+ years’ experience working with the complete on premises SQL Server stack in a variety of roles and industries. Now as the Business Intelligence Consultant at Purple Frog Systems has turned his keyboard to big data solutions in the Microsoft cloud. Specialising in Azure Data Lake Analytics, Azure Data Factory, Azure Stream Analytics, Event Hubs and IoT. Paul is also a STEM Ambassador for the networking education in schools’ programme, PASS chapter leader for the Microsoft Data Platform Group – Birmingham, SQL Bits, SQL Relay, SQL Saturday speaker and helper. Currently the Stack Overflow top user for Azure Data Factory. As well as very active member of the technical community.
Thanks for visiting.
@mrpaulandrew