0845 643 64 63

ADF

ADF Dataflow CTE workaround

At the time of writing, it is not possible to write a query using a CTE in the source of a dataflow. However, there are a few options to deal with this limitation:

  • re-write the query using subqueries instead of CTEs
  • use a stored procedure that contains the query and reference the stored proc in the source of the dataflow
  • write the query as a view and reference the view in the source of the dataflow (this is my preferred method and the one I will demo here)

I will use the following query purely as an example to demo this:

This query produces the following output:

If I write this query directly inside the source of the data flow as shown below, when trying to import the schema or preview the data I get the error message Incorrect syntax near the keyword ‘WITH’

However, if I create a view for this query and reference the view in the dataflow instead, this works and I can preview the data:

Note, in the source you can also write a query referencing the view (shown below), this is useful if you require additional logic on top of the view.

As mentioned earlier, you can also write the query in a stored procedure and reference this in a similar way to the above. You should now be able to use this simple method to use CTEs in the source for a dataflow.

How to parameterise the Execute Pipeline activity in Azure Synapse Analytics.

Unfortunately, as of April 2022 there is not an option to parameterise or add dynamic content to an “Execute Pipeline” activity to invoke a pipeline run.

However, with the use of a Microsoft API there is method which we can use to overcome this.

In this blog post we’ll use the “Pipeline – Create Pipeline Run” API documented below:

https://docs.microsoft.com/en-us/rest/api/synapse/data-plane/pipeline/create-pipeline-run

Using this API will allow us to dynamically pass in a pipeline name and trigger it. It will also us to dynamically pass in parameters to the called pipelines.

For this example, we’ll create 3 simple pipelines:

  1. “Wait 5 Seconds”
  2. “Wait 10 Seconds”
  3. “Wait 15 Seconds”

Plus a “Control Pipeline” that we’ll get onto later.

Each of the “Wait” pipelines contain a single “Wait” activity and a parameter called “WaitDuration” with a “Type” of “Int”. The idea being that we will dynamically pass in a wait duration from an array.

The “wait time in seconds” option in the settings tab of the “Wait” activity is then parameterised to refer to the parameter as shown below.

Next, we’ll build the “Control Pipeline” based on a “For Each” activity which will loop around the array displayed in notepad++ below. This array references the Pipeline Name that we want to dynamically trigger plus the Wait Duration we want it to wait for.

To use this array we’ll create a “ParameterArray” parameter with “Type” equal to “Array” in the “Control Pipeline”. We’ll set the default value equal to the array from above.

Next, within the settings tab of the “ForEach” activity we have the option of ticking the sequential option and listing the items we want to loop over.

Here we’ve got the “Sequential” option ticked. This just means that it will loop through each item in our array, starting with the very first item. For the “Items” option we’ll add the dynamic content which refers to our “ParameterArray” parameter.

Next, within the “ForEach” activity we’ll create a “Web” activity with the following options.

URL:

https://{Your Synapse Workspace Name}.dev.azuresynapse.net/pipelines/@{item().PipelineName}/createRun?api-version=2020-12-01

For example:

https://mytestsynapseworkspace.dev.azuresynapse.net/pipelines/@{item().PipelineName}/createRun?api-version=2020-12-01

*Notice here that I’ve added @{item().PipelineName} into the string which is our pipeline name from our “ParameterArray” parameter. This is what will make our Web API POST request dynamic.

Method:

POST

Headers:

Name : Content-Type

Value : application/json

Body:

{“WaitDuration”:@{item().WaitDuration}}

*Also notice here that i’ve added @{item().WaitDuration} into the string which is our wait duration from our “ParameterArray” parameter for each item. This allows us to feed each of the pipelines with the wait duration.

Integration runtime:

AutoResolveIntegrationRuntime

Authentication:

System Assigned Managed Identity

Resource:

https://dev.azuresynapse.net/

Once we’ve entered all of these details we should have a “Web” activity which looks similar to the image below:

Now let’s trigger our “Control Pipeline”:

As we can see each our 3 pipelines have been triggered by just triggering the “Control Pipeline”. We can also see that the respective parameters have been passed in successfully to each of the triggered pipelines.

Wait 5 Seconds:

Wait 10 Seconds:

Wait 15 Seconds:

Note, if you’re using Azure DevOps Git CI/CD integration you’ll need to make sure that the pipelines you want to execute already exist in live mode for the API request to recognise that they exist. If not you’ll get an error code suggesting that the pipeline doesn’t exist.

Also note that there may be some Azure permissions that you’ll need to configure with your Azure Administrator in the Azure portal to allow this to work.

In summary, this method is a trigger and forget approach. In this blog there is no consideration in the for each activity to wait for the pipeline to either succeed, fail or timeout. However with the use of the “Pipeline Runs – Get” API and a “Until” activity you can easily make this work.

Execute SQL Task in ADF

If you’re new to ADF or Synapse pipelines and looking for a way to execute SQL commands in pipelines, unfortunately there is no equivalent to the Execute SQL Task in SSIS, but I have found some alternatives which work well for me.

For something as simple as a SELECT statement where you want to return results to be used later, you can use a Lookup activity:

You can then reference the output of this activity in other activities using dynamic content such as @activity(‘LookUp’).output.firstRow.DateKey.

Although you can also use the Lookup activity for DML commands, it is not designed for this and so I’ve found another way to do this. First, create the following stored procedure in the database:

This stored procedure will execute the SQL command passed in via the @sql parameter. Now you can use the Stored procedure activity to point this newly created stored procedure and pass in a SQL command as shown here:

This activity will now execute the SQL command stated in the parameter value. Using this approach helps keep things consistent across your pipelines for SQL commands as they all use the Stored procedure activity, and the only difference is the SQL command used in the parameters section.

You can also add dynamic content to dynamically build up the SQL command that is passed into the stored procedure, this can allow you to use the same configuration for multiple Stored procedure activities but with different outcomes. For example if you had pipeline variables, the above TRUNCATE TABLE command could be replaced with @concat(‘TRUNCATE TABLE ‘,variables(‘SchemaName’),’.’,variables(‘TableName’)) and this would truncate a different table per pipeline based on the variable values.

Note: there is a limit on the number of characters the parameter value can take, so if you have a large SQL command it is best to create a stored procedure specifically for that command and to reference that stored procedure in the activity.

Pretty simple in the end, I hope this helps!

ADF Switch Activity – A neat solution for nested IFs.

Whilst carrying out some work for a client using Azure Data Factory I was presented with the challenge of triggering different activities depending on the result of a stored procedure.

In fact the challenge posed was to…

  • Execute ‘Copy A’ activity if the result of a stored procedure returned (A),
  • Execute ‘Copy B’ activity if the result of a stored procedure returned (B),
  • Execute ‘Copy C’ activity if the result of a stored procedure returned (C)
  • Finally execute an error stored procedure if the result of the stored procedure returned (D).

Prior to the switch statement I could achieve this using 4 ‘IF’ activities connected to a lookup activity as shown in the snip below using my ‘Wait’ example pipeline.

However a neater solution is to use the ‘Switch’ activity to do this work instead. I’ll now jump straight into a worked example to show you how I achieved this.

I created a demo.RandomNumberGenerator stored procedure in our Purple Frog Demo database which declares a variable and sets it equal to a random number between 1 and 4. The stored procedure then uses this variable in a case statement to return a string. So if the @randomnumber variable returns 1 the case statement returns ‘Wait 1 Minute’.

I then used a lookup activity in ADF to return the result of this stored procedure which I then used to ‘feed’ my ‘Switch’ activity.

When I preview this lookup activity it just returns the result of the stored procedure. When this is run time and time again it just returns a random ‘ADFAction’ as named in my case statement generated by my rand() SQL function. So in the scenario above it returned an action of ‘Wait 1 Minute’. If I were to preview the stored procedure again it might return ‘Wait 4 Minutes’ instead.

I can then connect my switch activity to this lookup activity using the ‘success’ green output connector.

I now need to add some dynamic content to refer to the output of my lookup activity. I don’t need to create dynamic nested IF content to say if my stored procedure equals ‘Wait 1 Minute’ then do this, if my stored procedure equals ‘Wait 2 Minutes’ then do this… all I need to do is refer to the output of the connected lookup activity ‘Random Number Lookup Activity’.

So in my example this is simply just @activity(‘Random Number Lookup Activity’).output.firstrow.adfaction. ADF will then match the output of the stored procedure to the case statements I provide it. The default activity is just a set of activities that are executed when the expression evaluation isn’t satisfied. I.e. the output of my the stored procedure doesn’t match any of the case statements I provide it.

In the snip below I have provided 4 case statements which match all of the possible outputs from my stored procedure. Essentially there isn’t any need for default activity as my stored procedure will always return 1 of 4 results but it’s useful to know for future reference.

Within each case statement there is an activity for which I would like ADF to perform if it matches the result of my stored procedure. In my example these are just ‘Wait’ activities. So if the stored procedure returns ‘Wait 1 Minute’ I’ve put a ‘Wait’ activity within the case statement to wait 1 minute (or 60 seconds).

When I first ran this pipeline in debug mode I could see that the output of the stored procedure was ‘Wait 2 Minutes’. The switch activity then matched this to the case statements I provided and performed the activity within the ‘Wait 2 Minutes’ case statement and triggered my ‘Wait 2 Minutes’ activity.

When I debugged another run my stored procedure returned ‘Wait 4 Minutes’. The switch activity then executed my ‘Wait 4 Minutes’ activity. Perfect!

So it’s pretty much as simple that!

I hope this helps all you ADF’ers out there!

ADF: Where have my connections gone?

If you’ve been doing any development work in ADF this week you might have noticed that “Connections” has moved. But where has it gone?

The old location

When you click onto “Connections” now you’ll receive the following message:

Management Hub?

Clicking this button takes you to the new Management Hub, the new fourth icon which goes alongside the existing three.

New Manage (Management Hub)

Within this area you now have access to your Linked Services, Integration Runtimes and Triggers. These all have the same options as before, they’ve just moved!

Moved options and New options

You do now have access to more detailed Git configuration options and also the ability to change the Parameterization templates for your ARM templates. This is only accessible when you have a Git repository set up and the Parameter file within it.

You can currently still access Triggers from the main Author window, but I would recommend getting used to finding them in the Management Hub to keep everything together.

Release details of the Management Hub from Microsoft can be found here: https://docs.microsoft.com/en-us/azure/data-factory/author-management-hub

Azure Data Factory (ADF) Parent Activities don’t trigger ON ERROR OUTPUT when Child Activities Error… What happens?

So that title is a big question that a client of ours came across that required some additional investigation… What does happen and how do we get around it?

In the below example we’ll use a simple IF Activity as our Parent Activity. To explain briefly how it functions, you provide it with a query that evaluates to either True of False. This then passes down to a set of Activities If True and a different set of Activities If False. We then have three Outputs, On Success, On Failure and On Completion. This looks like:

(Image 1) Pipeline with If Activity and three outputs

The IF Activity therefore has logically four different things it does given the query criteria:

  1. If the query evaluates correctly, then:
    • If True, Pass to True Activities
    • If False, Pass to False Activities
  2. Error Activity if the query doesn’t evaluate to True or False.

Following that, we have the below options:

  • When 1. If True Activities complete with Success, Trigger On Success Output and On Completion Output
  • When 1. If False Activities complete with Success, Trigger On Success Output and On Completion Output
  • When 2. completes with Error, Trigger On Error Output and On Completion Output

The issue occurs with the below options:
What happens when the If True Activities complete with Error?
What happens when the If False Activities complete with Error?

Logically you would expect that to cause the IF Activity to error as well. Right? WRONG!

Although it shows as if it has an error in the pipeline view:

(Image 2) If Condition Error, without passing down On Success or On Failure?

The details show “Activity failed because an inner activity failed.”:

(Image 3) If Condition Error Message

This means that the IF Activity itself has still started down it’s success pathway and is now just showing the error from further down. This causes the IF Activity to complete with neither Success or Error! (You can see this in Image 2 above). You can also see that the On Completion Output has triggered successfully.

This means that the Error Output of the If Activity can only be used to handle Errors on the Evaluation of the If Query itself. (When it can’t get to true or false, or can’t execute for another reason).

If you want to monitor the steps inside the IF Activity, you’ll need to put those output pathways inside the True and/or False activities themselves and handle this at a deeper level.

I have launched a new twitter account http://www.twitter.com/PurpleFrogReiss so please follow me there for updates on when I post new blog posts!

Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Jeet Kainth
Jon Fletcher
Nick Edwards
Joe Billingham
Lewis Prince
Reiss McSporran
Microsoft Gold Partner

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon