Like most things developed its very normal to have multiple environments for the same solution; dev, test, prod etc. Azure Data Factory is no exception to this. However, where it does differ slightly is the way it handles publishing to different environments from the Visual Studio tools provided. In this post we’ll explore exactly how to create Azure Data Factory (ADF) configuration files to support such deployments to different Azure services/directories.
For all the examples in this post I’ll be working with Visual Studio 2015 and the ADF extension available from the market place or via the below link.
Before we move on lets take a moment to say that Azure Data Factory configuration files are purely a Visual Studio feature. At publish time Visual Studio simply takes the config file content and replaces the actual JSON attribute values before deploying in Azure. That said, to be explicit.
- An ADF JSON file with attribute values missing (because they come from config files) cannot be deployed using the Azure portal ‘Author and Deploy’ blade. This will just fail validation as missing content.
- An ADF JSON config file cannot be deployed using the Azure portal ‘Author and Deploy’ blade. It is simply not understood by the browser based tool as a valid ADF schema.
Just as an aside. Code comments in ADF JSON files are also purely a Visual Studio feature. You can only comment your JSON in the usual way in Visual Studio, which at publish time will strip these out for you. Any comments left in code that you copy and paste into the Azure portal will return as syntax errors! I have already given feedback to the Microsoft product team that code comments in the portal blades would be really handy. But I digress.
Apologies in advance if I switch between the word publish and deploy too much. I mean the same thing. I prefer deploy, built in a Visual Studio ADF solution its called publish in the menu.
Creating a ADF Configuration File
First lets use the Visual Studio tools to create a common set of configuration files. In a new ADF project you have the familiar tree including Linked Services, Pipelines etc. Now right click on the project and choose Add > New Item. In the dialogue presented choose Config and add a Configuration File to the project, with a suitable name.
I went to town and did a set of three 🙂
Each time you add a config file to your ADF project. Or any component for that matter. You’ll be aware that Visual Studio tries to help you out by giving you a JSON template or starter for what you might want. This is good, but in the case of ADF config files isn’t that intuitive. Hence this blog post. Lets move on.
Populating the Configuration File
Before we do anything let me attempt to put into word what we need to do here. Every JSON attribute has a reference of varying levels to get to its value. When we recreate a value in our config file we need to recreate this reference path exactly from the top level of the component name. In the config file this goes as a parent (at the same level as schema) followed by square brackets [ ] which then contain the rest of the content we want to replace. Next within the square brackets of the component we need pairs of attributes (name and value). These represent the references to the actual component structure. In the ‘name’ value we start with a $. which represents the root of the component file. Then we build up the tree reference with a dot for each new level. Lastly, the value is as it says. Just the value to be used instead of whatever may be written in the actual component file.
Make sense? Referencing JSON with JSON? I said it wasn’t intuitive. Lets move on and see it.
Lets populate our configuration files with something useful. This of course greatly depends on what your data factory is doing as to what values you might want to switch between environments, but lets start with a few common attributes. For this example lets alter a pipelines schedule start, end and paused values. I always publish to dev as paused to give me more control over running the pipeline.
At the bottom of our pipeline component file I’ve done the following.
//etc...
//activities block
],
"start": "1900-01-01", /**/
"end": "1900-01-01", /**/
"isPaused": /**/,
"pipelineMode": "Scheduled"
}
}
… Which means in my config file I need to create the equivalent set of attribute references and values. Note; the dollar for the root, then one level down into the properties namespace. Then another dot before the attribute.
{
"ExactNameOfYourPipeline": [ // <<< Component name. Exactly!
{
"name": "$.properties.isPaused",
"value": true
},
{
"name": "$.properties.start",
"value": "2016-08-01"
},
{
"name": "$.properties.end",
"value": "2017-06-01"
}
]
}
A great thing about this approach with ADF tools in Visual Studio is that any attribute value can be overridden with something from a config file. It's really flexible and each component can be added in the same way regardless of type. There are however some quirks/features to be aware of, as below.
- All parent and child name referencing within the config file must match its partner in the actual component JSON file exactly.
- All referencing is case sensitive. But Visual Studio won't validate this for you in intellisense or when building the project.
- In the actual component file some attribute values can be left blank as they come from config. Others cannot and will result in the ADF project failing to build.
- For any config referencing that fails. You'll only figure this out when you publish and check the Azure portal to see that the JSON file in place has its original content. Fun.
Right then. Hope that's all clear as mud 🙂
Publishing using Different Configurations
Publishing is basically the easy bit. Involving a wizard so I don't need to say much here.
Right click on the project in Visual Studio and choose Publish. In the publish items panel of the wizard simply select the config file you want to use for the deployment.
I hope this post of helpful and saved you some time when developing with ADF.
Many thanks for reading.
It appears that only pipelines (and linked services I think) accept parameters. Datasets do not. Is that correct? In my case, I’m wanting to change dataset availability via config parameters.
When I create my config file it says, “”. Table is the name it uses to reference datasets in VS, so I believe that can be updated with the proper reference as well.
One note: to configure an activity, you can reference it by index, e.g.:
{
“name”: “$.properties.activities[0].typeProperties.storedProcedureParameters.whatever”,
“value”: “value1”
}
Also, I’m able to configure datasets without any problems, not sure what problem you had @Danny B
Can I include app.config of class library project as a configurable file while deploying custom activity to ADF ?