JSON

What’s New in Azure Data Factory Version 2 (ADFv2)

I’m sure for most cloud data wranglers the release of Azure Data Factory Version 2 has been long overdue. Well good news friends. It’s here! So, what new features does the service now offer for handling our Azure data solutions?… In short, loads!

In this post, I’ll try and give you an overview of what’s new and what to expect from ADFv2. However, I’m sure more questions than answers will be raised here. As developers we must ask why and how when presented with anything. But let’s start somewhere.

Note: the order of the sub headings below was intentional.

Before diving into the new and shiny I think we need to deal with a couple of concepts to understand why ADFv2 is a completely new service and not just an extension of what version 1 offered.

Let’s compare Azure Data Factory Version 1 and Version 2 at a high level.

  • ADFv1 – is a service designed for the batch data processing of time series data.
  • ADFv2 – is a very general-purpose hybrid data integration service with very flexible execution patterns.

This makes ADFv2 a very different animal and something that now can handle scale out control flow and data flow patterns for all our ETL needs. Microsoft seemed to have got the message here, following lots of feedback from the community, that this is the framework we want for developing our data flows. Plus, is how we’ve been working for a long time with the very mature SQL Server Integration Services (SSIS).
 
 
 

Concepts:

Integration Runtime (IR)

Everything done in Azure Data Factory v2 will use the Integration Runtime engine. The IR is the core service component for ADFv2. It is to the ADFv2 JSON framework of instructions what the Common Language Runtime (CLR) is to the .Net framework.

Currently the IR can be virtualised to live in Azure, or it can be used on premises as a local emulator/endpoint. To give each of these instances their proper JSON label the IR can be ‘SelfHosted’ or ‘Managed’. To try and put that into context, consider the ADFv1 Data Management Gateway as a self-hosted IR endpoint (for now). This distinction between hosted and managed IR’s will also be reflected in the data movement costs on your subscription bill, but let’s not get distracted with pricing yet.

The new IR is designed to perform three operations:

  1. Move data.
  2. Execute ADF activities.
  3. Execute SSIS packages.

Of course, points 1 and 2 here aren’t really anything new as we could already do this in ADFv1, but point 3 is what should spark the excitement. It is this ability to transform our data that has been missing from Azure that we’ve badly needed.

With the IR in ADFv2 this means we can now lift and shift our existing on premises SSIS packages into the cloud or start with a blank canvas and create cloud based scale out control flow and data flow pipelines, facilitated by the new capabilities in ADFv2.

Without crossing any lines, the IR will become the way you start using SSIS in Azure, regardless of whether you decide to wrap it in ADFv2 or not.

Branching

This next concept I assume for anyone that’s used SSIS won’t be new. But it’s great to learn that we now have it available in the ADFv2 control flow (at an activity level).

Post execution our downstream activities can now be dependent on four possible outcomes as standard.

  • On success
  • On failure
  • On completion
  • On skip

Also, custom ‘if’ conditions will be available for branching based expressions (more on expressions later).


That’s the high-level concepts dealt with. Now, for ease of reading let’s break the new features down into two main sections. The service level changes and then the additions to our toolkit of ADF activities.

Service Features:

Web Based Developer UI

This won’t be available for use until later in the year but having a web based development tool to build our ADF pipelines is very exciting!… No more hand crafting the JSON. I’ll leave this point just with a sneaky picture. I’m sure this explains more than I can in words.

It will include an interface to GitHub for source control and the ability the execute the activities directly in the development environment.

For field mappings between source and destination the new UI will also support a drag and drop panel, like SSIS.

Better quality screen shots to follow as soon as its available.

Expressions & Parameters

Like most other Microsoft data tools, expressions give us that valuable bit of inline extensibility to achieve things more dynamically when developing. Within our ADFv2 JSON we can now influence the values of our attributes in a similar way using a rich new set of custom inner syntax, secondary to the ADF JSON. To support the expressions factory-wide, parameters will become first class citizens in the service.

As a basic example, before we might do something like this:

1
"name": "value"

Now we can have an expression and return the value from elsewhere, maybe using a parameter like this:

1
"name": "@parameters('StartingDatasetName')"

With the @ symbol becoming important here for the start of the inline expression. The expression syntax is rich and offers a host of inline functions to call and manipulate our service. These include:

  • String functions – concat, substring, replace, indexof etc.
  • Collection functions – length, union, first, last etc.
  • Logic functions – equals, less than, greater than, and, or, not etc.
  • Conversation functions – coalesce, xpath, array, int, string, json etc.
  • Math functions – add, sub, div, mod, min, max etc.
  • Date functions – utcnow, addminutes, addhours, format etc.

System Variables

As a good follow on from the new expressions/parameters available we now also have a handful of system variables to support our JSON. These are scoped at two levels with ADFv2.

  1. Pipeline scoped.
  2. Trigger scoped (more on triggers later).

The system variables extend the parameter syntax allowing us to return values like the data factory name, the pipeline name and a specific run ID. Variables can be called in the following way using the new @ symbol prefix to reference the dynamic content:

1
"attribute": "@pipeline().RunId"

Inline Pipelines

For me this is a deployment convenience thing. Before and currently our linked services, datasets and pipelines were separate JSON files within our Visual Studio solution. Now an inline pipeline can house all its required parts within its own properties. Personally, I like having a single reusable linked service for various datasets in one place that only needs updating with new credentials once. Why would you duplicate these settings as part of several pipelines? Maybe if you want some complex expressions to influence your data handling and you are limited by the scope of a system variable, an inline pipeline may then be required.

Anyway, this is what the JSON looks like:

1
2
3
4
5
6
7
8
9
{
    "name": "SomePipeline",
    "properties": {
		"activities": [], 		//before
		"linkedServices": [], 		//now available
		"datasets": [],			//now available
		"parameters": []		//now available
		}
}

Beware, if you use the ADF copy wizard via the Azure portal. An inline pipeline is what you’ll now get back.

Activity Retry & Pipeline Concurrency

In ADFv2 our activities will be categorised as control and non-control types. This is mainly to support the use of our new activities like ‘ForEach’ (more on the activity itself later). A ‘ForEach’ activity sits within the category of a control type. Meaning it will not have retry, long retry and concurrency options available within its JSON policy block. I think it’s logical that something like a sequential looping can’t concurrency run, so just be aware that such JSON attributes will now be validated depending on the category of the activity.

Our familiar and existing activities like ‘Copy’, ‘Hive’ and ‘U-SQL’ will therefore be categorised as non-control types with policy attributes remaining the same.

Event Triggers

Like our close friend Azure Logic Apps, ADFv2 can perform actions based on triggered events. So far, the only working example of this requires an Azure Blob Storage account that will output a file arrival event. It will be great to replace those time series polling activities that needed to keep retrying until the file appeared with this event based approach.

Scheduled Triggers

You guessed it. We can now finally schedule our ADF executions using a defined recursive pattern (with enough JSON). This schedule will sit above our pipelines as a separate component within ADFv2.

  • A trigger will be able to start multiple pipelines.
  • A pipeline can be started by multiple scheduled triggers.

Let’s look at some JSON to help with the understanding.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
{
  "properties": {
    "type": "ScheduleTrigger",
    "typeProperties": {
      "recurrence": {
        "frequency": Minute, Hour, Day, Week, Year,
        "interval": ,  // optional, how often to fire (default to 1)
        "startTime": ,
        "endTime": ,
        "timeZone": 
        "schedule": {  // optional (advanced scheduling specifics)
          "hours": 0-24,
          "weekDays": ": ,
          "minutes": 0-60,
          "monthDays": 1-31,
          "monthlyOccurences": [
               {
                    "day": ,
                    "occurrence": 1-5
               }
           ] 
      }
    },
   "pipelines": [ // pipeline here
            {
                "pipelineReference": {
                    "type": "PipelineReference",
                    "referenceName": ""
                },
                "parameters": {
                    "": {
                        "type": "Expression",
                        "value": ""
                    },
                    " : ""
                }
           }
      ]
  }
}

Tumbling Window Triggers

For me, ADFv1 time slices simply have a new name. A tumbling window is a time slice in ADFv2. Enough said on that I think.

Depends On

We know that ADF is a dependency driven tool in terms of datasets. But now activities are also dependency driven with the execution of one providing the necessary information for the execution of the second. The introduction of a new ‘DependOn’ attribute/clause can be used within an activity to drive this behaviour.

The ‘DependsOn’ clause will also provide the branching behaviour mentioned above. Quick example:

1
"dependsOn": [ { "dependencyConditions": [ "Succeeded" ], "activity": "DownstreamActivity" } ]

More to come with this explanation later when we talk about the new ‘LookUp’ activity.

Azure Monitor & OMS Integration

Diagnostic logs for various other Azure services have been available for a while in Azure Monitor and OMS. Now with a little bit of setup ADFv2 will be able to output much richer logs with various metrics available across a data factory services. These metrics will include:

  • Successful pipeline runs.
  • Failed pipeline runs.
  • Successful activity runs.
  • Failed activity runs.
  • Successful trigger runs.
  • Failed trigger runs.

This will be a great improvement on the current PowerShell or .Net work required with version 1 just to monitor issues at a high level.
If you want to know more about Azure Monitor go here: https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-overview-azure-monitor

PowerShell

It’s worth being aware that to support ADFv2 there will be a new set of PowerShell cmdlets available within the Azure module. Basically, all named the same as the cmdlets used for version 1 of the service, but now including ‘V2’ somewhere in the cmdlet name and accepting parameters specific to the new features.

Let’s start with the obvious one:

1
2
3
4
New-AzureRmDataFactoryV2 `
	-ResourceGroupName "ADFv2" `
	-Name "PaulsFunFactoryV2" `
	-Location "NorthEurope"

Or, a splatting friendly version for the PowerShell geeks 🙂

1
2
3
4
5
6
$parameters = @{
    Name = "PaulsFunFactoryV2"
    Location = "NorthEurope"
    ResourceGroupName = "ADFv2"
}
New-AzureRmDataFactoryV2  @parameters

Pricing

This isn’t a new feature as such, but probably worth mentioning that with all the new components and functionality in ADFv2 there is a new pricing model that you’ll need to do battle with. More details here: https://azure.microsoft.com/en-gb/pricing/details/data-factory/v2

Note: the new pricing tables for SSIS as a service with variations on CPU, RAM and Storage!


Activities:

Lookup

This is not an SSIS data transformation lookup! For ADFv2 we can lookup a list of datasets to be used in another downstream activity, like a Copy. I mentioned earlier that we now have a ‘DependsOn’ clause in our JSON, lookup is a good example of why we might use it.

Scenario: we have a pipeline containing two activities. The first lookups of some list of datasets (maybe some tables in a SQLDB). The second performs the data movement using the results of the lookup so it knows what to copy. This is very much a dataset level handling operation and not a row level data join. I think a picture is required:

Here’s a JSON snippet, which will probably be a familiar structure for those of you that have ever created an ARM Template.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"name": "SomePipeline",
"properties": {
    "activities": [
        {
            "name": "LookupActivity", //First
            "type": "Lookup"
        },
        {
            "name": "CopyActivity", //Second
            "type": "Copy",              
            "dependsOn": [  //Dependancy
                {
                    "activity": "LookupActivity"
                }
            ],
            "inputs": [],  //From Lookup
            "outputs": []
        }
    ]        
}}

Currently the following sources can be used as lookups, all of which need to return a JSON dataset.

  • Azure Storage (Blob and Table)
  • On Premises Files
  • Azure SQL DB

HTTP

With the HTTP activity, we can call out to any web service directly from our pipelines. The call itself is a little more involved than a typical web hook and requires an XML job request to be created within a workspace. Like other activities ADF doesn’t handle the work itself. It passes off the instructions to some other service. In this case it uses the Azure Queue Service. The queue service is the compute for this activity that handles the request and HTTP response, if successful this get thrown back up to ADF.

There’s something about needing XML inside JSON for this activity that just seems perverse. So much so that I’m not going to give you a code snippet 🙂

Web (REST)

Our new web activity type is simply a REST API caller. Which I assume doesn’t require much more explanation. In ADFv1 if we wanted to make a REST call a custom activity was required and we needed C# for the interface interaction. Now we can do it directly from the JSON with child attributes to cover all the usual suspects for REST APIs:

  • URL
  • Method (GET, POST, PUT)
  • Headers
  • Body
  • Authentication

ForEach

The ForEach activity is probably self-explanatory for anyone with an ounce of programming experience. ADFv2 brings some enhancements to this. You can use a ForEach activity to simply iterate over a collection of defined items one at a time as you would expect. This is done by setting the IsSequential attribute of the activity to True. But you also have the ability to perform the activity in parallel, speeding up the processing time and using the scaling power of Azure.

For example: if you had a ‘ForEach’ Activity iterating over a ‘Copy’ operation, with 10 different items, with the attribute “isSequential” set to false, all copies will execute at once. ForEach then offers a new maximum of 20 concurrent iterations, compared to a signal non-control activity with its concurrency supporting only a maximum of 10.

To try and clarify, the ForEach activity accepts items and is developed as a recursive thing. But on execution you can chosoe to process them sequentially or in parallel (up to a maxuimum of 20). Maybe a picture will help:

Going even deeper, the ‘ForEach’ activity is not confined to only processing a single activity, it can also iterate over a collection of other activities, meaning we can nest activities in a workflow where ‘ForEach’ is the parent/master activity. The items clause for the looping still needs to be provided as a JSON array, maybe by an expression and parameter within your pipeline. But those items can reference another inner block of activities.

There will definitely be a follow up blog post on this one with some more detail and a better explanation, come back soon 🙂

Meta Data

Let’s start by defining what metadata is within the context of ADFv2. Meta data includes the structure, size and last modified date information about a dataset. A metadata activity will take a dataset as an input, and output the various information about what it’s found. This output could then be used as a point of validation for some downstream operation. Or, for some dynamic data transformation task that needs to be told what dataset structure to expect.

The input JSON for this dataset type needs to know the basic file format and location. Then the structure will be worked out based on what it finds.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
"name": "MyDataset",
"properties": {
"type": "AzureBlob",
	"linkedService": {
		"referenceName": "StorageLinkedService",
		"type": "LinkedServiceReference"
	},
	"typeProperties": {
		"folderPath":"container/folder",
		"Filename": "file.json",
		"format":{
			"type":"JsonFormat"
			"nestedSeperator": ","
		}
	}
}}

Currently, only datasets within Azure blob storage are supported.

I’m hoping you are beginning to see how branching, depends on condititions, expressions and parameters are bringing you new options when working with ADFv2, where one new features uses the other.


The next couple as you’ll know aren’t new activities, but do have some new options available when creating them.

Custom

Previously in our .Net custom activity code we could only pass static extended properties from the ADF JSON down to the C# class. Now we have a new ‘referenceObjects’ attribute that can be used to access information about linked services and datasets. Example JSON snippet below for an ADFv2 custom activity:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
  "name": "SomePipeline",
  "properties": {
    "activities": [{
      "type": "DotNetActivity",
      "linkedServiceName": {
        "referenceName": "AzureBatchLinkedService",
        "type": "LinkedServiceReference"
      },
		"referenceObjects": { //new bits
          "linkedServices": [],
		  "datasets": []
        },
        "extendedProperties": {}
}}}

This completes the configuration data for our C# methods giving us access to things like the connection credentials used in our linked services. Within the IDotNetActivity class we need the following methods to get these values.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
static void Main(string[] args)
{
    CustomActivity customActivity = 
        SafeJsonConvert.DeserializeObject(File.ReadAllText("activity.json"), 
        DeserializationSettings) as CustomActivity;
    List linkedServices = 
        SafeJsonConvert.DeserializeObject(File.ReadAllText("linkedServices.json"), 
        DeserializationSettings);
    List datasets = 
        SafeJsonConvert.DeserializeObject(File.ReadAllText("datasets.json"), 
        DeserializationSettings);
}
 
static JsonSerializerSettings DeserializationSettings
{
    get
    {
        var DeserializationSettings = new JsonSerializerSettings
        {
            DateFormatHandling = Newtonsoft.Json.DateFormatHandling.IsoDateFormat,
            DateTimeZoneHandling = Newtonsoft.Json.DateTimeZoneHandling.Utc,
            NullValueHandling = Newtonsoft.Json.NullValueHandling.Ignore,
            ReferenceLoopHandling = Newtonsoft.Json.ReferenceLoopHandling.Serialize
        };
        DeserializationSettings.Converters.Add(new PolymorphicDeserializeJsonConverter("type"));
        DeserializationSettings.Converters.Add(new PolymorphicDeserializeJsonConverter("type"));
        DeserializationSettings.Converters.Add(new PolymorphicDeserializeJsonConverter("type"));
        DeserializationSettings.Converters.Add(new TransformationJsonConverter());
 
        return DeserializationSettings;
    }
}

Copy

This can be a short one as we know what copy does. The activity now supports the following new data sources and destinations:

  • Dynamics CRM
  • Dynamics 365
  • Salesforce (with Azure Key Vault credentials)

Also as standard ‘copy’ will be able to return the number of rows processed as a parameter. This could then be used with a branching ‘if’ condition when the number of expected rows isn’t available for example.


Hopefully that’s everything and your now fully up to date with ADFv2 and all the new and exciting things it has to offer. Stay tuned for more in depth posts soon.

For more information check out the Microsoft documentation on ADF here: https://docs.microsoft.com/en-gb/azure/data-factory/introduction

Many thanks for reading.

 

Special thanks to Rob Sewell for reviewing and contributing towards the post.


Chaining Azure Data Factory Activities and Datasets

As I work with Azure Data Factory (ADF) and help others in the community more and more I encounter some confusion that seems to exist surrounding how to construct a complete dependency driven ADF solution. One that chains multiple executions and handles all of your requirements. In this post I hope to address some of that confusion and will allude to some emerging best practices for Azure Data Factory usage.

First a few simple questions:

  • Why is there confusion? In my opinion this is because the ADF copy wizard available via the Azure portal doesn’t help you architect a complete solution. It can be handy to reverse certain things, but really the wizard tells you nothing about the choices you make and what the JSON behind it is doing. Like most wizards, it just leads to bad practices!
  • Do I need several data factory services for different business functions? No, you don’t have to. Pipelines within a single data factory service can be disconnected for different processes and often having all your linked services in one place is easier to manage. Plus a single factor offers reusability and means I single set of source code etc.
  • Do I need one pipeline per activity. No, you can house many activities in a single pipeline. Pipelines are just logic containers to assist you when managing data orchestration tasks. If you want an SSIS comparison, think of them as sequence containers. In a factory I may group all my on premises gateway uploads into a single pipeline. This means I can pause that stream of uploads on demand. Maybe when the gateway keys needs to be refreshed etc.
  • Is the whole data factory a pipeline? Yes, in concept. But for technical terminology a pipeline is a specific ADF component. The marketing people do love to confuse us!
  • Can an activity support multiple inputs and multiple outputs? Generally yes. But there are exceptions depending on the activity type. U-SQL calls to Azure Data Lake can have multiples of both. ADF doesn’t care as long as you know what the called service is doing. On the other hand a copy activity needs to be one to one (so Microsoft can charge more for data movements).
  • Does an activity have to have an input dataset? No. For example, you can create a custom activity that executes your code for a defined time slice without an input dataset, just the output.

Datasets

Moving on, lets go a little deeper and think about a scenario that I use in my community talks. We have an on premises CSV file. We want to upload it. Clean it and aggregate the output. For each stage of this process we need to define a dataset for Azure Data Factory to use.

To be clear, a dataset in this context is not the actual data. It is just a set of JSON instructions that defines where and how our data is stored. For example, its file path, its extension, its structure, its relationship to the executing time slice.

Lets define each of the datasets we need in ADF to complete the above scenario for just 1 file:

  1. The on premises version of the file. Linked to information about the data management gateway to be used, with local credentials and file server/path where it can be accessed.
  2. A raw Azure version of the file. Linked to information about the data lake storage folder to be used for landing the uploaded file.
  3. A clean version of the file. Linked to information about the output directory of the cleaning process.
  4. The aggregated output file. Linked to information about the output directory of the query being used to do the aggregation.

All of the linked information to these datasets should come from your ADF linked services.

So, we have 1 file to process, but in ADF we now need 4 datasets defined for each stage of the data flow. These datasets don’t need to be complex, something as simple as the following bit of JSON will do.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
  "name": "LkpsCurrencyDataLakeOut",
  "properties": {
    "type": "AzureDataLakeStore",
    "linkedServiceName": "DataLakeStore",
    "structure": [ ],
    "typeProperties": {
      "folderPath": "Out",
      "fileName": "Dim.Currency.csv"
    },
    "availability": {
      "frequency": "Day",
      "interval": 1
    }
  }
}

Activities

Next, our activities. Now the datasets are defined above we need ADF to invoke the services that are going to do the work for each stage. As follows:

Activity (JSON Value) Task Description Input Dataset Output Dataset
Copy Upload file from local storage to Data Lake storage. 1 2
DotNetActivity Perform transformation/cleaning on raw source file. 2 3
DataLakeAnalyticsU-SQL Aggregate the datasets to produce a reporting output. 3 4

From the above table we can clearly see the output dataset of the first activity becomes the input of the second. The output dataset of the second activity becomes the input of the third. Apologies if this seems obvious, but I have know it to confuse people.

Pipelines

For our ADF pipeline(s) we can now make some decisions about how we want to manage the data flow.

  1. Add all the activities to a single pipeline meaning we can stop/start everything for this 1 dataset end to end.
  2. Add each activity to a different pipeline dependant on its type. This is my starting preference.
  3. Have the on premises upload in one pipeline and everything else in a second pipeline.
  4. Maybe separate your pipelines and data flows depending on the type of data. Eg. Fact/dimension. Finance and HR.

The point here, is that it doesn’t matter to ADF, it’s just down to how you want to control it. When I created the pipelines for my talk demo I went with option 2. Meaning I get the following pretty diagram, arranged to fit the width of my blog 🙂

Here we can clearly see at the top level each dataset flowing into a pipeline and its child activity. If I’m constructed this using option 1 above I would simply see the first dataset and the fourth with 1 pipeline box. I could then drill into the pipeline to see the chain activities within. A repeat, this doesn’t matter to ADF.

I hope you found the above useful and a good starting point for constructing your ADF data flows.

Best Practices

As our understanding of Azure Data Factory matures I’m sure some of the following points will need to be re-written, but for now I’m happy to go first and start laying the ground work of what I consider to be best for ADF usage. Comments very welcome.

  1. Resist using the wizard, please.
  2. Keep everything within a single ADF service if you can. Meaning linked services can be reused.
  3. Disconnect your on premises uploads using a single pipeline. For ease of management.
  4. Group your activities into natural pipeline containers for the operation type or data category.
  5. Layout your ADF diagram carefully. Left to right. It makes understanding it much easier for others.
  6. Use Visual Studio configuration files to deploy ADF projects between Dev/Test/Live. Ease of source control and development.
  7. Monitor activity concurrency and time outs carefully. ADF will kill called service executions if breached.
  8. Be mindful of activity cost and group inputs/outputs for data compute where possible.
  9. Use time slices to control your data volumes. Eg. Pass the time slice as a parameter to the called compute service.

What next? Well, I’m currently working on this beast…

  • 127x datasets.
  • 71x activities.
  • 9x pipelines.

… and I’ve got about another third left to build!

Many thanks for reading.


Using Azure Data Factory Configuration Files

Like most things developed its very normal to have multiple environments for the same solution; dev, test, prod etc. Azure Data Factory is no exception to this. However, where it does differ slightly is the way it handles publishing to different environments from the Visual Studio tools provided. In this post we’ll explore exactly how to create Azure Data Factory (ADF) configuration files to support such deployments to different Azure services/directories.

For all the examples in this post I’ll be working with Visual Studio 2015 and the ADF extension available from the market place or via the below link.

https://marketplace.visualstudio.com/items?itemName=AzureDataFactory.MicrosoftAzureDataFactoryToolsforVisualStudio2015

Before we move on lets take a moment to say that Azure Data Factory configuration files are purely a Visual Studio feature. At publish time Visual Studio simply takes the config file content and replaces the actual JSON attribute values before deploying in Azure. That said, to be explicit.

  • An ADF JSON file with attribute values missing (because they come from config files) cannot be deployed using the Azure portal ‘Author and Deploy’ blade. This will just fail validation as missing content.
  • An ADF JSON config file cannot be deployed using the Azure portal ‘Author and Deploy’ blade. It is simply not understood by the browser based tool as a valid ADF schema.

Just as an aside. Code comments in ADF JSON files are also purely a Visual Studio feature. You can only comment your JSON in the usual way in Visual Studio, which at publish time will strip these out for you. Any comments left in code that you copy and paste into the Azure portal will return as syntax errors! I have already given feedback to the Microsoft product team that code comments in the portal blades would be really handy. But I digress.

Apologies in advance if I switch between the word publish and deploy too much. I mean the same thing. I prefer deploy, built in a Visual Studio ADF solution its called publish in the menu.

Creating a ADF Configuration File

First lets use the Visual Studio tools to create a common set of configuration files. In a new ADF project you have the familiar tree including Linked Services, Pipelines etc. Now right click on the project and choose Add > New Item. In the dialogue presented choose  Config and add a Configuration File to the project, with a suitable name.

I went to town and did a set of three 🙂

Each time you add a config file to your ADF project. Or any component for that matter. You’ll be aware that Visual Studio tries to help you out by giving you a JSON template or starter for what you might want. This is good, but in the case of ADF config files isn’t that intuitive. Hence this blog post. Lets move on.

Populating the Configuration File

Before we do anything let me attempt to put into word what we need to do here. Every JSON attribute has a reference of varying levels to get to its value.  When we recreate a value in our config file we need to recreate this reference path exactly from the top level of the component name. In the config file this goes as a parent (at the same level as schema) followed by square brackets [ ] which then contain the rest of the content we want to replace. Next within the square brackets of the component we need pairs of attributes (name and value). These represent the references to the actual component structure. In the ‘name’ value we start with a $. which represents the root of the component file. Then we build up the tree reference with a dot for each new level. Lastly, the value is as it says. Just the value to be used instead of whatever may be written in the actual component file.

Make sense? Referencing JSON with JSON? I said it wasn’t intuitive. Lets move on and see it.

Lets populate our configuration files with something useful. This of course greatly depends on what your data factory is doing as to what values you might want to switch between environments, but lets start with a few common attributes. For this example lets alter a pipelines schedule start, end and paused values. I always publish to dev as paused to give me more control over running the pipeline.

At the bottom of our pipeline component file I’ve done the following.

1
2
3
4
5
6
7
8
9
    //etc...
	//activities block
	],
    "start": "1900-01-01", /*<get from config file>*/
    "end": "1900-01-01", /*<get from config file>*/
    "isPaused": /*<get from config file>*/,
    "pipelineMode": "Scheduled"
  }
}

… Which means in my config file I need to create the equivalent set of attribute references and values. Note; the dollar for the root, then one level down into the properties namespace. Then another dot before the attribute.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
  "ExactNameOfYourPipeline": [ // <<< Component name. Exactly!
    {
      "name": "$.properties.isPaused",
      "value": true
    },
    {
      "name": "$.properties.start",
      "value": "2016-08-01"
    },
    {
      "name": "$.properties.end",
      "value": "2017-06-01"
    }
  ]
}

A great thing about this approach with ADF tools in Visual Studio is that any attribute value can be overridden with something from a config file. It’s really flexible and each component can be added in the same way regardless of type. There are however some quirks/features to be aware of, as below.

  • All parent and child name referencing within the config file must match its partner in the actual component JSON file exactly.
  • All referencing is case sensitive. But Visual Studio won’t validate this for you in intellisense or when building the project.
  • In the actual component file some attribute values can be left blank as they come from config. Others cannot and will result in the ADF project failing to build.
  • For any config referencing that fails. You’ll only figure this out when you publish and check the Azure portal to see that the JSON file in place has its original content. Fun.

Right then. Hope that’s all clear as mud 🙂

Publishing using Different Configurations

Publishing is basically the easy bit. Involving a wizard so I don’t need to say much here.

Right click on the project in Visual Studio and choose Publish. In the publish items panel of the wizard simply select the config file you want to use for the deployment.

I hope this post of helpful and saved you some time when developing with ADF.

Many thanks for reading.


Creating Azure Data Factory Custom Activities

When creating an Azure Data Factory (ADF) solution you’ll quickly find that currently it’s connectors are pretty limited to just other Azure services and the T within ETL (Extract, Transform, Load) is completely missing altogether. In these situations where other functionality is required we need to rely on the extensibility of Custom Activities. A Custom Activity allows the use of .Net programming within your ADF pipeline. However, getting such an activity setup can be tricky and requires a fair bit of messing about. In this post a hope to get you started with all the basic plumbing needed to use the ADF Custom Activity component.

Visual Studio

Firstly, we need to get the Azure Data Factory tools for Visual Studio, available via the below link. This makes the process of developing custom activities and ADF pipelines a little bit easier. Compared to doing all the development work in the Azure portal. But be warned, because this stuff is still fairly new there are some pain points/quirks to overcome which I’ll point out.

https://visualstudiogallery.msdn.microsoft.com/371a4cf9-0093-40fa-b7dd-be3c74f49005

Once you have this extension available in Visual Studio create yourself a new solution with 2x projects. Data Factory and a C# Class Library. You can of course use VB if you prefer.
vsdatafactoryproject

Azure Services

Next, like the Visual Studio section above this is really a set of prerequisites for making the ADF custom activity work. Assuming you already have an ADF service running in your Azure subscription you’ll also need:

  • Azure Batch Service (ABS) – this acts as the compute for your C# called by the ADF custom activity. absThe ABS is a strange service which you’ll find when you spin one up. Under the hood it’s basically a virtual machine requiring CPU, RAM and an Operating System. Which you have to choose when deploying it (Windows or Linux available). But none of the graphical interface is available to use in a typical way, no RDP access to the Windows server below. Instead you give the service a compute Pool, where you need to assign CPU cores. The pool in turn has Tasks created in it by the calling services. Sadly because ADF is just for orchestration we need this virtual machine style glue and compute layer to handle our compiled C#.
  • Azure Storage Account (ASC) – this is required to house your compiled C# in it’s binary .DLL form. Aascs you’ll see further down this actually gets zipped up as well with all it’s supporting packages. It would be nice if the ABS allowed access to the OS storage for this, but no such luck I’m afraid.

At this point, if your doing this for the first time you’ll probably be thinking the same as me… Why on earth do I need all this extra engineering? What are these additional services going to cost? And, why can I not simply inline my C# in the ADF JSON pipeline and get it to handle the execution?

Well, I have voiced these very questions to the Microsoft Azure Research team and the Microsoft Tiger Team. The only rational answer is to keep ADF as a dum orchestrator that simply runs other services. Which is fine if it didn’t need this extensibility to do such simple things. This then leads into the argument about ADF being designed for data transformation. Should it just be for E and L, not T?

Let’s bottle up these frustrations for another day before this blog post turns into a rant!

C# Class Library

Moving on. Now for those of you that have ever read my posts before you’ll know that I don’t claim to be a C# expert. Well today is no exception! Expect fluffy descriptions in the next bit 🙂

First in your class project lets add the NuGet packages and references you’ll need for the library project to work with ADF. Using the Package Manager Console (Visual Studio > Tools > NuGet Package Manager > Package Manager Console) run the following installation lines to add all your required references.

Install-Package Microsoft.Azure.Management.DataFactories
Install-Package Azure.Storage

Next the fun bit. Whatever class name you decide to use it will need to inherit from IDotNetActivity which is the interface used at runtime by ADF. Then within the your new class you need to create an IDictionary method called Execute. It is this method that will be ran by the ABS when called from ADF.

Within the IDictionary method. Extended properties and details about the datasets and services on each side of the custom activity pipeline can be accessed. Here is the minimum of what you’ll need to connect the dots between ADF and your C#.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
using System;
using System.Collections.Generic;
using System.Linq;
 
using Microsoft.Azure;
using Microsoft.Azure.Management.DataFactories.Models;
using Microsoft.Azure.Management.DataFactories.Runtime;
 
namespace ClassLibrary1
{
    public class Class1 : IDotNetActivity
    {
        public IDictionary&lt;string, string&gt; Execute(
                IEnumerable linkedServices,
                IEnumerable datasets,
                Activity activity,
                IActivityLogger logger)
        {
            logger.Write("Start");
 
            //Get extended properties
            DotNetActivity dotNetActivityPipeline = (DotNetActivity)activity.TypeProperties;
 
            string sliceStartString = dotNetActivityPipeline.ExtendedProperties["SliceStart"];
 
            //Get linked service details
            Dataset inputDataset = datasets.Single(dataset =&gt; dataset.Name == activity.Inputs.Single().Name);
            Dataset outputDataset = datasets.Single(dataset =&gt; dataset.Name == activity.Outputs.Single().Name);
 
            /*
                DO STUFF
            */
 
            logger.Write("End");
 
            return new Dictionary&lt;string, string&gt;();
        }
    }
}

How you use the declared datasets will greatly depend on the linked services you have in and out of the pipeline. You’ll notice that I’ve also called the IActivityLogger using the write method to make user log entries. I’ll show you where this gets written to later from the Azure portal.

adfreferencetoclassesI appreciate that the above code block isn’t actually doing anything and that it’s probably just raised another load of questions. Patience, more blog posts are coming! Depending on what other Azure services you want your C# class to use next we’ll have to think about registering it as an Azure app so the compiled program can authenticate against other components. Sorry, but that’s for another time.

The last and most important thing to do here is add a reference to the C# class library in your ADF project. This is critical for a smooth deployment of the solution and complied C#.

Data Factory

Within your new or existing ADF project you’ll need to add a couple of things, specifically for the custom activity. I’m going to assume you have some datasets/data tables defined for the pipeline input and output.

Linked services first, corresponding to the above and what you should now have deployed in the Azure portal;

  • Azure Batch Linked Service – I would like to say that when presented with the JSON template for the ABS that filling in the gaps is pretty intuitive for even the most none technical peeps amongst us. However the names and descriptions are wrong within the typeProperties component! Here’s my version below with the corrections and elaborations on the standard Visual Studio template. Please extend your sympathies for the pain it took me to figure out where the values don’t match the attribute tags!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
{
  "$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/
              Microsoft.DataFactory.LinkedService.json",
    "name": "AzureBatchLinkedService1",
    "properties": {
        "type": "AzureBatch",
      "typeProperties": {
        "accountName": "<Azure Batch account name>",
        //Fine - get it from the portal, under service properties.
 
        "accessKey": "<Azure Batch account key>",
        //Fine -  get it from the portal, under service properties.
 
        "poolName": "<Azure Batch pool name>",
        //WRONG - this actually needs to be the pool ID
        //that you defined when you deployed the service.
        //Using the Pool Name will error during deployment.
 
        "batchUri": "<Azure Batch uri>",
        //PARTLY WRONG - this does need to be the full URI that you
        //get from the portal. You need to exclude the batch
        //account name. So just something like https://northeurope.batch.azure.com
        //depending on your region.
        //With the full URI you'll get a message that the service can't be found!
 
        "linkedServiceName": "<Specify associated storage linked service reference here>"
        //Fine - as defined in your Data Factory. Not the storage
        //account name from the portal.
      }
    }
}
  • Azure Storage Linked Service – the JSON template here is ok to trust. It only requires the connection string for your blob store which can be retrieved from the Azure Portal and inserted in full. Nice simple authentication.

Once we have the linked services in place lets add the pipeline. Its worth noting that by pipeline I mean the ADF component that houses our activities. A pipeline is not the entire ADF end to end solution in this context. Many people do use it as a broad term for all ADF things incorrectly.

  • Dot Net Activity – here we need to give ADF all the bits it needs to go away and execute our C#. Which is again defined in the typeProperties. Below is a JSON snippet of just the typeProperties block that I’ve commented on to go into more detail about each attribute.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
"typeProperties": {
  "assemblyName": "<Name of the output DLL to be used by the activity. e.g: MyDotNetActivity.dll>",
  //Once your C# class library has been built the DLL name will come from the name as the
  //project in Visual Studio by default. You can also change this in the project properties
  //if you wish.
 
  "entryPoint": "<Namespace and name of the class that implements the IDotNetActivity interface e.g: MyDotNetActivityNS.MyDotNetActivity>",
  //This needs to include the namespace as well as the class. Which is what the default is
  //alluding to where the dot separation is used. Typically your namespace will be inheritated
  //from the project default. You might override this to be the CS filename though so be careful.
 
  "packageLinkedService": "<Name of the linked service that refers to the blob that contains the zip file>",
  //Just to the clear. Your storage account linked service name.
 
  "packageFile": "<Location and name of the zip file that was uploaded to the Azure blob storage e.g: customactivitycontainer/MyDotNetActivity.zip>"
  //Here's the ZIP file. If you haven't already you'll need to create a container in your
  //storage account under blobs. Reference that here. The ZIP filename will be the same
  //as the DDL file name. Don't worry about where the ZIP files gets created just yet.
}

adfsolutionwithcsharpBy now you should have a solution that looks something like the solution explorer panel on the right. In mine I’ve kept all the default naming conventions for ease of understanding.

Deployment Time

If you have all the glue in place you can now right click on your ADF project and select Publish. This launches a wizard which takes you through the deployment process. Again I’ve made an assumption here that you are logged into Visual Studio with the correct credentials for your Azure subscription. The wizard will guide you through where the ADF project is going to be deployed, it will also validate the JSON content before sending it up and it will also detect if files in the target ADF service can be deleted.

With the reference in place to the C# class library the deployment wizard will detect the project dependency and zip up the compiled DLLs from your bin folder and upload them into the blob storage linked service referenced in the activity pipeline.

Sadly there is no local testing available for this lot and we just have to develop by trial/deploy/run and error.

 

 

 

 

Runtime

adfmonandmanageTo help with debugging from the portal if you go to the ADF Monitor & Manage area you should have your pipeline displayed. Clicking on the custom activity block will reveal the log files in the right hand panel. The first is the default system stack trace and the other is anything written out by the C# logger.Write call(s). These will become your new best friend when trying to figure out what isn’t working.

Of course you don’t need to perform a full publish of the ADF project every time if your only developing the C# code. Simply build the solution and upload a new ZIP file to your blob storage account using something like Microsoft Azure Storage Explorer. Then rerun the time slice for the output dataset.
adfmonitoring
If nothing appears to be happening you may also want to check on your ABS to ensure tasks are being created from ADF. If you haven’t assigned the compute pool any CPU cores it will just sit there and your ADF pipeline activity will time out with no errors and no clues as to what might have gone wrong. Trust me, I’ve been there too.
azurebatchmon
I hope this post was helpful and gave you a steer as to the requirements for extending your existing ADF solutions with .Net activity.

Many thanks for reading.

Windows IoT UWP Tasks Tickers and Threading

Upon entering the IoT revolution a few things immediately became apparent;

  • We now had the ability to collect and handle more sensor data than we’d ever before possibly conceived.
  • These sensors and there data will/are going to change the very way we live… Not convinced by this one? Talk to anybody that has a Fitbit attached to their wrist about how many steps they do each day!
  • More relevant to this article. The procedural tools we know and love like T-SQL and PowerShell are no longer going to be enough to deliver these new world real-time data requirements.

Moving on from the philosophy lets focus on my third point and the acceptance that we need to enter the realm of object orientated programming (OOP), specifically in the article C# .Net. Now I will state from the outset that I still consider myself to be a C# novice, but through IoT development the learning continues thick and fast. Although if you ask me directly to explain polymorphism I’ll still be running for Google 🙂

For procedural people already working with Microsoft products this learning and development can be fairly nice and sometimes even natural. However I expect people already involved in hard-core OOP software development and not very procedural this might seem a little backwards or just very obvious. Just a speculation at this point. At the moment I’m the former and if your reading this I hope you are too.

So why do we need OOP  for our data? What’s your point Paul?

Well being a Microsoft aligned person more and more I find myself working on Windows 10 IoT Core with C# based on the Universal Windows Platform (UWP) framework to develop apps and drive my sensors collecting that very precious data. For those of you that haven’t encountered the UWP concept yet I recommend visiting these Microsoft pages: https://msdn.microsoft.com/en-gb/windows/uwp/get-started/whats-a-uwp

Assuming you are familiar with a little UWP dev lets continue and dive straight into the first problem you’ll encounter, or may have already encountered.

Threading

In reverse order to my article title I know, but threading is basically the issue that we first need to work around when developing an IoT application. The UWP framework is great and very flexible however it only offers a cut down version of the full fat .Net library (at present). Or to be more accurate when working with a UWP solution the number of SDK’s available in your references will be very limited compared to what you might normally see.

UWPRefManager

This limit includes the well known System.Threading and classes like the following example from MSDN.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
using System;
using System.Threading;
 
public class MonitorSample
{
   public static void Main(String[] args)
   {
      int result = 0;
      Cell cell = new Cell( );
 
      CellProd prod = new CellProd(cell, 20);
      CellCons cons = new CellCons(cell, 20);
 
      Thread producer = new Thread(new ThreadStart(prod.ThreadRun));
      Thread consumer = new Thread(new ThreadStart(cons.ThreadRun));
 
         producer.Start( );
         consumer.Start( );
 
         producer.Join( );
         consumer.Join( );  
 
      Environment.ExitCode = result;
   }
}

Threading is simply not available on the Universal Windows Platform.

Tasks

Enter our new friends async and await tasks or asynchronous programming.

1
2
using System;
using System.Threading.Tasks;

Now I’m not even going to try and give you a lesson on C# as I’d probably just embarrass myself, so instead I will again direct your attention to following MSDN pages:

https://msdn.microsoft.com/en-us/library/mt674882.aspx

However what I will do is try and to give you some context for using this new “threading” none blocking concept within your UWP IoT application. The example I like to call on is very simple. You have an IoT sensor device that needs to do two things:

  1. Send JSON messages containing your data to an Azure IoT Event Hub (Device to Cloud)
  2. Receive messages containing device management instructions (Cloud to Device)

These two fundamental bits of functionality have to happen asynchronously. We can’t be waiting around to send messages because we are working on what has just been received. To handle this we need something like the following example at the core of our UWP app.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
namespace IoTSensor
{
    public sealed partial class MainPage : Page
    {
        private MainViewModel doTo;
        public MainPage()
        {
            this.InitializeComponent();
            doTo = this.DataContext as MainViewModel;
 
            Loaded += async (sender, args) =&gt;
            {
                await doTo.SendDeviceToCloudMessagesAsync();
                await doTo.ReceiveCloudToDeviceMessageAsync();
            };
 
        }
    }
}

Now both send and receive can occur without any blocking behaviour.

Tickers

Lastly lets think about tickers created using something like DispatcherTimer(). The good old fashioned clock cycle if you prefer.

We might need a ticker to cycle/iterate over a block of code that is doing something with our IoT sensors. For example if you wanted to collect a temperature reading every 10 seconds. Using an async task with a ticker would be the way to achieve that. For example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
namespace IoTSensor
{
    using System;
    using System.Threading.Tasks;
    using Windows.UI.Xaml;
    using Windows.UI.Xaml.Controls;
    using GHIElectronics.UWP.Shields;
 
    public sealed partial class MainPage : Page
    {
        FEZHAT hat; //THANKS https://www.ghielectronics.com/
        DispatcherTimer sensorCollection;
 
        public MainPage()
        {
            this.InitializeComponent();
        }
 
        private async Task SetupDevice()
        {
            this.hat = await FEZHAT.CreateAsync();
 
            this.sensorCollection = new DispatcherTimer();
            this.sensorCollection.Interval = TimeSpan.FromSeconds(10);
            this.sensorCollection.Tick += this.sensorCollection_Tick;
            this.sensorCollection.Start();
        }
 
        private void sensorCollection_Tick(object sender, object e)
        {
            //Get values and send to cloud etc...
        }
 
        private async void Page_Loaded(object sender, RoutedEventArgs e)
        {
            await SetupDevice();
        }
    }
}

I do hope this high level article has been of some use. I will attempt to follow up with a more deep dive look at the above once I’ve slept on the concepts and forced myself to leave the beloved SQL behind for another couple of weeks while we voyage every further into the Internet of Things!

Many thanks for reading


Azure Virual Machine CPU Cores Quota

Did you know that by default your Azure subscription has a limit on the number of CPU cores you can consume for virtual machines? Until recently I didn’t. Maybe like you I’d only ever created one of two small virtual machines (VMs) within my MSDN subscription for testing and always ended up deleting them before coming close to the limit.

Per Azure subscription the default quota for virtual machine CPU cores is 20.

To clarify because the relationship between virtual and physical CPU cores can get a little confusing. This initial limit is directly related to what you see in the Azure portal when sizing your VM during creation.

AzureBasicVMsFor example, using the range of basic virtual machines to the right you would hit your default limit with;

  • 10x A2 Basic’s
  • 5x A3 Basic’s
  • 2.5x A4 Basic’s (if it were possible to have 0.5 of a VM)

Fortunately this initial quota is only a soft limit and easily lifted for your production Azure subscriptions.

Before we look at how to solve this problem it’s worth learning how to recognise when you’ve hit the VM CPU core limit, as its not always obvious.

Limit Symptoms Lesson 1

When creating a new VM you might find the some of the larger size options are greyed out without any obvious reason why. This will be because consumed cores + new machine size cores will be greater than your current subscription limit. Therefore ‘Not Available’. Example below.
AzureVMSizesGrey

The other important quirk here is that the size blade will only dynamically alter its colourations and availability state if the CPU cores are currently being used in VMs that are running. If you have already reached your limit, but the VMs are stopped and appearing as de-allocated resources, you will still be able to proceed and deploy another VM exceeding your limit. This becomes clearer in lesson 2 below where the failure occurs.

Limit Symptoms Lesson 2

AzureVMCreateNormErrorIf you are unlucky enough to have exceeded your limit with de-allocated resource and you were able to get past the VM size selection without issue your new VM will now be deploying on your Azure dashboard if pinned. All good right?… Wrong! You’ll then hit this deployment failure alert from the lovely notifications bell. Example on the right.

Note; I took this screen shot from a different Azure subscription where I’d already increased my quota to 40x cores.

This could occur in the following scenario.

  • You have 2x A4 Basic virtual machines already created. 1x is running and 1x is stopped.
  • The stopped VM meant you were able to proceed in creating a third A4 Basic VM.
  • During deployment the Azure portal has now done its sums of all resources covering both stopped and running VM’s.
  • 8x cores on running VM + 8x cores on stopped VM + 8x cores on newly deployed VM. Total 24.
  • This has exceeded your limit by 4x cores.
  • Deployment failed.

In short; this is the difference in behaviour between validation of your VM at creation time vs validation of your VM at deployment time. AKA a feature!

Limit Symptoms Lesson 3

AzureVMCreateWithJSONError2If deploying VMs using a JSON template you will probably be abstracted away from the size and cores consumed by the new VM because in default template this is just hardcodes into the relevant JSON attribute.

Upon clicking ‘Create’ on the portal blade you will be presented with an error similar to the example on the right. This is of course a little more obvious compared to lesson 1 and more helpful than lesson 2 in the sense that the deployment hasn’t been started yet. But still this doesn’t really give you much in terms of a solution, unless you are already aware of the default quota.

Apparently my storage account name wasn’t correct when I took this screen shot either. Maybe another blob post required here covered where the Azure portal is case sensitive and where it isn’t! Moving on.

 

 The Solution

As promised, the solution to all the frustration you’ve encountered above.

AzureHelpAndSupportTo increase the number of CPU cores available to your Azure subscription you will need to raise a support ticket with the Azure help desk… Don’t sigh!… I assure this is not as troublesome as you might think.

Within the Help and Support section of your Azure portal there are a series of predefined menus to do exactly this. Help and Support will be on your dashboard by default, or available via the far left hand root menu.

Within the Help and Support blade click to add a New Support Request. Then follow the prompts selecting the Issue Type as ‘Quota’, your affected Subscription and the Quota Type as ‘Cores per Subscription’.

AzureIncreaseCPUQuota

Once submitted a friendly Microsft human will review and approve the request to make sure it’s reasonable…. Requesting 1000 extra CPU cores might get rejected! For me requesting an additional 30 cores took only hours to get approved and made available. Not 2 – 4 business days as the expectation managing auto reply would have you believe.

Of course I can’t promise this will always happen as quickly so my advise would be; know your current limits and allow for time to change them if you need to scale your production systems.

I hope this post saved you the time I lost when creating 17x VMs for a community training session.

Many thanks for reading.


Paul’s Frog Blog

Paul is a Microsoft Data Platform MVP with 10+ years’ experience working with the complete on premises SQL Server stack in a variety of roles and industries. Now as the Business Intelligence Consultant at Purple Frog Systems has turned his keyboard to big data solutions in the Microsoft cloud. Specialising in Azure Data Lake Analytics, Azure Data Factory, Azure Stream Analytics, Event Hubs and IoT. Paul is also a STEM Ambassador for the networking education in schools’ programme, PASS chapter leader for the Microsoft Data Platform Group – Birmingham, SQL Bits, SQL Relay, SQL Saturday speaker and helper. Currently the Stack Overflow top user for Azure Data Factory. As well as very active member of the technical community.
Thanks for visiting.
@mrpaulandrew