Azure Data Lake Authentication from Azure Data Factory

rereTo set the scene for the title of this blog post lets firstly think about other services within Azure. You’ll probably already know that most services deployed require authentication via some form of connection string and generated key. These keys can be granted various levels of access and also recycled as required, for example an IoT Event Hub seen below (my favourite service to play with).

levelskeysandconnectionstrings

Then we have other services like SQLDB that require user credentials to authenticate as we would expect from the on premises version of the product. Finally we have a few other Azure services that handle authentication in a very different way altogether requiring both user credentials initially and then giving us session and token keys to be used by callers. These session and token keys are a lot more fragile than connection strings and can expire or become invalid if the calling service gets rebuilt or redeployed.

In this blog post I’ll explore and demonstrate specifically how we handle session and token based authentication for Azure Data Lake (ADL), firstly when calling it as a Linked Service from Azure Data Factory (ADF), then secondly within ADF custom activities. The latter of these two ADF based operations becomes a little more difficult because the .Net code created and compiled is unfortunately treated as a distant relative to ADF requiring its own authentication to ADL storage as an Azure Application. To further clarify a Custom Activity in ADF does not inherit its authorising credentials from the parent Linked Service, it is responsible for its own session/token. Why? Because as you may know from reading my previous blog post; Custom Activates get complied and executed by an Azure Batch Service. Meaning the compute for the .Net code is very much detached from ADF.

At this point I should mention that this applies to Data Lake Analytics and Data Lake Storage. Both require the same approach to authentication.

Data Lake as a Service Within Data Factory

adf-author-and-deploy-buttonThe easy one first, adding an Azure Data Lake service to your Data Factory pipeline. From the Azure portal within the ADF Author and Deploy blade you simply add a new Data Lake Linked Service which returns a JSON template for the operation into the right hand panel. Then we kindly get provided with an Authorize button (spelt wrong) at the top of the code block.

Clicking this will pop up with the standard Microsoft login screen requesting work or personal user details etc. Upon competition or successful authentication your Azure subscription will be inspected. If more than one applicable service exists, you’ll of course need to select which you require authorisation for. But once done you’ll return to the JSON template now with a completed Authorization value and SessionId.

adf-adl-json-templateJust for information and to give you some idea of the differences in this type of authorisation compared to other Azure services. When I performed this task for the purpose of creating screen shots in this post the resulting Authorization URL was 1219 characters long and the returned SessionId was 1100! Or half a page of a standard Word document each. By comparison an IoT Hub key is only 44 characters. Furthermore, the two values are customised to the service that requested them and can only be used within the context where they were created.

For completeness, because we can also now develop ADF pipelines from Visual Studio it’s worth knowing that a similar operation is now available as part of the Data Factory extension. In Visual Studio within your ADF project on the Linked Service branch you are able to Right Click > Add > New Item and choose Data Lake Store or Analytics. You’ll then be taken through a wizard (similar in look to that of the ADF deployment wizard) which requests user details, the ADF context and returns the same JSON template with populated authorising values.

vs-adf-adl-addservice

A couple of follow up tips and lessons learnt here:

  • If you tell Visual Studio to reverse engineer your ADF pipeline from a current Azure deployed factory where an existing ADL token and session ID are available. These will not be brought into Visual Studio and you’ll need to authorise the service again.
  • If you copy an ADL JSON template from the Azure portal ‘Author and Deploy’ area Visual Studio will not popup the wizard to authorise the service and you’ll need to do it again.
  • If you delete the ADL Linked Service within the portal ‘Author and Deploy’ area. The same Linked Service tokens in Visual Studio will become invalid and you’ll need to authorise the service again.
  • If you sneeze to loudly while Visual Studio is open you’ll need to authorise the service again.

Do you get the idea when I said earlier that the authorisation method is fragile? Very sophisticated, but fragile when chopping and changes things during development.

What you may find yourself doing fairly frequently is:

  1. Deploying an ADF project from Visual Studio.
  2. The deployment wizard failing telling you the ADL tokens have expired or are no longer authorised.
  3. Adding a new Linked Service to the project just to get the user authentication wizard popup.
  4. Then copying the new token and session values into the existing ADL Linked Service JSON file.
  5. Then excluding the new services you created just to re-authorise from the Visual Studio project.

Fun! Moving on.

Update: you can use an Azure AD service principal to authenticate both Azure Data Lake Store and Azure Data Lake Analytics services from ADF. Details are included in this post: https://docs.microsoft.com/en-gb/azure/data-factory/v1/data-factory-azure-datalake-connector#azure-data-lake-store-linked-service-properties

Data Factory Custom Activity Call Data Lake

Next the slightly more difficult way to authenticate against ADL, using an ADF .Net Custom Activity. As mentioned previously the .Net code once sent to Azure as a DLL is treated as a third party application requiring its own credentials.

The easiest way I’ve found to getting this working is firstly to use PowerShell to register the application in Azure which using the correct CMDLets returns an application GUID and password which when combined give the .Net code its credentials. Here’s the PowerShell you’ll need below. Be sure you run this with elevated permissions locally.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Sign in to Azure.
Add-AzureRmAccount
 
#Set this variables
$appName = "SomeNameThatYouWillRegoniseInThePortal"
$uri = "AValidURIAlthoughNotApplicableForThis"
$secret = "SomePasswordForTheApplication"
 
# Create a AAD app
$azureAdApplication = New-AzureRmADApplication `
    -DisplayName $appName `
    -HomePage $Uri `
    -IdentifierUris $Uri `
    -Password $secret
 
# Create a Service Principal for the app
$svcprincipal = New-AzureRmADServicePrincipal -ApplicationId $azureAdApplication.ApplicationId
 
# To avoid a PrincipalNotFound error, I pause here for 15 seconds.
Start-Sleep -s 15
 
# If you still get a PrincipalNotFound error, then rerun the following until successful. 
$roleassignment = New-AzureRmRoleAssignment `
    -RoleDefinitionName Contributor `
    -ServicePrincipalName $azureAdApplication.ApplicationId.Guid
 
# The stuff you want:
 
Write-Output "Copy these values into the C# sample app"
 
Write-Output "_subscriptionId:" (Get-AzureRmContext).Subscription.SubscriptionId
Write-Output "_tenantId:" (Get-AzureRmContext).Tenant.TenantId
Write-Output "_applicationId:" $azureAdApplication.ApplicationId.Guid
Write-Output "_applicationSecret:" $secret
Write-Output "_environmentName:" (Get-AzureRmContext).Environment.Name

My recommendation here is to take the returned values and store that in something like the Class Library settings, available from the Visual Studio project properties. Don’t store them as constants at the top of your Class as its highly likely you’ll need them multiple times.

Next, what to do with the application GUID etc. Well in your Custom Activity C# will need something like the following. Apologies for dumping massive code blocks into this post, but you will need all of this in your Class if you want to use details from your ADF service and work with ADL files.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
class SomeCustomActivity : IDotNetActivity
{
	//Get credentials for app
	string domainName = Settings.Default.AzureDomainName;
	string appId = Settings.Default.ExcelExtractorAppId; //From PowerShell <<<<<
	string appPass = Settings.Default.ExceExtractorAppPass; //From PowerShell <<<<<
	string appName = Settings.Default.ExceExtractorAppName; //From PowerShell <<<<<
 
	private static DataLakeStoreFileSystemManagementClient adlsFileSystemClient;
	//and or:
	private static DataLakeStoreAccountManagementClient adlsAccountManagerClient;
 
	public IDictionary<string, string> Execute(
		IEnumerable linkedServices,
		IEnumerable datasets,
		Activity activity,
		IActivityLogger logger)
	{
		//Get linked service details from Data Factory
		Dataset inputDataset = new Dataset();
		inputDataset = datasets.Single(dataset => 
			dataset.Name == activity.Inputs.Single().Name);
 
		AzureDataLakeStoreLinkedService inputLinkedService;
 
		inputLinkedService = linkedServices.First(
			linkedService =>
			linkedService.Name ==
			inputDataset.Properties.LinkedServiceName).Properties.TypeProperties
			as AzureDataLakeStoreLinkedService;
 
		//Get account name for data lake and create credentials for app
		var creds = AuthenticateAzure(domainName, appId, appPass);
		string accountName = inputLinkedService.AccountName;
 
		//Authorise new instance of Data Lake Store
		adlsFileSystemClient = new DataLakeStoreFileSystemManagementClient(creds);
 
		/*
			DO STUFF...
 
			using (Stream input = adlsFileSystemClient.FileSystem.Open
				(accountName, completeInputPath)
				)
		*/	
 
 
		return new Dictionary<string, string>();
	}
 
 
	private static ServiceClientCredentials AuthenticateAzure
		(string domainName, string clientID, string clientSecret)
	{
		SynchronizationContext.SetSynchronizationContext(new SynchronizationContext());
 
		var clientCredential = new ClientCredential(clientID, clientSecret);
		return ApplicationTokenProvider.LoginSilentAsync(domainName, clientCredential).Result;
	}
}

Finally, before you execute anything be sure to grant the Azure app permissions to the respective Data Lake service. In the case of the Data Lake Store. From the portal you can use the Data Explorer blades to assign folder permissions.

adl-grant-permissions

I really hope this post has saved you some time in figuring out how to authorise Data Lake services from Data Factory. Especially when developing beyond what the ADF Copy Wizard gives you.

Many thanks for reading.


17 Responses to Azure Data Lake Authentication from Azure Data Factory

  • You got a typo “to reserve engineer” 🙂

  • hi Paul,
    Can ADL and ADF authorize using servicePrincipalId/servicePrincipalKey?

    • Hi Alexander, thanks for your comment. Yes, it can. But there are a few issues with using a service principal and certain ADF activities. For example it still does work for custom activities, hence not updating this post yet. Check Stack Overflow for more details of the same. Thanks

  • hello Paul,

    I am getting exception message when i try to add AzureDatelakeAnalyticLinkedService using Service Authentication in VS

    {
    “name”: “AzureDataLakeAnalyticsLinkedService”,
    “properties”: {
    “type”: “AzureDataLakeAnalytics”,
    “typeProperties”: {
    “accountName”: “adftestaccount”,
    “dataLakeAnalyticsUri”: “azuredatalakeanalytics.net”,
    “servicePrincipalId”: “”,
    “servicePrincipalKey”: “”,
    “tenant”: “”,
    “subscriptionId”: “”,
    “resourceGroupName”: “”
    }
    }
    }

    Can you please advise if there is any step which i am missing

  • Is Service Principal Authentication available for ADLA linked service?

    • Yes it sure is, I personally forced the support ticket through for both data lake services covering all ADF activities because originally the scope was limited. Thanks

      • Please advise how can we do this…. where i can find the AzureDataLakeAnalyticsLinkedService template to use Service principal authentication. I am not able to find it through VS and if i make changes in the template like adding “servicePrincipalId”: “”,
        “servicePrincipalKey”: “”,

        Its throwing me error message. it seemsADLA does not support Service principal authentication

        • Hi Gaurav, use a data factory project in Visual Studio to get a JSON template or via the Azure portal Author and Deploy section of the service. If you are still having issues please email sales@purplefrogsystems.com and we can arrange a Purple Frog consultant to support your project. Thanks Paul

  • In Visual studio there is not such A zureDataLakeAnalyticsLinkedService template to use Service principal authentication.There is only AzureDataLakeAnalyticsLinkedService template to support User credential authentication. if i change it to use Service principal authentication its not supporting. I have tried the same in Visual studio and Azure Portal both.

    • Hi Gaurav, I’ve tried to email you, however the Gmail address provided is invalid. What is your email address please? Thank Paul

      • Hi Paul, I’m facing same issue as what Gaurav has faced.
        Like when I modify the ADLA Linked service template (in Azure portal) to use the Service principal authentication as per, https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-usql-activity#azure-data-lake-analytics-linked-service , its not working. The linked service is still forcing to have User credential authorization parameters. Were you able to resolve it? if yes, can you help me here. My email id is: srinivas.tippani@gmail.com.
        Thanking in anticipation.

        Srinivas Tippani

    • Hi Gaurav, I’m facing exactly same issue. Like when I modify the ADLA Linked service template to use the Service principal authentication as per, https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-usql-activity#azure-data-lake-analytics-linked-service , its not working. The linked service is still forcing to have User credential authorization parameters. Were you able to resolve it? if yes, can you help me here. Thanking in anticipation.

      Srinivas Tippani

  • Sorry..

    My email id is er.gaurav30sharma@gmail.com

    Regards,
    Raurav sharma

  • Please suggest any book or method for azure data lake store and analytic to extract data from XML file with complex nesting into tabular form ready to store in SQL database

    • Hi Aasish, thanks for the comment. The problem with books on topics like Azure services is by the time the book gets published the content will probably be out of date! I suggest learning from more frequently updated online resources. Cheers Paul

Leave a Reply to Alexander Batishchev Cancel reply

Your email address will not be published. Required fields are marked *

HTML tags are not allowed.

2,999 Spambots Blocked by Simple Comments

Paul’s Frog Blog

Paul is a Microsoft Data Platform MVP with 10+ years’ experience working with the complete on premises SQL Server stack in a variety of roles and industries. Now as the Business Intelligence Consultant at Purple Frog Systems has turned his keyboard to big data solutions in the Microsoft cloud. Specialising in Azure Data Lake Analytics, Azure Data Factory, Azure Stream Analytics, Event Hubs and IoT. Paul is also a STEM Ambassador for the networking education in schools’ programme, PASS chapter leader for the Microsoft Data Platform Group – Birmingham, SQL Bits, SQL Relay, SQL Saturday speaker and helper. Currently the Stack Overflow top user for Azure Data Factory. As well as very active member of the technical community.
Thanks for visiting.
@mrpaulandrew