Did you know it’s now possible to RDP to your Azure Batch Service compute nodes?
I’ve used the batch service to handle the compute for my Azure Data Factory custom activities for a while now. Which I’ve basically been doing blindly because the code execution and logging is provided to ADF, with no visibility to the underlying pool of VM’s doing the work. Well, no more is this the case!
In the Azure portal go to your Batch Service > Pools > Select Pool > Nodes > Select Node > Connect.
The connect button then presents you with the option to add a new user before telling you the external IP with an RDP file.
Once you’ve connected you’ll find a virtual machine, but with a few slight differences.
- The OS is on the D drive, rather than C.
- The amount of storage probably won’t match what you requested when you created the VM compute pool (that’s for another post).
- The VM has a bunch of special environment variables that you’ll want to use for any jobs being ran. More info on these here: https://docs.microsoft.com/en-us/azure/batch/batch-compute-node-environment-variables
The directory on the VM used for any ADF custom activities will be something like the following path:
C:\user\tasks\workitems\adf-{guid}\job-0000000001\{guid}-{activityname}-\wd\
I hope this was helpful when you go beyond the basics of Creating Azure Data Factory Custom Activities
Many thanks