PowerShell

Recursive U-SQL With PowerShell (U-SQL Looping)

In its natural form U-SQL does not support recursive operations and for good reason. This is a big data, scale out, declarative language where the inclusion of procedural, iterative code would be very unnatural. That said, if you must pervert things PowerShell can assist with the looping and dare I say the possibility for dynamic U-SQL.

A couple of caveats…

  • From the outset, I accept this abstraction with PowerShell to achieve some iterative process in U-SQL is a bit of a hack and very inefficient, certainly in the below example.
  • The scenario here is not perfect and created using a simple situation for the purposes of explanation only. I know the same data results could be achieved just by extending the aggregate grouping!

Hopefully that sets the scene. As I’m writing, I’m wondering if this blog post will be hated by the purists out there. Or loved by the abstraction hackers. I think I’m both 🙂

Scenario

As with most of my U-SQL blog posts I’m going to start with the Visual Studio project available as part of the data lake tools extension called ‘U-SQL Sample Application’. This gives us all the basic start up code and sample data to get going.

Input: within the solution (Samples > Data > Ambulance Data) we have some CSV files for vehicles. These are separated into 16 sources datasets covering 4 different vehicle ID’s across 4 days.

Output: let’s say we have a requirement to find out the average speed of each vehicle per day. Easily enough with a U-SQL wildcard on the extractor. But we also want to output a single file for each day of data. Not so easy, unless we write 1 query for each day of data. Fine with samples only covering 4 days, not so fine with 2 years of records split by vehicle.

Scenario set, lets look at how we might do this.

The U-SQL

To produce the required daily outputs I’m going to use a U-SQL query in isolation to return a distinct list of dates across the 16 input datasets, plus a parameterised U-SQL stored procedure to do the aggregation and output a single day of data.

First getting the dates. The below simply returns a text file containing a distinct list of all the dates in our source data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
DECLARE @InputPath string = "/Samples/Data/AmbulanceData/{filename}";
 
@DATA =
    EXTRACT 
        [vehicle_id] INT,
        [entry_id] long,
        [event_date] DateTime,
        [latitude] FLOAT,
        [longitude] FLOAT,
        [speed] INT,
        [direction] string,
        [trip_id] INT?,
        [filename] string
    FROM 
        @InputPath
    USING 
        Extractors.Csv();
 
@DateList =
    SELECT DISTINCT 
        [event_date].ToString("yyyyMMdd") AS EventDateList
    FROM 
        @DATA;
 
OUTPUT @DateList
TO "/output/AmbulanceDataDateList.txt"
USING Outputters.Csv(quoting : FALSE, outputHeader : FALSE);

Next, the below stored procedure. This uses the same input files, but does the required aggregation and outputs a daily file matching the parameter passed giving us a sinlge output.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
CREATE PROCEDURE IF NOT EXISTS [dbo].[usp_OutputDailyAvgSpeed]
    (
    @OutputDate string
    )
AS
BEGIN
 
    //DECLARE @OutputDate string = "20140914"; //FOR dev
    DECLARE @InputPath string = "/Samples/Data/AmbulanceData/{filename}";
    DECLARE @OutputPath string = "/output/DailyRecords/VehicleAvgSpeed_" + @OutputDate + ".csv";
 
    @DATA =
        EXTRACT 
            [vehicle_id] INT,
            [entry_id] long,
            [event_date] DateTime,
            [latitude] FLOAT,
            [longitude] FLOAT,
            [speed] INT,
            [direction] string,
            [trip_id] INT?,
            [filename] string
        FROM 
            @InputPath
        USING 
            Extractors.Csv();
 
    @VAvgSpeed =
        SELECT DISTINCT 
            [vehicle_id],
            AVG([speed]) AS AverageSpeed
        FROM 
            @DATA
        WHERE
            [event_date].ToString("yyyyMMdd") == @OutputDate
        GROUP BY
            [vehicle_id];
 
    OUTPUT @VAvgSpeed
    TO @OutputPath
    USING Outputters.Csv(quoting : TRUE, outputHeader : TRUE);
 
END;

At this point, we could just execute the stored procedures with each required date, manually crafted from the text file. Like this:

1
2
3
4
[dbo].[usp_OutputDailyAvgSpeed]("20140914");
[dbo].[usp_OutputDailyAvgSpeed]("20140915");
[dbo].[usp_OutputDailyAvgSpeed]("20140916");
[dbo].[usp_OutputDailyAvgSpeed]("20140917");

Fine, for small amounts of data, but we can do better for larger datasets.

Enter PowerShell and some looping.

The PowerShell

As with all things Microsoft PowerShell is our friend and the supporting cmdlets for the Azure Data Lake services are no exception. I recommend these links if you haven’t yet written some PowerShell to control ADL Analytics jobs or upload files to ADL Storage.

Moving on. How can PowerShell help us script our data output requirements? Well, here’s the answer, in my PowerShell script below I’ve done the following.

  1. Authenticate against my Azure subscription (optionally create yourself a PSCredential to do this).
  2. Submit the first U-SQL query as a file to return the distinct list of dates.
  3. Wait for the ADL Analytics job to complete.
  4. Download the output text file from ADL storage.
  5. Read the contents of the text file.
  6. Iterate over each dates listed in the text file.
  7. Submit a U-SQL job for each stored procedure with the date passed from the list.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#Params...
$WhereAmI = $MyInvocation.MyCommand.Path.Replace($MyInvocation.MyCommand.Name,"")
 
$DLAnalyticsName = "myfirstdatalakeanalysis" 
$DLAnalyticsDoP = 10
$DLStoreName = "myfirstdatalakestore01"
 
 
#Create Azure Connection
Login-AzureRmAccount | Out-Null
 
$USQLFile = $WhereAmI + "RecursiveOutputPrep.usql"
$PrepOutput = $WhereAmI + "AmbulanceDataDateList.txt"
 
#Summit Job
$job = Submit-AzureRmDataLakeAnalyticsJob `
    -Name "GetDateList" `
    -AccountName $DLAnalyticsName `
    –ScriptPath $USQLFile `
    -DegreeOfParallelism $DLAnalyticsDoP
 
Write-Host "Submitted USQL prep job."
 
#Wait for job to complete
Wait-AdlJob -Account $DLAnalyticsName -JobId $job.JobId | Out-Null
 
Write-Host "Downloading USQL output file."
 
#Download date list
Export-AzureRmDataLakeStoreItem `
    -AccountName $DLStoreName `
    -Path $myrootdir\output\AmbulanceDataDateList.csv `
    -Destination $PrepOutput | Out-Null
 
Write-Host "Downloaded USQL output file."
 
#Read dates
$Dates = Get-Content $PrepOutput
 
Write-Host "Read date list."
 
#Loop over dates with proc call for each
ForEach ($Date in $Dates)
    {
    $USQLProcCall = '[dbo].[usp_OutputDailyAvgSpeed]("' + $Date + '");'
    $JobName = 'Output daily avg dataset for ' + $Date
 
    Write-Host $USQLProcCall
 
    $job = Submit-AzureRmDataLakeAnalyticsJob `
        -Name $JobName `
        -AccountName $DLAnalyticsName `
        –Script $USQLProcCall `
        -DegreeOfParallelism $DLAnalyticsDoP
 
    Write-Host "Job submitted for " $Date
    }
 
Write-Host "Script complete. USQL jobs running."

At this point I think its worth reminding you of my caveats above 🙂

I would like to point out the flexibility in the PowerShell cmdlet Submit-AzureRmDataLakeAnalyticsJob. Allowing us to pass a U-SQL file (step 2) or build up a U-SQL string dynamically within the PowerShell script and pass that as the execution code (step 7), very handy. Switches: -Script or -ScriptPath.

If all goes well you should have jobs being prepared and shortly after running to produce the daily output files.

I used 10 AU’s for my jobs because I wanted to burn up some old Azure credits, but you can change this in the PowerShell variable $DLAnalyticsDoP.

Conclusion

It’s possible to archive looping behaviour with U-SQL when we want to produce multiple output files, but only when we abstract the iterative behaviour away to our friend PowerShell.

Comments welcome on this approach.

Many thanks for reading.

 

Ps. To make life a little easier. I’ve stuck all of the above code and sample data into a GitHub repostiory to save you copy and pasting things from the code windows above.

https://github.com/mrpaulandrew/RecursiveU-SQLWithPowerShell

 

 


Changing the Start Up App on Windows 10 IoT Core

Changing the start up application on your IoT device running Windows IoT Core seems to be a common requirement once the device is out in the field so I hope this post exploring a few different ways of doing this will be of value to people.

To clarify the behaviour we will achieve below is from device power on our Windows 10 IoT Core operating system will boot up. But instead of starting the default application which is originally named ‘IoTCoreDefaultApp’ we will load our own custom created UWP app executable file. I will not talk about deploying your UWP app to the device, the assumption here is that has already been done and a complied .EXE file is living on the devices local storage.

Application & Package Name

Before diving into any changes it’s worth taking a moment to figure what our UWP application is called. This might seem obvious but depending on how you make this change (via the web portal or PowerShell) it’s referred to by different names. Both are configured in Visual Studio under the Package.AppxManifest and do not have to be the same. See below with screen shots from a new vanilla UWP project.

  • Firstly, the application name which is the friendly display name given to solution on creation and the EXE file presented in the settings web portal.

UWPAppName

 

  • Secondly, the package name which is the not so friendly value (defaulted as a GUID) is used as a prefix for the package family name. This is what PowerShell uses and needs to be exact.

UWPPackName

Assumed you’ve navigated that mine field and you know what your UWP app is called depending on the context where its used lets move on.

Headed vs Headless

Sorry, a tiny bit more detail to gain an appreciation of before we actually do this. Start up apps here can take 2x different forms; headed or headless. The difference between them is best thought of as running in the background or running in the foreground. With the foreground apps being visual via the devices display output.

  • Headed = foreground. Visible via display.
  • Headless = background. Not visible, like a Windows service. Also you have no head 🙂

Choose as required and be mindful that you can only have one of each running.

Web Portal

Ok, the easy way to make this change and make your app start up post OS boot: navigate to your devices IP address and authenticate against the web portal. Here in the Apps section you’ll find anything currently deployed to your Windows IoT Core system. Under the Startup field in the Apps table simply click the application you now want to start after the operating system has booted up using the Set as Default App link.

UWPPortalStartupAPP

At the point of making this selection the table content will refresh to reflect the change and the application will also start running.

You can now restart you device and your app will load without any further intervention.

PowerShell

Given the ease of the above why would you want to use PowerShell Paul? Well if you’ve got to do this on mass for 30x devices you’ll want to script things. Or, as I found out when Windows 10 IoT Core was still in preview for the Raspberry Pi 3 the App panel in the web portal above had a bug and just didn’t display anything!

Moving on, create yourself a remote management session to your IoT device. If your unfamiliar with doing this the IoT Dashboard can help. Check out my previous blog post Controlling Your Windows 10 IoT Core Device for guidance.

Within the remote management session you’ll need the command IoTStartUp. When using a new command for the first time I always like to add the help switch /? to check out what it can do.

Next work through the following commands in your PowerShell session.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
## 1.
## Review the list of apps currently available on your device.
IoTStartUp List
## If your expected app isn't there then it hasn't been deployed to the device.
 
## 2.
## Copy the package family name from the list or from Visual Studio if you still have it open.
 
## 3.
## Set the app as headed using the following command.
IoTStartUp Add Headed YOURAPPFAMILYNAME
 
## 4.
## To confirm which apps are now set for startup use:
IoTStartUp Startup

Here’s the same thing in a PowerShell window just to be clear.

UWPPowerShellStartupApp

Again to prove this you can now restart the device. From PowerShell use shutdown -r.

Restoring the Start Up Default

To change this back to the default application either repeat the above PowerShell steps for the package family name IoTCoreDefaultApp_1w720vyc4ccym!App or in the web portal set the app name IoTCoreDefaultApp as the default.

Many thanks for reading


Controlling Your Windows 10 IoT Core Device

If like me you have a several Raspberry Pi’s doing different things and running different operating systems controlling them can sometimes be a bit of a challenge, especially with a wealth of remote control protocols and command line tools to choose from. In this post I’m specifically exploring the different ways to control your Raspberry Pi running Windows 10 IoT Core.

The thing that surprised me about Microsoft’s “core” flavour of operating systems is just how much they appear on the network like a normal workstation, this might sound silly and very obvious but for the longest time when installing servers in Hyper-V I would shy away from the minimalist core install because I very wrongly assumed this meant you only got a black CMD prompt and nothing more. With Windows 10 IoT Core I can assure you that this is certainly not the case. Apart from the lack of a pretty title Start Menu the operating system (OS) is feature rich and very easy to work with.

Lets look at what we can use to control our Pi and Core OS.

The Old Fashioned Physical Method

PiShot1Just for completeness its probably best to start with the traditional approach of using physical things like a HDMI cable plugged into the back of your monitor plus a keyboard and mouse connected via USB ports. This is obviously very basic and probably not why you’ve been drawn to this blog post. However even via this method you do get a graphical interface on screen with the ability to alter things like the WiFi settings and actually shutdown the device correctly without pulling the plug.

Windows Core Browser Console

Next and by far my favourite feature in Windows 10 IoT Core is the browser based console that can be connected to using the IP address of the device followed by port 8080. Eg: HTTP://192.168.1.2:8080.

If you navigate to the equivalent address of the device on your network you will be prompted for a set of admin credentials and then taken to a lovely bunch of pages which include a wealth of configuration options. When developing this satisfies almost all of my immediate needs for controlling the device and starting up deployed applications.


IoTSetup


The browser console even includes a Task Manager style page of running processes with some pretty real time performance graphs for CPU, RAM, I/O and Network usage. See slide 5 of 16 in the GIF above.

Lets move on before I turn this blog post into a page of screen shots!

Windows IoT Remote Client

IoTClientNext we have the IoT Remote Client app, which as I type is officially still in preview. However this is another really helpful way to control your device. The desktop application available from the Microsoft Store is basically the IoT Core operating system RDP client, so no need for VNC server licencing between Raspbian and Windows to worry about.

IoTRemoteClientStore link: https://www.microsoft.com/en-gb/store/apps/windows-iot-remote-client/9nblggh5mnxz

To get this running in the browser based console I mentioned above first go to the Remote page and tick the box to Enable Windows IoT Remote Server. AKA Terminal Services. Next install the app on and start it up. If all is well it will detect the IoT device on your network and allow you to connect, or just enter the IP address in the box provided. Post authentication you’ll then have a view of your IoT device matching exactly what the HDMI cable can offer. Plus the ability to interact with the device with a keyboard and mouse through the remote session. When starting apps that include something graphical it’s really useful to see the thing. Another use case might be when performing cloud to device messaging, having the received content on screen is nice to see.

This method of remote control is where I actually pulled the first screen shot above from, rather than taking a picture of the monitor displaying the HDMI output.

IoT Dashboard

Another really useful bit of software in the IoT device toolbox is the IoT Dashboard.

Download link: http://go.microsoft.com/fwlink/?LinkID=708576

This walks you through the setting up a new device SD card plus gives you a My Devices window for easily launching the next 2x features detailed below.
IoTDashboard

PowerShell

What can’t you do with PowerShell in the Microsoft world?

Before the IoT Dashboard connecting to the IoT Core device required starting the PowerShell remote commands service. Then setting the items address etc etc. Like this:

1
2
3
4
5
6
7
$MyDeviceIP = "192.168.1.2" #Example
 
Net start WinRM
 
Set-Item WSMan:\localhost\Client\TrustedHosts -Value $MyDeviceIP
 
Enter-PSSession -ComputerName $MyDeviceIP -Credential "localhost\Administrator"

Now if using the IoT Dashboard above from My Devices you have the option just to right click on the auto detected device and select Connect using PowerShell. This handles all the above prerequisites for connecting and only prompts for credentials. Lovely!

PStoPi

Note; both PowerShell methods of connection to the device will require elevated permissions.

Admin File Share

Lastly and again really for completeness we have the traditional UNC path. From the IoT Dashboard this is made easily in My Devices this time by selecting Open Network Share. This brings up a file explorer window to the C$ admin share for browsing the file system on the SD Card.

PiShot5For anybody that is used to seeing the contents of the C drive on a Windows this folder structure will look very familiar. Program File etc.

Other

Sadly despite a few attempts things like Remote Registry access, remote connection to Services using your local snap-in console and Computer Management are not currently supported. The Service to enable Remote Registry simply isn’t there, along with other remote management services on Windows 10 IoT Core. If this changes I’ll be sure to update this post.

Many thanks for reading.


Using Hyper-V and PowerShell to Create the Perfect Developer Workstation

So the second challenge I faced after unboxing and plugging in my new workstation at Purple Frog Systems was software. Having customers using all versions of SQL Server from 2005 onwards I didn’t want to tie myself to a particular version locally and I also didn’t want the hassle of running lots of different SQL Server services all on the same host operating system. Especially if I wanted to use Windows 10 as my host, which as we know would have compatibility issues with SQL Server 2005 and our old friend BIDS.

WindowsFeaturesHyperV

Enter Microsoft Hyper-V

In Windows 8 onwards Hyper-V is available out of the box as a feature which can simply be switched on from Control Panel > Programs and Features > Turn Windows Features On or Off. Apologies in advance if the home editions don’t support this.

Note: be sure to select the Hyper-V Module for Windows PowerShell.

Now with my Windows 10 host now running Hyper-V management services I set about creating a bunch of virtual machines to serve up every version of SQL Server required.

Whenever starting the Hyper-V manager always do so as an administrator. The user interface throws some odd errors without the elevated permissions which could lead you down a rabbit hole of Google searches. Trust me!

Creating a Virtual Switch

For those of you new to Hyper-V creating a virtual switch for your guess virtual machines is one of the most important things to get right and sort out first. It’s very simple to do, but without it your network admins and DHCP police might start questioning what on earth your physical network connection is doing. Again, trust me on this!

HyperVCreateSwitchFrom your Hyper-V Manager toolbar go to Action > Virtual Switch Manager. In the dialogue window that appears in the right hand panel choose the switch type of External and click Create Virtual Switch. Give the switch a name and select the External Network: that the new switch will basically impersonate. This will be a sort list if you only have 1x physical network connection.

 

With an external virtual switch in place any guess machine setup to use it as a network resource will appear on your network in the same way your physical host does. It’s own MAC address, IP address, hostname etc. Also be mindful that Hyper-V will take over how your network connections appear in your host operating system and a CMD ipconfig /all will start to look a little more complex, but this is nothing to worry about and perfectly normal.

Creating the Guest Virtual Machines

To offer some repeatability when creating my guest virtual machines I broke out the PowerShell, hence including the module at install time. With a few parameterised cmdlets I was able to setup several empty virtual machines ready for my guest operating systems. Example below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#Create hyperv machine
New-VM `
    -Name $VMName `
    -Path $VMLocation `
    -MemoryStartupBytes $RAM `
    -NewVHDPath $VHDOSFile `
    -NewVHDSizeBytes $VHDSizes `
    -SwitchName $NetworkSwitch `
    -Generation $Generation
 
#Set Number of CPUs
Set-VMProcessor `
    -VMName $VMName `
    -Count $CPUs
 
#Create data disk
New-VHD `
    –Path $VHDDataFile -Dynamic `
    –SizeBytes $VHDSizes `

Download the full script I used here.

Note: I’m not a PowerShell expert, just a big fan of it’s uses.

You’ll see in the full script that I give my virtual machines 2x hard drive files each on different controllers (IDE and SCSI), which are in turn are created on different host hard drives, mainly because my host has 3x physical disks. My original intention was to have all operating systems on mechanical disks and all data on solid state disks. So for guess SQL Server installs the virtual machine data disk would be used to house databases attached to the instance. In practice this didn’t really pay off but then this is just another advantage of using Hyper-V, that the underlying resources can be moved around as required.

If you have time I would recommend having you ISO’s on a separate physical disk to the guess hard drive files. This greatly speeds up installation. You could even run the guess OS hard drive files on your solid state drive just for installation, then move them to mechanical disks afterwards.

The Guests

Assuming you now have a bunch of virtual machine shells in place I’d recommend the following operating systems vs versions of SQL Server. Mainly to avoid any compatibility issues.

  • SQL Server 2005 – Windows Server 2003
  • SQL Server 2008 – Windows Server 2008
  • SQL Server 2012 – Windows Server 2012
  • SQL Server 2014 – Windows Server 2012
  • SQL Server 2016 – Windows Server 2016 (assuming the OS is released in time)

I’m not working to talk about installing Windows and SQL Server on each guest. I’m assuming your familiar with that Next, Next, Next process (if you want the defaults!). However I would say that for SQL Server 2012 onwards I did this twice for Analysis services to give me both Tabular and Multi Dimensional services in my guess virtual machine.

The Goal

If you’ve worked through this post and setup your host in the same way I have you’ll now be able to enjoy the advantages and flexibility of running all versions of SQL Server at the same time with all SQL services in guess virtual machines on your developer worksation.

HyperVVMList

My virtual machines are also setup with dynamic RAM:

  • Initial value: 2048 MB
  • Min value: 512 MB
  • Max value: 8192 MB

The Memory Weight slider is great if you want to keep all your guess virtual machines running at the same time like I do. If doing some development work on SQL Server 2005 I simply increase that guests RAM priority which dynamically adjusts the less used virtual machines so the host doesn’t get overwhelmed. Plus you’ll only be developing on one version of SQL Server at once right?!

HyperVRAMWeight

The last thing you’ll probably want to do is install some SQL Server client tools on your Windows 10 host. I went for SQL Server 2012, plus Visual Studio 2012. Then SQL Server 2014, plus Visual Studio 2015. Installed in that order.

Final Words

Please be aware that the above was only possible with a Microsoft MSDN subscription which provided licencing for the guest operating systems and older versions of SQL Server.

During this setup I also had the advantage of being a domain admin, which meant creating things like service accounts for all my virtual machines was easy, adding my virtual machines to the domain and accessing SQL Server services from my host using AD authentication made easy. If you not a DA SQL authentication and local workgroups is ok, but has its limits for SSAS.

RDCMAddMachine

You may want to try using Remote Desktop Connection Manager to access your guess operating systems. In the latest version a VM ID can be including giving console style access without needing to connect to each guess from the Hyper-V manager.

VM IDs can be found with the following bit of PowerShell.

1
Get-VMMemory -VMName "PF*" | Select VMName, ID | Sort-Object VMName

Very last thing, I mentioned my host machine had 3x physical hard drives. The third of which is a huge 4 TB block. To keep my guest operating systems fairly customer independent you’ll have seen I only gave the standard virtual hard drives in PowerShell 100GB of space each. What I then do is give a guest an additional virtual disk much larger in size which resides on the host big data volume. Work is done here. Then post project completion this customer specific virtual data disk can just be dropped/archives/moved away and the guest machine ready for the next work item. It’s of course a management overhead, but helps keep things clean.

Many thanks for reading.


Paul’s Frog Blog

Paul is a Microsoft Data Platform MVP with 10+ years’ experience working with the complete on premises SQL Server stack in a variety of roles and industries. Now as the Business Intelligence Consultant at Purple Frog Systems has turned his keyboard to big data solutions in the Microsoft cloud. Specialising in Azure Data Lake Analytics, Azure Data Factory, Azure Stream Analytics, Event Hubs and IoT. Paul is also a STEM Ambassador for the networking education in schools’ programme, PASS chapter leader for the Microsoft Data Platform Group – Birmingham, SQL Bits, SQL Relay, SQL Saturday speaker and helper. Currently the Stack Overflow top user for Azure Data Factory. As well as very active member of the technical community.
Thanks for visiting.
@mrpaulandrew