Creating a U-SQL Date Dimension & Numbers Table in Azure Data Lake

Now we all know what a date dimension is and there are plenty of really great examples out there for creating them in various languages. Well, here’s my U-SQL version creating the output from scratch using a numbers table. Remember that U-SQL needs to be handled slightly differently because we don’t have any iterative functionality available. Plus its ability to massively parallelise jobs means we can’t write something that relies on procedural code.

This is version 1…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
//Enter start and end dates for the date dimension:
DECLARE @StartDate DateTime = new DateTime(2017,1,1);
DECLARE @EndDate DateTime = new DateTime(2018,12,31);
 
//Create numbers table
@Numbers0 = SELECT * FROM (VALUES(0)) AS Row (Number);
@Numbers1 = SELECT [Number] FROM @Numbers0 
    UNION ALL SELECT [Number]+1 AS Number FROM @Numbers0;
@Numbers2 = SELECT [Number] FROM @Numbers1 
    UNION ALL SELECT [Number]+2 AS Number FROM @Numbers1;
@Numbers4 = SELECT [Number] FROM @Numbers2 
    UNION ALL SELECT [Number]+4 AS Number FROM @Numbers2;
@Numbers8 = SELECT [Number] FROM @Numbers4 
    UNION ALL SELECT [Number]+8 AS Number FROM @Numbers4;
@Numbers16 = SELECT [Number] FROM @Numbers8 
    UNION ALL SELECT [Number]+16 AS Number FROM @Numbers8;
@Numbers32 = SELECT [Number] FROM @Numbers16 
    UNION ALL SELECT [Number]+32 AS Number FROM @Numbers16;
@Numbers64 = SELECT [Number] FROM @Numbers32 
    UNION ALL SELECT [Number]+64 AS Number FROM @Numbers32;
//Double it again if you want it bigger...
 
//Create date dimension
@DateDimension = 
SELECT 
    int.Parse([Date].ToString("yyyyMMdd")) AS DateKey,
    [Date],
    [Date].ToString("dd/MM/yyyy") AS DateString,
    [Date].Day AS Day,
    [Date].Month AS Month,
    Math.Floor(((decimal)[Date].Month + 2) / 3) AS Quater,
    [Date].Year AS Year,
    Convert.ToInt32([Date].DayOfWeek) + 1 AS DayOfWeekNo,
    [Date].ToString("dddd") AS DayName,
    [Date].ToString("MMMM") AS MonthName,
    [Date].Month >=4 ? [Date].ToString("yyyy")+"/"+([Date].AddYears(+1)).ToString("yy") 
        : ([Date].Year - 1).ToString() + "/" + [Date].ToString("yy") AS FinancialYear,
    DateTimeFormatInfo.CurrentInfo.Calendar.GetWeekOfYear(
        [Date], CalendarWeekRule.FirstDay, System.DayOfWeek.Sunday) AS WeekNoOfYear
FROM
    (
    SELECT 
        @StartDate.AddDays(Convert.ToDouble([RowNumber]) -1) AS Date
    FROM 
        (
        SELECT
            ROW_NUMBER() OVER (ORDER BY n1.[Number]) AS RowNumber
        FROM 
            @Numbers64 AS n1
            CROSS JOIN @Numbers64 AS n2 //make it big!
        ) AS x
    ) AS y
WHERE
    [Date] <= @EndDate; //cheat to cut off results
 
 
//Output files
OUTPUT @DateDimension
TO "/Stuff/DateDimension.csv"
ORDER BY [Date] ASC
USING Outputters.Csv(quoting : true, outputHeader : true);
 
//Get a numbers table as a bonus :-)
OUTPUT @Numbers64
TO "/Stuff/Numbers.csv"
ORDER BY [Number] ASC
USING Outputters.Csv(quoting : true, outputHeader : true);

In version 2 I may use a replicated string that EXPLODE’s from an array rather than using a numbers table. But that’s for another time. I included the numbers table as an output in this one as a little bonus 🙂

Hope this helps you out while swimming in the Azure Data Lake.

Many thanks for reading.


Leave a Reply

Your email address will not be published. Required fields are marked *

HTML tags are not allowed.

1,037 Spambots Blocked by Simple Comments

Paul’s Frog Blog

Paul is a Microsoft Data Platform MVP with 10+ years’ experience working with the complete on premises SQL Server stack in a variety of roles and industries. Now as the Business Intelligence Consultant at Purple Frog Systems has turned his keyboard to big data solutions in the Microsoft cloud. Specialising in Azure Data Lake Analytics, Azure Data Factory, Azure Stream Analytics, Event Hubs and IoT. Paul is also a STEM Ambassador for the networking education in schools’ programme, PASS chapter leader for the Microsoft Data Platform Group – Birmingham, SQL Bits, SQL Relay, SQL Saturday speaker and helper. Currently the Stack Overflow top user for Azure Data Factory. As well as very active member of the technical community.
Thanks for visiting.
@mrpaulandrew