Creating a U-SQL Date Dimension & Numbers Table in Azure Data Lake

Now we all know what a date dimension is and there are plenty of really great examples out there for creating them in various languages. Well, here’s my U-SQL version creating the output from scratch using a numbers table. Remember that U-SQL needs to be handled slightly differently because we don’t have any iterative functionality available. Plus its ability to massively parallelise jobs means we can’t write something that relies on procedural code.

This is version 1…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
//Enter start and end dates for the date dimension:
DECLARE @StartDate DateTime = new DateTime(2017,1,1);
DECLARE @EndDate DateTime = new DateTime(2018,12,31);
 
//Create numbers table
@Numbers0 = SELECT * FROM (VALUES(0)) AS Row (Number);
@Numbers1 = SELECT [Number] FROM @Numbers0 
    UNION ALL SELECT [Number]+1 AS Number FROM @Numbers0;
@Numbers2 = SELECT [Number] FROM @Numbers1 
    UNION ALL SELECT [Number]+2 AS Number FROM @Numbers1;
@Numbers4 = SELECT [Number] FROM @Numbers2 
    UNION ALL SELECT [Number]+4 AS Number FROM @Numbers2;
@Numbers8 = SELECT [Number] FROM @Numbers4 
    UNION ALL SELECT [Number]+8 AS Number FROM @Numbers4;
@Numbers16 = SELECT [Number] FROM @Numbers8 
    UNION ALL SELECT [Number]+16 AS Number FROM @Numbers8;
@Numbers32 = SELECT [Number] FROM @Numbers16 
    UNION ALL SELECT [Number]+32 AS Number FROM @Numbers16;
@Numbers64 = SELECT [Number] FROM @Numbers32 
    UNION ALL SELECT [Number]+64 AS Number FROM @Numbers32;
//Double it again if you want it bigger...
 
//Create date dimension
@DateDimension = 
SELECT 
    int.Parse([Date].ToString("yyyyMMdd")) AS DateKey,
    [Date],
    [Date].ToString("dd/MM/yyyy") AS DateString,
    [Date].Day AS Day,
    [Date].Month AS Month,
    Math.Floor(((decimal)[Date].Month + 2) / 3) AS Quater,
    [Date].Year AS Year,
    Convert.ToInt32([Date].DayOfWeek) + 1 AS DayOfWeekNo,
    [Date].ToString("dddd") AS DayName,
    [Date].ToString("MMMM") AS MonthName,
    [Date].Month >=4 ? [Date].ToString("yyyy")+"/"+([Date].AddYears(+1)).ToString("yy") 
        : ([Date].Year - 1).ToString() + "/" + [Date].ToString("yy") AS FinancialYear,
    DateTimeFormatInfo.CurrentInfo.Calendar.GetWeekOfYear(
        [Date], CalendarWeekRule.FirstDay, System.DayOfWeek.Sunday) AS WeekNoOfYear
FROM
    (
    SELECT 
        @StartDate.AddDays(Convert.ToDouble([RowNumber]) -1) AS Date
    FROM 
        (
        SELECT
            ROW_NUMBER() OVER (ORDER BY n1.[Number]) AS RowNumber
        FROM 
            @Numbers64 AS n1
            CROSS JOIN @Numbers64 AS n2 //make it big!
        ) AS x
    ) AS y
WHERE
    [Date] <= @EndDate; //cheat to cut off results
 
 
//Output files
OUTPUT @DateDimension
TO "/Stuff/DateDimension.csv"
ORDER BY [Date] ASC
USING Outputters.Csv(quoting : true, outputHeader : true);
 
//Get a numbers table as a bonus :-)
OUTPUT @Numbers64
TO "/Stuff/Numbers.csv"
ORDER BY [Number] ASC
USING Outputters.Csv(quoting : true, outputHeader : true);

In version 2 I may use a replicated string that EXPLODE’s from an array rather than using a numbers table. But that’s for another time. I included the numbers table as an output in this one as a little bonus 🙂

Hope this helps you out while swimming in the Azure Data Lake.

Many thanks for reading.


Paul’s Frog Blog

This blog is now closed to new comments or posts.

Please see new posts here