By Nick Edwards
In this blog post we’ll take a quick look at creating a self-generating calendar table using DAX.
Dates are important if we want to perform time intelligence reporting on our data i.e. yearly sales, monthly sales, weekly sales, year to date sales or previous year sales.
We’ll be using the calendar function to create our date table, but there are other methods to do this such as CALENDARAUTO or GENERATESERIES.
Here is the syntax we’ll be using to generate our date table.
The calendar function returns a single column called “Date” which generates a continuous series of dates from the specified start date to the specified end date. So if we specified the start date to be the 01/01/2020 and the end date to be 31/12/2020 the function would generate 366 rows of distinct date data.
We can then use the add columns function to expand our calendar table further to add specific columns we wish to slice our data by i.e. year, month, quarter, week number, day…
Here is the syntax we will be using to expand our calender table.
Let’s use an example to further explore how these functions work in practice using some sample adventure works sales data. Here we use the “Get Data” icon to directly query our adventure works database and bring over a sample of sales data, named “AW_Sales”.
Now, to create our calendar table we need to click “New Table” in the modeling tab and enter the following function.
Note: I like to use variables in my date table just to keep the DAX looking clean and fuss free, but this isn’t necessary.
For the start date parameter we have used the FIRSTDATE() and for the end Date parameter we have used the LASTDATE() function. This is so we can extract the first and last “OrderDate” from our AW_Sales table. We could have also used the MIN() and MAX() function to deliver the same results using the newly created “Date” column. As we can see this has generated a sequential date list from the 01/01/2012 to the 31/12/2013 with 731 distinct rows of data.
Now we want to expand our calendar table using the newly created “Date” column with new columns which can slice and dice our data. To do this I use the ADDCOLUMNS function. Here I have added Year, Quarter, Year Month, Month Number, Month Name, Day Of Year, Day Of Month, Day Of Week and Day Name as columns.
Once we’ve done this, we will mark our newly created table as a date table to allow Power BI to recognize date hierarchies and time intelligence functions.
We can now view our newly generated calender table in the data view. Instantly when any new data enters our model from “AW_Sales”, the calendar table will expand accordingly due to the last date function used above.
Now it’s just a case of creating a one to many relationship between our new calendar table and our “AW_Sales” table. We’ll create a one to many relationship between “Calendar[Date]” and “AW_Sales[OrderDate]” as shown below.
Congratulations we have now created a fully fledged calendar table that can slice and dice our “AW_Sales” tale by any of the columns we have created in our calendar table, as shown in the example below.
For those that haven’t yet heard of DAX, it’s an expression language developed by Microsoft to perform calculations against PowerPivot. Stepping back one step further, PowerPivot is essentially a local Analysis Services cube that runs within Excel 2010.
I’ve heard plenty of comments from various sources about how DAX is the [multi-dimensional] query language of the future and how it’s going to kill off MDX. Ok…. well no, it’s not ok.
For starters they both exist for two different purposes. DAX is not a query language but an expression language. You can use MDX to query information from a cube and generate a pivot, you can’t with DAX as it is not a query language.
The best way to think of DAX is as an extension to Excel formulas, you can use it to perform a calculation against an existing set of cells, in this case, a PowerPivot dataset.
It is also similar to an MDX calculated member, and in fact supports a number of MDX functions (TotalYTD, ParallelPeriod etc.).
If your data is in a database: You would use SQL to query data from a database and import the results into Excel. You would then use Excel expressions/calculations to enhance the data.
If your data is in a cube: You would use an Excel pivot (or MDX query) to query the data and import the results into Excel. You then have to use a third party tool such as OLAP PivotTable Extensions to add expressions/calculations to enhance the data.
If your data is in PowerPivot: You would use PowerPivot to query the data and import the results into Excel. You would then use DAX to add calculations to enhance the data.
DAX is a fantastic expression tool, and one that provides significant power to PowerPivot, but no, it won’t replace MDX. My hope is that Microsoft will provide DAX capability for MDX queries as well, and not restrict it to PowerPivot queries. As I’ve shown in my previous blog post it’s a great expression language that would provide significant benefit to cube users.
In my previous post I explained how to create a calculated MDX member that projects full year data (sales etc.) based on existing year to date data.
In this post I’ll be doing exactly the same but in DAX, the new expression language used to enhance PowerPivot data. As it’s the same desired outcome, I’m not going to repeat the background, you’ll have to look at my previous post for that.
The expressions below assume that you have a single table ‘Sales’ with a [Date] column and an [Internet Sales Amount] column.
Step 1 – What are our total sales so far this year?
We use the TOTALYTD function to work this out for us
=TOTALYTD(SUM(Sales[Internet Sales Amount]) ,Sales[Date], all(Sales))
The first parameter is the expression we want to calculate, i.e. the sum of internet sales.
The second parameter specifies the date we’re calculating up to.
The third parameter is one that catches a lot of people out. We have to tell DAX the context of the date. As it stands the expression can only see the data in the selected row, by specifing all(Sales) in the filter we expand the expression to be able to look at all of the data.
Step 2 – How far are we though the year?
It’s here where DAX really shows an improvement in the functions available over and above what’s available in MDX. There’s a YEARFRAC function which calculates how far we are through the year.
=YEARFRAC( CONCATENATE("01/01/" ,year(Sales[Date])) ,Sales[Date]))
The first parameter is the start date, i.e. the 1st January. We have to build this using the year of the selected row to ensure we get the right year.
The second parameter is the date of the record we’re looking at, Sales[Date].
Step 3 – Project the value to the end of the year
We combine the two values by simply dividing the YTD figure by how far we are through the year
=TOTALYTD(SUM(Sales[Internet Sales Amount]) ,Sales[Date], all(Sales)) / YEARFRAC( CONCATENATE("01/01/" , year(Sales[Date])) ,Sales[Date]))
Thanks to Sandeep Nallabelli in the comments below or pointing out a bug in this, YearFrac should actually be calculated from 31st Dec instead of 1st Jan. Updated calculation should be:
=TOTALYTD(SUM(Sales[Internet Sales Amount]) ,Sales[Date], all(Sales)) / YEARFRAC( CONCATENATE("31/12/" , year(Sales[Date])-1) ,Sales[Date]))
Also, it is worth adding the YearFrac calculation as a calculated column in your date table so that you can use it in measures.
This chart shows how the full year run rate is adjusted throughout the year as the cumulative sales to date grows. At the start of the year it’s quite volatile, but from February it settles down with an acurate projection.
And it really is as easy as that.
Purple Frog spent a very interesting day at Microsoft last week, at one of their many events promoting the launch of SQL Server 2008 R2. Rafal Lukewiecki presented an entertaining (as always!) and informative series of talks covering the release, focusing on the enhanced Business Intelligence tools available.
The primary changes to note are
- Power Pivot – An in memory, client side add-in to Excel, that allows users to create virtual cubes on their desktop and analyse over 100m records of data virtually instantly
- DAX – A new expression language, designed for non-technical (probably more realistically, semi-technical) users to extend pivot tables and power pivot tables without having to learn MDX
- Report Components – In a report consisting of a couple of tables, a chart and a few gauges (gauges, sparklines & maps are all new features of SSRS), you can save each element as a component and re-use it in different reports. This should result in much less duplication of work.
- Report Builder 3 – A thin-client tool allowing end users to create Reporting Services reports. This is a big enhancement over its predecessor s it is finally fully compatible with reports created in the Business Intelligence Development Studio (BIDS), including report components.
- Master Data Services – A centralised tool and database intended to provide governance of your organisation’s master data (centralised list of products, fiscal calendar, regions etc.).
The enhancements to Reporting Services (SSRS) are very welcome, and should be of huge benefit to anyone either currently using SSRS or considering using it. I firmly believe that there are no comparable web based reporting engines that even come close for SME organisations when looking at the whole picture including cost of implementation, ease of use, flexibility and capability.
Master Data Services as a concept has been around for a long time, but there has never been a tool available to organisations to effectively implement it. This is Microsoft’s first proper stab at delivering a workable solution, and although I’m a big fan of the concept, and have no doubt of its benefit to a SME, I’m yet to be convinced that the tool is ready for a large scale corporate environment. Time will tell how scalable and manageable the system is, and credit has to go to Microsoft for starting the ball rolling.
The most impressive addition is without a doubt PowerPivot. In a nutshell, it’s a user defined OLAP cube wrapped up within Excel 2010, running entirely in memory on a user’s workstation. If you’ve not yet played with it or seen a demo, I’ll try and elaborate for you… Think about loading Excel with 1 million rows, and then imagine sorting and filtering a number of those columns [cue going out to lunch whilst waiting for Excel to catch up]. With PowerPivot, you can sort and filter over 100 million rows of data almost in an instant – it’s very impressive indeed!
That’s the snazzy demo bit, but to reduce it to a glorified spreadsheet is very harsh indeed. It allows a user to import multiple data sources and combine them together into a single dimensional data model, PowerPivot will create your own personal cube, without you having to build a warehouse, without knowing anything about MDX, dimension hierarchies, attribute relationships, granularity etc. etc.
Microsoft’s vision and reason for creating this tool is self-service BI, allowing users to create their own cubes, data analysis environments and reporting systems. And this is where I start to have a problem…
I can’t remember the last time I designed a data warehouse, where I did not find significant data quality problems, conflicting data, missing data, duplicated data etc.. I also find it hard to think of a situation where an end user (even a power user) is sufficiently clued up about the intricacies of a source OLTP database to be able to extract the right data and know what to do with it. Or if they are, a dozen other people in different departments have a different idea about how things work, resulting in many different versions of the truth.
I’m therefore (for now!) sticking with the opinion that it is still absolutely vital for an organisation to provide a clean, consistent, dimensionally modelled data warehouse as the basis for their BI/MI infrastructure. Tools like PowerPivot then sit very nicely on top to provide an incredibly powerful and beneficial user experience, but to try and use the emergence of self-service BI tools to usher in a new ‘non-data warehouse’ era is a very dangerous route which I hope people will avoid.
In summary – this release brings with it a fantastic host of new tools, but with great power comes great responsibility…