0845 643 64 63

PowerPivot

SSAS Tabular String Imported as Integer

Let me start by saying that I think the SSAS Tabular model is great. But…. there are a number of problems that Microsoft still need to get ironed out.

Not least of which is being able to import data properly directly from CSV/Text files. Yes you can import directly from csv, but you are given absolutely no control over the process, and this can lead to some serious problems.

One of these issues is the rather odd automated data type selection that is used by the Tabular import process. A column consisting of a combination of alpha and numerical text is more often than not treated as an integer, with all text information stripped out. And the Tabular model designer provides absolutely no way of changing this behaviour.

For example, take the following csv file, containing just three columns; ID, ItemCode and ItemValue.

ChangeTabularDataType01

The second column should clearly be interpreted as text, as the 5th row contains a non-numerical value. However when this is imported into the Tabular model, it treats the column as an integer. As you can see from the screenshot below, because the 5th row doesn’t contain a valid integer, the value is just ignored.

ChangeTabularDataType02

One would expect that we could simply go into the table in the designer, and update the Data Type property for the column. No. This just takes the numerical value and formats it as text. But any non-numerical values are still stripped out. The problem is that when the Tabular model reloads the file, it detects that the data type is an integer, and there’s nothing we can do to override it.

The only way of getting around this is by wrapping the strings in quotes within our csv.

ChangeTabularDataType03

This is irritating but we really don’t have a choice, just remember to be explicit in any csv definition that is to be imported directly into a Tabular model.

However, what if we find this out too late? If we’ve already built the model, added all of our DAX calculations in, set up the relationships etc., how do we change the format of an already created column?

Firstly we have to get the csv updated to wrap every string column in quotes.

If we try and just reimport this we’ll get an error “Unable to convert a value to the data type requested for table ‘xxx’ column ‘xxx’. The current operation was cancelled because another operation in the transaction failed.”

ChangeTabularDataType04

So first you have to change the column data type for the table in the designer. Click on the column, then in the column properties, change Data Type to Text.

ChangeTabularDataType05Once this is done, you can reprocess the table and import the actual text.

This is all well and good, and works most of the time. However, I recently encountered a Tabular model which had this problem, and the above process wouldn’t work. So the only solution I found was to go routing in the Tabular model’s xml source code, and force it to change. Hacking it manually worked a treat, so I thought I’d share the process here. Just be careful – always keep a backup of your files before you change anything!

To do this, open up your .bim file in a suitable text editor. I highly recommend Notepad++, as it works great for XML.

We need to change three things:

1) Change the data type for the table column key & name, within the dimensions

<Alter><ObjectDefinition><Database><Dimensions><Dimension><Attributes><Attribute><KeyColumns><KeyColumn>
change <DataType>BigInt</DataType> to <DataType>WChar</DataType>     (Note this may be Int or BigInt)
change <DataSize>-1</DataSize> to <DataSize>32768</DataSize>

Then do the same for the <NameColumn>

ChangeTabularDataType06

2) Change the data type for the table column key & name in the corresponding cube

<Alter><ObjectDefinition><Database><Cubes><Cube><MeasureGroups><MeasureGroup><Dimensions><Dimension><Attributes><Attribute><KeyColumns><KeyColumn>
change <DataType>BigInt</DataType> to <DataType>WChar</DataType>     (Note this may be Int or BigInt)
change <DataSize>-1</DataSize> to <DataSize>32768</DataSize>

Then do the same for the <NameColumn>

3) Change the definition of the csv datasource

<Alter><ObjectDefinition><Database><DataSourceViews><DataSourceView><Schema><xs:schema><xs:element><xs:complexType><xs:choice><xs:element><xs:complexType><xs:sequence><xs:element>
delete ‘type=”xs:int”‘ from the element, and replace it with a SimpleType and restriction defining the string:

<xs:simpleType><xs:restriction base=”xs:string”><xs:maxLength value=”32768″ /></xs:restriction></xs:simpleType>ChangeTabularDataType07

 

Then save the .bim file, reload your Tabular model, and reprocess the table. Problem solved.

Frog-Blog-Out

 

 

PowerPivot Vs QlikView 101 – SQLBits Video

PowerPivot Vs QlikView 101 – SQLBits Video

The video is now available from my PowerPivot and QlikView talk at SQLBits 9 in Liverpool in September 2011. You can download or watch the video here.

In this 1 hour session I create interactive dashboards from scratch in both PowerPivot and QlikView, showing how to set up the data model, overcome common pitfalls and build the dashboards. I create almost identical dashboards in both systems and highllight the pros and cons of each system.

PowerPivot QlikView

You can find the scripts and code to go with this video in this blog post

SQLBits 10 is coming!!

Today the SQLBits organisers have announced that SQLBits 10 will be held in London between 29th – 31st March 2012 in the Novotel London West. It’s going to be even bigger and better, so keep an eye on the website and get your tickets early.

SQLBits 9 Session – PowerPivot and QlikView 101

Thank you to the SQLBits committee, speakers, helpers and attendees, for making SQLBits 9 one of the best yet. What a great example SQLBits is of the power of the SQL Server community coming together to help each other, and enjoy a few beers in the process!

QlikView PowerPivotI was delighted to have my “PowerPivot & QlikView 101” session chosen by the committee, which I presented on Saturday afternoon. Thank you to all those who attended, great to see such a good number there. If you haven’t yet submitted your feedback for my session, and the event as a whole, please do take the time to do it. It helps everyone improve the quality of the event for next time.

For those who atttended my session, here are the slides, with a summary of the results and code etc.

Slides: Download the PowerPivot & QlikView 101 slide deck here

QlikView Script: Download the QlikView Script here

PowerPivot Workboook: Download the PowerPivot workbook here

Session Video: The session recording isn’t yet available, I’ll post a link here when it’s ready.

QlikView Expressions

Sales Amount Visual Cue for the green shading
   Upper: (SUM(TOTAL SalesAmount) / COUNT(TOTAL SalesTerritoryRegion))*1.2
   Lower: (SUM(TOTAL SalesAmount) / COUNT(TOTAL SalesTerritoryRegion))*0.8

Sales Amount blue bar Guage maximum
   max(ALL Aggr(SUM(SalesAmount),OrderMonth))

Sales Amount Year to Date
   rangesum(above(sum(SalesAmount),0,12))

Sales Amount Previous Year
   SUM({<OrderYear={‘$(=Max(OrderYear)-1)’}>} SalesAmount)

PowerPivot DAX Expressions

Sales Amount Year to Date
   =TOTALYTD(SUM(FactInternetSales[SalesAmount]), DimOrderDate[FullDateAlternateKey], ALL(DimOrderDate))

Sales Amount Previous Year
   =CALCULATE(SUM(FactInternetSales[SalesAmount]), SAMEPERIODLASTYEAR(DimOrderDate[FullDateAlternateKey]), ALL(DimOrderDate))

 

PowerPivot Results

PowerPivot Dashboard

 

QlikView Results

QlikView Dashboard

If you have any questions about any of this, please get in touch.

Frog-Blog-Out

The Business Intelligence Semantic Model (BISM)

I’m a happy chap. Why? Because I read a blog post yesterday by T.K. Anand (SSAS Principal Group Program Manager) about the vision and roadmap of Analysis Services.

There were slightly concerning questions last November (following the PASS conference) surounding the future of Analysis Services, or more specifically the UDM, the dimensional model that we all know and love in SSAS 2005 & 2008. The arrival of PowerPivot into the Microsoft BI arsenal has without a doubt moved the goalposts and added significant power, flexibility and usability to the BI stack. My concern, along with others (most notibly Chris Webb, who sparked somewhat of a stampede on the issue), was for the future of the UDM and the multitude of existing dimensional systems out in the field. Is the dimensional approach being phased out? Will it be supported in future editions? Will it be enhanced? Will the future BISM support the complexity and power we currently have with the UDM?

There’s no doubt that the overall approach to business intelligence is evolving. And this isn’t just in the cube space, it obviously has a direct effect on all other aspects of the BI strategy; the data warehouse, reporting layer, ETL etc.

From a BI consultant’s point of view, I don’t want to be recommending tools to my clients which have a restricted life span and don’t provide them a future proof upgrade path.

From a technology perspective, I’m a hardened supporter of the dimensional model. I recently designed a complex cube system for a banking client which had over 150 dimensions and facts, with thousands of lines of MDX to create a very sophisticated calculation framework for their liquidity modelling and loan profiling. I wouldn’t dream of doing that in a tabular approach like PowerPivot (in their current form).

From a personal point of view, where do I focus my attention in terms of training, research, blogging, user groups, conference sessions etc. etc.

I should point out that I’m very excited by, and fully committed to the tabular/PowerPivot route (along with VertiPaq, Crescent, DAX, etc.) for systems that it is suited to. In fact I’m using it right now to prototype a global BI solution for a very large client. There are however some solutions that do not fit well with the tabular approach and are best suited to a dimensional approach. I’m in favour of a hybrid framework which allows the right tool to be used for the right system. And it looks like that’s what we’re going to get.

The guys at Microsoft have now evolved and clarified the roadmap, and have confirmed that the BISM (business intelligence semantic model, i.e. The core of Analysis Services in SQL Server Denali) will contain two parallel approaches that can both be used for whichever situations they are best suited to. More importantly, they are both here to stay, will both be developed further, and there will be a cross-availability of functionality and tools between them.

Multi Dimensional Model

Essentially the same as the existing UDM, the multi-dimensional data model will support MDX and ROLAP/MOLAP data access. Existing OLAP cubes in SQL 2008 will easily upgrade to this.

Tabular Model

Think of this as hosted PowerPivot. A tabular approach with a column based data store, DAX as the expression language and either VertiPaq or Direct Query for data access.

The two will co-exist side by side within a singluar BISM, albeit initially with a degree of seperation. In the upcoming CTP2 release (July 2011?) there will not be any cross-availability of functionality, i.e. VertiPaq, Crescent and DAX will not be available to the dimensional model. However TK makes it clear that this is a short term restriction in the CTP, and that Microsoft are commited to getting this cross availability in place in the finished product.

If you’re involed in BI in any way, I really do encourage you to go and read TK’s post in detail. The Business Intelligence world is changing. I now have total confidence that it’s for the better.

Frog-blog-out

Will DAX replace MDX

For those that haven’t yet heard of DAX, it’s an expression language developed by Microsoft to perform calculations against PowerPivot. Stepping back one step further, PowerPivot is essentially a local Analysis Services cube that runs within Excel 2010.

I’ve heard plenty of comments from various sources about how DAX is the [multi-dimensional] query language of the future and how it’s going to kill off MDX. Ok…. well no, it’s not ok.

For starters they both exist for two different purposes. DAX is not a query language but an expression language. You can use MDX to query information from a cube and generate a pivot, you can’t with DAX as it is not a query language.

The best way to think of DAX is as an extension to Excel formulas, you can use it to perform a calculation against an existing set of cells, in this case, a PowerPivot dataset.

It is also similar to an MDX calculated member, and in fact supports a number of MDX functions (TotalYTD, ParallelPeriod etc.).

If your data is in a database: You would use SQL to query data from a database and import the results into Excel. You would then use Excel expressions/calculations to enhance the data.

If your data is in a cube: You would use an Excel pivot (or MDX query) to query the data and import the results into Excel. You then have to use a third party tool such as OLAP PivotTable Extensions to add expressions/calculations to enhance the data.

If your data is in PowerPivot: You would use PowerPivot to query the data and import the results into Excel. You would then use DAX to add calculations to enhance the data.

DAX is a fantastic expression tool, and one that provides significant power to PowerPivot, but no, it won’t replace MDX. My hope is that Microsoft will provide DAX capability for MDX queries as well, and not restrict it to PowerPivot queries. As I’ve shown in my previous blog post it’s a great expression language that would provide significant benefit to cube users.

Calculating Run Rate in DAX for PowerPivot

In my previous post I explained how to create a calculated MDX member that projects full year data (sales etc.) based on existing year to date data.

In this post I’ll be doing exactly the same but in DAX, the new expression language used to enhance PowerPivot data. As it’s the same desired outcome, I’m not going to repeat the background, you’ll have to look at my previous post for that.

The expressions below assume that you have a single table ‘Sales’ with a [Date] column and an [Internet Sales Amount] column.

Step 1 – What are our total sales so far this year?

We use the TOTALYTD function to work this out for us


  =TOTALYTD(SUM(Sales[Internet Sales Amount])
     ,Sales[Date], all(Sales))

The first parameter is the expression we want to calculate, i.e. the sum of internet sales.
The second parameter specifies the date we’re calculating up to.
The third parameter is one that catches a lot of people out. We have to tell DAX the context of the date. As it stands the expression can only see the data in the selected row, by specifing all(Sales) in the filter we expand the expression to be able to look at all of the data.

Step 2 – How far are we though the year?

It’s here where DAX really shows an improvement in the functions available over and above what’s available in MDX. There’s a YEARFRAC function which calculates how far we are through the year.


  =YEARFRAC(
     CONCATENATE("01/01/"
         ,year(Sales[Date]))
     ,Sales[Date]))

The first parameter is the start date, i.e. the 1st January. We have to build this using the year of the selected row to ensure we get the right year.
The second parameter is the date of the record we’re looking at, Sales[Date].

Step 3 – Project the value to the end of the year

We combine the two values by simply dividing the YTD figure by how far we are through the year


  =TOTALYTD(SUM(Sales[Internet Sales Amount])
     ,Sales[Date], all(Sales))
  / YEARFRAC(
      CONCATENATE("01/01/"
          , year(Sales[Date]))
     ,Sales[Date]))

CORRECTION
Thanks to Sandeep Nallabelli in the comments below or pointing out a bug in this, YearFrac should actually be calculated from 31st Dec instead of 1st Jan. Updated calculation should be:


 =TOTALYTD(SUM(Sales[Internet Sales Amount]) 
    ,Sales[Date], all(Sales)) 
  / YEARFRAC( 
      CONCATENATE("31/12/" 
          , year(Sales[Date])-1) 
      ,Sales[Date])) 

Also, it is worth adding the YearFrac calculation as a calculated column in your date table so that you can use it in measures.


This chart shows how the full year run rate is adjusted throughout the year as the cumulative sales to date grows. At the start of the year it’s quite volatile, but from February it settles down with an acurate projection.

And it really is as easy as that.

Frog-Blog Out

The Frog Blog

I'm Alex Whittles.

I specialise in designing and implementing SQL Server business intelligence solutions, and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.

Data Platform MVP

Frog Blog Out
twitter
rssicon