One of the most useful aspects of a Business Intelligence system is the ability to add calculations to create new measures. This centralises the logic of the calculation into a single place, ensuring consistency and standardisation across the user base.

By way of example, a simple calculation for profit (Income – Expenditure) wouldn’t be provided by the source database and historically would be implemented in each and every report. In a data warehouse and/or cube we can create the calculation in a single place for everyone to use.

This post highlights some of methods of doing this, each with their respective pros and cons.

Calculated Members in SSAS Cube


SSAS provides a ‘Calculations’ tab in the cube designer which allows you to create new measures using MDX. You can use any combination of existing measures and dimension attributes, along with the plethora of MDX functions available to create highly complex calculations.
Pros:

  • Very complex calculations can be created using all available MDX functions
  • No changes are required to the structure of the data warehouse
  • Changes to the calculation will apply to every record, historic and new
  • The results are not stored in the warehouse or cube, so no extra space is required
  • New calculations can be added without having to deploy or reprocess the cube
  • Calculations can be scoped to any level of aggregation and granularity. Different calculations can even be used for different scopes
  • Calculations can easily combine measures from different measure groups

Cons:

  • The calculation will not make use of SSAS cube aggregations, reducing performance
  • SSAS drill through actions will not work
  • The calculation results are not available in the data warehouse, only the cube

SQL Calculations in the Data Source View


There’s a layer in-between the data warehouse and the cube called the data source view (DSV). This presents the relevant tables in the warehouse to the cube, and can be used to enhance the underlying data with calculations. This can either be the dsv layer within the cube project, or I prefer to create SQL Server views to encapsulate the logic.
Pros:

  • No changes are required to the table structure of the data warehouse
  • Calculations use SQL not MDX, reducing the complexity
  • Changes to the calculation will apply to every record, historic and new
  • The calculation will make full use of SSAS cube aggregations
  • SSAS drill through actions will work
  • The results are not stored in the warehouse, so the size of the database does not increase

Cons:

  • The cube must be redeployed and reprocessed before the new measure is available
  • The results of the calculation must be valid at the granularity of the fact table
  • The calculation results are not available in the data warehouse, only the cube

Calculate in the ETL process


Whilst bringing in data from the source data systems, it sometimes makes sense to perform calculations on the data at that point, and store the results in the warehouse.
Pros:

  • The results of the calculation will be available when querying the warehouse as well as the cube
  • In the ETL pipeline you can import other data sources (using lookups etc.) to utilise other data in the calculation
  • If the calculation uses time based data, or data valid at a specific time (i.e. share price) then by performing the calculation in the ETL, the correct time based data is used, without having to store the full history of the underlying source data
  • The calculation will make full use of SSAS cube aggregations
  • SSAS drill through actions will work

Cons:

  • You have to be able to alter the structure of the data warehouse, which isn’t always an option.
  • The results are stored in the warehouse, increasing the size of the database
  • The results of the calculation must be valid at the granularity of the fact table
  • If the calculation logic changes, all existing records must be updated

In Conclusion

If the calculation is valid for a single record, and it would be of benefit to have access to the results in the warehouse, then perform the calculation in the ETL pipeline and store teh results in the warehouse.

If the calculation is valid for a single record, and it would not be of benefit to have the results in the warehouse, then calculate it in the data source view.

If the calculation is too complex for SQL, requiring MDX functions, then create an MDX calculated measure in the cube.

Tags: , , , ,