0845 643 64 63

Data warehousing

DatabaseWhat is a Data Warehouse?

A data warehouse is a central repository of information. It's a single database from which you can control and enhance the information that your organisation uses. You can either use it as a direct source for your reporting, or use it as a launching platform for any number of more advanced tools such as Data Mining or OLAP Cubes.


In its simplest form, a data warehouse (DW) can just be an overnight data extract from a single system (sales order processing, Excel, accounts etc.) that allows you to run more efficient queries as well as storing a history for a system that doesn't provide historical information. In this form it is technically an Operational Data Store (ODS) but for simplicity we'll treat it as a DW.


In it's most complex form it can contain data extracted from any number of systems, all using different technologies and standards, different methods of extraction, and dirty, incorrect and incompatible data. The DW then remodels the data, stores it in a format that's optimised for reporting, and then presents this in a single consolidated form for the whole organisation to use.



How Can Purple Frog Help?

Purple Frog are internationally recognised experts in data warehousing and data anlytics using the Microsoft SQL Server data platform, which leads the Gartner Magic Quadrants in its field.

We can help you in a number of ways:

  • Designing and building a complete solution for you
  • Guide and mentor your staff as they build an in-house solution, providing architecture, design and code input where required
  • Maintain and enhance your existing solutions to improve performance, functionality and reliability
  • Provide bespoke services specific to your business needs

Historical Data

A common problem with a number of IT systems, is that they are designed to run a business day to day. They don't provide enough of an audit trail or historical information. This means that you can't see what has changed over a set time frame, you can't assess key metric information such as performance now compared with last week or last year. You can't even correct mistakes such as incorrectly modified data. Data Warehouse Snapshot

A DW will take a daily (or hourly, weekly or even near real time) snapshot of your data and incorporate it into its dataset, allowing you to report on your changes and historical data as easily as you can view the current information.



It is likely that a number of your systems are department specific, or have been set up and used in isolation. This usually means that information from your systems use different units, different reference codes, different calendars, different descriptions and even different languages and currencies.

You can define how the data warehouse should present this information, using a consistent view of your organisation. This means that for the first time you can pull up reports instantly that compare data from all of your departments, locations and systems, all in a single place.


Data Cleansing

Data Cleansing

In any environment where data is being entered, either manually or via automated data capture, it is almost guaranteed that some invalid or contradictory data will be introduced into the system. This could be manual mistakes, data corruptions or 3rd party system errors.

As the data warehouse is being loaded, the ETL (Extract, Transform and Load) process can filter out these errors, and either correct them or save them in a reject area for manual review. You can then be confident that all the information being reported on is correct and consistent.



As the demand for information grows, the load on your systems grows with it. If a report takes 5 minutes to run, that is putting 5 minutes of heavy load on your source systems, compromising any other users that are working at the time.

As the data warehouse takes over job of information supplier, it can have a huge performance effect on your existing systems, purely by reducing their workload. You can then use the flexability of SQL Server to add clustering, multi processor support, mirroring and failover support to increase the performance, scalability and reliability further and further.