Fabric is the newest data analytics platform from Microsoft. It brings together the entire analytics process onto the Power BI service, meaning that you no longer need to be moving your data around from different services. The introduction of OneLake means that all data that you import whether from Cloud servers, On-Premises servers or more is available in one central location.

With Fabric, you need to choose how you will store the data that you have imported so that you can process it. Data Warehouses and Data Lakehouses are the two main ways that this can be done on Fabric – but what is the difference?

Person choosing between Fabric Warehouse and Lakehouse

What is a Data Warehouse?

A Data Warehouse is a structured data store, where a defined schema describes the layout of data in each table (most commonly a Star Schema made up of fact and dimension tables). T-SQL is used to read and write data to your Fabric Warehouse, allowing for DDL and DML transactions. As Fabric is still in development, some commands are not supported – for a full list of these commands, check T-SQL surface area – Microsoft Fabric | Microsoft Learn!

Data can be ingested into your Warehouse using pipelines, dataflows and SQL. When transforming data in the warehouse, you are able to query across multiple tables in the warehouse, and across other warehouses and lakehouses in your OneLake. Data from a warehouse can then be used directly in Power BI using the dataset that is automatically created from the warehouse, or data can be read using Spark notebooks for analysis in PySpark, Scala, Spark SQL or R.

What is a Data Lakehouse?

A data lakehouse is a data platform combining features of the data warehouse and data lake. It is able to support data of any structure, meaning that data can be in a tabular or file format, or both! In Fabric, the lakehouse has an additional “SQL endpoint” created which is a read-only T-SQL interface for the lakehouse. The lakehouse stores data in either a table or files folder, where tables are of Delta format only.

Data is ingested into the lakehouse using pipelines, dataflows, Spark and shortcuts. When transforming data, you can produce Spark notebooks against any of your lakehouses, increasing the repeatability of transformations. Data in the lakehouse can then be used directly in Power BI using the dataset that is automatically created, or data can be read using Spark notebooks for analysis in PySpark, Scala, Spark SQL or R.

How do I decide?

When trying to decide which to use, there are a few key things to consider:

  1. The experience and knowledge of the developer
    • If they are more familiar with SQL then using a warehouse will be easier
    • If the developer is more familiar with Spark or PySpark, then using a lakehouse will be easier
  2. The types of data being stored
      • If your data is completely made up of tables, then a warehouse is a great option
      • If you have files such as images in addition to tabular data, the lakehouse is suited to you

 

The table below provides a comparison of the key features in a warehouse and lakehouse to help make a decision on which to use:

      Data warehouse Lakehouse
    Data volume Unlimited Unlimited
    Type of data Structured Unstructured, semi-structured, structured
    Primary developer persona Data warehouse developer, SQL engineer Data engineer, data scientist
    Primary developer skill set SQL Spark(Scala, PySpark, Spark SQL, R)
    Data organized by Databases, schemas, and tables Folders and files, databases, and tables
    Read operations Spark, T-SQL Spark, T-SQL
    Write operations T-SQL Spark (Scala, PySpark, Spark SQL, R)
    Multi-table transactions Yes No
    Primary development interface SQL scripts Spark notebooks, Spark job definitions
    Security Object level (table, view, function, stored procedure, etc.), column level, row level, DDL/DML Row level, table level (when using T-SQL), none for Spark
    Access data via shortcuts Yes (indirectly through the lakehouse) Yes
    Can be a source for shortcuts Yes (tables) Yes (files and tables)
    Query across items Yes, query across lakehouse and warehouse tables Yes, query across lakehouse and warehouse tables; query across lakehouses (including shortcuts using Spark)

     

    Let us know which one you are using and why you chose that!

    Tags: , , ,