At today’s Build conference, Microsoft announced Fabric. What is this?
In simple terms, think of taking Synapse Analytics, Data Warehousing, Data Lakes, Data Factory, Spark Notebooks and Machine Learning, and bring them all together into Power BI.
This is underpinned by Microsoft OneLake, a high performance scalable data lake storage layer, supporting all of the above. OneLake is, as the name implies, one data lake that can be used across your organisation, containing all of your staging data, data lakehouses, data warehouses, and ML output.
Did I just say a data warehouse is storing data in a data lake, within Power BI?! Yes! Fabric Data Warehouse is an full ACID compliant SQL Server engine, just with its data stored in OneLake, and managed through the Power BI portal.
How does this get the performance? I’m glad you asked! OneLake natively uses Parquet Delta files, a highly compressed column store storage format, as used by Databricks, Delta Lakes, etc. which is a proven scalable and powerful storage engine, and is an open standard format.
Data Engineering in Power BI? So this is for small projects then?
Yes, and No! This architecture is designed to be scalable to large multi-terrabyte data solutions. We’re still exploring the extend and capability of this, and will provide updates on our testing as we get them.
Where is all this developed and stored?
Within your Power BI workspace. The workspace will now contain your notebooks, data warehouse, lakehouse, data factory, ML workload, in addition to your Dataflows, Datasets, Reports and Apps.
The Power BI portal now has ‘views’, accessed through the icon on the bottom left, which will filter the content and functionality to desired workloads; Power BI, Data Engineering, Data Science, etc. to help filter out the clutter.
It is also all secured, so you can decide who has access to the different components.
Do I need Power BI Premium?
Yes, existing ‘Premium’ capacity is changing to ‘Fabric’ capacity, with P1 being equivalent to F64.
This means there will be a much lower entry level to Fabric, with an F1 only costing a couple of hundred USD per month, enabling all of this amazing new functionality! Note that you would still need an F64 (P1) to avoid having to have Power BI user licenses. Below this you would need an F capacity to provide the Fabric functionality, and Power BI Pro licenses to develop or consume Power BI reports.
What about DirectLake? What is this?
To get a realtime Power BI dataset we currently use DirectQuery mode. Power BI has to convert DAX into SQL, and then fire off the SQL query against the underlying data source. This can be very slow and costly for compute on the source system.
To make reports fast, we have to import data into the dataset.
- Power BI stores its data internally in Vertipaq, which is a highly compressed column store format.
- OneLake stores its data internally in Delta, which is a highly compressed column store format.
If Vertipaq and Delta are essentially the same, why can’t Power BI run DAX natively against Delta?
It now can! And this connectivity mode is called Direct Lake. Think of it as Power BI treating OneLake as it’s internal Vertipaq storage layer.
This means that you can create a data warehouse or lakehouse in Fabric, and Power BI will be able to report on this, in realtime, with no processing, with high performance!
What’s Data Activator?
This has not been released yet, but will come soon. It allows you to create KPIs on top of Power BI, including triggers based on thresholds, so you can define data driven alerts and actions. For example define a trigger when a KPI exceeds a maximum acceptable value, which then emails the relevant team, and triggers a Power Automate which adds a task into a workflow.
Still early days yet, but looks interesting.
Should I choose a Warehouse or Lakehouse?
A big question is going to be whether a data warehouse or a data lakehouse is going to be the best option for each project.
Fabric natively supports both, even at the same time, allowing you to create any number of either in the same workspace, with cross-querying between them. It provides consistency across them both, with consistent access and management.
In simple terms, the best option should be determined by your preferred choice of ETL tool. If you prefer Spark notebooks, go for Lakehouse. If you prefer SQL and stored procs, go for a Warehouse.
We also have Datamart, which has not changed from its previous form for now, but it will be migrated at some point soon from Azure SQL DB storage over to OneLake storage. This will become a simplified warehouse with guardrails, for power users or departmental projects.
And then the Mounted DB to round things off. This takes an existing external data warehouse or data lake (Synapse, Snowflake, SQL, etc.), and sets up a real-time clone (think CDC) into OneLake. So you keep the original where it is, but get a read-only clone in Fabric OneLake to use for reporting and analytics, so you get to play with all the toys! Not yet available, coming soon to a Fabric near you.
The following table should help understand the difference and choices.
Fabric Entity |
Warehouse | Datamart | Lakehouse | Mounted DB |
What is it |
Fully ACID compliant data warehouse, offering comparable functionality to SQL Server or Synapse Dedicated SQL Pool | Simplified Warehouse, with guardrails and protection in place. | Unstructured and Semi-structured file store, can be presented as tables to look like a warehouse. |
Real-time synchronized clone of external database (Synapse Gen2, Snowflake, SQL Server, etc.) |
Availability |
Public Preview | Public Preview | Public Preview |
TBC |
Purpose |
Primary data warehouse for enterprise applications, for organizations who prefer SQL based tools and Stored Proc based transformations | Smaller data warehouse projects, including departmental self service datamarts.
Datamarts will be migrated to OneLake storage in due course. |
Primary data warehouse for enterprise applications, for organizations who prefer Spark based tools and transformations.
Also staging and archive storage for Warehouse projects |
Use the reporting capability of Trident, including queries spanning multiple Trident entities. |
Write |
SQL
Dataflows Pipelines |
SQL
Dataflows Pipelines |
Spark
Dataflows Pipelines |
Only the internal Trident change data capture engine |
Read |
SQL
Spark DirectLake |
SQL
Spark DirectLake |
SQL
Spark DirectLake |
SQL Spark DirectLake |
SQL Functionality | Tables, Views, Stored Procs, Functions
Read/Write |
Tables, Views, Stored Procs, Functions
Read/Write |
Tables, Views, Stored Procs, Functions
Read only |
Tables, Views, Stored Procs, Functions Read only |
Storage Layer |
Delta format in OneLake Distributed array of ADLS Gen2 managed storage accounts, providing a single logical storage layer for all data |
What about AI and Machine Learning?
All data in OneLake can be queried via the SQL endpoint or from integrated Spark notebooks. For ML workloads Spark is the obvious choice here, being able to create notebooks directly within the Power BI workspace in a browser, to access all of the warehouses and/or lakehouses in that workspace, or indeed using shortcuts/pointers to other data sources as well.
We have the full power of Python, Scala, and SQL (and indeed Copilot!) within the notebooks to implement any machine learning that we want.
What’s this Copilot all about?
Not really a Fabric thing, but a very impressive AI assistant that is being rolled out to every corner of the Microsoft world; including Office tools, Power BI and Fabric.
Think of our good old friend Clippy, but this time it actually works! It’s based on the OpenAI ChatGPT engine, and is there to help us write code faster and get more done in less time. It looks seriously impressive, but we need to get hands on with it in production to really see how far this goes.
What now?
Well this is all now in public preview, so get playing with it!
We don’t yet know when it will become generally available, but if you’re planning a new data engineering project, data warehouse, lakehouse or ML project, you may want to give Fabric a go.
If you want to discuss the capabilities in any more detail, please let us know. We’d be happy to talk you through it.
Check out the Microsoft page about this, at https://aka.ms/BuildWithAnalytics