That Blue Cloud

Building A Lakehouse: Implementing Medallion Architecture In Fabric

Let's use Medallion Architecture in Microsoft Fabric and build a Lakehouse using Pipelines and Dataflows. We'll also discuss the responsibilities and the structure of the Bronze, Silver and Gold layers of the OneLake.
Building A Lakehouse: Implementing Medallion Architecture In Fabric

Fabric brings you many different technologies and concepts to mix and match so that you can find the most optimised solution for your data requirements. One of those concepts is the Lakehouse, and we'll be looking into how we can build a proper one in Fabric using the Medallion Architecture.

As we've covered "What is Lakehouse?" in a previous article and the Medallion Architecture in the "Designing Fabric Workspaces" post, I won't go into too much detail here. But to recap, Medallion Architecture stands on the promise of splitting your data into multiple layers with different responsibilities. If interested in reading further, you can read Databricks' Medallion Architecture article.

This article will demonstrate implementing the best practice for Medallion Architecture in Fabric, and although we'll walk through the steps, we won't build the actual pipelines here. Instead, I'll show you how everything will be connected to the Lakehouse in four steps:

  • Step 1: Designing the Lake
  • Step 2: Establishing the Tables
  • Step 3: Building the Pipelines
  • Step 4: Putting it all together

We will cover the overnight data-pulling scenario with Pipelines and Dataflows in this article, but future articles will be on Streaming datasets and connecting to other Azure data resources.

Step 1: Designing The Lake

Before going into what we're going to use to process the data, let's define our zones/layers in our OneLake:

  • Landing: A layer for incoming data to arrive, ready to be picked up by our Lakehouse ingestion process. Data is kept in the original file format, with a folder structure reflecting arrival metadata. The data is kept here temporarily and deleted after the Lakehouse ingests it.
  • Bronze/Raw: A layer for incoming data to be kept and archived for access. You keep the data as it comes to only store it in Delta format and in a hierarchy to access them easily (most commonly, date of arrival and data type)
  • Silver/Trusted: Raw data is translated into a more standardised format. You can split a single raw file into multiple files/tables to create a normalised relationship, or you can put together numerous raw files into a single table.
  • Gold/Curated: For business-level aggregations and analytics.

Read the full story

Sign up now to read the full story and get access to all posts for subscribers only.

Subscribe
Already have an account? Sign in
Harun Legoz

Harun Legoz

I’m a cloud solutions architect with a coffee obsession. Have been building apps and data platforms for over 18 years, I also blog on Azure & Microsoft Fabric. Feel free to say hi on Twitter/X!

That Blue Cloud

Design awesome data platforms using Microsoft Fabric

That Blue Cloud

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to That Blue Cloud.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.