As the amount of data produced increases and the technologies required to process it grow, organisations are looking to advanced data architectures to meet new needs.
In this context, the Medallion architecture emerges, a novel perspective that fits perfectly with the data lakehouse approach and promises to promote data quality.
The amount of data continues to grow every year. According to the latest statistics from Forbes (2023), experts anticipate that the total volume of data worldwide will increase from 64.2 to 181 zettabytes in five years (2020-2025).
The exponential increase in the amount of data generated is putting the focus on disciplines such as data governance and data quality. The more data we have, the more complicated it becomes to manage and exploit.
On the other hand, the transformation of data into business insights no longer depends on the quantity of data, but on its quality. In a context of over-information, it is understandable that data quality policies become more relevant.
Companies are trying to solve this puzzle with flexible data architectures that allow them to adopt new technologies and approaches to data management as needs arise, which is essential to keep up with a changing environment.
On the other hand, flexibility makes it possible to adapt more quickly to market transformations and new customer demands.
When evaluating and optimizing data integration at a corporate level, focusing solely on technology is not enough.
Beyond the platform itself, true effectiveness lies in applying best practices in data management and processing.
Recently, and in line with this, a new approach, the Medallion architecture, is becoming popular, which not only fits in with flexible data architectures, but also promotes guarantees in terms of ensuring optimal quality of the data processed.
Before going on to explain what a Medallion data architecture is and how it works, it is important to introduce other concepts: data lakehouse and data mesh.
Data Mesh is an approach that brings flexibility to data management. It is therefore a flexible data architecture.
The main premise of the data mesh approach is to treat data as products, assigning responsibilities to specific teams for particular data domains. This decentralises ownership and ensures that teams have a better understanding of the data they produce.
Data is delivered through data products and managed through centralised platforms.
This approach promotes collaboration, data quality and ease of access in complex business environments.
A Data Lakehouse is a data architecture that combines the flexibility of a Data Lake (for storing raw, unstructured data) with the analytical capabilities of a Data Warehouse (for structured analytics).
It enables a variety of data to be stored, processed and analysed in one place, facilitating advanced analytics and providing valuable insights for organisations, all with robust security and governance measures.
In short, it is the combination of a data lake and a data warehouse.
In the world of data management, the Medallion architecture, also known as multi-hop architecture, is an approach to data model design that encourages the logical organisation of data within a data lakehouse.
The Medallion architecture structures data in a multi-tier approach —bronze, silver and gold layer— taking into account and encouraging data quality as it moves through the transformation process (from raw data to valuable business insights).
This approach was proposed by Databricks, an authority in the field of data management, which advocates Data as a Product (DaaP) and multi-layered approaches to build a single source of truth in an organisation.
This Medallion architecture ensures data integrity by passing through several stages of validations and transformations that ensure data atomicity, consistency and durability.
Once the data has passed through these validations and transformations, it is stored in an optimal layout for effective analysis, ready to be used for strategic decision making.
Our ebook, “11 Best Practices for Data Integration” brings together essential principles, expert recommendations, and proven methodologies to help organizations streamline and strengthen their data integration processes.
From defining a unified strategy to managing the full data lifecycle, you’ll learn how to turn integration into a lasting competitive advantage.
As explained above, the most distinctive feature of the Medallion architecture is that it structures the data in layers: the bronze layer, the silver layer and the gold layer.
In short, in a Medallion architecture, the quality and structure of data improves as it passes through each layer. The bronze layer contains raw data, the silver layer contains cleansed and enriched data, and the gold layer contains data that is aggregated and ready to be analysed and integrated into business applications.
This modular architecture facilitates large-scale data management and allows for agile adaptation to changing needs.
In the context of a Medallion architecture with a data lakehouse approach, it is common to use the ELT methodology instead of ETL.
This involves performing minimal transformations and applying data cleansing rules during the loading of data into the silver layer, prioritising speed and agility in the ingestion and delivery of data into the data lake.
Complex transformations and specific business rules are applied once the data moves from the Silver layer to the Gold layer.
This allows for greater flexibility to tailor the data to the specific needs of each project and business, making it easier to implement complex business rules and transformations later in the process.
In conclusion, the Medallion architecture presents itself as an innovative solution to meet the needs of organisations in handling large volumes of data.
By combining the benefits of the data lakehouse approach with the multi-tier structure of bronze, silver and gold, it promotes data quality and facilitates its transformation into valuable business insights.
This architecture enables flexible data management, adapting to changing market demands and providing a single source of truth in an organisation.
Before you go...
Don’t forget to download the 11 best practices to improve enterprise-level data integration.