Demystifying Metadata Management: Lakehouse, Data fabric, Data Mesh

Ashish Mishra
5 min readNov 6, 2023

--

Photo by Ali Kazal on Unsplash

This part of the series contains advanced components of metadata management. Each component plays a very crucial role in designing the metadata management architecture.

If you are visiting this series first time then the recommendation is to complete Demystifying Metadata Management part -1 of the series to have basic knowledge of the metadata components.

In this part, we will discuss the following topics:

  • Lake House
  • Data Fabric
  • Data Warehouse vs Data lake vs Data Fabric
  • Data Virtualization
  • Data Mesh
  • Data Mesh vs Data Mart vs Data Fabric
  • Data Context

Lake House:

Don’t worry if you have not heard this word before. It’s the combination of Data Warehouse and Data Lake. Businesses leverage cost-effective storage of data lake analytics-optimized data warehouse. The Organizations will eventually ETL certain portions to the data lake and subsequently some portion of data to the warehouse to make it a two-tier architecture.

Data Lakehouse is a step towards making it a single-tier architecture, that can benefit:

  • Businesses don’t have to worry about fragile engineering ETLs to transfer data among multiple data stores.
  • Reduced Data Redundancy by enabling a single repository for all data.
  • Reduced cost by streamlining ETL and single-tier architecture.
  • Increased agility and up-to-date data enable quick analytics and reports on near real-time data.
  • Able to combine transaction, ACID, and structured data features of warehouse and quick and easy data access into a single solution.

Data Lake could enable quick access to data, but data warehouse provides lower latency and better performance of SQL queries.

Lake House Solutions are:

Image Courtesy: Databricks

Good read on Data Lakehouse: Blog by Splunk

Data Fabric:

Data fabric is a collection of practices and tools(Software or Hardware) used to implement data connectivity across the entire company estate — on-premises systems, all cloud data storage services, hybrid storage systems, and even edge devices. Think of a data fabric solution as the nervous system in our bodies. Its purpose is to distribute various kinds of information to the organs (business systems), requiring it for decision-making.

Data fabric can also be referred to as an architecture that facilitates end-to-end data pipelines, systems, and databases.

Historically, an enterprise may have had different ERP, CRM, supply chain, or customer data platforms, which persist data in different and separate environments. There is a high chance that these systems will have a potential overlap of data. Decision-making requires data across multiple places to understand the in-depth behavior of customer's lifecycle and future predictions and thus there is a requirement to create a system that could have access to the data from across data stores. Data fabric helps to build such a system.

Thus, Data Fabric allows more holistic and data-centric decision-making by applying sophisticated automation and tooling. There is no single tool or platform to fully set up data fabric architecture. A mix of solution, management, and monitoring will be required to create a Data Fabric Solution.

To establish a Data Fabric, there is no set of rules or practices. The sophistication of Data Fabric Architecture is based upon the use case and the organization’s needs. Data Fabric implementation could be costly if not procured with precaution and need.

Image Courtesy: MonteCarloData

Data Virtualization must not be confused with Data Fabric. Data Virtualization is one of the technologies to enable Data Fabric. We will discuss more about Data Virtualization in the later part of this blog.

Data fabric is a complementary “coating” you can use to connect disparate data lake solutions with data warehouse tools to resolve interoperability issues, improve data distribution speed, and achieve better data standardization.

Data fabric solutions:

  • SAP Data Intelligence
  • IBM Data Fabric
  • Google Cloud Dataplex

Good Read on Data Fabric: A blog by Infopulse

Data Warehouse vs Data Lake vs Data Fabric:

Data Virtualization:

Rather than moving data physically using any ETL process. Data Virtualization is a process or architecture to create a metadata layer above physical data to create a virtual data layer. This allows users to leverage the source data in real time.

Data virtualization can also be extended to add an extra security layer to enable row-level or column-level security.

Image Courtesy: tibco.com

Good Read on Data Virtualization: A blog by tibco

Data Mesh:

Data Mesh” - The term coined by Zhamak Dehghani in her 2019 blog post is a “socio-technical paradigm”. Data Mesh is a process or shift in organizational culture to replace centrally managed data lakes or warehouses with independent lakes or warehouses managed by business or domain owners. Each domain owner is responsible for their ETL to get data to their domain-based warehouse. Thus, in Data Mesh, Ownership, responsibility, and accountability of management for ETL, monitoring, and management of centralized warehouse/data lake gets distributed among multiple small and less complex domain-based warehouses.

Data Marts must not be confused with Data Mesh. Data Marts are dependent upon the central warehouse to provide data. On the contrary, Data Mesh architecture is to replace central warehouse.

Read more about the difference between Data Mesh and Data Marts here.

Good Read on Data Mesh: A blog by upsolver

Data Fabric and Data Mesh:

A data fabric and a data mesh both provide an architecture to access data across multiple technologies and platforms, but a data fabric is technology-centric, while a data mesh focuses on organizational change. as per Serra in a June blog post:

Data mesh is more about people and process than architecture, while a data fabric is an architectural approach that tackles the complexity of data and metadata in a smart way that works well together.

Image Courtesy: martinfowler.com

Good read on Data Mesh vs Data Fabric: A blog by analyticsindiamag

[Writer’s Corner]

While writing Demystifying Metadata Management series, I had an opportunity to talk to few brightest minds in the industry and most of the companies are struggling in state of Data Mesh. Data fabric is as tough as much as it is beautiful and most of the companies struglling tobe onee.

--

--

Ashish Mishra
Ashish Mishra

Written by Ashish Mishra

Over 10 years of experience in the data platform field.I love helping organizations make data-driven decisions and enjoy sharing my knowledge through blogging.

No responses yet