Tech topics

What is a data lakehouse?

Illustration of IT items with focus on a question mark

Overview

A data lakehouse enables an enterprise to effectively manage growing data volumes, boost data security, lower data storage costs, and tap into GenAI and business intelligence. Discover how data lakehouses work, key benefits of adopting a data lakehouse architecture, and how you can have access to real-time analytics and machine learning wherever data is stored—in a data lakehouse, data warehouse, or data lake.

The cloud repatriation shift: What the data tells us

Discover why 200+ IT leaders are rethinking cloud-only strategies for their data lakehouses—opting for on-prem, private cloud, or hybrid deployments to reclaim performance.

Download the guide

Data lakehouse

What is a data lakehouse?

A data lakehouse is a data management platform that brings together aspects of a data warehouse and data lake with added performance, security, and flexibility benefits. A data lakehouse is essentially a high performing data warehouse, able to support all types of data (structured, unstructured, and semi-structured) with built-in data processing tools. The result is a single, powerful data management foundation that powers data processing for AI and advanced analytics.

Innovations in data lakehouse architecture have driven adoption, also spurred by a need to more efficiently manage growing volumes of diverse data, bridge the gap between a data lake and a data warehouse, and deliver trusted AI and business intelligence.


How are data lakes, data lakehouses, and data warehouses different?

While data lakehouses, data lakes, and data warehouses are all data repositories, each has distinct differences and relevant use cases. Let’s compare the three data approaches.

A data warehouse provides a way to centralize the storage of structured data, and is able to consolidate data from multiple sources into a single location. As a result, data warehouses break down information silos, giving business users fast data access and the ability to query data to generate reports and insights. Data warehouses support data mining, data analytics, and business intelligence use cases, allowing organizations to understand business performance, uncover trends, and make more informed business decisions.

However, data warehouses aren’t without challenges, with complex extract, transform, and load (ETL) processes increasing management requirements and driving up costs. In addition, off-cloud data warehouses may struggle to scale to support enterprise data growth and new use cases, further impacting TCO.

A data lake stores large volumes of structured and unstructured data, and is able to easily scale to support growing volumes. The ability to support various types of diverse data and formats makes data lakes applicable for big-data use cases—such as machine learning and data science—and provides a more cost-effective option compared to a data warehouse.

The complexity and size of data lakes require proper management to prevent data from becoming unwieldy and difficult to manage. They typically require data scientists or data engineers to effectively utilize the data.

Historically, data warehouses and data lakes were deployed as individual, siloed architectures, which required data to be shared across two systems. A data lakehouse can be used in tandem with a data lake and a data warehouse, providing a flexible and low-cost storage option for all types of data and formats and eliminating the need for multiple copies of data across different systems.

With support for autonomy, consistency, isolation, and durability (ACID) transactions, users can run queries through SQL commands for structured and unstructured data, using high-performance AI and analytics across a variety of use cases. As a result, organizations can boost analytics power to enable more intelligent operations: applying insight to personalize customer experiences, improving decision-making, speeding up product development, optimizing workflows, and ultimately accelerating revenue growth.


Why are organizations moving to a data lakehouse architecture?

The limitations of traditional data architectures—such as high costs and limited scalability—are driving organizations to embrace data lakehouses. A 2025 survey found that 87% of over 200 IT leaders plan to repatriate workloads within two years.

There are several factors contributing to the desire to move to a more modern data architecture approach, including:

  • Increasing volumes of unstructured data: Organizations need a more efficient way to store, manage, and utilize emails, social media posts, product images, videos, call center transcripts, chat messages, etc.
  • Laser-focus on customer service: Advanced analytics and machine learning within a data lakehouse architecture can help identify customer behavior patterns, gain insight from service interactions, and create more targeted, data-driven experiences.
  • Cost savings: Leveraging a data lakehouse can reduce storage and processing costs, as well as improve data management across diverse workloads.
  • Embracing a hybrid data strategy: A data lakehouse architecture gives organizations the flexibility to leverage both cloud and off-cloud data storage based on desired deployment, security, and compliance requirements.

How does a data lakehouse work?

Data lakehouse typically consists of five layers:

  • Ingestion layer
  • Storage layer
  • Metadata layer
  • API layer
  • Consumption layer

Let’s explore the role of each:

The ingestion layer, the first layer, gathers data from various sources, such as transactional databases, NoSQL databases, and APIs. From there, the data is transformed it into an accessible format for the data lakehouse to store and analyze.

The storage layer serves as a unified, scalable repository for all data types—structured, semi-structured, and unstructured. Built on open formats, it ensures ACID transactions, schema evolution, and time travel. By decoupling storage from compute, it enables high-performance analytics, cost-efficient data retention, and seamless support for both batch and streaming workloads.

The third layer is the metadata layer which classifies the metadata associated with the data that has been ingested and stored.

The fourth layer uses APIs to conduct more advanced analytics, enabling analytics tools and third-party applications to query the data within the data lakehouse architecture. This layer supports real-time data processing, allowing teams to tap into real-time analytics even as data is updated and refreshed.

The consumption layer allows applications and tools to access all metadata and data stored in the lakehouse. This provides desired data access to business users, allowing individuals to perform analytics tasks such as dashboard creation, data visualization, SQL queries, and machine learning tasks.


What are the business advantages of a data lakehouse architecture?

Data lakehouses yield many benefits to organizations and users, such as improved data management, cost savings, and enhanced AI and machine learning functions from the same source. Here are some of the primary advantages a data lakehouse can deliver:

  • A single source of truth: Unify data management and integrate data from multiple sources and across formats for data consistency.
  • Desired scalability: With separate storage and compute resources, a diverse set of workloads can be supported and scaled.
  • New opportunities for GenAI: Capabilities and structure of a data lakehouse allow organizations to leverage data resources for GenAI applications and use them for content creation, insights, and personalized, prompt responses.
  • Analytics performance: Improve data query performance to increase speed and accuracy of results.
  • Trusted data governance: Robust data governance framework and controls to enforce data quality and security.
  • Deployment flexibility: Optimize cost and performance with options for off-cloud, hybrid, and multi-cloud deployments.

How can OpenText help you take advantage of data lakehouse benefits?

With real-time analytics and built-in machine learning, OpenText allows organizations to seamlessly analyze data within a data lakehouse—optimizing resource use and reducing total cost of ownership.

OpenText helps enterprises take full advantage of a modern data lakehouse architecture—anchored by OpenText™ Analytics Database (Vertica) for high-performance, scalable analytics across both data warehouses and data lakes.

OpenText’s unified engine supports high-performance SQL, advanced analytics, and open data formats, giving you the speed of a warehouse with the scale and openness of a lake. Whether on-premises, in the cloud, or in hybrid environments, OpenText empowers organizations to unify their data landscape and run analytics wherever the data lives—without compromise.

To extend these capabilities, OpenText’s composable Analytics and AI platform enables organizations to extract deeper insights, govern data more effectively, and deliver value across the enterprise.

Our composable platform can also help enhance insight using knowledge discovery, which brings advanced AI and machine learning to process and analyze unstructured data such as documents, emails, video, and audio—critical content types that traditional lakehouses often overlook. OpenText™ Intelligent Classification enriches this further with natural language processing, uncovering sentiment, topics, and key entities from massive volumes of text. OpenText™ Intelligence empowers business users with interactive dashboards and self-service analytics to accelerate decision-making.

Beyond analytics, OpenText addresses critical enterprise needs around data trust, governance, and security. OpenText™ Data Discovery automatically scans, classifies, and maps data across silos—giving organizations visibility into sensitive and regulated information, and reducing risk before data even enters the analytics environment. Our data privacy and protection capability adds enterprise-grade, data-centric security through format-preserving encryption, tokenization, and policy-based privacy controls—ensuring your data remains protected throughout its lifecycle.

Together, these capabilities turn OpenText’s data lakehouse offering into a holistic, enterprise-ready ecosystem—built for speed, intelligence, security, and trust.

Explore how a fast, scalable analytics platform can support your business and analyze data wherever it is stored.

Learn more about OpenText’s data lakehouse and analytics

Start your free trial of OpenText Analytics Database

Start your free trial

Footnotes