OpenText home page.
Tech topics

What is a data lakehouse?

Illustration of IT items with focus on a question mark

Overview

A data lakehouse enables enterprises to effectively manage growing data volumes, boost data security, lower data storage costs, and tap into GenAI and business intelligence. Discover how data lakehouses work, key benefits of adopting a data lakehouse architecture, and how you can access to real-time analytics and machine learning wherever data is stored—in a data lakehouse, data warehouse, or data lake.

The cloud repatriation shift: What the data tells us

Discover why 200+ IT leaders are rethinking cloud-only strategies for their data lakehouses—opting for on-prem, private cloud, or hybrid deployments to reclaim performance.

Download the guide

Data lakehouse

What is a data lakehouse?

A data lakehouse is a data management platform that brings together aspects of a data warehouse and data lake with added performance, security, and flexibility benefits. A data lakehouse is essentially a high performing data warehouse, able to support all types of data (structured, unstructured, and semi-structured) with built-in data processing tools. The result is a single, powerful data management foundation that powers data processing for AI and advanced analytics.

Innovations in data lakehouse architecture have driven adoption, also spurred by a need to more efficiently manage growing volumes of diverse data, bridge the gap between a data lake and a data warehouse, and deliver trusted AI and business intelligence.


How are data lakes, data lakehouses, and data warehouses different?

While a data lakehouse, data lake, and data warehouse are all data repositories, each has distinct differences and relevant use cases. Let’s compare the three data approaches.

A data warehouse provides a way to centralize the storage of structured data, able to consolidate data from multiple sources into a single location. As a result, data warehouses break down information silos, giving business users fast data access and the ability to query data to generate reports and insights. Data warehouses support data mining, data analytics, and business intelligence use cases, allowing organizations to understand business performance, uncover trends, and make more informed business decisions.

However, data warehouses aren’t without challenges, with complex ETL (extract, transform, and load) processes increasing management requirements and driving up costs. In addition, off-cloud data warehouses may struggle to scale to support enterprise data growth and new use cases, further impacting TCO.

A data lake stores large volumes of structured and unstructured data, able to easily scale to support growing volumes. The ability to support various types of diverse data and formats makes data lakes applicable for big data use cases, such as machine learning and data science—and provide a more cost-effective option compared to a data warehouse.

But the complexity and size of data lakes require proper management to prevent data from becoming unwieldy and difficult to manage and typically require data scientists or data engineers to effectively utilize data.

Historically, data warehouses and data lakes were deployed as individual, siloed architectures, which required data to be shared across two systems. A data lakehouse can be used in tandem with a data lake and data warehouse, providing a flexible and low-cost storage option for all types of data and formats and eliminating the need for multiple copies of data across different systems.

With support for ACID transactions, users can run queries through SQL commands for structured and unstructured data, using high-performance AI and analytics for a variety of use cases. As a result, organizations can boost analytics power to enable more intelligent operations, applying insight to personalize customer experiences, improve decision-making, speed up product development, optimize workflows, and accelerate revenue growth.


Why are organizations moving to a data lakehouse architecture?

The limitations of traditional data architectures, such as high costs and limited scalability, are driving organizations to embrace data lakehouses. A recent survey found that 87% of over 200 IT leaders plan to repatriate workloads within two years.

There are several factors contributing to the desire to move to a more modern data architecture approach, including:

  • Increasing volumes of unstructured data: Organizations need a more efficient way to storage, manage, and utilize emails, social media posts, product images, videos, call center transcripts, chat messages, etc.
  • Laser-focus on customer service: Advanced analytics and machine learning within a data lakehouse architecture can help identify customer behavior patterns, gain insight from service interactions, and create more targeted, data-driven experiences.
  • Cost savings: Leveraging a data lakehouse can reduce storage and processing costs, as well as improve data management across diverse workloads.
  • Embracing a hybrid data strategy: A data lakehouse architecture gives organizations the flexibility to leverage both cloud and off-cloud data storage based on desired deployment, security, and compliance requirements.

How does a data lakehouse work?

Data lakehouse typically consists of five layers:

  • Ingestion layer
  • Storage layer
  • Metadata layer
  • API layer
  • Consumption layer

Let’s explore the role of each:

The ingestion layer, the first layer, gathers data from various sources, such as transactional databases, NoSQL databases, and APIs. From there, the data is transforms it into an accessible format for the data lakehouse to store and analyze.

The storage layer is where all of the data (unstructured, structured, and semi-structured) is ingested into the lakehouse and stored. The data is stored in open file formats for optimized analytics performance.

The third layer is the metadata layer which classifies the metadata associated with the data that has been ingested and stored.

The fourth layer uses APIs to increase conduct more advanced analytics, enabling analytics tools and third-party applications to query the data within the data lakehouse architecture. This layer supports real-time data processing, allowing teams to tap into real-time analytics even as data is updated and refreshed.

The consumption layer allows applications and tools to access all metadata and data stored in the lakehouse. This provides desired data access to business users, allowing individuals to perform analytics tasks such as dashboard creation, data visualization, SQL queries, and machine learning tasks.


What are the business advantages of a data lakehouse architecture?

Data lakehouses yield many benefits to organizations and users, such as improved data management, cost savings, and enhanced AI and machine learning from the same source. Here are some of the primary advantages a data lakehouse can deliver:

  • A single source of truth: Unify data management and integrate data from multiple sources and across formats for data consistency.
  • Desired scalability: With separate storage and compute resources, a diverse set of workloads can be supported and scaled.
  • New opportunities for GenAI: Capabilities and structure of a data lakehouse allow organizations to leverage data resources for GenAI applications and use for content creation, insights, and personalized, prompt responses.
  • Analytics performance: Improve data query performance to increase speed and accuracy of results.
  • Trusted data governance: Robust data governance framework and controls to enforce data quality and security.
  • Deployment flexibility: Optimize cost and performance with options for off-cloud, hybrid, and multi-cloud deployments.

How can OpenText help you take advantage of data lakehouse benefits?

With real-time analytics and built-in machine learning, OpenText allows organizations to seamlessly analyze data within a data lakehouse—optimizing resource use and reducing total cost of ownership.

OpenText helps enterprises take full advantage of a modern data lakehouse architecture—anchored by OpenText™ Analytics Database (Vertica) for high-performance, scalable analytics across both data warehouses and data lakes.

OpenText’s unified engine supports high-performance SQL, advanced analytics, and open data formats, giving you the speed of a warehouse with the scale and openness of a lake. Whether on-premises, in the cloud, or in hybrid environments, OpenText empowers organizations to unify their data landscape and run analytics wherever the data lives—without compromise.

To extend these capabilities, OpenText’s composable Analytics and AI platform enables organizations to extract deeper insights, govern data more effectively, and deliver value across the enterprise.

To enhance insight, Knowledge Discovery brings advanced AI and machine learning to process and analyze unstructured data such as documents, emails, video, and audio—critical content types that traditional lakehouses often overlook. OpenText™ Intelligent Classification enriches this further with natural language processing, uncovering sentiment, topics, and key entities from massive volumes of text. OpenText™ Intelligence empowers business users with interactive dashboards and self-service analytics to accelerate decision-making.

Beyond analytics, OpenText addresses critical enterprise needs around data trust, governance, and security. OpenText™ Data Discovery automatically scans, classifies, and maps data across silos—giving organizations visibility into sensitive and regulated information, and reducing risk before data even enters the analytics environment. Data Privacy and Protection adds enterprise-grade, data-centric security through format-preserving encryption, tokenization, and policy-based privacy controls—ensuring your data remains protected throughout its lifecycle.

Together, these capabilities turn OpenText’s data lakehouse offering into a holistic, enterprise-ready ecosystem—built for speed, intelligence, security, and trust.

Explore how a fast, scalable analytics platform can support your business and analyze data wherever it is stored.

Learn more about OpenText’s data lakehouse and analytics

Start your free trial of OpenText Analytics Database

Start your free trial

Footnotes