All Products
Search
Document Center

MaxCompute:Data Lakehouse 2.0

Last Updated:Nov 08, 2025

MaxCompute offers the Data Lakehouse 2.0 solution. This solution lets you create management objects that define the metadata and data access methods for external data sources. It uses an external schema mapping mechanism to directly access all tables within a database or schema of an external data source. This solution breaks down data silos between data lakes and data warehouses by combining the flexibility and rich multi-engine ecosystem of a data lake with the enterprise-grade capabilities of a data warehouse. This helps you build an integrated data management platform that unifies data lakes and data warehouses. (This feature is in public preview).

Concepts

  • Data warehouse vs. data lake

    Category

    Capability

    Data warehouse

    Emphasizes the management and constraints on structured and semi-structured data that enters the warehouse. It relies on strong management capabilities to achieve better computing performance and more standardized management.

    Data lake

    Emphasizes open data storage and common data formats. It supports multiple engines for on-demand data production or consumption. To ensure flexibility, it provides only weak management capabilities. It is compatible with unstructured data and supports a schema-on-read approach, which is a more flexible way to manage data.

  • MaxCompute data warehouse

    MaxCompute is a cloud native data warehouse based on a serverless architecture. You can perform the following operations:

    • Model your data warehouse using MaxCompute.

    • Use extract, transform, and load (ETL) tools to load and store data in model tables that have defined structures.

    • Process massive amounts of data in the data warehouse using a standard SQL engine and analyze the data using the Hologres online analytical processing (OLAP) engine.

  • MaxCompute scenarios on data lakes and for data federation

    In this scenario, MaxCompute reads upstream data from the data lake, is compatible with various mainstream open-source data formats, performs computations within its engine, and continuously produces data for downstream workflows.

    As a secure, high-performance, and cost-effective data warehouse that aggregates high-value data, MaxCompute also fetches metadata and data from the data lake. This enables in-engine computation on external data and federated computation with internal warehouse data to extract value. This process helps converge data into the strictly managed data warehouse.

    In addition to data lakes, MaxCompute, as a data warehouse, also needs to retrieve data from various external data sources, such as Hadoop and Hologres, for federated computation with its internal data. In federated computation scenarios, MaxCompute also supports reading metadata and data from external systems.

  • MaxCompute Data Lakehouse 2.0

    MaxCompute Data Lakehouse 2.0 is based on the MaxCompute compute engine. It supports access to Alibaba Cloud metadata or storage services over the cloud product interconnect network. It also supports access to external data sources in a Virtual Private Cloud (VPC) through a leased line. This solution lets you create management objects that define the metadata and data access methods for external data sources. It also uses an external schema to map to a database or schema of an external data source. This enables direct access to all tables within that database or schema.

    image

    • Network connectivity

      For more information, see the description of Networklink in Access VPCs (Leased Line Direct Connection). MaxCompute can access data sources in a VPC network, such as EMR and RDS instances (coming soon), through a network connection. DLF, OSS, and Hologres are on the cloud product interconnect network. MaxCompute can directly access data in these services without setting up a Networklink object.

    • Foreign server

      A foreign server contains information about metadata and data access. It also includes identity authentication information, location information, and connection protocol descriptions for accessing the data source. A foreign server is a tenant-level management object defined by a tenant administrator.

      When the project-level tenant resource access control feature is enabled, the tenant administrator attaches the foreign server to the project that requires it. The project administrator then uses a policy to grant users within the project permission to use the foreign server.

    • External schema

      An external schema is a special type of schema in a MaxCompute data warehouse project. As shown in the figure, it can map to a database or schema of a data source. This allows direct access to the tables and data within that database or schema. Tables mapped to a remote database through an external schema are called federated foreign tables.

      Federated foreign tables do not store metadata in MaxCompute. Instead, MaxCompute fetches the metadata in real time from the metadata service in the foreign server object. When you run a query, you do not need to create a foreign table in the data warehouse using a Data Definition Language (DDL) statement. You can directly operate on the table using the project name and external schema name as the namespace and referencing the original table name from the data source. If the table schema or data in the data source changes, the federated foreign table immediately reflects the latest state of the source table. The data source level to which an external schema maps depends on the table hierarchy in the data source and the level defined by the foreign server. The level of the foreign server is determined by the data source access permissions of the authenticated identity.

    • External project

      In Data Lakehouse Solution 1.0, an external project uses a two-layer model. Similar to an external schema, it maps to a database or schema of a data source. It requires a data warehouse project to read and compute external data. However, external projects are a high-level construct. Mapping a data source database or schema can create an excessive number of external projects. These projects cannot be shared with three-layer model data warehouse projects. MaxCompute is gradually phasing out external projects from Data Lakehouse Solution 1.0. Existing users can migrate to external schemas.

      In Data Lakehouse 2.0, you can use external schemas to obtain all the features of external projects from Data Lakehouse Solution 1.0. An external schema directly maps to a Catalog or Database of a three-layer model data source. This lets you directly view the databases under a DLF Catalog or the schemas under a Hologres Database. You can then access the data source tables as federated foreign tables.

    Data source type

    Foreign server level

    External schema mapping level

    Data Lakehouse 2.0 external project mapping level

    Data Lakehouse Solution 1.0 external project (being deprecated) mapping level

    Authentication method

    DLF + OSS

    Region-level DLF and OSS services

    DLF Catalog.Database

    DLF Catalog

    DLF Catalog.Database

    RAM role

    Hive + HDFS

    EMR instance

    Hive database

    Not supported

    Hive database

    No authentication

    Hologres

    Database of a Hologres instance

    Schema

    Database

    Not supported

    RAM role

    Note

    Different data sources support various types of authentication. MaxCompute will gradually support more authentication methods in future releases, such as using the current user's identity to access Hologres or using Kerberos authentication to access Hive.