All Products
Search
Document Center

MaxCompute:Build and manage a data lakehouse (based on DLF and OSS)

Last Updated:Dec 06, 2025

This topic describes how to build and manage a data lakehouse using MaxCompute, Data Lake Formation (DLF), and Object Storage Service (OSS). A data lakehouse integrates a data warehouse and a data lake to provide flexible and efficient data processing.

Usage notes

  • The data lakehouse feature is available only in the China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Singapore, and Germany (Frankfurt) regions.

  • MaxCompute, OSS, and DLF must be deployed in the same region.

Procedure

  1. Activate services

  2. Grant permissions for MaxCompute access

    When you build a data lakehouse with MaxCompute, DLF, and OSS, the account used for the MaxCompute project cannot access DLF or OSS without authorization. You must grant the required permissions. Two authorization methods are available:

    • One-click authorization: Use this method if the same account is used to create the MaxCompute project and to deploy DLF and OSS. You can click Authorize DLF and OSS to grant permissions with one click.

    • Custom authorization: Use this method if the same account or different accounts are used to create the MaxCompute project and to deploy DLF and OSS. For more information, see Custom authorization.

  3. Build a data lakehouse in DataWorks

    1. Log on to the DataWorks console and select a region in the upper-left corner.

      For more information about the supported regions, see Usage notes.

    2. In the left navigation pane, choose Other Items > Lake and Warehouse Integration (Data Lakehouse).

    3. On the Lake and Warehouse Integration (Data Lakehouse) page, click Start.

    4. On the Create Data Warehouse page, follow the on-screen instructions.

      The following table describes the parameters.

      • Create Data Warehouse:

        Parameter

        Description

        External Project Name

        A custom name for the external project. The name must follow these conventions:

        • The name must start with a letter and can contain only letters, underscores (_), and digits.

        • The name can be up to 128 characters in length.

        For more information about the basic concepts of external projects, see Project concepts.

        MaxCompute Project

        Select a MaxCompute project.

      • Create Data Lake Connection

        Parameter

        Description

        Heterogeneous Data Platform Type

        • Alibaba Cloud E-MapReduce/Hadoop Cluster: Use MaxCompute and Hadoop to build a data lakehouse.

        • Alibaba Cloud DLF + OSS: Use MaxCompute, DLF, and OSS to build a data lakehouse.

        Alibaba Cloud DLF + OSS

        External Project Description

        Optional. The description of the external project.

        Region Where DLF Is Activated

        The region where the DLF service is activated. Select a region as needed. Valid values:

        • Hangzhou: cn-hangzhou

        • Shanghai: cn-shanghai

        • Beijing: cn-beijing

        • Shenzhen: cn-shenzhen

        • Zhangjiakou: cn-zhangjiakou

        • Singapore: ap-southeast-1

        • Frankfurt: eu-central-1

        DLF Endpoint

        The internal endpoint of the DLF service. Select an endpoint based on your region. Valid values:

        • China (Hangzhou): dlf-share.cn-hangzhou.aliyuncs.com

        • China (Shanghai): dlf-share.cn-shanghai.aliyuncs.com

        • China (Beijing): dlf-share.cn-beijing.aliyuncs.com

        • China (Zhangjiakou): dlf-share.cn-zhangjiakou.aliyuncs.com

        • China (Shenzhen): dlf-share.cn-shenzhen.aliyuncs.com

        • China (Hong Kong): dlf-share.cn-hongkong.aliyuncs.com

        • Singapore (ap-southeast-1): dlf-share.ap-southeast-1.aliyuncs.com

        • Germany (Frankfurt): dlf-share.eu-central-1.aliyuncs.com

        DLF Database Name

        • The name of the destination DLF database to which you want to connect.

        • Method:

          1. Log on to the Data Lake Formation (DLF) console and select a region in the upper-left corner.

          2. In the navigation pane on the left, choose Metadata > Metadata.

          3. On the Metadata page, click the Table tab.

            Obtain the DLF database name.

        • Currently, only databases in the default DLF Catalog are supported.

        1. Log on to the Data Lake Formation (DLF) console and select a region in the upper-left corner.

        2. In the navigation pane on the left, choose Metadata > Metadata.

        3. On the Metadata page, click the Table tab.

        DLF RoleARN

        Optional. The Alibaba Cloud Resource Name (ARN) of the RAM role. This parameter is required if you use a custom authorization method.

        Obtaining method:

        1. Log on to the Resource Access Management (RAM) console.

        2. In the navigation pane on the left, choose Identities > Roles.

        3. On the Roles page, click the target Role Name to open its details page.

        4. In the Basic Information section, you can find the ARN.

  4. Manage the data lakehouse in DataWorks

    1. Log on to the DataWorks console and select a region in the upper-left corner.

    2. In the left navigation pane, choose Other Items > Lake and Warehouse Integration (Data Lakehouse).

    3. On the Other Items > Lake and Warehouse Integration (Data Lakehouse) page, you can perform the following operations:

      1. Select the target external project and click Use Data Lakehouse in the Actions column to get started.

      2. Click Project configuration in the Actions column of the target external project. In the Project configuration dialog box, you can update the external project information.

      3. You can update the database name of the external data source that is mapped to the MaxCompute external project and reselect the external data source. You cannot update an existing external data source. To delete an external data source, go to its page.

      4. Click Delete in the Actions column of the target external project to delete the project. The external project is logically deleted and enters a silent state. The project is permanently deleted after 15 days. You cannot create an external project with the same name during this period.

    4. View the metadata of a data lakehouse external project

      1. Log on to the DataWorks console and select a region in the upper-left corner.

      2. In the left navigation pane, click Workspace.

      3. On the Workspaces page, find the target workspace and in the Actions column, choose Shortcuts > Data Map.

        Select the workspace that is attached to the external project.

      4. In the search box on the Shortcuts Data Map page, or by clicking the image icon in the left navigation pane, you can search for the table name in an external project on the Directory List tab on the right.

      The metadata in the table is updated on the next day (T+1). If you modify the table schema at the mapping source, such as in Hive, the changes are synchronized to DataWorks Data Map on the next day. The metadata in the MaxCompute engine is updated in real time.

References

For more information about data lakehouse solutions that use DLF and RDS or Flink and OSS to support Delta Lake or Hudi storage, see Using DLF and RDS or Flink and OSS to support Delta Lake or Hudi storage mechanisms.