All Products
Search
Document Center

MaxCompute:Build and manage a data lakehouse (based on DLF and OSS)

Last Updated:Mar 26, 2026

This topic walks you through setting up and managing a data lakehouse that uses MaxCompute, Data Lake Formation (DLF), and Object Storage Service (OSS), with DataWorks as the management interface. The data lakehouse architecture lets you run analytical workloads directly on OSS-based data lake storage while using MaxCompute for structured query processing — combining the flexibility of a data lake with the governance of a data warehouse.

Prerequisites

Before you begin, make sure you have:

  • An Alibaba Cloud account with permissions to activate DLF and OSS

  • A MaxCompute project, or access to create one. See Create a MaxCompute project

  • MaxCompute, OSS, and DLF all deployed in the same region

Limitations

The data lakehouse feature is available in the following regions only:

  • China (Hangzhou)

  • China (Shanghai)

  • China (Beijing)

  • China (Zhangjiakou)

  • China (Shenzhen)

  • China (Hong Kong)

  • Singapore

  • Germany (Frankfurt)

Step 1: Activate services

  1. On the DLF activation page, activate the DLF service.

  2. Activate the OSS service.

Step 2: Grant permissions for MaxCompute access

By default, the MaxCompute project account cannot access DLF or OSS. Grant the required permissions using one of the following methods:

Method When to use
One-click authorization The same account is used to create the MaxCompute project and to deploy DLF and OSS
Custom authorization The same account or different accounts are used to create the MaxCompute project and to deploy DLF and OSS

Step 3: Build a data lakehouse in DataWorks

  1. Log on to the DataWorks console and select a region in the upper-left corner.

    For supported regions, see Limitations.

  2. In the left navigation pane, choose Other Items > Lake and Warehouse Integration (Data Lakehouse).

  3. On the Lake and Warehouse Integration (Data Lakehouse) page, click Start.

  4. On the Create Data Warehouse page, fill in the following parameters. The following table lists the DLF internal endpoints by region:

    Create Data Warehouse

    Parameter Description Required
    External Project Name A custom name for the external project. Must start with a letter and can contain only letters, underscores (_), and digits. Maximum 128 characters. For more information, see Project concepts. Required
    MaxCompute Project The MaxCompute project to associate. If the target project is not in the list, see Attach the target project in the DataWorks console. Required

    Create Data Lake Connection

    Parameter Description Required
    Heterogeneous Data Platform Type Select Alibaba Cloud DLF + OSS to build a data lakehouse using MaxCompute, DLF, and OSS. Select Alibaba Cloud E-MapReduce/Hadoop Cluster to use MaxCompute and Hadoop instead. Required
    External Project Description A description of the external project. Optional
    Region where DLF is activated The region where DLF is deployed. Valid values: cn-hangzhou, cn-shanghai, cn-beijing, cn-shenzhen, cn-zhangjiakou, ap-southeast-1, eu-central-1. Required
    DLF Endpoint The internal endpoint of DLF. Select the endpoint that matches your region. See the endpoint table below. Required
    DLF Database Name The DLF database to connect to. Only databases in the default DLF Catalog are supported. To find the database name: log on to the DLF console, select your region, and go to Metadata > Metadata > Table tab. Required
    DLF RoleARN The Alibaba Cloud Resource Name (ARN) of the RAM role. Required if you use the custom authorization method. To find the ARN: log on to the RAM console, go to Identities > Roles, click the target role name, and find the ARN in the Basic Information section. Required for custom authorization only
    Region DLF endpoint
    China (Hangzhou) dlf-share.cn-hangzhou.aliyuncs.com
    China (Shanghai) dlf-share.cn-shanghai.aliyuncs.com
    China (Beijing) dlf-share.cn-beijing.aliyuncs.com
    China (Zhangjiakou) dlf-share.cn-zhangjiakou.aliyuncs.com
    China (Shenzhen) dlf-share.cn-shenzhen.aliyuncs.com
    China (Hong Kong) dlf-share.cn-hongkong.aliyuncs.com
    Singapore dlf-share.ap-southeast-1.aliyuncs.com
    Germany (Frankfurt) dlf-share.eu-central-1.aliyuncs.com

Step 4: Manage the data lakehouse in DataWorks

  1. Log on to the DataWorks console and select a region in the upper-left corner.

  2. In the left navigation pane, choose Other Items > Lake and Warehouse Integration (Data Lakehouse).

  3. On the Lake and Warehouse Integration (Data Lakehouse) page, use the Actions column to manage your external projects:

    • Use the lakehouse: Click Use Data Lakehouse to get started with the external project.

    • Update project configuration: Click Project configuration to update the external project information. You can update the database name of the external data source mapped to the MaxCompute external project and reselect the external data source. Existing external data sources cannot be updated in place — to remove one, go to its page directly.

    • Delete the project: Click Delete to delete the external project.

    Warning

    Deleting an external project moves it to a silent state. The project is permanently deleted after 15 days. During this period, you cannot create another external project with the same name.

View the metadata of a data lakehouse external project

  1. Log on to the DataWorks console and select a region in the upper-left corner.

  2. In the left navigation pane, click Workspace.

  3. On the Workspaces page, find the target workspace. In the Actions column, choose Shortcuts > Data Map. Select the workspace attached to the external project.

  4. On the Data Map page, search for a table name using the search box, or click the directory icon in the left navigation pane to browse the Directory List tab on the right.

What's next

For data lakehouse solutions using DLF and RDS, or Flink and OSS to support Delta Lake or Hudi storage, see Using DLF and RDS or Flink and OSS to support Delta Lake or Hudi storage mechanisms.