Build a data lakehouse solution-Lakehouse - MaxCompute

MaxCompute provides a data lakehouse solution that enables you to build a data management platform combining the flexibility of data lakes with the enterprise-class deployment of data warehouses. Use it to run SQL analytics on open-format data stored in Object Storage Service (OSS) or Hadoop clusters.

The data lakehouse solution is in public preview.

Capabilities

The MaxCompute data lakehouse solution lets you:

Query data where it lives — run SQL directly on data in OSS without moving it into MaxCompute
Use open formats — process Delta Lake, Apache Hudi, AVRO, CSV, JSON, Parquet, and ORC files natively
Unify metadata management — map external data lake schemas into MaxCompute external projects for consistent governance
Leverage existing infrastructure — connect to Hadoop clusters in data centers, cloud virtual machines (VMs), or E-MapReduce (EMR) without migrating data

Build methods

MaxCompute acts as the data warehouse in the lakehouse solution. Choose a build method based on where your data and metadata are stored.

Build a lakehouse with MaxCompute, DLF, and OSS

When to use: Your data is in OSS and you want a fully managed, cloud-native metadata layer.

All data lake schemas are stored in Data Lake Formation (DLF). MaxCompute uses DLF's metadata management capability to query semi-structured data in OSS directly — supporting Delta Lake, Apache Hudi, AVRO, CSV, JSON, Parquet, and ORC formats.

For setup instructions, see Build a data lakehouse by using MaxCompute, DLF, and OSS.

Build a lakehouse with MaxCompute and Hadoop

When to use: You have an existing Hadoop cluster — on-premises, on cloud VMs, or in Alibaba Cloud E-MapReduce (EMR) — and want to query its data from MaxCompute without migration.

Connect MaxCompute to the virtual private cloud (VPC) where the Hadoop cluster runs. MaxCompute then accesses Hive metastores directly and maps the metadata to MaxCompute external projects.

For setup instructions, see Build a data lakehouse by using MaxCompute and Hadoop.

Limitations

Supported regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Singapore, and Germany (Frankfurt).
Co-location requirement: MaxCompute must be deployed in the same region as DLF and OSS.

Next steps

Grant other users the permissions on an external project — the external project owner is the account used to create it; grant additional users access separately.
Use SQL statements to manage an external project — manage external projects with SQL after setup.