All Products
Search
Document Center

MaxCompute:Overview of the MaxCompute data lakehouse

Last Updated:Mar 26, 2026

MaxCompute provides a data lakehouse solution that enables you to build a data management platform combining the flexibility of data lakes with the enterprise-class deployment of data warehouses. Use it to run SQL analytics on open-format data stored in Object Storage Service (OSS) or Hadoop clusters.

The data lakehouse solution is in public preview.

Capabilities

The MaxCompute data lakehouse solution lets you:

  • Query data where it lives — run SQL directly on data in OSS without moving it into MaxCompute

  • Use open formats — process Delta Lake, Apache Hudi, AVRO, CSV, JSON, Parquet, and ORC files natively

  • Unify metadata management — map external data lake schemas into MaxCompute external projects for consistent governance

  • Leverage existing infrastructure — connect to Hadoop clusters in data centers, cloud virtual machines (VMs), or E-MapReduce (EMR) without migrating data

Build methods

MaxCompute acts as the data warehouse in the lakehouse solution. Choose a build method based on where your data and metadata are stored.

Build a lakehouse with MaxCompute, DLF, and OSS

When to use: Your data is in OSS and you want a fully managed, cloud-native metadata layer.

All data lake schemas are stored in Data Lake Formation (DLF). MaxCompute uses DLF's metadata management capability to query semi-structured data in OSS directly — supporting Delta Lake, Apache Hudi, AVRO, CSV, JSON, Parquet, and ORC formats.

For setup instructions, see Build a data lakehouse by using MaxCompute, DLF, and OSS.

Build a lakehouse with MaxCompute and Hadoop

When to use: You have an existing Hadoop cluster — on-premises, on cloud VMs, or in Alibaba Cloud E-MapReduce (EMR) — and want to query its data from MaxCompute without migration.

Connect MaxCompute to the virtual private cloud (VPC) where the Hadoop cluster runs. MaxCompute then accesses Hive metastores directly and maps the metadata to MaxCompute external projects.

For setup instructions, see Build a data lakehouse by using MaxCompute and Hadoop.

Limitations

  • Supported regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Singapore, and Germany (Frankfurt).

  • Co-location requirement: MaxCompute must be deployed in the same region as DLF and OSS.

Next steps