All Products
Search
Document Center

MaxCompute:Lakehouse of MaxCompute

Last Updated:Feb 02, 2024

MaxCompute provides the data lakehouse solution that enables you to build a data management platform that combines data lakes and data warehouses. This solution integrates the flexibility and broad ecosystem compatibility of data lakes with the enterprise-class deployment of data warehouses. This topic describes how to use MaxCompute and a heterogeneous data platform to build a data lakehouse solution. The data lakehouse solution is in public preview.

Build a data lakehouse solution

You can build a data lakehouse solution by using MaxCompute and a data lake. MaxCompute serves as a data warehouse in the data lakehouse solution. You can build a data lakehouse solution by using one of the following methods:

  • Build a data lakehouse by using MaxCompute, DLF, and OSS: If you use this method, all schemas of the data lake are stored in Data Lake Formation (DLF). MaxCompute can use the metadata management capability of DLF to efficiently process semi-structured data in OSS. The OSS semi-structured data includes data in the Delta Lake, Apache Hudi, AVRO, CSV, JSON, Parquet, and ORC formats.

  • Build a data lakehouse by using MaxCompute and Hadoop: You can use a Hadoop cluster that is deployed in a data center, on virtual machines (VMs) in the cloud, or in Alibaba Cloud E-MapReduce (EMR). If MaxCompute is connected to the virtual private cloud (VPC) in which the Hadoop cluster is deployed, MaxCompute can directly access Hive metastores and map metadata to external projects of MaxCompute.

Limits

  • The lakehouse solution is supported in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Singapore, and Germany (Frankfurt).

  • MaxCompute must be deployed in the same region as DLF and OSS.

References