MaxCompute provides an enterprise-level open lakehouse architecture. This architecture offers unified metadata management, open storage, diverse computing solutions, robust security, and cost-effectiveness.
Target customers
Data warehouses primarily contain structured data, but they also require the openness of a data lake to support cross-team and multi-engine access.
Customers with high security and compliance requirements who need enterprise-level features such as row-level and column-level permissions, data masking, disaster recovery, and backup.
Features
Unified metadata
MaxCompute provides a unified data catalog and data governance solution for data objects, including tables, views, snapshots, and models. This enables fine-grained access control and auditing. Security policies are defined once and applied globally.
You can manage access credentials for Alibaba Cloud services using connections. MaxCompute automatically discovers structured and unstructured data files in external data lakes, such as Object Storage Service (OSS), and registers them as foreign tables. Accessing data through these tables simplifies the data analytics flow. Caching statistics for lake tables improves computing performance.
Open storage
Open storage allows a single copy of data to be used by multiple compute engines. The Storage API makes table data managed by MaxCompute available to third-party compute engines.
You can integrate MaxCompute with computing ecosystems such as Spark, Flink, Flink CDC, StarRocks, DBT, Presto, Trino, PAI, and PyTorch.
The Storage API provides an efficient, low-latency, and secure method for reading data. Data is transferred in Arrow format. The API supports performance optimizations such as predicate pushdown, partitioning, and column pruning. It also supports security features such as row-level and column-level permissions and data masking. This approach balances cross-team collaboration efficiency with compliance.
Open computing
MaxCompute's self-developed SQL and MaxFrame engines provide a unified computing experience across diverse data sources in the lakehouse ecosystem. They enable transparent access to various external storage systems through foreign tables and external projects. You can flexibly use data from both internal and external tables in extract, transform, and load (ETL), data analytics, and machine learning tasks.
