What is MaxCompute? - MaxCompute - Alibaba Cloud Documentation Center

MaxCompute is an enterprise-grade Software as a Service (SaaS) Cloud Data Warehouse built for Data Analytics. With its Serverless Architecture, it delivers a fast, fully managed online data warehouse service that eliminates the scalability and elasticity constraints of traditional data platforms. This approach minimizes your operational overhead, letting you analyze and process massive datasets economically and efficiently.

As data collection methods evolve and industry data accumulates, data volumes have grown to terabyte (TB), petabyte (PB), and even exabyte (EB) scales, reaching levels that traditional software cannot handle. MaxCompute provides both offline and real-time data ingestion, and supports large-scale data processing and query acceleration. It offers versatile data warehouse solutions and analytical modeling services for a wide range of computing scenarios. With comprehensive data import solutions and a variety of classic distributed computing models, you can easily analyze big data without the complexity of managing and maintaining distributed systems.

MaxCompute is designed for storage and compute needs ranging from 100 GB to the exabyte (EB) level and has been battle-tested at scale within Alibaba Group. It is ideal for use cases such as data warehousing and BI analytics for large internet companies, website log analysis, e-commerce transaction analysis, and analyzing user behavior and interests.

MaxCompute is deeply integrated with the following Alibaba Cloud products:

DataWorks
An end-to-end platform for data synchronization, workflow design, data development, management, and operations.
Platform for AI (PAI)
A machine learning platform with algorithm components for training models on MaxCompute data.
Hologres
A real-time data warehouse that can accelerate queries on MaxCompute data via external tables or for interactive analysis on data exported from MaxCompute.
Quick BI
A business intelligence tool for creating reports and visually analyzing MaxCompute data.

Core features

Feature	Description
Fully managed Serverless online service	An out-of-the-box online service accessed through APIs. Provides a large-scale pre-provisioned resource cluster that you can use on demand with a pay-as-you-go billing method. Requires no platform maintenance, minimizing your operational workload.
Elasticity and Scalability	Storage and compute scale independently, allowing enterprises to connect and analyze all their data assets on a single platform, eliminating data silos. Supports dynamic resource allocation based on business peaks and troughs.
Unified and rich computing and storage capabilities	MaxCompute supports various computing models and rich UDFs. Uses Columnar Storage, which typically achieves a 5x compression ratio to significantly reduce storage costs.
Data modeling, development, and governance capabilities	You can centralize, integrate, process, and govern all your data with DataWorks, a one-stop Data Development and Data Governance platform. DataWorks supports MaxCompute project management and web-based query editing.
Integrated AI capabilities	Seamlessly integrates with the Platform for AI (PAI) to provide powerful machine learning processing capabilities. Lets you run intelligent analysis using the familiar Spark-ML. Supports third-party Python machine learning libraries.
Deep integration with the Spark engine	Provides a built-in Apache Spark engine with complete Spark functionality. Deeply integrates with MaxCompute's computing resources, data, and permission system.
Lakehouse	Integrates access and analysis for data in a data lake (such as OSS or Hadoop Distributed File System (HDFS)). You can analyze data in the lake by mapping it with an External Table or accessing it directly with Spark. Enables joint analysis of data across a data lake and a data warehouse within a unified data warehouse service and user interface. For more information, see Lakehouse of MaxCompute.
Unified offline and real-time processing	Deeply integrates with Hologres, a real-time data warehouse. It supports querying associated external tables and direct reads from the storage layer, achieving over 5 times higher query efficiency than other external table types. Hologres provides query acceleration for MaxCompute, delivering a 10x or greater performance boost without moving data. Hologres supports batch import of MaxCompute Metadata, eliminating the need to create external tables manually.
Support for stream writing and near real-time analytics	Supports real-time writing of streaming data for analysis within the data warehouse. Deeply integrates with major cloud streaming services, making it easy to ingest streaming data from various sources. Supports high-performance, second-level elastic concurrent queries for near real-time analytics scenarios.
Continuous SaaS-based data protection in the cloud	Provides over 20 security features that meet Level 3 standards for classified information security protection. These features cover infrastructure, data centers, networks, power supply, platform security, permission management, and privacy protection, combining the security capabilities of both open-source big data and managed databases.

Product architecture

The following figure shows the MaxCompute architecture.

The core modules are described below.

Module	Description
Storage engine	MaxCompute provides the MaxCompute Storage Engine (internal storage) to store MaxCompute tables and resources. You can also directly read data stored in other products like OSS, Tablestore, and RDS using external tables. The MaxCompute Storage Engine primarily uses Columnar Storage, which typically achieves a 5x compression ratio.
Compute engine	MaxCompute provides the MaxCompute SQL Compute Engine and the CUPID computing platform. MaxCompute SQL Engine: Directly runs MaxCompute SQL tasks. For command syntax, function requirements, and development examples for MaxCompute SQL tasks, see Overview of MaxCompute SQL. CUPID computing platform: Runs tasks from third-party engines like Spark and Mars. For development requirements and examples for multiple engines, see PyODPS.
Cloud service layer	MaxCompute allows you to create different task queues and configure unique resources and priorities for each, enabling fine-grained control over task execution. To enhance overall system efficiency, its powerful scheduling system manages and optimizes the allocation and use of computing resources. To ensure data security and privacy, MaxCompute also provides multi-layered data protection, including project-level isolation, access control, and data encryption.
Unified metadata and security systems	MaxCompute's offline, tenant-level metadata is provided through Information Schema. You can also use Information Schema to query historical usage data logs, enabling you to analyze metrics like resource consumption, run duration, and data processed. This helps you optimize jobs or plan resource capacity. MaxCompute also offers a comprehensive security management system with features like access control, data encryption, and dynamic data masking to ensure data security. For more security-related information, see Security features.
User interfaces and openness	MaxCompute provides the following user interfaces: Tunnel: A data transmission service for MaxCompute, which includes shared and dedicated clusters. API and SDK: Restful API SDK for Java and PyODPS Java Database Connectivity (JDBC) driver Connectors: Connectors for third-party products, including Flink, Spark, and Kafka. For more information. For details, see Use Flink to write data to a Delta table and Import Kafka data to MaxCompute in offline or real-time mode.
Data ecosystem support	MaxCompute is deeply integrated with Alibaba Cloud DataWorks to provide one-stop data development, analytics, and governance. It also supports various other data development and analysis scenarios: Data lake Data integration Data governance Data development by using a third-party engine Visualized data analytics
TopConsole (MaxCompute console)	Provides basic configuration and management capabilities, including MaxCompute project management, quota management, and tenant management. It also offers fundamental O&M features like job O&M and resource observation, as well as enhanced features like materialized views and cost analytics and optimization. For more information, see Resource management and use.

Product advantages

MaxCompute offers the following key advantages:

Easy to use
- High-performance storage and compute optimized for data warehousing.
- Pre-integrated with services and supports standard SQL for simple development.
- Built-in management and security capabilities.
- Fully managed with a pay-as-you-go model; you incur no compute costs when not in use.
Elasticity that matches business growth
With decoupled storage and compute, resources scale independently and dynamically. This on-demand elasticity meets sudden business growth without upfront capacity planning.
Support for various analytics scenarios
Supports an open data ecosystem, providing a unified platform for data warehousing, BI, Near-real-time Analytics, Data Lake Analysis, and Machine Learning.
Open platform
- Provides open APIs and a rich ecosystem, offering flexibility for data and application migration and Custom Development.
- Flexibly combines with open-source and commercial products like Airflow and Tableau to build a wide range of data applications.

Contact us

If you have any questions or suggestions while using MaxCompute, please fill out the DingTalk group application form to join our DingTalk group for feedback.