What is MaxCompute? - MaxCompute - Alibaba Cloud Documentation Center

MaxCompute is an enterprise-level cloud data warehouse that uses the software as a service (SaaS) model. MaxCompute is suitable for scenarios that require data analysis. It provides a fast, fully managed online data warehousing service in a serverless architecture. MaxCompute eliminates the constraints of traditional data platforms in terms of resource extensibility and elasticity, minimizes operations and maintenance (O&M) costs, and allows you to efficiently process and analyze large amounts of data at low costs.

As data collection techniques continue to diversify, enterprises in various industries accumulate terabytes, petabytes, or even exabytes of data. The rapid increase in the amount of data exceeds the processing capacity of the traditional software industry. MaxCompute provides offline and real-time data access, supports large-scale data computing and query acceleration, and provides data warehousing solutions and analysis and modeling services for various computing scenarios. MaxCompute also provides comprehensive data import solutions and various typical distributed computing models. It allows you to complete big data analytics without knowledge about distributed computing and maintenance.

MaxCompute is suitable for scenarios in which more than 100 GB of data needs to be stored or computed. MaxCompute can process up to exabytes of data and is widely used in Alibaba Group. MaxCompute is suitable for various big data processing scenarios, such as data warehousing and business intelligence (BI) analysis for large Internet enterprises, website log analysis, e-commerce transaction analysis, and exploration of user characteristics and interests.

MaxCompute is deeply integrated with the following Alibaba Cloud services:

DataWorks
DataWorks provides various features, such as end-to-end data synchronization, workflow design, data development, data management, and O&M for MaxCompute.
Platform for AI (PAI)
The algorithm components of PAI can be used to train models based on data in MaxCompute.
Hologres
You can use external tables in Hologres to accelerate queries on MaxCompute data. You can also export data to Hologres for interactive analytics.
Quick BI
Quick BI allows you to create reports for data in MaxCompute and analyze the data in a visualized manner.

Core features

Feature	Description
Fully managed online data warehousing service in a serverless architecture	Supports access over an API. The online service is an out-of-the-box service. Provides a large number of cluster resources. You can purchase resources on demand by using the pay-as-you-go billing method. Is O&M-free. The O&M cost is minimized.
High elasticity and extensibility	Separately extends storage and computing capabilities. MaxCompute allows enterprises to analyze all data assets on the same platform. This way, data silos are eliminated. Allocates resources based on the peaks and valleys of your business in real time.
Centralized, rich computing and storage capabilities	Supports multiple computing models and various user-defined functions (UDFs). Supports column compression, which reduces the data size to 20% of the original size in most cases. This way, storage costs are significantly reduced.
Data modeling, development, and governance capabilities	Implements global data aggregation, integration, processing, and governance based on the end-to-end data development and governance platform DataWorks. DataWorks can be used to manage MaxCompute projects and edit web-side query code.
Integrated AI capabilities	Seamlessly integrates with Platform for AI to provide powerful machine learning capabilities. Allows you to use Spark ML for BI analysis. Uses third-party Python libraries for machine learning.
Deep integration with a Spark engine	Provides a built-in Apache Spark engine, which supports all Spark features. Deeply integrates the computing resources, data, and permission systems of MaxCompute into the Spark engine.
Lakehouse	Integrates with data lakes such as Object Storage Service (OSS) and Hadoop Distributed File System (HDFS). MaxCompute allows you to analyze data in data lakes by using external tables. You can also use Spark to directly access data lakes and analyze data in the data lakes. Supports association analysis between a data lake and a data warehouse based on a set of data warehousing services and user interfaces. For more information, see Lakehouse of MaxCompute.
Integration of offline and real-time data processing	MaxCompute is deeply integrated with Hologres to support data queries by using external tables in Hologres and direct data reading at the storage layer. The query efficiency by using external tables in Hologres is more than five times better than the query efficiency by using external tables of other types. Hologres supports query acceleration for MaxCompute without the need for data migration. The query efficiency is increased by more than ten times. Hologres allows you to batch import MaxCompute metadata. You do not need to manually create external tables.
Streaming writing and near real-time analytics	Allows you to write streaming data in real time and analyze the data in a data warehouse. Deeply integrates with major streaming services in the cloud to read streaming data from various sources. Supports elastic, parallel queries in the scale of seconds to meet the requirements for near real-time analysis.
Continuous SaaS-based data protection in the cloud	Provides enterprises with three levels of more than 20 security features, such as infrastructure, data center, network, power supply, and platform security capabilities, user permission management, and privacy protection. MaxCompute also provides the same security capabilities as open source big data services and managed databases.

Service architecture

The following figure shows the architecture of MaxCompute.

The following table describes the core modules.

Module	Description
Storage engine	MaxCompute provides the MaxCompute storage engine (internal storage) to store MaxCompute tables and resources. You can also use external tables to read data stored in services such as OSS, Tablestore, and ApsaraDB RDS. The MaxCompute storage engine mainly uses column-oriented storage. In most cases, the size of compressed data is one fifth the size of the original data.
Compute engine	MaxCompute provides the MaxCompute SQL engine and the Cupid computing platform. MaxCompute SQL engine: allows you to directly run MaxCompute SQL tasks. For more information about the syntax requirements and development examples of MaxCompute SQL statements and functions, see Overview of MaxCompute SQL. Cupid computing platform: allows you to run the tasks of third-party engines, such as Spark tasks and Mars tasks. For more information about the development requirements and examples of multi-engine development, see Overview.
Cloud service layer	MaxCompute allows you to create different task queues and configure different resources and priorities for each queue. This helps you manage tasks in a fine-grained manner. MaxCompute also provides a powerful scheduling system to manage and optimize the allocation and use of computing resources. This helps improve the overall efficiency of the system. MaxCompute provides multi-layer protection for data security, including project isolation, permission management, and data encryption to ensure data security and privacy.
Unified metadata and security systems	MaxCompute provides the Information Schema service for you to use offline tenant-level metadata. Information Schema also allows you to query data based on the historical logs of MaxCompute. This way, you can analyze the running information about jobs, such as resource consumption, running duration, and amount of processed data, to optimize jobs or plan resource capacity. MaxCompute also provides comprehensive security management systems, such as access control, data encryption, and dynamic data masking systems, to ensure data security. For more information about security, see Security features.
User interfaces and openness	MaxCompute provides the following user interfaces: Tunnel: the data transmission service of MaxCompute. This service provides shared clusters and exclusive clusters. APIs and SDKs Restful API SDK for Java and PyODPS Java Database Connectivity (JDBC) driver Connector: a connector that is encapsulated for a third-party service, such as Flink, Spark, and Kafka. For more information, see Use Flink (streaming data transfer in the new version) and Import Kafka data to MaxCompute in offline or real-time mode.
Data ecosystem	MaxCompute is deeply integrated with DataWorks. You can use DataWorks to implement end-to-end data development, analytics, and governance. DataWorks supports the following data development and analytics scenarios: Data lake Data integration Data governance Data development by using a third-party engine Visualized data analytics
TopConsole (MaxCompute console)	The MaxCompute console provides basic configuration management capabilities, such as MaxCompute project management, quota management, and tenant management. The MaxCompute console also provides basic O&M capabilities, such as job O&M and resource observation, and enhanced O&M capabilities, such as materialized views and cost analytics and optimization. For more information, see Resource and job management.

Benefits

MaxCompute has the following benefits:

Ease of use
- Helps you build a data warehouse that delivers high-performance storage and computing.
- Pre-integrates multiple services, which simplifies standard SQL development.
- Provides comprehensive management and security capabilities.
- Is O&M-free and supports the pay-as-you-go billing method. Computing fees are generated only for the resources that you use.
High scalability to meet business requirements
Supports separate extension of storage and computing capabilities. The dynamic scaling feature frees you from planning capacity in advance and can meet the storage and computing requirements of rapid business growth.
Various analysis scenarios
Uses an open, unified platform to meet business requirements in various scenarios, such as data warehousing, BI, near real-time analysis, data lake analysis, and machine learning.
Open platform
- Supports open interfaces and data ecosystems, which ensures flexible data migration, application migration, and custom software development.
- Supports flexible combination with commercial or open source services, such as Airflow and Tableau, to build various data applications.

Contact us

If you have questions or suggestions about MaxCompute, you can fill in the DingTalk group application form to join the DingTalk group for feedback.