All Products
Search
Document Center

MaxCompute:What is MaxCompute?

Last Updated:Jun 28, 2023

MaxCompute is an enterprise-level cloud data warehouse that uses the software as a service (SaaS) model. MaxCompute is suitable for scenarios that require data analysis. It provides a fast, fully managed online data warehousing service in a serverless architecture. MaxCompute eliminates the constraints of traditional data platforms in terms of resource extensibility and elasticity, minimizes operations and maintenance (O&M) costs, and allows you to efficiently process and analyze large amounts of data at low costs.

As data collection techniques continue to diversify, enterprises in various industries accumulate terabytes, petabytes, or even exabytes of data. The rapid increase in the amount of data exceeds the processing capacity of the traditional software industry. MaxCompute provides offline and streaming data access, supports large-scale data computing and query acceleration, and provides data warehousing solutions and analysis and modeling services for a variety of computing scenarios. MaxCompute also provides comprehensive data import solutions and various typical distributed computing models. It allows you to complete big data analytics without knowledge about distributed computing and maintenance.

MaxCompute is suitable for scenarios in which more than 100 GB of data needs to be stored or computed. MaxCompute can process up to exabytes of data and is widely used in Alibaba Group. MaxCompute is suitable for various big data processing scenarios, such as data warehousing and business intelligence (BI) analysis for large Internet enterprises, website log analysis, e-commerce transaction analysis, and exploration of user characteristics and interests.

MaxCompute is deeply integrated with the following Alibaba Cloud services:

  • DataWorks

    DataWorks provides various features, such as end-to-end data synchronization, workflow design, data development, data management, and O&M for MaxCompute.

  • Machine Learning Platform for AI (PAI)

    The algorithm components of PAI can be used to train models based on data in MaxCompute.

  • Hologres

    You can use external tables in Hologres to accelerate queries on MaxCompute data. You can also export data to Hologres for interactive analytics.

  • Quick BI

    Quick BI allows you to create reports for data in MaxCompute and analyze the data in a visualized manner.

Core features

Feature

Description

Fully managed online data warehousing service in a serverless architecture

  • Supports access over an API. The online service is an out-of-the-box service.

  • Provides a large number of cluster resources. You can purchase resources on demand by using the pay-as-you-go billing method.

  • Is O&M-free. The O&M cost is minimized.

High elasticity and extensibility

  • Separately extends storage and computing capabilities. MaxCompute allows enterprises to analyze all data assets on the same platform. This way, data silos are eliminated.

  • Allocates resources based on the peaks and valleys of your business in real time.

Centralized, rich computing and storage capabilities

  • Supports multiple computing models and various user-defined functions (UDFs).

  • Supports column compression, which reduces the data size to 20% of the original size in most cases. This way, storage costs are significantly reduced.

Deep integration with DataWorks

Integrates with DataWorks, an end-to-end data development and data governance platform. DataWorks enables global data aggregation, fusion processing, and data governance. DataWorks can be used to manage MaxCompute projects and edit web-side query code.

Integrated AI capabilities

  • Seamlessly integrates with PAI, which provides powerful machine learning capabilities.

  • Allows you to use Spark ML for BI analysis.

  • Uses third-party Python libraries for machine learning.

Deep integration with a Spark engine

  • Provides a built-in Apache Spark engine, which supports all Spark features.

  • Deeply integrates the computing resources, data, and permission systems of MaxCompute into the Spark engine.

Lakehouse

  • Integrates with data lakes such as Object Storage Service (OSS) and Hadoop Distributed File System (HDFS). MaxCompute allows you to analyze data in data lakes by using external tables. You can also use Spark to directly access data lakes and analyze data in the data lakes.

  • Supports association analysis between a data lake and a data warehouse based on a set of data warehousing services and user interfaces.

For more information, see Lakehouse of MaxCompute.

Integration of offline and real-time data processing

  • MaxCompute is deeply integrated with Hologres to support data queries by using external tables in Hologres and direct data reading at the storage layer. The query efficiency by using external tables in Hologres is more than five times better than the query efficiency by using external tables of other types.

  • Hologres supports query acceleration for MaxCompute without the need for data migration. The query efficiency is increased by more than ten times.

  • Hologres allows you to batch import MaxCompute metadata. You do not need to manually create external tables.

Streaming data collection and near real-time analysis

  • Allows you to write streaming data in real time and analyze the data in a data warehouse.

  • Deeply integrates with major streaming services in the cloud to read streaming data from various sources.

  • Supports elastic, parallel queries in the scale of seconds to meet the requirements for near real-time analysis.

Continuous SaaS-based data protection in the cloud

Provides enterprises with three levels of more than 20 security features, such as infrastructure, data center, network, power supply, and platform security capabilities, user permission management, and privacy protection. MaxCompute also provides the same security capabilities as open source big data services and managed databases.

Service architecture

The following figure shows the architecture of MaxCompute.

产品架构

Module

Description

Storage

  • MaxCompute tables: Table is the data storage unit of MaxCompute. Tables are the input and output objects of all types of jobs in MaxCompute.

  • Compression strategy: MaxCompute supports column compression, which reduces the data size to 20% of the original size in most cases.

  • AliORC: The data storage format of MaxCompute is upgraded to AliORC for higher storage performance.

Compute engine

MaxCompute supports various compute engines. MaxCompute runs Spark jobs on the Cupid platform developed by Alibaba Cloud. The Cupid platform is fully compatible with the computing framework that is supported by open source YARN.

Data tunnels for computing models

MaxCompute supports various data tunnels, which can meet your requirements in different scenarios:

  • SQL: MaxCompute supports SQL queries. You can use MaxCompute as traditional database software. However, MaxCompute is far more powerful than traditional database software and is capable of processing up to exabytes of data.

    Note
    • MaxCompute SQL does not support transactions or indexes.

    • The SQL syntax of MaxCompute is different from the SQL syntax of Oracle or MySQL. You cannot seamlessly migrate SQL statements from other databases to MaxCompute.

    • You can use MaxCompute to compute more than 100 GB of data. MaxCompute SQL can return query results in minutes or seconds, but not in milliseconds.

    • MaxCompute SQL is easy to use. To use MaxCompute SQL, you do not need to understand complex distributed computing concepts. If you have experience in database operations, you can familiarize yourself with MaxCompute SQL within a short period of time.

  • External Table: You can use external tables to process data that is stored outside MaxCompute. You can execute a simple DDL statement to create an external table in MaxCompute. This external table is associated with an external data source.

  • Java UDFs: If the built-in functions of MaxCompute cannot meet your computing requirements, you can use Java to build UDFs.

  • Python UDF: If the built-in functions of MaxCompute cannot meet your computing requirements, you can use Python to build UDFs.

  • MapReduce: MaxCompute provides a Java MapReduce programming model, which can simplify the development process and improve development efficiency.

  • Hologres: Hologres seamlessly integrates with MaxCompute at the underlying layer. This allows you to use standard PostgreSQL statements to query and analyze large amounts of data in MaxCompute without the need to migrate data. This way, the amount of time that is required to obtain query results is reduced.

  • PAI: a machine learning algorithm platform based on MaxCompute. PAI provides an end-to-end machine learning platform for data processing, model training, service deployment, and prediction without the need for data migration.

  • PyODPS: MaxCompute SDK for Python. It provides easy-to-use Python programming interfaces.

  • Graph: an iterative graph computing and processing framework.

  • Tunnel: a service that supports highly concurrent data uploads and downloads.

  • Mars: a tensor-based unified distributed computing framework. Mars allows you to use parallel and distributed computing technologies to accelerate data processing for Python data science stacks.

  • SQLML: SQLML depends on MaxCompute and PAI. You can develop MaxCompute SQLML jobs on a client, learn MaxCompute data by using PAI, and then use machine learning models to make predictions. Then, use these results to guide your business planning.

  • Flink: Flink provides real-time data processing capabilities for MaxCompute.

  • Spark on MaxCompute: a computing service that is provided by MaxCompute and compatible with open source Spark. This service provides a Spark computing framework based on unified computing resource and dataset permission systems. The service allows you to use your preferred development method to submit and run Spark jobs. Spark on MaxCompute can fulfill diverse data processing and analysis requirements.

User interfaces

MaxCompute provides the following user interfaces:

Unified metadata and security systems

The Information Schema service of MaxCompute provides information such as project metadata and historical data. You can analyze job metrics such as the resource usage, job execution duration, and size of processed data to optimize jobs or plan resource capacity.

MaxCompute also provides comprehensive security management systems, such as access control, data encryption, and dynamic data masking systems, to ensure data security. For more information about security, see Security features.

Benefits

MaxCompute has the following benefits:

  • Ease of use

    • Helps you build a data warehouse that delivers high-performance storage and computing.

    • Pre-integrates multiple services, which simplifies standard SQL development.

    • Provides comprehensive management and security capabilities.

    • Is O&M-free and supports the pay-as-you-go billing method. You are charged only for the resources that you use.

  • High scalability to meet business requirements

    Supports separate extension of storage and computing capabilities. The dynamic scaling feature frees you from planning capacity in advance and can meet the storage and computing requirements of rapid business growth.

  • Various analysis scenarios

    Uses an open, unified platform to meet business requirements in various scenarios, such as data warehousing, BI, near real-time analysis, data lake analysis, and machine learning.

  • Open platform

    • Supports open interfaces and data ecosystems, which ensures flexible data migration, application migration, and custom software development.

    • Supports flexible combination with commercial or open source services, such as Airflow and Tableau, to build various data applications.

Contact us

If you have questions or suggestions about MaxCompute, you can fill in the DingTalk group application form to join the DingTalk group for feedback.