What is MaxCompute

Last Updated: May 07, 2018

MaxCompute is a big data processing platform that processes and stores massive batch structural data to provide effective data warehousing solutions and big data modeling. MaxCompute supports a variety of classic distributed computing models that enable you to solve massive data calculation problems while reducing business costs, and maintaining data security.

MaxCompute seamlessly integrates with DataWorks, which provides one-stop data synchronization, task development, data workflow development, data operation and maintenance, and data management for MaxCompute. For more information, see DataWorks.

Benefits of MaxCompute

Large-scale computing and storage

MaxCompute is suitable for storage and computing large volumes of data (up to PB-level).

Multiple computation models

MaxCompute supports data processing methods based on SQL, MapReduce, Graph, MPI iteration algorithm, and other programming models.

Strong data security

MaxCompute supports all offline business analysis of Alibaba Group with robust multi-layer sandbox protection and monitoring.

Low-cost

MaxCompute can help reduce procurement costs by 20%-30% compared with on-premises private cloud models.

Function

MaxCompute Tunnel

  • Supports large volumes of historical data channels

    Tunnel provides high concurrency data upload and download services. You can use Tunnel to import TB/PB level data from various heterogeneous data sources into MaxCompute, or export data from MaxCompute. As the unified channel for MaxCompute data transmission, Tunnel provides stable and high- throughput services. Tunnel provides RESTful APIs and a Java SDK to facilitate programming.

  • Real-time and incremental data channels

    For real-time data upload scenarios, MaxCompute provides DataHub services with low latency and convenient usage. It is especially suitable for incremental data import. DataHub also supports a variety of data transmission plug-ins, such as Logstash, Flume, Fluentd, Sqoop.

Computing and analysis tasks

MaxCompute provides multiple computing models.

  • SQL: In MaxCompute, data is stored in tables. MaxCompute provides an SQL query function for the external interface. You can operate MaxCompute similarly to a traditional database software but with the ability to process PB-level data.

    Notes:

    • MaxCompute SQL does not support transactions, index, or UPDATE/DELETE operations.

    • MaxCompute SQL syntax differs from Oracle and MySQL, notably, you cannot seamlessly migrate SQL statements of other databases into MaxCompute. For more information, see SQL syntax.

    • After you submit MaxCompute jobs, the jobs can be queued and scheduled for execution. MaxCompute SQL can complete queries at the second- to millisecond-level.

  • UDF: A user-defined function. MaxCompute provides numerous built-in functions to meet your computing needs, while also supporting the creation of custom functions.

  • MapReduce: MapReduce is a Java MapReduce programming model provided by MaxCompute and uses the Java programming interface. It simplifies the development process, however, users are recommended to have a basic understanding of the concept of distribution, and relevant programming experience, before using MapReduce.

  • Graph: Graph in MaxCompute is a processing framework designed for iterative graph computing. Graph jobs use graphs to build models. Graphs are composed of vertices and edges. Vertices and edges contain values. After performing iterative graph editing and evolution, you can get the final result. Typical applications include PageRank, SSSP algorithm, and K-Means algorithm.

SDK

A convenient toolkit provided for developers. For more information, see MaxCompute SDK.

Security

MaxCompute provides powerful security services that fully protects user data. For more information about each function model, see MaxCompute Security Manual.

Thank you! We've received your feedback.