Summary

Last Updated: Aug 07, 2017

MaxCompute is a big data processing platform developed by Alibaba independently. It is mainly used for batch structural data storage and processing, which can provide massive data warehouse solution and big data modeling service.

Along with the diversified data collection, more and more industrial data has been accumulated. The data size has grown up to a massive level (TB, even PB), which the traditional software industry cannot take care of. Under the analysis of massive data scenarios, the data analysts usually adopt distributed computing mode due to the limited processing capacity of the single server. But the distributed computing model demands more to the data analysis and is difficult to be maintained. Using the distributed model, data analysts not only need to understand the service requirements, but also need to be familiar with the underlying computing model.

The purpose of MaxCompute is to provide a convenient way to analyze and process big data for the user. The user is able to analyze big data without concerning details of distributed computing.

MaxCompute Ecosystem and Functional Components

MaxCompute provides tunnel for data upload and download, SQL and MapReduce for calculation and analysis service. Besides, it also provides a completed security solution.

MaxCompute Components

  • MaxCompute TUNNEL: provides high concurrency data upload and download services. Tunnel is a kind of service to upload and download data. User can use the Tunnel service to upload or download the data to MaxCompute. MaxCompute Tunnel only provides the Java programming interface for users.

  • Computing and Analysis:

    • MaxCompute SQL: In MaxCompute, data is stored in forms of tables. MaxCompute provides a SQL query function for the external interface. You can operate MaxCompute just like traditional database software, but still be able to process the massive data to TB or PB level. It is worth to mention that MaxCompute SQL does not support transactions, index and Update/Delete operations. MaxCompute SQL syntax differs from Oracle and MySQL, so the user cannot migrate SQL statements of other databases into MaxCompute seamlessly. In addition, MaxCompute SQL can complete the query in minutes even seconds but unable to return to result in millisecond. The advantage of MaxCompute SQL is to reduce users’ learning cost and the user does not need to understand the concept of distribution. MaxCompute SQL could be understood easily by users who are familiar with database operations.

    • MapReduce: MapReduce is the first distributed data processing model put forward by Google. It has drawn a lot of attention and has been applied to all kinds of business scenarios. In this document, we will make a brief introduction of MapReduce model to help users quickly familiar with and understand the model. Users who use MaxCompute MapReduce must have a basic understanding of the concept of distribution and the corresponding programming experience. MaxCompute MapReduce provides Java programming interface for the users.

    • Graph: the graph function provided by MaxCompute is a set of iterative graph computing and processing framework. Graph computing job is modeled by graph. Graph is composed of Vertex and Edge; the vertex and edge contain weights (Value). Through iteration, edit and evolute the graph and then get the result finally. The typical apllications include: PageRank, SSSP, K-Means, etc.

  • SDK:toolkit provided for the developers. For details, please refer to MaxCompute SDK.

  • Security: MaxCompute provides a powerful security services and provides protection for the user’s data. For a full description of each function model, please refer to MaxCompute Security Manual.

Thank you! We've received your feedback.