MaxCompute is a big data processing platform that processes and stores massive batch structural data to provide effective data warehousing solutions and big data modeling. MaxCompute supports a variety of classic distributed computing models that enable you to solve massive data calculation problems while reducing business costs, and maintaining data security.
MaxCompute seamlessly integrates with DataWorks, which provides one-stop data synchronization, task development, data workflow development, data operation and maintenance, and data management for MaxCompute. For more information, see DataWorks.
MaxCompute is mainly used to store and compute batches of structured data.It provides a massive range of data warehouse solutions as well as big data analysis and modeling services. As data collection techniques are becoming increasingly diverse and comprehensive,industries are amassing larger and larger volumes of data. The scale of data has increased to the level of massive data (100 GB, TB and even PB) that traditional software industry can not carry.
|MaxCompute has been widely used in the Alibaba group, such as data warehouse and BI analysis of large Internet enterprises, web log analysis, transaction analysis of e-commerce sites, user characteristics and interest mining.|
MaxCompute learning path
You can quickly learn about MaxCompute's related concepts, basic operations and advanced operations through MaxCompute learning path.
- Large-scale computing and storage
MaxCompute is suitable for storage and computing requirements above 100GB, up to EB level.
- Multiple computational models
MaxCompute supports data processing methods based on SQL, MapReduce,Graph, MPI iteration algorithm, and other programming models.
- Strong data security
MaxCompute has stabilized all Alibaba's offline analysis operations for more than seven years, providing multilayer sandbox protection and monitoring.
MaxCompute can help reduce procurement costs by 20%-30% compared with on-premises private cloud models.
- Data tunnel
Supports large volumes of historical data channels
TUNNEL provides high concurrency data upload and download services. This service supports the import and export of terabytes or petabytes of data on a daily basis, which is particularly useful for the batch import of full or historical data. Tunnel Provides you with a Java programming interface, and in the MaxCompute client tool, there are corresponding commands for local file and service data interchange.
Real-time, incremental data channels
For real-time data upload scenarios, MaxCompute provides DataHub services with low latency and convenient usage. It is especially suitable for incremental data import. DataHub also supports a variety of data transmission plug-ins, such as Logstash, Flume, Fluentd, Sqoop, it supports Log. Service's delivery log to MaxCompute, and then use DataWorks to do log analysis and mining.
- Computing and analysis tasks
MaxCompute provides multiple computing models.
SQL: In MaxCompute, data is stored in tables. MaxCompute provides an SQL query function for the external interface. You can operate MaxCompute similarly to a traditional database software but with the ability to process PB-level data.
- MaxCompute SQL does not support transactions, index, or Update/Delete operations.
- MaxCompute SQL syntax differs from Oracle and MySQL, notably, you cannot seamlessly migrate SQL statements of other databases into MaxCompute.
- In terms of usage, MaxCompute SQL can complete queries at the second- to millisecond-level,and can not return results at milliseconds.
- The advantage of MaxCompute SQL is low learning cost. You don't need to understand the concept of complex distributed computing. If you have experience in database operations, you can familiarize yourself with MaxCompute SQL quickly.
- UDF: A user-defined function.
MaxCompute provides numerous built-in functionsto meet your computing needs, while also supporting the creation of custom functions.
- MapReduce: MaxCompute MapReduce is Java MapReduce programming model provided by MaxCompute, which simplifies the development process and is more efficient. If you use MaxCompute MapReduce, you need to have a basic understanding of the concept of distributed computing and have corresponding programming experience. MaxCompute MapReduce provides you with Java programming interface.
- Graph：Graph in MaxCompute is a processing framework designed for iterative graph computing. Graph computing jobs use graphs to build models. Graphs are composed of vertices and edges. Vertices and edges contain values. Graph in MaxCompute is a processing framework designed for iterative graph computing. Graph jobs use graphs to build models. Graphs are composed of vertices and edges. Vertices and edges contain values. After performing iterative graph editing and evolution, you can get the final result. Typical applications include PageRank, SSSP algorithm, and K-Means algorithm.The graph is edited and evolved through an iteration, and the results are finally solved. Typical applications: PageRank, single source shortest distance algorithm, K-means clustering algorithm, and so on.
A convenient toolkit provided for developers. For more information, see MaxCompute SDK.
Maxcompute offers powerful security services to protect your data, for more information, see the security guide.
What to do next
Now, you have learned about MaxCompute's product advantages, functional features and other related profiles, you can continue to learn the next tutorial. In this tutorial, you will understand the related charges of MaxCompute. For more information, see Product Pricing.