MaxCompute (previously known as ODPS) is a general purpose, fully managed, multi-tenancy data processing platform for large-scale data warehousing. MaxCompute supports various data importing solutions and distributed computing models, enabling users to effectively query massive datasets, reduce production costs, and ensure data security.
Large-scale Computing and Storage
Data processing and storage of up to 100GB+
Multiple Computation Model
Supports SQL, MapReduce, Graph, and MPI iteration algorithm
Multilayer sandbox and monitoring with the same technology that supports secure offline analysis for Alibaba Group
Reduces procurement costs by 20-30% compared with self-established private cloud models
MaxCompute is a Big Data processing platform independently developed by Alibaba Cloud. It is used for batch structural data storage and processing, to provide massive data warehouse solutions and Big Data modelling.
Batch and Historical Data Tunnel
Tunnel is the data transmission service provided to users. This service can be scaled horizontally and supports TB/PB-level data importing/exporting. It can be used to upload massive historical data in batches.
Real-time, Incremental Data Tunnel
Another service is called datahub for real-time data flow, which is particularly suited for importing the incremental data with low latency and is easy to use.
2D Table Data Storage
Data is stored as a table which resides on the underlying file system. Compression approach is used to reduce user's cost significantly.
Computing - SQL
This functionality supports standard SQL syntax and efficient computing framework, with better execution efficiency than common MapReduce model. No support with transaction, index and update/delete and other operations.
Computing - MapReduce
This functionality provides Java MapReduce programming model, which simplify software development process. A new extended MapReduce model (called MR2) can support multiple reduce function for one map function.
Computing - Graph
It is time-consuming for MapReduce to run complex iterative computing tasks. Instead, Graph mode is more suitable.
MaxCompute is a multi-tenancy computing platform with user data isolated. MaxCompute provides authorization mechanism to enable user data shared with others.
With the utilization of abstract task processing framework, MaxCompute can support various computing tasks and provide unified programming interface/GUI, sharing common secure control, storage, data management and resource scheduling. MaxCompute also supports data upload/download tunnel, SQL, MapReduce, machine learning algorithms, graphic computing, streaming computing, and other computing models.
Learn more about the programming recommendations with Big Data warehousing by applying typical MaxCompute application scenarios.
Business Intelligence Analysis
MaxCompute can be used with ECS, AnalyticDB/RDS, and other BI report tools to meet users' BI analysis needs. End-users can access the app or website with servers built on Alibaba Cloud ECS, then access logs can be uploaded to Datahub using data import tool Fluentd deployed on ECS. In this way, MaxCompute's DataHub syncs log data with MaxCompute's offline data in real-time. Application developers can submit SQL analysis scripts to MaxCompute Service via SDK or client tool.
Personalized Ad Recommendations
In addition to the BI analysis scenarios, MaxCompute can perform other more complex analysis tasks, such as machine learning and data mining. This helps users provide personal recommendations and handle other advertising scenarios. Specifically, users can utilize machine learning products and recommendation engines provided by Alibaba Cloud Shujia, and use Rule Engine for targeted marketing.
ETL Development (Extract Transform and Load)
After raw data is imported to MaxCompute, developers can extract, transform or load data to the target destination. MaxCompute Command Line Tool can be used for data development, or Graphic User Interface DataIDE (provided by Alibaba Cloud Shujia) can be used for development and Operation & Maintenance.
MaxCompute can be accessed through the Console. To install and configure MaxCompute from the Console please see Document Center.
Non-project owners must be added into the corresponding MaxCompute project and granted corresponding privileges in order to operate MaxCompute. Please refer to Document Center to add/remove use.
After the user has been added into a project and granted privileges, they may operate MaxCompute. As the operation objects of MaxCompute (input and output) are tables, users must create tables and partitions before processing data. See Document Center to create/describe/drop table.
For information regarding SQL syntax with MaxCompute please see Document Center.
Information regarding how to run the example program 'MapReduce WordCount' can be found in Document Center.
Command Line Tool
The client command line tool was created based on the Java SDK, and provides convenient access to MaxCompute.Download
Data Import Tool
To facilitate users' development with MapReduce and UDF Java SDKs, MaxCompute offers the Eclipse plugin. Eclipse can simulate MapReduce and UDF processes, providing local debugging methods for users and a simple template generation function.
Users of Maven can search the Maven library for "odps-sdk" to find different versions of Java SDK documents
How to Install and Configure？
MaxCompute takes Project as the charged unit. You will be charged according to three aspects: storage usage, computing resources, and data download. Please refer to Document Center for more information regarding produce charges.
How to use Java UDF？
MaxCompute UDF includes: UDF, UDAF and UDTF. Please refer to UDF Guide.
How to create data sync job？
Currently data source types supported by the data synchronization jobs include: MaxCompute, RDS (MySQL, SQL Server, PostgreSQL), Oracle, FTP, ADS, OSS, OCS, and DRDS. To create data sync job please refer to Document Center.
How to use MapReduce?
MaxCompute provides MapReduce programming interface: MaxCompute MapReduce, extensional MapReduce, Hadoop MapReduce. For further details please refer to Document Center.