Find the documentation that matches your role and goals.
MaxCompute beginners
Start with these topics in order to build a working foundation.
| Step | Topic | What you'll do |
|---|---|---|
| 1 | Product Introduction | Learn what MaxCompute does, its key features, core concepts, and limits. |
| 2 | Preparations | Create an account, set up your environment, create a table, and import data. |
| 3 | Getting Started | Run your first SQL jobs and export results. |
| 4 | Common SQL statements | Learn the commands you'll use most often. |
| 5 | Tools | Get familiar with the MaxCompute client and MaxCompute Studio. |
| 6 | Endpoints | Understand network connection modes, region-specific endpoints, and connectivity considerations when using MaxCompute with Elastic Compute Service (ECS), Tablestore, and Object Storage Service (OSS). |
Data analysts
Read the SQL topics to query and analyze large datasets in MaxCompute.
| Feature | What you can do |
|---|---|
| DDL statements | Manage tables, partitions, columns, lifecycles, and views. |
| DML statements | Insert data into or update data in tables or partitions. |
| DQL statements | Run SELECT queries, subqueries, and other query operations. |
| SQL enhancement operations | Import and export data, clone table data, and run other extended SQL operations. |
| Built-in functions | Process data with mathematical, window, date, aggregate, and string functions. |
| UDF | Write user-defined functions (UDFs) to handle custom computing requirements. |
Users with development experience
If you understand distributed architecture and need capabilities beyond SQL, explore these advanced modules.
| Module | What you can do |
|---|---|
| MapReduce | Write MapReduce programs using the Java API to process MaxCompute data. MaxCompute provides the MapReduce programming model in Java. |
| Graph | Run iterative graph computing on graphs made up of vertices and edges, both of which contain values. MaxCompute Graph iteratively edits and evolves graphs to obtain analysis results. |
| Tunnel | Upload or download large volumes of data in bulk. |
| SDK for Java | Build and integrate MaxCompute workflows using Java. |
| SDK for Python | Build and integrate MaxCompute workflows using Python. |
Project owners or administrators
A project is the basic organizational unit in MaxCompute, similar to a database or schema in a traditional database system. Projects isolate users and control access across tables, resources, functions, and instances. A user can have permissions on multiple projects.
Project management
Before creating a project, plan your resource budget. MaxCompute charges for three resource types:
-
Storage: Billed at pay-as-you-go rates with tiered pricing. Costs vary as stored data changes. For details, see Storage pricing (pay-as-you-go).
-
Computing: Available under both pay-as-you-go and subscription billing. Start with pay-as-you-go to understand your usage, then decide whether to switch to subscription. For details, see Computing pricing.
-
Data downloads over the Internet: Billed at pay-as-you-go rates. For details, see Download pricing (pay-as-you-go).
For more information about billing, see Storage fees (pay-as-you-go), Computing fees, and Download fees (pay-as-you-go).
After planning your budget, complete these setup tasks:
| Task | Details |
|---|---|
| Create an account and activate the service | Before you create a MaxCompute project, create an Alibaba Cloud account and activate MaxCompute. Bills are issued to the Alibaba Cloud account. Choose the pay-as-you-go or subscription billing method based on your budget. |
| Create a project | Create and configure your first MaxCompute project. |
| Manage project members | Assign members based on responsibilities and security requirements. If you use MaxCompute in the DataWorks console, understand the permission relationships between the two products. |
| Manage RAM users | Add Resource Access Management (RAM) users to your project. In the DataWorks console, only RAM users under your Alibaba Cloud account can be added as members. Manage RAM users in the RAM console. |
| Manage scheduling resources | Configure DataWorks scheduling resources to execute and distribute tasks. Default scheduling resources use the public resource pool of DataWorks; if parallelism is high and resources are insufficient, nodes wait until resources are allocated before running their tasks. For high-parallelism needs, set up custom scheduling resources using an ECS instance as the scheduling server. |
| Configure your project | Only the project owner can configure project-level settings, such as enabling full table scan or switching to the MaxCompute V2.0 data type edition. |
Scheduling resources on DataWorks execute or distribute tasks from the scheduling system. DataWorks provides two modes for scheduling resources:
-
Default scheduling resources. These resources belong to the public resource pool of DataWorks. When a high number of DataWorks nodes run concurrently, scheduling resources can become limited. This causes nodes to enter a waiting state. A node begins to execute its task only after it acquires a resource.
-
Custom scheduling resources. These are scheduling servers created from ECS instances that you purchase and configure. An Alibaba Cloud account can create custom scheduling resources, which can consist of one or more physical servers or ECS instances. These resources mainly run data synchronization or other tasks.
Do not allow multiple project members to share one RAM user. When a member leaves or changes roles, delete their RAM user promptly. If the RAM user was added through the DataWorks console, remove them from DataWorks first, then delete the RAM user in the RAM console.
Cost management
Costs depend on your billing method and actual usage, so monitor them throughout your project lifecycle.
-
Product billing and pricing. For more information, see Billable items and billing methods.
-
For a pricing overview, see Overview.
-
To change your billing method, see Switch billing methods.