This topic highlights important key features that developers, and project owners or administrators must be aware of before using MaxCompute.

For beginners

If you are a beginner, we recommend that you start from the following topics:

  • MaxCompute Summary: Introduces MaxCompute, including its main function modules. By reading this chapter, you can have a general knowledge of MaxCompute.
  • (Optional) Use an ad-hoc query to run SQL statements: Provides a step-by-step guide including how to apply for an account, install the client, create a table, authorize a user, export/import data, run SQL tasks, run UDF, and run MapReduce programs.
  • Termsand List of common commands: Detail key terms and frequently used commands of MaxCompute. You can be further familiar with how to operate MaxCompute.
  • Tools: Before analyzing the data, you may need to master how to download, configure and use the frequently used tools.

    Client:You can operate MaxCompute through this tool.

  • Endpoints and Data Centers: MaxCompute Region opens and connects to answer network connectivity and download data charges that you encounter in other cloud products (ECS, Table Store, OSS) interchange scenarios.

After you are familiar with those modules that mentioned preceding, you are recommended to perform a further study on other modules.

For data analysts

If you are a data analyst, we recommend that you read the following topic:

  • MaxCompute SQL: Query and analyze massive volumes of data that are stored on MaxCompute. It includes the following functions:
    • Use DDL statements CREATE, DROP, and ALERT to manage tables and partitions.
    • Use a SELECT statement to select records in a table, and use a WHERE clause to view the records meeting the filter condition.
    • Associate two tables through an Equijoin operation.
    • Aggregate columns using a GROUP BY statement.
    • You can insert the result records into another table through Insert overwrite/into syntax.
    • You can use built-in functions and user-defined functions (UDF) to complete a variety of computations.

For developers

If you have a certain level of development experience, understand the concept of distribution, and know that some data analysis functions may not be possible with SQL, then we recommend that you learn more about the following advanced functional modules of MaxCompute:

  • MapReduce: Explains the MapReduce programming interface. You can use the Java API, which is provided by MapReduce, to write MapReduce program for processing data in MaxCompute.
  • Graph: Provides a set of frameworks for iterative graph computing. This function uses graphs to build models. Graphs are composed of vertices and edges. Vertices and edges contain values. This process outputs a result after performing iterative graph editing and evolution.
  • Tunnel: Facilitates users to use the Tunnel service to upload batch offline data to MaxCompute, or download batch offline data from MaxCompute.
  • SDK:
    • Java SDK: Provides developers with Java interfaces.
    • Python SDK: Provides developers with Python interfaces.
    Note MapReduce and Graph are still in open beta, and if you want to use this feature, applications can be submitted through the job system. Please specify the name of your project when you apply, and we will process it within 7 working days.

For project owners or administrators

  • Project management

    Projects are the smallest units of MaxCompute. They are similar to traditional databases or schemas and serve to isolate users and manage users' access permissions. Each user can be granted the permissions for multiple projects. This allows the same user to access such objects as tables, resources, functions, and instances in different projects. Operating MaxCompute means operating various objects in projects.

    • Prepare for creating a project.
      • Estimate the resources you need.
        MaxCompute charges fees for the following three types of resources:
        1. Storage resources: charged by using the Pay-As-You-Go billing method. The prices are divided into multiple levels. You can estimate the fees you need to pay based on the data volume. However, the data is not stored in MaxCompute all at once within a single day. Instead, the data may be read from or written into MaxCompute at any time on every day. Therefore, the estimated resources and fees may differ from your bill.
        2. Computation resources: charged by using the Subscription or Pay-As-You-Go billing method. Computation resources are used for SQL, MapReduce, Spark, and Lightning tasks. Estimating the fees for computation resources is difficult at the very beginning. Therefore, we recommend that you start from the Pay-As-You-Go billing method and then decide whether to switch to the Subscription billing method after a period of testing.
        3. Internet download traffic: charged by using the Pay-As-You-Go billing method. You are billed only when you consume traffic for downloading resources through the Internet.

        For more information about metering and pricing, see Storage pricing (pay-as-you-go)Computing pricingDownload pricing(Pay-As-You-Go).

      • Register with Alibaba Cloud and activate the MaxCompute services.

        Before you create a project, you must register with Alibaba Cloud to obtain an Alibaba Cloud account. Then, determine whether you want to use the Subscription or Pay-As-You-Go billing method based on your resource estimation, and use your Alibaba Cloud account to activate the MaxCompute services. MaxCompute will deduct fees from your Alibaba Cloud account.

    • Create a project.

      For information about how to create a project, see Create a project.

    • Manage the members in a project.

      You need to assign roles and permissions to the project members. If you use MaxCompute through DataWorks, you also need to consider the mapping between MaxCompute permissions and DataWorks permissions.

    • Manage RAM users.

      MaxCompute projects support two types of accounts: Alibaba Cloud accounts and RAM user accounts. You can add any RAM user under your Alibaba Cloud account to a MaxCompute project, but MaxCompute does not consider how the permissions of the RAM user are defined when it verifies the RAM user. For more information, see Prepare a RAM user.

      When you operate MaxCompute through DataWorks, you can only use your Alibaba Cloud account to create RAM users under your Alibaba Cloud account, add the RAM users as members to a DataWorks workspace, and manage the RAM users as needed.

      • Each project member must have a unique RAM user account.
      • Once a project member has left the company or has been transferred to another job position, you must delete the RAM user account of the project member immediately.
        Note If the RAM user is a project member in DataWorks, you must delete the project member from DataWorks and then delete the RAM user from the RAM user management system.
    • Manage scheduling resources.
      • Scheduling resources

        The scheduling resources in DataWorks are categorized as default scheduling resources and custom scheduling resources. They are used to distribute or run tasks.

        1. Default scheduling resources are public resources in DataWorks. When a large number of DataWorks nodes are running concurrently, the DataWorks nodes that cannot occupy scheduling resources enter the waiting state. These DataWorks nodes start to distribute tasks immediately after they occupy scheduling resources.
        2. Custom scheduling resources are used to distribute or run data synchronization or other tasks. You can use your Alibaba Cloud account to configure a physical device or an ECS instance as a scheduling server that can distribute tasks. With custom scheduling resources, MaxCompute can properly distribute and run tasks even when the default scheduling resources are running out. If you want to create custom scheduling resources in a custom resource group, you need to open a ticket. If you want to create custom scheduling resources in an existing custom resource group, you do not need to open a ticket.
    • Set a project.

      If you are the project owner, then you need to set the project such as specifying whether to enable full table scan and whether to enable MaxCompute 2.0 by default. For more information, see Project operations.

  • Security management

    You need to manage users, roles, and permissions. MaxCompute and DataWorks each have a unique permission model. When you operate MaxCompute through DataWorks, you must know the mapping between MaxCompute permissions and DataWorks permissions so that you can manage permissions based on your service requirements. Specifically, you can grant permissions to users, share resources among projects, and enable data protection and set policies for projects. For more information, see Security model.

  • Cost management

    You need manage your costs based on the pricing and billing of MaxCompute.