This topic describes the terms and concepts used in MaxCompute. This helps you better understand MaxCompute before you use MaxCompute.
An AccessKey pair is a credential for accessing Alibaba Cloud APIs. An AccessKey pair consists of an AccessKey ID and an AccessKey secret. After you create an Alibaba Cloud account on the International site (alibabacloud.com), an AccessKey pair is generated on the AccessKey Management page. AccessKey pairs are used to identify users and verify the signature of requests for accessing MaxCompute or other Alibaba Cloud services, or connecting to third-party tools. Keep your AccessKey secret confidential to prevent credential leaks. If the AccessKey secret is accidentally leaked, disable or update your AccessKey secret immediately.
Authorization allows a project administrator or project owner to grant permissions on MaxCompute objects to other users. After authorization, these users can perform specific operations on MaxCompute objects. For example, these users can read, write, and view objects, such as tables, tasks, and resources. For more information about authorization, see Permission overview.
A MaxCompute client that runs on Windows or Linux. The MaxCompute client allows you to run commands to perform operations, such as project management operations, DDL operations, and DML operations. For more information about how to use the MaxCompute client, see MaxCompute client.
- data type
The types of data in the columns of a MaxCompute table. For more information about MaxCompute data type editions and the data types supported by each edition, see Data type editions.
DDL operations, such as create a table or view. For more information about DDL syntax, see DDL statements.
DML operations, such as INSERT, UPDATE, and DELETE operations. For more information about DML syntax, see DML statements.
Functions provided by MaxCompute include built-in functions and user-defined functions (UDFs). For more information about functions, see Function.
Instances are used to run jobs. For more information, see Task instance.
- Job Scheduler
Job Scheduler is a module in the kernel of the Apsara distributed operating system. Job Scheduler is used to manage resources and schedule jobs. Job Scheduler also provides a basic programming framework for application development. Job Scheduler serves as the underlying task scheduling module of MaxCompute.
MapReduce is a programming model for data processing. MapReduce is used for parallel operations on large datasets. You can use the Java API provided by MapReduce to write MapReduce programs and process MaxCompute data. The idea of MapReduce is to classify data processing methods as Map and Reduce. The Map method is used for the mapping of data and the Reduce method is used for the combination of data.
Before you perform the Map operation, make sure that the input data is sliced into data blocks of equal size. Each data block is processed as the input to a single Map worker node. This way, multiple Map worker nodes can work at the same time. Each Map worker node processes an input data block and generates the intermediate result to a Reduce worker node. Then, the Reduce worker node combines the outputs of multiple Map worker nodes to obtain the final result. For more information, see MapReduce.
ODPS is the original name of MaxCompute.
A partition is a division of a table based on the partition key, which consists of one or more partition key columns. Partitions are used to divide the data stored in a table. If a table is not partitioned, data in the table is stored in the directory that stores the table. If a table is partitioned, each partition corresponds to a subdirectory in the directory that stores the table. In this case, data is stored in separate subdirectories. For more information about partitions, see Partition.
A project is a basic organizational unit of MaxCompute. Similar to a database or schema in a traditional database system, a project is used to isolate users and control access requests. For more information about projects, see Project.
Quota serves as a computing resource pool of MaxCompute. Quotas provide computing resources that are required for running jobs. For more information about quotas, see Quota.
Role is a concept in the MaxCompute security feature. A role can be considered a set of users who have the same permissions. One user can assume multiple roles, and multiple users can assume the same role. After you grant permissions to a role, all users who are assigned this role are granted the same permissions. For more information about how to manage roles, see Role planning and management.
Resource is a special concept of MaxCompute. You must have the required resources to implement UDFs and MapReduce operations in MaxCompute. For more information about resources, see Resource.
A Software Development Kit (SDK) is a collection of development tools used by software engineers to build application software for specific software packages, software instances, software frameworks, hardware platforms, operating systems, or document packages. MaxCompute supports SDK for Java and SDK for Python.
A sandbox is an isolated environment to restrict program actions based on security policies. A sandbox serves as a security mechanism to isolate Java code execution in a separate environment and restrict malicious code from accessing local system resources. This prevents damage to the local system. MaxCompute MapReduce and UDFs that run in a distributed environment are restricted by Java sandbox.
The MaxCompute multi-tenant data security system provides features, such as user authentication, user and permission management, resource sharing across projects, and project data protection. For more information about the security management operations of MaxCompute, see Permission overview.
In MaxCompute, tables are used to store data. For more information about tables, see Table.
Tunnel is a data channel in MaxCompute. Tunnel provides highly concurrent offline data uploads and downloads. You can use MaxCompute Tunnel to upload data in batches to MaxCompute or download data in batches to your on-premises machine. For more information about related commands, see Tunnel commands or MaxCompute Tunnel SDK.
In a broad sense, UDFs include user-defined scalar functions, user-defined aggregate functions (UDAFs), and user-defined table-valued functions (UDTFs). MaxCompute allows you to develop UDFs in Java or Python. For more information, see MaxCompute UDF.
In a narrow sense, UDFs refer to only user-defined scalar functions. The input and output data of a UDF have a one-to-one mapping relationship, which indicates that one value is returned every time a UDF reads one row of data.
The input and output data of a UDAF have a many-to-one mapping relationship. Multiple input records are aggregated to generate one output value. UDAFs can be used with the GROUP BY clause of SQL statements. For more information about UDAFs, see UDAF.
Only UDTFs can return multiple fields. For more information about UDTFs, see UDTF.
User is a concept in the MaxCompute security feature. You can access MaxCompute by using an Alibaba Cloud account, a RAM user, or a user who is assigned a RAM role. All users, except the project owner, must be added to a MaxCompute project and granted the related permissions to manage data, jobs, resources, and functions in MaxCompute. For more information about how to manage users, see User planning and management.
A view is a virtual table that is created based on existing tables. Its schema and content are derived from these tables. A view corresponds to one or more tables. You can use views if you want to retain query results without the need to create additional tables. For more information about views, see View-related operations.