This topic introduces the basic concepts and terms used in MaxCompute.
AK for short, is the key used to access Alibaba Cloud APIs. An AccessKey pair consists of an AccessKey ID and AccessKey secret. After you register an Alibaba Cloud account on the Alibaba Cloud official website, an AccessKey pair is generated on the Security Management page. You can use the AccessKey pair to identify users and verify the signature of requests for accessing MaxCompute or other Alibaba Cloud products. The AccessKey secret must be kept confidential.
A project administrator or project owner grants you permissions to perform read, write, and view operations on objects (such as tables, tasks, and resources) in MaxCompute. For more information about authorization operations, see Manage users.
MaxCompute console, is a client tool running the Windows or Linux operating system. It allows you to submit commands to complete operations, such as project management, DDL operations, and DML operations. For more information about tool installation and common parameters, see Client.
- data type
The types of data in the columns of a MaxCompute table. For more information about the data types supported by MaxCompute, see Data types.
Data Definition Language, is used to perform operations such as table or view creations. For more information about the MaxCompute DDL syntax, see DDL syntax.
Data Manipulation Language, is used to perform operations such as INSERT. For more information about the MaxCompute DML syntax, see INSERT operations.
- Job Scheduler
A module used for resource management and task scheduling in the kernel of Apsara distributed operating system. It also provides a basic programming framework for application development. The underlying task scheduling module of MaxCompute is the scheduling module of Job Scheduler.
An actually running job, which is similar to the concept of job in Hadoop. For more information, see Instance.
A programming model for processing data. MapReduce is used for parallel operations on large data sets. You can use the Java API provided by MapReduce to write MapReduce programs and process MaxCompute data. The idea is to classify data processing methods as Map (mapping) and Reduce (specification).
The input data must have been sharded before the Map operation. Sharding means to split the input data into data blocks of the same size. Each data block is processed as the input for a single Map worker to facilitate parallel working of multiple Map workers. Each Map worker performs calculations after reading data and generates intermediate data. Reducers aggregate these data to obtain the final result. For more information, see MapReduce.
The original name of MaxCompute.
The division of data storage in a table into independent parts based on partition fields (a single field or a combination of multiple fields). If a table is not partitioned, data is directly stored in the directory of the table. If a table is partitioned, each partition corresponds to a directory under the table. In this manner, data is stored in separate directories. For more information about partitions, see Partition.
A basic organizational unit of MaxCompute. Similar to a database or schema in a traditional database system, a project is used to isolate multiple users and control access requests. For more information, see Project.
A concept in MaxCompute security functions. A role can be considered as a set of users with the same permissions. One user can have multiple roles, and multiple users can belong to the same role. After you authorize a role, all users assigned this role are granted the same permissions. For more information about role management, see Manage roles.
A concept in MaxCompute. You must have the required resources for implementing User Defined Functions (UDFs) and MapReduce operations in MaxCompute. For more information, see Resource.
Software Development Kit. It is a collection of development tools used by software engineers to build application software for specific software packages, software instances, software frameworks, hardware platforms, operating systems, or document packages. Currently, MaxCompute supports Java SDK and Python SDK.
The MaxCompute multi-tenant data security system mainly includes user authentication, user and authorization management of projects, resource sharing across projects, and project data protection. For more information about MaxCompute security operations, see Security Guide.
A sandbox is an isolated environment to restrict program actions based on security policies. A sandbox isolates Java code execution in a separate environment and restricts malicious code from accessing local system resources. This prevents damage to the local system. MaxCompute MapReduce and UDF programs running in a distributed environment are restricted by Java sandbox.
A data storage unit in MaxCompute. For more information, see Table.
A data channel in MaxCompute that provides highly-concurrent offline data upload and download services. You can use Tunnel to upload data in batches to MaxCompute or download data in batches to your local device. For more information about related commands, see Tunnel commands or Tunnel overview.
In a broad sense, User Defined Functions (UDFs) refer to all user-defined functions: user-defined scalar function, User Defined Aggregation Function (UDAF), and User Defined Table Valued Function (UDTF). For more information about UDFs of Java programming APIs provided by MaxCompute, see Overview.
In a narrow sense, UDFs refer to only user-defined scalar functions. The relationship between the input and output is one-to-one mapping, which indicates that one value is returned each time a UDF reads one row of data.
User Defined Aggregation Function. The relationship between the input and output is multiple-to-one mapping, which indicates that multiple input records are aggregated for one output value. It can be used with the GROUP BY clause of SQL. For more information, see UDAF.
User Defined Table Valued Function. It is used in scenarios where multiple rows of data are returned after each function invocation. It is the only user-defined function that returns multiple fields. For more information, see UDTF.