Understand the key terms and fundamental concepts - MaxCompute

A

AccessKey
An AccessKey pair is a credential for accessing Alibaba Cloud APIs. Each pair consists of an AccessKey ID and an AccessKey secret. When you create an Alibaba Cloud account on alibabacloud.com, you can generate an AccessKey pair on the AccessKey Management page. MaxCompute and other Alibaba Cloud services use AccessKey pairs to identify callers and verify request signatures. Third-party tools that connect to MaxCompute also rely on AccessKey pairs for authentication. Keep your AccessKey secret confidential. If it is accidentally exposed, disable or rotate it immediately to prevent unauthorized access.
Authorization
Authorization lets a project administrator or project owner grant permissions on MaxCompute objects — such as tables, tasks, and resources — to other users. After authorization, those users can perform the permitted operations (for example, read, write, or view) on the specified objects. For more information, see User planning and management.

C

Console
The MaxCompute client (odpscmd) is a command-line tool that runs on Windows and Linux. It lets you run commands to manage projects and perform data definition language (DDL) and data manipulation language (DML) operations directly against MaxCompute. For more information, see Connect using the local client (odpscmd).

D

Data Type
Data types define the kind of data that each column in a MaxCompute table can hold. MaxCompute supports multiple data type editions, each covering a different set of types. For the full list of editions and supported types, see Data type version guide.
DDL
Data definition language (DDL) statements define and modify the structure of MaxCompute objects. Typical DDL operations include creating a table or view. For DDL syntax, see DDL statements.
DML
Data manipulation language (DML) statements read and modify the data stored in MaxCompute tables. Common DML operations are INSERT, UPDATE, and DELETE. For DML syntax, see DML operations.

F

Function
MaxCompute provides two categories of functions: built-in functions, which are ready to use without any setup, and user-defined functions (UDFs), which you write to handle logic that built-in functions do not cover. For an overview of both categories, see Functions.
Job Scheduler
Job Scheduler is a module in the kernel of the Apsara distributed operating system. It manages resources, schedules jobs, and provides a foundational programming framework for application development. Job Scheduler serves as the underlying task scheduling module of MaxCompute.

I

Instance
An instance represents a single running job in MaxCompute — conceptually equivalent to a job in Hadoop. For more information, see Task instance.

M

MapReduce
MapReduce is a programming model for processing large datasets in parallel across a distributed cluster. The model breaks computation into two phases:
- Map: transforms input records independently. Before this phase begins, MaxCompute slices the input data into equal-sized blocks, and each block is assigned to one Map worker node so that multiple workers can process data simultaneously.
- Reduce: aggregates the intermediate outputs from all Map workers into a final result.
You can write custom MapReduce programs using the Java API provided by MaxCompute's MapReduce framework. For more information, see MapReduce.

N

Network connection
Before you can use external tables, UDFs, or the lakehouse solution, you must establish network connectivity between MaxCompute and the target external services. MaxCompute can connect to services running inside a virtual private cloud (VPC) — such as HBase, RDS, or Hadoop — as well as services accessible over the Internet. For setup instructions, see Network setup process.

P

Partition
A partition divides the data in a table into separate subdirectories based on one or more partition key columns.
- Unpartitioned table: all data is stored in a single directory for the table.
- Partitioned table: each distinct combination of partition key values maps to its own subdirectory. Data for that partition is stored exclusively in that subdirectory.
For more information, see Partition.
Project
A project is the fundamental organizational unit of MaxCompute — similar to a database or schema in a traditional relational database system. A project is used to isolate users and control access requests. For more information, see Project.

Q

Quota
A quota is a compute resource pool in MaxCompute. Quotas provide the computing resources that are required for running jobs. For more information, see Quota.

R

Resource
Resource is a special concept of MaxCompute. You must have the required resources to implement UDFs and MapReduce operations in MaxCompute. For more information about resources, see Resource.
Role
A role is a named collection of permissions in the MaxCompute security model. Rather than granting permissions to each user individually, you grant permissions to a role and then assign users to that role. One user can hold multiple roles, and the same role can be assigned to many users. After you grant permissions to a role, all users who are assigned this role are granted the same permissions. For more information, see Role planning.

S

Sandbox
A sandbox is an isolated execution environment that restricts what a program can do based on a defined security policy. MaxCompute runs MapReduce programs and UDFs in a distributed environment, so all such code executes inside a Java sandbox. The sandbox prevents malicious or buggy code from accessing local system resources or causing damage to the underlying infrastructure. For more information, see Java sandbox.
SDK
A Software Development Kit (SDK) is a set of libraries and tools that let software engineers build applications against a specific platform or service. MaxCompute provides SDKs for two languages:
- Java SDK
- Python SDK
Security
MaxCompute includes a multi-tenant data security system that covers user authentication, user and permission management, cross-project resource sharing, and project data protection. For an overview of the security model, see Permission overview.

T

Table
In MaxCompute, tables are used to store data. For more information about tables, see Table.
Tunnel
MaxCompute Tunnel is the high-throughput data channel for bulk data movement into and out of MaxCompute. It supports highly concurrent offline uploads and downloads. You can use MaxCompute Tunnel to upload data in batches to MaxCompute or download data in batches to your on-premises machine. For usage details, see Tunnel commands or the Batch data tunnel SDK.

U

UDF
User-defined functions (UDFs) extend MaxCompute's built-in SQL functions with custom logic that you write in Java or Python. MaxCompute recognizes three types of UDFs, distinguished by their input-to-output mapping:
- UDF (scalar function): one-to-one mapping — one output value is returned for each input row.
- UDAF (user-defined aggregate function): many-to-one mapping — multiple input rows are aggregated into a single output value. UDAFs are used with the GROUP BY clause in SQL.
- UDTF (user-defined table-valued function): one-to-many mapping — a single call can return multiple rows or fields.
For an overview of UDF development, see MaxCompute UDF.
UDAF
A user-defined aggregate function (UDAF) aggregates multiple input records into a single output value, giving it a many-to-one input-to-output relationship. UDAFs are used with the GROUP BY clause of SQL statements and are part of the broader UDF family in MaxCompute. For more information, see UDAF.
UDTF
A user-defined table-valued function (UDTF) is the only UDF type in MaxCompute that can return multiple rows or fields from a single function call. UDTFs are suited for scenarios where one input record should produce a variable-length set of output records. For more information, see UDTF.
User
A user is an identity that can be granted access to a MaxCompute project. MaxCompute supports three identity types: an Alibaba Cloud account, a RAM user, and a user assigned a RAM role. The project owner has full access by default. All other users must be explicitly added to a project and granted the appropriate permissions before they can manage data, jobs, resources, or functions within it. For more information, see User planning and management.

V

View
A view is a virtual table — similar to a saved query in SQL — whose schema and content are derived from one or more underlying base tables. A view corresponds to one or more tables. You can use views if you want to retain query results without the need to create additional tables. For more information, see View operations.