All Products
Search
Document Center

MaxCompute:COLLECT_SET

Last Updated:Mar 26, 2026

COLLECT_SET is a MaxCompute V2.0 aggregate function that collects column values into an array of distinct elements, automatically removing duplicates.

Syntax

array collect_set(<colname>)

Parameters

ParameterRequiredDescription
colnameYesThe name of the column to be aggregated.

Return value

Returns an ARRAY of distinct values from the specified column. NULL values are excluded from the result.

Examples

The following examples use the emp table. To set up the table, run:

CREATE TABLE IF NOT EXISTS emp (
  empno   BIGINT,
  ename   STRING,
  job     STRING,
  mgr     BIGINT,
  hiredate DATETIME,
  sal     BIGINT,
  comm    BIGINT,
  deptno  BIGINT
);

tunnel upload emp.txt emp; -- Replace emp.txt with the actual path to your data file.

The emp.txt file contains:

7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

Example 1: Collect distinct salaries across all employees

SELECT collect_set(sal) FROM emp;

Result:

+------+
| _c0  |
+------+
| [800, 950, 1100, 1250, 1300, 1500, 1600, 2450, 2850, 2975, 3000, 5000] |
+------+

Example 2: Collect distinct salaries by department using GROUP BY

SELECT deptno, collect_set(sal) FROM emp GROUP BY deptno;

Result:

+------------+------+
| deptno     | _c1  |
+------------+------+
| 10         | [1300, 2450, 5000]           |
| 20         | [800, 1100, 2975, 3000]      |
| 30         | [950, 1250, 1500, 1600, 2850] |
+------------+------+

Usage notes

MaxCompute V2.0 data type edition

If your query involves new data types (TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY), enable the MaxCompute V2.0 data type edition.

Session level — add the following before your SQL statement and submit both together:

set odps.sql.type.system.odps2=true;

Project level — run the following command as the project owner. The change takes effect within 10 to 15 minutes.

setproject odps.sql.type.system.odps2=true;

See Project operations for details on setproject. For precautions about enabling the V2.0 data type edition at the project level, see Data type editions.

Memory overflow with multiple aggregate functions

When an SQL statement contains multiple aggregate functions and project computing resources are insufficient, memory overflow may occur. Optimize the SQL statement or purchase additional computing resources as needed.

Related functions

COLLECT_SET is an aggregate function. For other aggregate functions, see Aggregate functions.