collect_list is a MaxCompute V2.0 aggregate function that aggregates the values in a column specified by colname into an array.
Usage notes
collect_list is a MaxCompute V2.0 extension function. If your query uses new data types (TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, BINARY), enable the MaxCompute V2.0 data type edition first.
-
Session level: Add the following statement before your SQL statement and submit both together:
set odps.sql.type.system.odps2=true; -
Project level: The project owner can enable the MaxCompute V2.0 data type edition for the entire project. The change takes effect after 10 to 15 minutes:
setproject odps.sql.type.system.odps2=true;For more information, see Project operations and Data type editions.
If your SQL statement uses multiple aggregate functions and project resources are insufficient, memory overflow may occur. Optimize the SQL statement or purchase additional computing resources.
Syntax
array collect_list(<colname>)
Parameters
| Parameter | Required | Description |
|---|---|---|
colname |
Yes | The column to aggregate. Accepts any data type. |
Return value
Returns a value of the ARRAY type. If a value of the column specified by colname is null, the row that contains this value is not used for calculation.
Sample data
The examples in this topic use a table named emp. Run the following statements to create and populate the table:
create table if not exists emp
(empno bigint,
ename string,
job string,
mgr bigint,
hiredate datetime,
sal bigint,
comm bigint,
deptno bigint);
tunnel upload emp.txt emp; -- Replace emp.txt with the actual path to your data file.
The emp.txt file contains the following data:
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10
Examples
Example 1: Collect all salary values into an array.
select collect_list(sal) from emp;
Result:
+------+
| _c0 |
+------+
| [800,1600,1250,2975,1250,2850,2450,3000,5000,1500,1100,950,3000,1300,5000,2450,1300] |
+------+
Example 2: Group by department and collect salary values for each group.
select deptno, collect_list(sal) from emp group by deptno;
Result:
+------------+------+
| deptno | _c1 |
+------------+------+
| 10 | [2450,5000,1300,5000,2450,1300] |
| 20 | [800,2975,3000,1100,3000] |
| 30 | [1600,1250,1250,2850,1500,950] |
+------------+------+
Example 3: Group by department and collect unique salary values for each group.
select deptno, collect_list(distinct sal) from emp group by deptno;
Result:
+------------+------+
| deptno | _c1 |
+------------+------+
| 10 | [1300,2450,5000] |
| 20 | [800,1100,2975,3000] |
| 30 | [950,1250,1500,1600,2850] |
+------------+------+
Example 4: Concatenate all salary values into a comma-separated string.
The following two queries produce the same result. Use WM_CONCAT for better performance:
-- Use collect_list with array_join
select array_join(collect_list(sal), ',') from emp;
-- Use WM_CONCAT (better performance)
select wm_concat(',', sal) from emp;
Result:
+-----+
| _c0 |
+-----+
| 800,1600,1250,2975,1250,2850,2450,3000,5000,1500,1100,950,3000,1300,5000,2450,1300 |
+-----+
Example 5: Retrieve an arbitrary value from a column.
The following two queries produce the same result. Use ANY_VALUE for better performance:
-- Use collect_list with index access
select collect_list(sal)[0] from emp;
-- Use ANY_VALUE (better performance)
select any_value(sal) from emp;
Result:
+------------+
| _c0 |
+------------+
| 800 |
+------------+
Example 6: Sort and concatenate employee names alphabetically.
The following two queries produce the same result. Use WM_CONCAT with WITHIN GROUP for better performance:
-- Use collect_list with sort_array
select concat_ws(',', sort_array(collect_list(ename))) from emp;
-- Use WM_CONCAT with WITHIN GROUP (better performance)
select wm_concat(',', ename) WITHIN group (order by ename) from emp;
Result:
+-----+
| _c0 |
+-----+
| ADAMS,ALLEN,BLAKE,CLARK,FORD,JACCKA,JAMES,JONES,KING,MARTIN,MILLER,SCOTT,SMITH,TEBAGE,TURNER,WARD,WELAN |
+-----+
Performance considerations
collect_list aggregates values into an ARRAY, which adds overhead when you need to further process those values. For common operations, use the alternatives listed below instead:
| Use case | collect_list | Alternative |
|---|---|---|
| Concatenate values into a string | array_join(collect_list(col), ',') |
WM_CONCAT(',', col) |
| Retrieve a single arbitrary value | collect_list(col)[0] |
ANY_VALUE(col) |
| Sort and concatenate values | concat_ws(',', sort_array(collect_list(col))) |
WM_CONCAT(',', col) WITHIN GROUP (ORDER BY col) |
Related functions
collect_list is an aggregate function. For related functions, see Aggregate functions.