Aggregate functions group multiple input records to form a single output record. You can use an aggregate function with a GROUP BY
clause of MaxCompute SQL. This topic describes the syntax, parameters, and examples of aggregate functions that are supported by MaxCompute SQL. It guides you through data development by using aggregate functions.
The following table describes the aggregate functions that are supported by MaxCompute SQL.
Function | Description |
---|---|
AVG | Returns the average value of a column. |
COUNT | Returns the number of records that match the specified criteria. |
COUNT_IF | Returns the number of records whose expr value is True. |
MAX | Returns the maximum value of a column. |
MIN | Returns the minimum value of a column. |
MEDIAN | Returns the median value of a column. |
STDDEV | Returns the population standard deviation of all the input values. |
STDDEV_SAMP | Returns the sample standard deviation of all the input values. |
SUM | Returns the sum of a column. |
WM_CONCAT | Concatenates strings with a specified delimiter. |
ANY_VALUE | Returns a random value from a specified column. |
APPROX_DISTINCT | Returns an approximate number of distinct input values in a specified column. |
ARG_MAX | Returns the column value of the row that corresponds to the maximum value of a specified column. |
ARG_MIN | Returns the column value of the row that corresponds to the minimum value of a specified column. |
MAX_BY | Returns the column value of the row that corresponds to the maximum value of a specified column. |
MIN_BY | Returns the column value of the row that corresponds to the minimum value of a specified column. |
COLLECT_LIST | Aggregates the specified columns into an array. |
COLLECT_SET | Aggregates distinct values from a specified column into an array. |
NUMERIC_HISTOGRAM | Returns an approximate histogram based on a specified column. |
PERCENTILE_APPROX | Returns approximate percentiles. This function applies to scenarios in which a large amount of data is calculated. |
BITWISE_OR_AGG | Aggregates input values based on the bitwise OR operation. |
BITWISE_AND_AGG | Aggregates input values based on the bitwise AND operation. |
MAP_AGG | Returns a map that is created by using a and b. a is the key in the map. b is the value of the key in the map. |
MULTIMAP_AGG | Returns a map that is created by using a and b. a is the key in the map. b is used to create an array, which is used as the value of the key in the map. |
MAP_UNION | Returns a new map that is the union of all input maps. |
MAP_UNION_SUM | Returns a new map that is the union of all input maps. The output map sums the values of the matching keys in all input maps. |
HISTOGRAM | Returns a map that contains the number of times each input value appears. |
Usage notes
MaxCompute V2.0 provides extension functions. If the functions that you use involve new data types, you must run one of the following SET commands to enable the MaxCompute V2.0 data type edition. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.- Session level: To use the MaxCompute V2.0 data type edition, you must add
set odps.sql.type.system.odps2=true;
before the SQL statement that you want to execute, and commit and execute them together. - Project level: The project owner can enable the MaxCompute V2.0 data type edition for the project based on the project requirements. The configuration takes effect after 10 to 15 minutes. Run the following command:
For more information aboutsetproject odps.sql.type.system.odps2=true;
setproject
, see Project operations. For more information about the precautions that you must take when you enable the MaxCompute V2.0 data type edition at the project level, see Data type editions. - A Worker can contain a maximum of 2 million elements.
If you use an SQL statement that includes multiple aggregate functions and the project resources are insufficient, memory overflow may occur. We recommend that you optimize the SQL statement or purchase computing resources based on your business requirements.
Syntax
Syntax of an aggregate function:
<aggregate_name>(<expression>[,...]) [within group (order by <col1>[,<col2>…])] [filter (where <where_condition>)]
<aggregate_name>(<expression>[,...])
: a built-in aggregate function or a user-defined aggregate function (UDAF). The format of an aggregate function is based on its syntax.within group (order by <col1>[,<col2>…])
: If the syntax of an aggregate function includes this expression, the system automatically sorts the input data of<col1>[,<col2>…]
in ascending order. To sort the input data in descending order, use the expressionwithin group (order by <col1>[,<col2>…] [desc])
.Before you use this expression, take note of the following limits:- You can use this expression only for WM_CONCAT, COLLECT_LIST, COLLECT_SET, and UDAFs.
- If multiple aggregate functions of a SELECT statement include the expression
within group (order by <col1>[,<col2>…])
,order by <col1>[,<col2>…]
must be the same for these functions. - If the parameters of an aggregate function include the DISTINCT keyword, columns with distinct values must be specified in the expression
order by <col1>[,<col2>…]
.
Examples:-- Example 1: Sort the input data in ascending order and return the output data. select x, wm_concat(',', y) within group (order by y) from values('k', 1),('k', 3),('k', 2) as t(x, y) group by x; -- The following result is returned: +------------+------------+ | x | _c1 | +------------+------------+ | k | 1,2,3 | +------------+------------+ -- Example 2: Sort the input data in descending order and return the output data. select x, wm_concat(',', y) within group (order by y desc) from values('k', 1),('k', 3),('k', 2) as t(x, y) group by x; -- The following result is returned: +------------+------------+ | x | _c1 | +------------+------------+ | k | 3,2,1 | +------------+------------+
[filter (where <where_condition>)]
: If an aggregate function includes this expression, the aggregate function processes only the data that meets the condition specified by<where_condition>
. For more information about<where_condition>
, see WHERE clause (where_condition).Before you use this expression, take note of the following limits:Examples:-- Example 1: Filter and aggregate data. select sum(x), sum(x) filter (where y > 1), sum(x) filter (where y > 2) from values(null, 1),(1, 2),(2, 3),(3, null) as t(x, y); -- The following result is returned: +------------+------------+------------+ | _c0 | _c1 | _c2 | +------------+------------+------------+ | 6 | 3 | 2 | +------------+------------+------------+ -- Example 2: Use multiple aggregate functions to filter and aggregate data. select count_if(x > 2), sum(x) filter (where y > 1), sum(x) filter (where y > 2) from values(null, 1),(1, 2),(2, 3),(3, null) as t(x, y); -- The following result is returned: +------------+------------+------------+ | _c0 | _c1 | _c2 | +------------+------------+------------+ | 1 | 3 | 2 | +------------+------------+------------+
Sample data
create table if not exists emp
(empno bigint,
ename string,
job string,
mgr bigint,
hiredate datetime,
sal bigint,
comm bigint,
deptno bigint);
tunnel upload emp.txt emp;
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10
Filter expressions
- Limits
- Only built-in aggregate functions of MaxCompute support filter expressions. UDAFs do not support filter expressions.
COUNT(*)
cannot be used with filter expressions. Use the COUNT_IF function instead if you want to specify a filter expression.
- Syntax
<aggregate_name>(<expression>[,...]) [filter (where <where_condition>)]
- Description
All aggregate functions support filter expressions. If you specify a filter condition, only the row data that meets the filter condition can be passed to the related aggregate function for data processing.
- Parameters
- aggregate_name: required. The name of the aggregate function. Select an aggregate function that is described in this topic based on your business requirements.
- expression: required. The parameters of the aggregate function that you select. Specify this parameter based on the description of the aggregate function that you select.
- where_condition: optional. The filter condition. For more information about where_condition, see WHERE clause (where_condition).
- Return value
For more information, see the description of the return value for each aggregate function.
- Example
The following result is returned:select sum(sal) filter (where deptno=10), sum(sal) filter (where deptno=20), sum(sal) filter (where deptno=30) from emp;
+------------+------------+------------+ | _c0 | _c1 | _c2 | +------------+------------+------------+ | 17500 | 10875 | 9400 | +------------+------------+------------+
AVG
- Syntax
DECIMAL|DOUBLE avg(<colname>)
- Description
Returns the average value of a column.
- Parameters
colname: required. Column values support all data types and can be converted into the DOUBLE type before calculation.
- Return value
If the value of colname is null, the row that contains this value is not used for calculation. The following table describes the mappings between data types of input data and return values.
Input type Return value type TINYINT DOUBLE SMALLINT DOUBLE INT DOUBLE BIGINT DOUBLE FLOAT DOUBLE DOUBLE DOUBLE DECIMAL DECIMAL - Examples
- Example 1: Calculate the average salary (sal) values of all employees. Sample statement:
The following result is returned:select avg(sal) from emp;
+------------+ | _c0 | +------------+ | 2222.0588235294117 | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees by department (deptno) and calculate the average salary (sal) values of employees in each department. Sample statement:
The following result is returned:select deptno, avg(sal) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 2916.6666666666665 | | 20 | 2175.0 | | 30 | 1566.6666666666667 | +------------+------------+
- Example 1: Calculate the average salary (sal) values of all employees. Sample statement:
COUNT
- Syntax
bigint count([distinct|all] <colname>)
- Description
Returns the number of records that match the specified criteria.
- Parameters
- distinctall: optional. This parameter specifies whether to remove duplicates during the counting. The default value is all, which indicates that all records are counted. If this parameter is set to distinct, only records with distinct values are counted.
- colname: required. The name of a column, which can be of any data type. The value of colname can be an asterisk (
*
).count(*)
indicates that the number of all rows is returned.
- Return value
A value of the BIGINT type is returned. If the value of colname is null, the row that contains this value is not used for calculation.
- Examples
- Example 1: Calculate the total number of employees in all departments. Sample statement:
The following result is returned:select count(*) from emp;
+------------+ | _c0 | +------------+ | 17 | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees by department (deptno) and calculate the number of employees in each department. Sample statement:
The following result is returned:select deptno, count(*) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 6 | | 20 | 5 | | 30 | 6 | +------------+------------+
- Example 3: Remove duplicates when you calculate the number of departments. Sample statement:
The following result is returned:select count(distinct deptno) from emp;
+------------+ | _c0 | +------------+ | 3 | +------------+
- Example 1: Calculate the total number of employees in all departments. Sample statement:
COUNT_IF
- Syntax
bigint count_if(boolean <expr>)
- Description
Returns the number of records whose expr value is True.
- Parameters
expr: required. A BOOLEAN expression.
- Return value
A value of the BIGINT type is returned. If the value of the expr parameter is False or the value of a specified column in expr is null, the row that contains this value is not used for calculation.
- Example
The following result is returned:select count_if(sal > 1000), count_if(sal <=1000) from emp;
+------------+------------+ | _c0 | _c1 | +------------+------------+ | 15 | 2 | +------------+------------+
MAX
- Syntax
max(<colname>)
- Description
Returns the maximum value of a column.
- Parameters
colname: required. The name of a column, which can be of any data type other than BOOLEAN.
- Return value
The type of the return value is the same as the type of the colname parameter. The return value varies based on the following rules:
- If the value of colname is null, the row that contains this value is not used for calculation.
- If the value of colname is of the BOOLEAN type, the value is not used for calculation.
- Examples
- Example 1: Calculate the highest salary (sal) of all employees. Sample statement:
The following result is returned:select max(sal) from emp;
+------------+ | _c0 | +------------+ | 5000 | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees by department (deptno) and calculate the highest salary of employees in each department. Sample statement:
The following result is returned:select deptno, max(sal) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 5000 | | 20 | 3000 | | 30 | 2850 | +------------+------------+
- Example 1: Calculate the highest salary (sal) of all employees. Sample statement:
MIN
- Syntax
min(<colname>)
- Description
Returns the minimum value of a column.
- Parameters
colname: required. The name of a column, which can be of any data type other than BOOLEAN.
- Return value
The type of the return value is the same as the type of the colname parameter. The return value varies based on the following rules:
- If the value of colname is null, the row that contains this value is not used for calculation.
- If the value of colname is of the BOOLEAN type, the value is not used for calculation.
- Examples
- Example 1: Calculate the lowest salary (sal) of all employees. Sample statement:
The following result is returned:select min(sal) from emp;
+------------+ | _c0 | +------------+ | 800 | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees by department (deptno) and calculate the lowest salary of employees in each department. Sample statement:
The following result is returned:select deptno, min(sal) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 1300 | | 20 | 800 | | 30 | 950 | +------------+------------+
- Example 1: Calculate the lowest salary (sal) of all employees. Sample statement:
MEDIAN
- Syntax
double median(double <colname>) decimal median(decimal <colname>)
- Description
Returns the median value of a column.
- Parameters
colname: required. The name of a column, which can be of the DOUBLE or DECIMAL type. If the input value is of the STRING or BIGINT type, the value is implicitly converted into the DOUBLE type before calculation.
- Return value
If the value of colname is null, the row that contains this value is not used for calculation. The following table describes the mappings between data types of input data and return values.
Input type Return value type TINYINT DOUBLE SMALLINT DOUBLE INT DOUBLE BIGINT DOUBLE FLOAT DOUBLE DOUBLE DOUBLE DECIMAL DECIMAL - Examples
- Example 1: Calculate the median salary (sal) values of all employees. Sample statement:
The following result is returned:select median(sal) from emp;
+------------+ | _c0 | +------------+ | 1600.0 | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees by department (deptno) and calculate the median salary values of employees in each department. Sample statement:
The following result is returned:select deptno, median(sal) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 2450.0 | | 20 | 2975.0 | | 30 | 1375.0 | +------------+------------+
- Example 1: Calculate the median salary (sal) values of all employees. Sample statement:
STDDEV
- Syntax
double stddev(double <colname>) decimal stddev(decimal <colname>)
- Description
Returns the population standard deviation of all the input values.
- Parameters
colname: required. The name of a column, which is of the DOUBLE or DECIMAL type. If the input value is of the STRING or BIGINT type, it is implicitly converted into the DOUBLE type before calculation.
- Return value
If the value of colname is null, the row that contains this value is not used for calculation. The following table describes the mappings between data types of input data and return values.
Input type Return value type TINYINT DOUBLE SMALLINT DOUBLE INT DOUBLE BIGINT DOUBLE FLOAT DOUBLE DOUBLE DOUBLE DECIMAL DECIMAL - Examples
- Example 1: Calculate the population standard deviation of salary (sal) values of all employees. Sample statement:
The following result is returned:select stddev(sal) from emp;
+------------+ | _c0 | +------------+ | 1262.7549932628976 | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees by department (deptno) and calculate the population standard deviation of salary values of employees in each department. Sample statement:
The following result is returned:select deptno, stddev(sal) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 1546.1421524412158 | | 20 | 1004.7387720198718 | | 30 | 610.1001739241043 | +------------+------------+
- Example 1: Calculate the population standard deviation of salary (sal) values of all employees. Sample statement:
STDDEV_SAMP
- Syntax
double stddev_samp(double <colname>) decimal stddev_samp(decimal <colname>)
- Description
Returns the sample standard deviation of all the input values.
- Parameters
colname: required. The name of a column, which can be of the DOUBLE or DECIMAL type. If the input value is of the STRING or BIGINT type, the value is implicitly converted into the DOUBLE type before calculation.
- Return value
If the value of colname is null, the row that contains this value is not used for calculation. The following table describes the mappings between data types of input data and return values.
Input type Return value type TINYINT DOUBLE SMALLINT DOUBLE INT DOUBLE BIGINT DOUBLE FLOAT DOUBLE DOUBLE DOUBLE DECIMAL DECIMAL - Examples
- Example 1: Calculate the sample standard deviation of salary (sal) values of all employees. Sample statement:
The following result is returned:select stddev_samp(sal) from emp;
+------------+ | _c0 | +------------+ | 1301.6180541247609 | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees by department (deptno) and calculate the sample standard deviation of salary values of employees in each department. Sample statement:
The following result is returned:select deptno, stddev_samp(sal) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 1693.7138680032901 | | 20 | 1123.3320969330487 | | 30 | 668.3312551921141 | +------------+------------+
- Example 1: Calculate the sample standard deviation of salary (sal) values of all employees. Sample statement:
SUM
- Syntax
DECIMAL|DOUBLE|BIGINT sum(<colname>)
- Description
Returns the sum of a column.
- Parameters
colname: required. Column values support all data types and can be converted into the DOUBLE type before calculation. The name of a column, which is of the DOUBLE, DECIMAL, or BIGINT type. If the input value is of the STRING type, the value is implicitly converted into the DOUBLE type before calculation.
- Return value
If the value of colname is null, the row that contains this value is not used for calculation. The following table describes the mappings between data types of input data and return values.
Input type Return value type TINYINT BIGINT SMALLINT BIGINT INT BIGINT BIGINT BIGINT FLOAT DOUBLE DOUBLE DOUBLE DECIMAL DECIMAL - Examples
- Example 1: Calculate the sum of salary (sal) values of all employees. Sample statement:
The following result is returned:select sum(sal) from emp;
+------------+ | _c0 | +------------+ | 37775 | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees by department (deptno) and calculate the sum of salary values of employees in each department. Sample statement:
The following result is returned:select deptno, sum(sal) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 17500 | | 20 | 10875 | | 30 | 9400 | +------------+------------+
- Example 1: Calculate the sum of salary (sal) values of all employees. Sample statement:
WM_CONCAT
- Syntax
string wm_concat(string <separator>, string <colname>)
- Description
Concatenates values in colname with a delimiter that is specified by separator.
- Parameters
- separator: required. The delimiter, which is a constant of the STRING type.
- colname: required. The name of a column, which is of the STRING type. If the input value is of the BIGINT, DOUBLE, or DATETIME type, the value is implicitly converted into the STRING type before calculation.
- Return value (The
GROUP BY
clause is specified, and the ORDER BY clause is not specified.)A value of the STRING type is returned. The return value varies based on the following rules:- If the value of separator is not a constant of the STRING type, an error is returned.
- If the value of colname is not of the STRING, BIGINT, DOUBLE, or DATETIME type, an error is returned.
- If the value of colname is null, the row that contains this value is not used for calculation.
Note Iftable_name
in theselect wm_concat(',', name) from table_name;
statement is an empty set, null is returned. - Examples
- Example 1: Concatenate the names (ename) of all employees. Sample statement:
The following result is returned:select wm_concat(',', ename) from emp;
+------------+ | _c0 | +------------+ | SMITH,ALLEN,WARD,JONES,MARTIN,BLAKE,CLARK,SCOTT,KING,TURNER,ADAMS,JAMES,FORD,MILLER,JACCKA,WELAN,TEBAGE | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees by department (deptno) and concatenate the names (ename) of employees in each department. Sample statement:
The following result is returned:select deptno, wm_concat(',', ename) from emp group by deptno order by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | CLARK,KING,MILLER,JACCKA,WELAN,TEBAGE | | 20 | SMITH,JONES,SCOTT,ADAMS,FORD | | 30 | ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES | +------------+------------+
- Example 3: Use this function with
GROUP BY
to group all employees by department (deptno) and concatenate the salary values of the employees in each department after duplicates are removed. Sample statement:
The following result is returned:select deptno, wm_concat(distinct ',', sal) from emp group by deptno order by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 1300,2450,5000 | | 20 | 1100,2975,3000,800 | | 30 | 1250,1500,1600,2850,950 | +------------+------------+
- Example 4: Use this function with
GROUP BY
andORDER BY
to group all employees by department (deptno), concatenate the salary values (sal) of all employees in each department, and sort the salary values (sal) in a specified order. Sample statement:
The following result is returned:select deptno, wm_concat(',', sal) within group(order by sal) from emp group by deptno order by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 1300,1300,2450,2450,5000,5000 | | 20 | 800,1100,2975,3000,3000 | | 30 | 950,1250,1250,1500,1600,2850 | +------------+------------+
- Example 1: Concatenate the names (ename) of all employees. Sample statement:
ANY_VALUE
- Syntax
any_value(<colname>)
- Description
Returns a random value from a specified column. This function is an extension function of MaxCompute V2.0.
- Parameters
colname: required. The name of a column, which can be of any data type.
- Return value
The data type of the return value is the same as the data type of the colname parameter. If the value of colname is null, the row that contains this value is not used for calculation.
- Examples
- Example 1: Select one of the employees. Sample statement:
The following result is returned:select any_value(ename) from emp;
+------------+ | _c0 | +------------+ | SMITH | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees by department (deptno) and select one employee from each group. Sample statement:
The following result is returned:select deptno, any_value(ename) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | CLARK | | 20 | SMITH | | 30 | ALLEN | +------------+------------+
- Example 1: Select one of the employees. Sample statement:
ARG_MAX
- Syntax
arg_max(<valueToMaximize>, <valueToReturn>)
- Description
Finds the row in which the maximum value of valueToMaximize is included and returns the value of valueToReturn in the row. This function is an extension function of MaxCompute V2.0.
- Parameters
- valueToMaximize: required. Data of any data type.
- valueToReturn: required. Data of any data type.
- Return value
The data type of the return value is the same as the data type of the valueToReturn parameter. If multiple rows have the largest value of valueToMaximize, the value of valueToReturn in one of the rows is randomly returned. If the value of valueToMaximize is null, the row that contains this value is not used for calculation.
- Examples
- Example 1: Return the name of the employee with the highest salary. Sample statement:
The following result is returned:select arg_max(sal, ename) from emp;
+------------+ | _c0 | +------------+ | KING | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees based on the deptno column and return the name of the employee with the highest salary in each group. Sample statement:
The following result is returned:select deptno, arg_max(sal, ename) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | KING | | 20 | SCOTT | | 30 | BLAKE | +------------+------------+
- Example 1: Return the name of the employee with the highest salary. Sample statement:
ARG_MIN
- Syntax
arg_min(<valueToMinimize>, <valueToReturn>)
- Description
Finds the row in which the minimum value of valueToMinimize is included and returns the value of valueToReturn in the row. This function is an extension function of MaxCompute V2.0.
- Parameters
- valueToMinimize: required. Data of any data type.
- valueToReturn: required. Data of any data type.
- Return value
The data type of the return value is the same as the data type of the valueToReturn parameter. If multiple rows have the smallest value of valueToMinimize, the value of valueToReturn in one of the rows is randomly returned. If the value of valueToMinimize is null, the row that contains this value is not used for calculation.
- Examples
- Example 1: Return the name of the employee with the lowest salary. Sample statement:
The following result is returned:select arg_min(sal, ename) from emp;
+------------+ | _c0 | +------------+ | SMITH | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees based on the deptno column and return the name of the employee with the lowest salary in each group. Sample statement:
The following result is returned:select deptno, arg_min(sal, ename) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | MILLER | | 20 | SMITH | | 30 | JAMES | +------------+------------+
- Example 1: Return the name of the employee with the lowest salary. Sample statement:
MAX_BY
- Syntax
max_by(<valueToReturn>,<valueToMaximize>)
- Description
Finds the row in which the maximum value of valueToMaximize is included and returns the value of valueToReturn in the row. This function is an extension function of MaxCompute V2.0.
- Parameters
- valueToMaximize: required. Data of any data type.
- valueToReturn: required. Data of any data type.
- Return value
The data type of the return value is the same as the data type of the valueToReturn parameter. If multiple rows have the largest value of valueToMaximize, the value of valueToReturn in one of the rows is randomly returned. If the value of valueToMaximize is null, the row that contains this value is not used for calculation.
- Examples
- Example 1: Return the name of the employee with the highest salary. Sample statement:
The following result is returned:select max_by(ename,sal) from emp;
+------------+ | _c0 | +------------+ | KING | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees based on the deptno column and return the name of the employee with the highest salary in each group. Sample statement:
The following result is returned:select deptno, max_by(ename,sal) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | KING | | 20 | SCOTT | | 30 | BLAKE | +------------+------------+
- Example 1: Return the name of the employee with the highest salary. Sample statement:
MIN_BY
- Syntax
min_by(<valueToReturn>,<valueToMinimize>)
- Description
Finds the row in which the minimum value of valueToMinimize is included and returns the value of valueToReturn in the row. This function is an extension function of MaxCompute V2.0.
- Parameters
- valueToMinimize: required. Data of any data type.
- valueToReturn: required. Data of any data type.
- Return value
The data type of the return value is the same as the data type of the valueToReturn parameter. If multiple rows have the smallest value of valueToMinimize, the value of valueToReturn in one of the rows is randomly returned. If the value of valueToMinimize is null, the row that contains this value is not used for calculation.
- Examples
- Example 1: Return the name of the employee with the lowest salary. Sample statement:
The following result is returned:select min_by(ename,sal) from emp;
+------------+ | _c0 | +------------+ | SMITH | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees based on the deptno column and return the name of the employee with the lowest salary in each group. Sample statement:
The following result is returned:select deptno, min_by(ename,sal) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | MILLER | | 20 | SMITH | | 30 | JAMES | +------------+------------+
- Example 1: Return the name of the employee with the lowest salary. Sample statement:
APPROX_DISTINCT
- Syntax
approx_distinct(<colname>)
- Description
Returns the approximate number of distinct input values in a specified column. This function is an extension function of MaxCompute V2.0.
- Parameters
colname: required. The name of the column from which duplicates need to be removed.
- Return value
A value of the BIGINT type is returned. This function produces a standard error of 5%. If the value of colname is null, the row that contains this value is not used for calculation.
- Examples
- Example 1: Calculate the approximate number of distinct values in the sal column. Sample statement:
The following result is returned:select approx_distinct(sal) from emp;
+-------------------+ | numdistinctvalues | +-------------------+ | 12 | +-------------------+
- Example 2: Use this function with
GROUP BY
to group all employees based on the deptno column and calculate the approximate number of distinct values in the sal column. Sample statement:
The following result is returned:select deptno, approx_distinct(sal) from emp group by deptno;
+------------+-------------------+ | deptno | numdistinctvalues | +------------+-------------------+ | 10 | 3 | | 20 | 4 | | 30 | 5 | +------------+-------------------+
- Example 1: Calculate the approximate number of distinct values in the sal column. Sample statement:
COLLECT_LIST
- Syntax
array collect_list(<colname>)
- Description
Aggregates the values in a column specified by colname into an array. This function is an extension function of MaxCompute V2.0.
- Parameters
colname: required. The name of a column, which can be of any data type.
- Return value
A value of the ARRAY type is returned. If the value of colname is null, the row that contains this value is not used for calculation.
- Examples
- Example 1: Aggregate the values in the sal column of all employees into an array. Sample statement:
The following result is returned:select collect_list(sal) from emp;
+------------+ | _c0 | +------------+ | [800,1600,1250,2975,1250,2850,2450,3000,5000,1500,1100,950,3000,1300,5000,2450,1300] | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees based on the deptno column and aggregate the values in the sal column of the employees in each group. Sample statement:
The following result is returned:select deptno, collect_list(sal) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | [2450,5000,1300,5000,2450,1300] | | 20 | [800,2975,3000,1100,3000] | | 30 | [1600,1250,1250,2850,1500,950] | +------------+------------+
- Example 3: Use this function with
GROUP BY
to group all employees based on the deptno column and aggregate the values in the sal column of the employees in each group after duplicates are removed. Sample statement:
The following result is returned:select deptno, collect_list(distinct sal) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | [1300,2450,5000] | | 20 | [800,1100,2975,3000] | | 30 | [950,1250,1500,1600,2850] | +------------+------------+
- Example 1: Aggregate the values in the sal column of all employees into an array. Sample statement:
COLLECT_SET
- Syntax
array collect_set(<colname>)
- Description
Aggregates the values in a column specified by colname into an array with only distinct values. This function is an extension function of MaxCompute V2.0.
- Parameters
colname: required. The name of a column, which can be of any data type.
- Return value
A value of the ARRAY type is returned. If the value of colname is null, the row that contains this value is not used for calculation.
- Examples
- Example 1: Aggregate the values in the sal column of all employees into an array with only distinct values. Sample statement:
The following result is returned:select collect_set(sal) from emp;
+------------+ | _c0 | +------------+ | [800,950,1100,1250,1300,1500,1600,2450,2850,2975,3000,5000] | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees based on the deptno column and aggregate the values in the sal column of the employees in each group into an array with only distinct values. Sample statement:
The following result is returned:select deptno, collect_set(sal) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | [1300,2450,5000] | | 20 | [800,1100,2975,3000] | | 30 | [950,1250,1500,1600,2850] | +------------+------------+
- Example 1: Aggregate the values in the sal column of all employees into an array with only distinct values. Sample statement:
NUMERIC_HISTOGRAM
- Syntax
map<double key, double value> numeric_histogram(bigint <buckets>, double <colname> [, double <weight>])
- Description
Returns the approximate histogram of a specified column. This function is an extension function of MaxCompute V2.0.
- Parameters
- buckets: required. A value of the BIGINT type. This parameter specifies the maximum number of buckets in the column whose approximate histogram is returned.
- colname: required. A value of the DOUBLE type. This parameter specifies the columns whose approximate histograms need to be calculated.
- weight: optional. The weight value of the DOUBLE type. This parameter specifies the weight of a row of data.
- Return value
A value of the
map<double key, double value>
type is returned. In the return value, key indicates the X-axis of the approximate histogram, and value indicates the approximate height of the Y-axis of the approximate histogram. The return value varies based on the following rules:- If the value of buckets is null, null is returned.
- If the value of colname is null, the row that contains this value is not used for calculation.
- Examples
- Return an approximate histogram of the sal column. Sample statement:
The following result is returned:select numeric_histogram(5, sal) from emp;
+------------+ | _c0 | +------------+ | {"1328.5714285714287":7.0,"2450.0":2.0,"5000.0":2.0,"875.0":2.0,"2956.25":4.0} | +------------+
- Return the approximate histogram of the sal column. The
deptno
in each row of data indicates the department weight. Sample statement:
The following result is returned:select numeric_histogram(5, sal, deptno) from emp;
+------------+ | _c0 | +------------+ | {"2944.4444444444443":90.0,"2450.0":20.0,"5000.0":20.0,"890.0":50.0,"1350.0":160.0} | +------------+
- Return an approximate histogram of the sal column. Sample statement:
PERCENTILE_APPROX
- Syntax
double percentile_approx (double <colname>[, double <weight>], <p> [, <B>])) -- Return multiple approximate percentiles as an array. array<double> percentile_approx (double <colname> [, double <weight>], array(<p1> [, <p2>...]) [, <B>])
- Description
Returns approximate percentiles. This function applies to scenarios in which a large amount of data is calculated. This function sorts the values in a specified column in ascending order and returns the approximate percentile of the column at the pth percentage. The value of
percentile_approx
starts from 1. For example, if a column contains values of 100, 200, and 300, the sequence numbers of these values are 1, 2, and 3. If you calculate the 0.6th percentile of the column, thePERCENTILE_APPROX
function returns 180, which is calculated by using the following formula:100 + (200 - 100) × 0.8
. This value corresponds to 1.8, which is calculated by using the formula 3 × 0.6 and is between sequence numbers 1 and 2. This function is an extension function of MaxCompute V2.0.Note Differences betweenPERCENTILE_APPROX
andPERCENTILE
:PERCENTILE_APPROX
is used to calculate the approximate percentile, andPERCENTILE
is used to calculate the exact percentile. If the amount of data is large,PERCENTILE
may fail to run due to memory limits, butPERCENTILE_APPROX
does not have this issue.- The implementation of
PERCENTILE_APPROX
is consistent with that ofPERCENTILE_APPROX
in Hive, but the calculation algorithm of PERCENTILE_APPROX is different from that ofPERCENTILE
. If the amount of data is small, the execution result of PERCENTILE_APPROX is different from that ofPERCENTILE
.
- Parameters
- colname: required. The name of a column, which is of the DOUBLE type.
- weight: optional. The weight value of the DOUBLE type. This parameter specifies the weight of a row of data.
- p: required. The approximate percentile. Valid values:
[0.0, 1.0]
. - B: the accuracy of the return value. A higher accuracy indicates a more accurate value. If you do not specify this parameter, 10000 is used by default.
- Return value
A value of the DOUBLE or ARRAY type is returned. The return value varies based on the following rules:
- If the value of colname is null, the row that contains this value is not used for calculation.
- If the value of p or B is null, an error is returned.
- Examples
- Example 1: Calculate the 0.3rd percentile in the sal column. Sample statement:
The following result is returned:select percentile_approx(sal, 0.3) from emp;
+------------+ | _c0 | +------------+ | 1252.5 | +------------+
- Example 2: Use this function with
GROUP BY
to group all employees based on the deptno column and calculate the 0.3rd percentile of employees in each group in the sal column. Sample statement:
The following result is returned:select deptno, percentile_approx(sal, 0.3) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 1300.0 | | 20 | 950.0 | | 30 | 1070.0 | +------------+------------+
- Example 3: Use this function with
GROUP BY
to group all employees based on the deptno column and calculate the 0.3rd, 0.5th, and 0.8th percentiles of employees in each group in the sal column. Sample statement:
The following result is returned:set odps.sql.type.system.odps2=true; select deptno, percentile_approx(sal, array(0.3, 0.5, 0.8), 1000) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | [1300.0,1875.0,3470.000000000001] | | 20 | [950.0,2037.5,2987.5] | | 30 | [1070.0,1250.0,1580.0] | +------------+------------+
- Example 4 (example with weight specified): Use this function with
GROUP BY
to group all employees based on the deptno column and calculate the 0.3rd, 0.5th, and 0.8th percentile of employees in each group in the sal column. Values in thecnt
column of theemp
table indicate the number of employees whose salaries are in the specified percentile. Sample statement:
The following result is returned:select deptno, percentile_approx(sal, deptno, array(0.3, 0.5, 0.8), 1000) from emp group by deptno;
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | [1300.0,1875.0,3470.0] | | 20 | [950.0,2037.5,2987.5] | | 30 | [1070.0,1250.0,1580.0] | +------------+------------+
- Example 1: Calculate the 0.3rd percentile in the sal column. Sample statement:
BITWISE_OR_AGG
- Syntax
bigint bitwise_or_agg(bigint value)
- Description
Aggregates input values based on the bitwise OR operation.
- Parameters
value: required. A value of the BIGINT type. The null value is not used for calculation.
- Return value
A value of the BIGINT type is returned.
- Examples
The following result is returned:select id, bitwise_or_agg(v) from values (1L, 2L), (1L, 1L), (2L, null), (1L, null) t(id, v) group by id;
+------------+------------+ | id | _c1 | +------------+------------+ | 1 | 3 | | 2 | NULL | +------------+------------+
BITWISE_AND_AGG
- Syntax
bigint bitwise_and_agg(bigint value)
- Description
Aggregates input values based on the bitwise AND operation.
- Parameters
value: required. A value of the BIGINT type. The null value is not used for calculation.
- Return value
A value of the BIGINT type is returned.
- Examples
The following result is returned:select id, bitwise_and_agg(v) from values (1L, 2L), (1L, 1L), (2L, null), (1L, null) t(id, v) group by id;
+------------+------------+ | id | _c1 | +------------+------------+ | 1 | 0 | | 2 | NULL | +------------+------------+
MAP_AGG
- Syntax
map<K, V> map_agg(K a, V b);
- Description
Returns a map that is created by using a and b. a is the key in the map. b is the value of the key in the map. If the key in the map is null, the key is ignored. If the key field has duplicate values, one of the values is randomly retained.
- Parameters
- a: an input field that is used as the key in the map.
- b: an input field that is used as the value in the map.
- Return value
A new map is returned.
- Examples
The following result is returned:select map_agg(a, b) from values (1L, 'apple'), (2L, 'hi'), (null, 'good'), (1L, 'pie') t(a, b);
+------------------------+ | _c0 | +------------------------+ | {"2":"hi","1":"apple"} | +------------------------+
MULTIMAP_AGG
- Syntax
map<K, array<V>> multimap_agg(K a, V b);
- Description
Returns a map that is created by using a and b. a is the key in the map. b is used to create an array, which is used as the value of the key in the map. If the key in the map is null, the key is ignored.
- Parameters
- a: an input field that is used as the key in the map.
- b: an input field that is used as the value in the map. Fields that correspond to the same key are placed in the same array and are used as values in the map.
- Return value
A new map is returned.
- Examples
The following result is returned:select multimap_agg(a, b) from values (1L, 'apple'), (2L, 'hi'), (null, 'good'), (1L, 'pie') t(a, b);
+----------------------------------+ | _c0 | +----------------------------------+ | {"2":["hi"],"1":["apple","pie"]} | +----------------------------------+
MAP_UNION
- Syntax
map<K, V> map_union(map<K, V> input);
- Description
Returns a new map that is the union of all input maps. If a key exists in multiple input maps, one of the values that correspond to the key is randomly retained.
- Parameters
input: the input maps.
- Return value
A new map is returned.
- Examples
The following result is returned:select map_union(a) from values (map(1L, 'hi', 2L, 'apple', 3L, 'pie')), (map(1L, 'good', 4L, 'this')), (null) t(a);
+-----------------------------------------------+ | _c0 | +-----------------------------------------------+ | {"4":"this","1":"good","2":"apple","3":"pie"} | +-----------------------------------------------+
MAP_UNION_SUM
- Syntax
map<K, V> map_union_sum(map<K, V> input);
- Description
Returns a new map that is the union of all input maps. The output map sums the values of the matching keys in all input maps. If the value that corresponds to a key is null, the value is converted into 0.Note The values in input maps must be of the BIGINT, INT, SMALLINT, TINYINT, FLOAT, DOUBLE, or DECIMAL data type.
- Parameters
input: the input maps.
- Return value
A new map is returned.Note The values in the new map are of the BIGINT, DOUBLE, or DECIMAL type.
- Examples
The following result is returned:select map_union_sum(a) from values (map('hi', 2L, 'apple', 3L, 'pie', 1L)), (map('apple', null, 'hi', 4L)), (null) t(a);
+----------------------------+ | _c0 | +----------------------------+ | {"apple":3,"hi":6,"pie":1} | +----------------------------+
HISTOGRAM
- Syntax
map<K, bigint> histogram(K input);
- Description
Returns a map that contains the number of times each input value appears. The keys in the map are the input values. Each value in the map is the number of times an input value appears. The null value is ignored.
- Parameters
input: input values, which are used as the keys in the map.
- Return value
A map that contains the number of times each input value appears is returned.
- Examples
The following result is returned:select histogram(a) from values ('hi'), (null), ('apple'), ('pie'), ('apple') t(a);
+----------------------------+ | _c0 | +----------------------------+ | {"pie":1,"hi":1,"apple":2} | +----------------------------+