All Products
Search
Document Center

MaxCompute:Aggregate functions

Last Updated:Feb 05, 2024

Aggregate functions group multiple input records to form a single output record. You can use an aggregate function with a GROUP BY clause of MaxCompute SQL. This topic describes the syntax, parameters, and examples of aggregate functions that are supported by MaxCompute SQL. It guides you through data development by using aggregate functions.

The following table describes the aggregate functions that are supported by MaxCompute SQL.

Function

Description

ANY_VALUE

Returns a random value from a specific column.

APPROX_DISTINCT

Returns an approximate number of distinct input values in a specified column.

ARG_MAX

Returns the column value of the row that corresponds to the maximum value of a specified column.

ARG_MIN

Returns the column value of a row that corresponds to the minimum value of a specific column.

AVG

Returns the average value of a column.

BITWISE_AND_AGG

Aggregates input values based on the bitwise AND operation.

BITWISE_OR_AGG

Aggregates input values based on the bitwise OR operation.

COLLECT_LIST

Aggregates the specified columns into an array.

COLLECT_SET

Aggregates distinct values from a specified column into an array.

COUNT

Returns the number of records that match a specified criteria.

COUNT_IF

Returns the number of records whose expr value is True.

COVAR_POP

Calculates the population covariance of two specified numeric columns.

COVAR_SAMP

Calculates the sample covariance of two specified numeric columns.

HISTOGRAM

Returns a map that contains the number of times each input value appears.

MAP_AGG

Returns a map that is created by using a and b. a is the key in the map. b is the value of the key in the map.

MAP_UNION

Returns a new map that is the union of all input maps.

MAP_UNION_SUM

Returns a new map that is the union of all input maps. The output map sums the values of the matching keys in all input maps.

MAX

Returns the maximum value of a column.

MAX_BY

Returns the column value of the row that corresponds to the maximum value of a specified column.

MEDIAN

Returns the median value of a column.

MIN

Returns the minimum value of a column.

MIN_BY

Returns the column value of a row that corresponds to the minimum value of a specific column.

MULTIMAP_AGG

Returns a map that is created by using a and b. a is the key in the map. b is used to create an array, which is used as the value of the key in the map.

NUMERIC_HISTOGRAM

Returns an approximate histogram based on a specified column.

PERCENTILE

Calculates an exact percentile. This function is suitable for scenarios where a small amount of data is calculated.

PERCENTILE_APPROX

Returns approximate percentiles. This function applies to scenarios in which a large amount of data is calculated.

STDDEV

Returns the population standard deviation of all input values.

STDDEV_SAMP

Returns the sample standard deviation of all the input values.

SUM

Returns the sum of a column.

VAR_SAMP

Calculates the sample variance of a specified numeric column.

VARIANCE/VAR_POP

Calculates the variance of a specified numeric column.

WM_CONCAT

Concatenates strings with a specified delimiter.

Precautions

MaxCompute V2.0 provides additional functions. If the functions that you use involve new data types that are supported in the MaxCompute V2.0 data type edition, you must execute the SET statement to enable the MaxCompute V2.0 data type edition. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.

  • Session level: To use the MaxCompute V2.0 data type edition, you must add set odps.sql.type.system.odps2=true; before the SQL statement that you want to execute, and commit and execute them together.

  • Project level: The project owner can run the following command to enable the MaxCompute V2.0 data type edition for the project based on the project requirements. The configuration takes effect after 10 to 15 minutes. Sample statement:

    setproject odps.sql.type.system.odps2=true;

    For more information about setproject, see Project operations. For more information about the precautions that you must take when you enable the MaxCompute V2.0 data type edition at the project level, see Data type editions.

  • A worker can contain a maximum of 2 million elements.

If you use an SQL statement that includes multiple aggregate functions and the project resources are insufficient, memory overflow may occur. We recommend that you optimize the SQL statement or purchase computing resources based on your business requirements.

Syntax

Syntax of an aggregate function:

<aggregate_name>(<expression>[,..]) [within group (order by <col1>[,<col2>…])] [filter (where <where_condition>)]
  • <aggregate_name>(<expression>[,..]): a built-in aggregate function or a user-defined aggregate function (UDAF). The format of an aggregate function is based on its syntax.

  • within group (order by <col1>[,<col2>…]): If the syntax of an aggregate function includes this expression, the system automatically sorts the input data of <col1>[,<col2>…] in ascending order. To sort the input data in descending order, use the expression within group (order by <col1>[,<col2>…] [desc]).

    Before you use this expression, take note of the following limits:

    • You can use this expression only for WM_CONCAT, COLLECT_LIST, COLLECT_SET, and UDAFs.

    • If multiple aggregate functions of a SELECT statement include the expression within group (order by <col1>[,<col2>…]), order by <col1>[,<col2>…] must be the same for these functions.

    • If the parameters of an aggregate function include the DISTINCT keyword, columns with distinct values must be specified in the expression order by <col1>[,<col2>…]. Columns that are specified in the ORDER BY clause are a subset of columns with distinct values. The data type of the fields in the expression <col1>[,<col2>…] must be the same as the data type of the input parameter of the aggregate function.

      Note

      Only one input parameter can be specified in an aggregate function that uses the expression within group (order by <col1>[,<col2>…]). Therefore, if the input parameter in the aggregate function includes the DISTINCT keyword, only one column can be specified in the ORDER BY clause. The data type of the column must be the same as the data type of the input parameter of the aggregate function.

      For example, if the input parameter of the WM_CONCAT function is of the STRING data type, the data type of the field specified in the ORDER BY clause must also be STRING. For more information, see Example 4. For more information about how to create the emp table, see Sample data.

    Examples:

    -- Example 1: Sort the input data in ascending order and return the output data. 
    select 
      x,
      wm_concat(',', y) within group (order by y)
    from values('k', 1),('k', 3),('k', 2) as t(x, y)
    group by x;
    -- The following result is returned: 
    +------------+------------+
    | x          | _c1        |
    +------------+------------+
    | k          | 1,2,3      |
    +------------+------------+
    
    -- Example 2: Sort the input data in descending order and return the output data. 
    select 
      x,
      wm_concat(',', y) within group (order by y desc)
    from values('k', 1),('k', 3),('k', 2) as t(x, y)
    group by x;
    -- The following result is returned: 
    +------------+------------+
    | x          | _c1        |
    +------------+------------+
    | k          | 3,2,1      |
    +------------+------------+
    
    
    -- Example 3
    select id,
    wm_concat(distinct ',', name) within group (order by name desc)
    from values('k', '1'),('k', '3'),('k', '2') as t(id, name)
    group by id;
    
    -- The following result is returned: 
    +------------+------------+
    | id         | _c1        |
    +------------+------------+
    | k          | 3,2,1      |
    +------------+------------+
    
    
    -- Example 4
    -- The parameters of an aggregate function include the DISTINCT keyword. In this case, if the input parameter sal of the WM_CONCAT function is of the BIGINT data type, the data type of the input parameter is implicitly converted into a value of the STRING data type. 
    -- To ensure data type consistency, you must use the CAST function to convert the data type of the input parameter sal into a value of the STRING data type. Otherwise, an error is reported. 
    select deptno,
    wm_concat(distinct ',', sal) 
    within group (order by cast(sal as STRING ) desc) 
    from emp group by deptno order by deptno;
    
    -- The following result is returned: 
    +------------+------------+
    | deptno     | _c1        |
    +------------+------------+
    | 10         | 5000,2450,1300 |
    | 20         | 800,3000,2975,1100 |
    | 30         | 950,2850,1600,1500,1250 |
    +------------+------------+
  • [filter (where <where_condition>)]: If an aggregate function includes this expression, the aggregate function processes only the data that meets the condition specified by <where_condition>. For more information about <where_condition>, see WHERE clause (where_condition).

    Before you use this expression, take note of the following limits:

    • Only built-in aggregate functions support this expression. UDAFs do not support this expression.

    • count(*) supports the expression [filter (where <where_condition>)].

    • The COUNT_IF function does not support the expression [filter (where <where_condition>)].

    Examples:

    -- Example 1: Filter and aggregate data. 
    select
      sum(x),
      sum(x) filter (where y > 1),
      sum(x) filter (where y > 2)
      from values(null, 1),(1, 2),(2, 3),(3, null) as t(x, y);
    -- The following result is returned: 
    +------------+------------+------------+
    | _c0        | _c1        | _c2        |
    +------------+------------+------------+
    | 6          | 3          | 2          |
    +------------+------------+------------+
    
    -- Example 2: Use multiple aggregate functions to filter and aggregate data. 
    select
      count_if(x > 2),
      sum(x) filter (where y > 1),
      sum(x) filter (where y > 2)
      from values(null, 1),(1, 2),(2, 3),(3, null) as t(x, y);
    -- The following result is returned: 
    +------------+------------+------------+
    | _c0        | _c1        | _c2        |
    +------------+------------+------------+
    | 1          | 3          | 2          |
    +------------+------------+------------+

Sample data

This section provides sample source data and examples for you to understand how to use the functions. Create a table named emp and insert the sample data into the table. Sample commands:

create table if not exists emp
   (empno bigint,
    ename string,
    job string,
    mgr bigint,
    hiredate datetime,
    sal bigint,
    comm bigint,
    deptno bigint);
tunnel upload emp.txt emp;

The emp.txt file contains the following sample data:

7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

Filter expressions

  • Limits

    • Only built-in aggregate functions of MaxCompute support filter expressions. UDAFs do not support filter expressions.

    • COUNT(*) cannot be used with filter expressions. Use the COUNT_IF function with filter expressions.

  • Syntax

    <aggregate_name>(<expression>[,...]) [filter (where <where_condition>)]
  • Description

    All aggregate functions support filter expressions. If you specify a filter condition, only the row data that meets the filter condition can be passed to the related aggregate function for data processing.

  • Parameters

    • aggregate_name: required. The name of the aggregate function. Select an aggregate function that is described in this topic based on your business requirements.

    • expression: required. The parameters of the aggregate function that you select. Specify this parameter based on the description of the aggregate function that you select.

    • where_condition: optional. The filter condition. For more information about where_condition, see WHERE clause (where_condition).

  • Return value

    For more information, see the description of the return value for each aggregate function.

  • Sample statement:

    select sum(sal) filter (where deptno=10), sum(sal) filter (where deptno=20), sum(sal) filter (where deptno=30) from emp;

    The following result is returned:

    +------------+------------+------------+
    | _c0        | _c1        | _c2        |
    +------------+------------+------------+
    | 17500      | 10875      | 9400       |
    +------------+------------+------------+

ANY_VALUE

  • Syntax

    any_value(<colname>)
  • Description

    Returns a random value from a specific column. This function is an additional function of MaxCompute V2.0.

  • Parameters

    colname: required. The name of a column, which can be of any data type.

  • Return value

    The data type of the return value is the same as the data type of the colname parameter. If the value of the colname parameter is null, the row that contains this value is not used for calculation.

  • Examples

    • Example 1: Select one of the employees. Sample statement:

      select any_value(ename) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | SMITH      |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees based on the deptno column and select one employee from each group. Sample statement:

      select deptno, any_value(ename) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | CLARK      |
      | 20         | SMITH      |
      | 30         | ALLEN      |
      +------------+------------+

APPROX_DISTINCT

  • Syntax

    approx_distinct(<colname>)
  • Description

    Returns the approximate number of distinct input values in a specified column. This function is an additional function of MaxCompute V2.0.

  • Parameters

    colname: required. The name of the column from which duplicates need to be removed.

  • Return value

    A value of the BIGINT type is returned. This function produces a standard error of 5%. If a value of the column that is specified by the colname parameter is null, the row that contains this value is not used for calculation.

  • Examples

    • Example 1: Calculate an approximate number of distinct values in the sal column. Sample statement:

      select approx_distinct(sal) from emp;

      The following result is returned:

      +-------------------+
      | numdistinctvalues |
      +-------------------+
      | 12                |
      +-------------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the approximate number of distinct values in the sal column. Sample statement:

      select deptno, approx_distinct(sal) from emp group by deptno;

      The following result is returned:

      +------------+-------------------+
      | deptno     | numdistinctvalues |
      +------------+-------------------+
      | 10         | 3                 |
      | 20         | 4                 |
      | 30         | 5                 |
      +------------+-------------------+

ARG_MAX

  • Syntax

    arg_max(<valueToMaximize>, <valueToReturn>)
  • Description

    Finds the row in which the value of valueToMaximize is included and returns the value of valueToReturn in the row. This function is an additional function of MaxCompute V2.0.

  • Parameters

    • valueToMaximize: required. The name of a column, which can be of any data type.

    • valueToReturn: required. The name of a column, which can be of any data type.

  • Return value

    The data type of the return value is the same as the data type of the valueToReturn parameter. If multiple rows contain the largest value of valueToMaximize, the value of valueToReturn in one of the rows is randomly returned. If the value of valueToMaximize is null, the row that contains this value is not used for calculation.

  • Examples

    • Example 1: Return the name of the employee who has the highest salary. Sample statement:

      select arg_max(sal, ename) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | KING       |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and return the name of the employee with the highest salary in each group. Sample statement:

      select deptno, arg_max(sal, ename) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | KING       |
      | 20         | SCOTT      |
      | 30         | BLAKE      |
      +------------+------------+

ARG_MIN

  • Syntax

    arg_min(<valueToMinimize>, <valueToReturn>)
  • Description

    Finds the row in which the minimum value of valueToMinimize is included and returns the value of valueToReturn in the row. This function is an additional function of MaxCompute V2.0.

  • Parameters

    • valueToMinimize: required. A value of any data type.

    • valueToReturn: required. A value of any data type.

  • Return value

    The data type of the return value is the same as the data type of the valueToReturn parameter. If multiple rows contain the smallest value of valueToMinimize, the value of valueToReturn in one of the rows is randomly returned. If the value of valueToMinimize is null, the row that contains this value is not used for calculation.

  • Examples

    • Example 1: Return the name of the employee who has the lowest salary. Sample statement:

      select arg_min(sal, ename) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | SMITH      |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and return the name of the employee with the lowest salary in each group. Sample statement:

      select deptno, arg_min(sal, ename) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | MILLER     |
      | 20         | SMITH      |
      | 30         | JAMES      |
      +------------+------------+

AVG

  • Syntax

    DECIMAL|DOUBLE avg(<colname>)
  • Description

    Returns the average value of a column.

  • Parameters

    colname: required. Column values support all data types and can be converted into the DOUBLE type before calculation.

  • Return value

    If the value of colname is null, the row that contains this value is not used for calculation. The following table describes the mappings between data types of input data and return values.

    Input type

    Return value type

    TINYINT

    DOUBLE

    SMALLINT

    DOUBLE

    INT

    DOUBLE

    BIGINT

    DOUBLE

    FLOAT

    DOUBLE

    DOUBLE

    DOUBLE

    DECIMAL

    DECIMAL

  • Examples

    • Example 1: Calculate the average salary (sal) values of all employees. Sample statement:

      select avg(sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 2222.0588235294117 |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the average salary (sal) values of employees in each department. Sample statement:

      select deptno, avg(sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 2916.6666666666665 |
      | 20         | 2175.0     |
      | 30         | 1566.6666666666667 |
      +------------+------------+

BITWISE_AND_AGG

  • Syntax

    bigint bitwise_and_agg(bigint value)
  • Description

    Aggregates input values based on the bitwise AND operation.

  • Parameters

    value: required. A value of the BIGINT type. The null value is not used for calculation.

  • Return value

    A value of the BIGINT type is returned.

  • Examples

    select id, bitwise_and_agg(v) from
        values (1L, 2L), (1L, 1L), (2L, null), (1L, null) t(id, v) group by id;

    The following result is returned:

    +------------+------------+
    | id         | _c1        |
    +------------+------------+
    | 1          | 0          |
    | 2          | NULL       |
    +------------+------------+

BITWISE_OR_AGG

  • Syntax

    bigint bitwise_or_agg(bigint value)
  • Description

    Aggregates input values based on the bitwise OR operation.

  • Parameters

    value: required. A value of the BIGINT type. The null value is not used for calculation.

  • Return value

    A value of the BIGINT type is returned.

  • Examples

    select id, bitwise_or_agg(v) from
        values (1L, 2L), (1L, 1L), (2L, null), (1L, null) t(id, v) group by id;

    The following result is returned:

    +------------+------------+
    | id         | _c1        |
    +------------+------------+
    | 1          | 3          |
    | 2          | NULL       |
    +------------+------------+

COLLECT_LIST

  • Syntax

    array collect_list(<colname>)
  • Description

    Aggregates the values that are specified by colname into an array. This function is an additional function of MaxCompute V2.0.

  • Parameters

    colname: required. The name of a column, which can be of any data type.

  • Return value

    A value of the ARRAY type is returned. If a value of the column specified by colname is null, the row that contains this value is not used for calculation.

  • Examples

    • Example 1: Aggregate the salary (sal) values of all employees into an array. Sample statement:

      select collect_list(sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | [800,1600,1250,2975,1250,2850,2450,3000,5000,1500,1100,950,3000,1300,5000,2450,1300] |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees based on the deptno column and aggregate the values in the sal column of each group into an array. Sample statement:

      select deptno, collect_list(sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | [2450,5000,1300,5000,2450,1300] |
      | 20         | [800,2975,3000,1100,3000] |
      | 30         | [1600,1250,1250,2850,1500,950] |
      +------------+------------+
    • Example 3: Use this function with GROUP BY to group all employees based on the deptno column and aggregate the values in the sal column of each group after duplicates are removed. Sample statement:

      select deptno, collect_list(distinct sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | [1300,2450,5000] |
      | 20         | [800,1100,2975,3000] |
      | 30         | [950,1250,1500,1600,2850] |
      +------------+------------+

COLLECT_SET

  • Syntax

    array collect_set(<colname>)
  • Description

    Aggregates the values that are specified by colname into an array with only distinct values. This function is an additional function of MaxCompute V2.0.

  • Parameters

    colname: required. The name of a column, which can be of any data type.

  • Return value

    A value of the ARRAY type is returned. If a value of the column specified by colname is null, the row that contains this value is not used for calculation.

  • Examples

    • Example 1: Aggregate the salary (sal) values of all employees into an array with only distinct values. Sample statement:

      select collect_set(sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | [800,950,1100,1250,1300,1500,1600,2450,2850,2975,3000,5000] |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees based on the deptno column and aggregate the values in the sal column of the employees in each group into an array with only distinct values. Sample statement:

      select deptno, collect_set(sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | [1300,2450,5000] |
      | 20         | [800,1100,2975,3000] |
      | 30         | [950,1250,1500,1600,2850] |
      +------------+------------+

COUNT

  • Syntax

    bigint count([distinct|all] <colname>)
  • Description

    Returns the number of records that match a specified criteria.

  • Parameters

    • distinct|all: optional. This parameter specifies whether to remove duplicates during the counting. The default value is all, which indicates that all records are counted. If this parameter is set to distinct, only records with distinct values are counted.

    • colname: required. The name of a column, which can be of any data type. The value of colname can be an asterisk (*). count(*) indicates that the number of all rows is returned.

  • Return value

    A value of the BIGINT type is returned. If a value of the column specified by colname is null, the row that contains this value is not used for calculation.

  • Examples

    • Example 1: Calculate the total number of employees in all departments. Sample statement:

      select count(*) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 17         |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the number of employees in each department. Sample statement:

      select deptno, count(*) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 6          |
      | 20         | 5          |
      | 30         | 6          |
      +------------+------------+
    • Example 3: Remove duplicates when you calculate the number of departments. Sample statement:

      select count(distinct deptno) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 3          |
      +------------+

COUNT_IF

  • Syntax

    bigint count_if(boolean <expr>)
  • Description

    Returns the number of records whose expr value is True.

  • Parameters

    expr: required. A BOOLEAN expression.

  • Return value

    A value of the BIGINT type is returned. If the value of the expr parameter is False or the value of a specific column in expr is null, the row that contains this value is not used for calculation.

  • Examples

    select count_if(sal > 1000), count_if(sal <=1000) from emp;

    The following result is returned:

    +------------+------------+
    | _c0        | _c1        |
    +------------+------------+
    | 15         | 2          |
    +------------+------------+

COVAR_POP

  • Syntax

    double covar_pop(<colname1>, <colname2>)
  • Description

    Calculates the population covariance of two specified numeric columns. This function is an additional function of MaxCompute V2.0.

  • Parameters

    colname1 and colname2: required. Columns of the numeric data type. If the specified column is not a numeric column, a null value is returned.

  • Examples

    Execute the following statement to append data to the emp table:

    -- sal_new is the new salary column. 
    alter table emp add columns (sal_new bigint);
    insert overwrite table emp select empno, ename, job, mgr, hiredate, sal, comm, deptno, sal+1000 from emp;
    • Example 1: Calculate the population covariance of the sal and sal_new columns. Sample statement:

      select covar_pop(sal, sal_new) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 1594550.1730103805 |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the population covariance of the sal and sal_new columns of employees in the same group. Sample statement:

      select deptno, covar_pop(sal, sal_new) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 2390555.5555555555 |
      | 20         | 1009500.0  |
      | 30         | 372222.2222222222 |
      +------------+------------+

COVAR_SAMP

  • Syntax

    double covar_samp(<colname1>, <colname2>)
  • Description

    Calculates the sample covariance of two specified numeric columns. This function is an additional function of MaxCompute V2.0.

  • Parameters

    colname1 and colname2: required. Columns of the numeric data type. If the specified column is not a numeric column, a null value is returned.

  • Examples

    Execute the following statement to append data to the emp table:

    -- sal_new is the new salary column. 
    alter table emp add columns (sal_new bigint);
    insert overwrite table emp select empno, ename, job, mgr, hiredate, sal, comm, deptno, sal+1000 from emp;
    • Example 1: Calculate the sample covariance of the sal and sal_new columns. Sample statement:

      select covar_samp(sal, sal_new) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 1694209.5588235292 |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the sample covariance of the sal and sal_new columns of employees in the same group. Sample statement:

      select deptno, covar_samp(sal, sal_new) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 2868666.6666666665 |
      | 20         | 1261875.0  |
      | 30         | 446666.6666666666 |
      +------------+------------+

HISTOGRAM

  • Syntax

    map<K, bigint> histogram(K input);
  • Description

    Returns a map that contains the number of times each input value appears. The keys in the map are the input values. Each value in the map is the number of times an input value appears. The null value is ignored.

  • Parameters

    input: input values, which are used as the keys in the map.

  • Return value

    A map that contains the number of times each input value appears is returned.

  • Examples

    select histogram(a) from values
        ('hi'), (null), ('apple'), ('pie'), ('apple') t(a);

    The following result is returned:

    +----------------------------+
    | _c0                        |
    +----------------------------+
    | {"pie":1,"hi":1,"apple":2} |
    +----------------------------+

MAP_AGG

  • Syntax

    map<K, V> map_agg(K a, V b);
  • Description

    Returns a map that is created by using a and b. a is the key in the map. b is the value of the key in the map. If the key in the map is null, the key is ignored. If the key field has duplicate values, one of the values is randomly retained.

  • Parameters

    • a: an input field that is used as the key in the map.

    • b: an input field that is used as the value in the map.

  • Return value

    A new map is returned.

  • Examples

    select map_agg(a, b) from
            values (1L, 'apple'), (2L, 'hi'), (null, 'good'), (1L, 'pie') t(a, b);

    The following result is returned:

    +------------------------+
    | _c0                    |
    +------------------------+
    | {"2":"hi","1":"apple"} |
    +------------------------+

MAP_UNION

  • Syntax

    map<K, V> map_union(map<K, V> input);
  • Description

    Returns a new map that is the union of all input maps. If a key exists in multiple input maps, one of the values that correspond to the key is randomly retained.

  • Parameters

    input: the input maps.

  • Return value

    A new map is returned.

  • Examples

    select map_union(a) from values
        (map(1L, 'hi', 2L, 'apple', 3L, 'pie')), (map(1L, 'good', 4L, 'this')), (null) t(a);

    The following result is returned:

    +-----------------------------------------------+
    | _c0                                           |
    +-----------------------------------------------+
    | {"4":"this","1":"good","2":"apple","3":"pie"} |
    +-----------------------------------------------+

MAP_UNION_SUM

  • Syntax

    map<K, V> map_union_sum(map<K, V> input);
  • Description

    Returns a new map that is the union of all input maps. The output map sums the values of the matching keys in all input maps. If the value that corresponds to a key is NULL, the value is converted into 0.

    Note

    The values in input maps must be of the BIGINT, INT, SMALLINT, TINYINT, FLOAT, DOUBLE, or DECIMAL data type.

  • Parameters

    input: the input maps.

  • Return value

    A new map is returned.

    Note

    The values in the new map are of the BIGINT, DOUBLE, or DECIMAL type.

  • Examples

    select map_union_sum(a) from values
        (map('hi', 2L, 'apple', 3L, 'pie', 1L)), (map('apple', null, 'hi', 4L)), (null) t(a);

    The following result is returned:

    +----------------------------+
    | _c0                        |
    +----------------------------+
    | {"apple":3,"hi":6,"pie":1} |
    +----------------------------+

MAX

  • Syntax

    max(<colname>)
  • Description

    Returns the maximum value of a column.

  • Parameters

    colname: required. The name of a column, which can be of any data type other than BOOLEAN.

  • Return value

    The type of the return value is the same as the type of the colname parameter. The return value varies based on the following rules:

    • If the value of colname is null, the row that contains this value is not used for calculation.

    • If the value of colname is of the BOOLEAN type, the value is not used for calculation.

  • Examples

    • Example 1: Calculate the highest salary (sal) of all employees. Sample statement:

      select max(sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 5000       |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the highest salary of employees in each department. Sample statement:

      select deptno, max(sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 5000       |
      | 20         | 3000       |
      | 30         | 2850       |
      +------------+------------+

MAX_BY

  • Syntax

    max_by(<valueToReturn>,<valueToMaximize>)
  • Description

    Note

    The MAX_BY function provides the same feature as the ARG_MAX function. The difference lies in the parameter order. The MAX_BY function is introduced in MaxCompute to maintain compatibility with the open source syntax.

    Finds the row in which the value of valueToMaximize is included and returns the value of valueToReturn in the row. This function is an additional function of MaxCompute V2.0.

  • Parameters

    • valueToMaximize: required. A value of any data type.

    • valueToReturn: required. A value of any data type.

  • Return value

    The data type of the return value is the same as the data type of the valueToReturn parameter. If multiple rows have the largest value of valueToMaximize, the value of valueToReturn in one of the rows is randomly returned. If the value of valueToMaximize is null, the row that contains this value is not used for calculation.

  • Examples

    • Example 1: Return the name of the employee who has the highest salary. Sample statement:

      select max_by(ename,sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | KING       |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees based on the deptno column and return the name of the employee with the highest salary in each group. Sample statement:

      select deptno, max_by(ename,sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | KING       |
      | 20         | SCOTT      |
      | 30         | BLAKE      |
      +------------+------------+

MEDIAN

  • Syntax

    double median(double <colname>)
    decimal median(decimal <colname>)
  • Description

    Returns the median value of a column.

  • Parameters

    colname: required. The name of a column, which can be of the DOUBLE or DECIMAL type. If the input value is of the STRING or BIGINT type, it is implicitly converted into the DOUBLE type before calculation.

  • Return value

    If the value of colname is null, the row that contains this value is not used for calculation. The following table describes the mappings between data types of input data and return values.

    Input type

    Return value type

    TINYINT

    DOUBLE

    SMALLINT

    DOUBLE

    INT

    DOUBLE

    BIGINT

    DOUBLE

    FLOAT

    DOUBLE

    DOUBLE

    DOUBLE

    DECIMAL

    DECIMAL

  • Examples

    • Example 1: Calculate the median salary (sal) values of all employees. Sample statement:

      select median(sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 1600.0     |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the median salary values of employees in each department. Sample statement:

      select deptno, median(sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 2450.0     |
      | 20         | 2975.0     |
      | 30         | 1375.0     |
      +------------+------------+

MIN

  • Syntax

    min(<colname>)
  • Description

    Returns the minimum value of a column.

  • Parameters

    colname: required. The name of a column, which can be of any data type other than BOOLEAN.

  • Return value

    The type of the return value is the same as the type of the colname parameter. The return value varies based on the following rules:

    • If the value of colname is null, the row that contains this value is not used for calculation.

    • If the value of colname is of the BOOLEAN type, the value is not used for calculation.

  • Examples

    • Example 1: Calculate the lowest salary (sal) of all employees. Sample statement:

      select min(sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 800        |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the lowest salary of employees in each department. Sample statement:

      select deptno, min(sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 1300       |
      | 20         | 800        |
      | 30         | 950        |
      +------------+------------+

MIN_BY

  • Syntax

    min_by(<valueToReturn>,<valueToMinimize>)
  • Description

    Note

    The MIN_BY function provides the same feature as the ARG_MIN function. However, the functions differ in the parameter order. The MIN_BY function is introduced in MaxCompute to maintain compatibility with the open source syntax.

    Finds the row in which the minimum value of valueToMinimize is included and returns the value of valueToReturn in the row. This function is an additional function of MaxCompute V2.0.

  • Parameters

    • valueToMinimize: required. A value of any data type.

    • valueToReturn: required. A value of any data type.

  • Return value

    The data type of the return value is the same as the data type of the valueToReturn parameter. If multiple rows contain the smallest value of valueToMinimize, the value of valueToReturn in one of the rows is randomly returned. If the value of valueToMinimize is null, the row that contains this value is not used for calculation.

  • Examples

    • Example 1: Return the name of the employee who has the lowest salary. Sample statement:

       select min_by(ename,sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | SMITH      |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees based on the deptno column and return the name of the employee with the lowest salary in each group. Sample statement:

      select deptno, min_by(ename,sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | MILLER     |
      | 20         | SMITH      |
      | 30         | JAMES      |
      +------------+------------+

MULTIMAP_AGG

  • Syntax

    map<K, array<V>> multimap_agg(K a, V b);
  • Description

    Returns a map that is created by using a and b. a is the key in the map. b is used to create an array, which is used as the value of the key in the map. If the key in the map is null, the key is ignored.

  • Parameters

    • a: an input field that is used as the key in the map.

    • b: an input field that is used as the value in the map. Fields that correspond to the same key are placed in the same array and are used as values in the map.

  • Return value

    A new map is returned.

  • Examples

    select multimap_agg(a, b) from
            values (1L, 'apple'), (2L, 'hi'), (null, 'good'), (1L, 'pie') t(a, b);

    The following result is returned:

    +----------------------------------+
    | _c0                              |
    +----------------------------------+
    | {"2":["hi"],"1":["apple","pie"]} |
    +----------------------------------+

NUMERIC_HISTOGRAM

  • Syntax

    map<double key, double value> numeric_histogram(bigint <buckets>,
                                                    double <colname>
                                                    [, double <weight>])
                        
  • Description

    Returns an approximate histogram based on a specified column. This function is an additional function of MaxCompute V2.0.

  • Parameters

    • buckets: required. A value of the BIGINT type. This parameter specifies the maximum number of buckets in the column whose approximate histogram is returned.

    • colname: required. A value of the DOUBLE type. This parameter specifies the columns whose approximate histograms need to be calculated.

    • weight: optional. The weight value of data in each row. The value is of the DOUBLE type.

  • Return value

    A value of the map<double key, double value> type is returned. In the return value, key indicates the X-axis of the approximate histogram, and value indicates the approximate height of the Y-axis of the approximate histogram. The return value varies based on the following rules:

    • If the value of buckets is null, null is returned.

    • If the value of colname is null, the row that contains this value is not used for calculation.

  • Examples

    • Return an approximate histogram of the sal column. Sample statement:

      select numeric_histogram(5, sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | {"1328.5714285714287":7.0,"2450.0":2.0,"5000.0":2.0,"875.0":2.0,"2956.25":4.0} |
      +------------+
    • Return the approximate histogram of the sal column. The deptno in each row of data indicates the department weight. Sample statement:

      select numeric_histogram(5, sal, deptno) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | {"2944.4444444444443":90.0,"2450.0":20.0,"5000.0":20.0,"890.0":50.0,"1350.0":160.0} |
      +------------+

PERCENTILE

  • Syntax

    double percentile(bigint <colname>, <p>)
    -- Return multiple exact percentiles as an array. 
    array percentile(bigint <colname>, array(<p1> [, <p2>...]))
  • Description

    Calculates an exact percentile. This function is suitable for scenarios where a small amount of data is calculated. This function sorts data in the specified column in ascending order and then returns the p × 100th percentile of the column. p must be a value from 0 to 1. The values in the column specified for percentile are numbered from 0. For example, the values in a column are 100, 200, and 300 and the values are numbered 0, 1, and 2. If you use this function to calculate the 30th percentile of the column, the value of percentile is 0.6, which is calculated by using the following formula: 2 x 0.3 = 0.6. The value is between the sequence numbers 0 and 1. The result is calculated by using the following formula: 100 + (200 - 100) × 0.6 = 160. This function is an additional function of MaxCompute V2.0.

  • Parameters

    • colname: required. A column of the BIGINT type.

    • p: required. The percentile. Valid values: [0.0,1.0].

  • Return value

    A value of the DOUBLE or ARRAY type is returned.

  • Examples

    • Example 1: Calculate the 30th percentile in the sal column. Sample statement:

      select percentile(sal, 0.3) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 1290.0     |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees based on the deptno column and calculate the 30th percentile of employees in each group in the sal column. Sample statement:

      select deptno, percentile(sal, 0.3) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 1875.0     |
      | 20         | 1475.0     |
      | 30         | 1250.0     |
      +------------+------------+
    • Example 3: Use this function with GROUP BY to group all employees based on the deptno column and calculate the 30th, 50th, and 80th percentiles of employees in each group in the sal column. Sample statement:

      set odps.sql.type.system.odps2=true;
      select deptno, percentile(sal, array(0.3, 0.5, 0.8)) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | [1875.0,2450.0,5000.0] |
      | 20         | [1475.0,2975.0,3000.0] |
      | 30         | [1250.0,1375.0,1600.0] |
      +------------+------------+

PERCENTILE_APPROX

  • Syntax

    double percentile_approx (double <colname>[, double <weight>], <p> [, <B>]))
    -- Returns multiple approximate percentiles as an array. 
    array<double> percentile_approx (double <colname>
                                     [, double <weight>],
                                     array(<p1> [, <p2>...])
                                     [, <B>])
  • Description

    This function is an additional function of MaxCompute V2.0. The values in the column specified for percentile_approx are numbered from 1. For example, the p × 100th percentile of a column that contains n rows of data is calculated. The PERCENTILE_APPROX function first sorts the values in the column in ascending order and then calculates the required percentile. In this section, to help you better understand the calculation logic, arr is used to represent the array of the sorted values in the column, and res is used to represent the return value of the function. The calculation formula of res varies based on the index that is calculated by using the following formula: index = n × p.

    • If the index ≤ 1 condition is met, res is calculated based on the following formula: res = arr[0].

    • If the index >= n - 1 condition is met, res is calculated based on the following formula: res = arr[n-1].

    • If the 1 < index < n - 1 condition is met, diff is first calculated based on the following formula: diff = index + 0.5 - ceil(index).

      If the abs(diff) < 0.5 condition is met, res is calculated based on the following formula: res = arr[ceil(index) - 1].

      If the abs(diff) = 0.5 condition is met, res is calculated based on the following formula: res = arr[index - 1] + (arr[index] - arr[index - 1]) × 0.5.

      The value of abs(diff) cannot be greater than 0.5.

    For example, the col column contains values of 100, 200, 300, and 400, and the sequence numbers of these values are 1, 2, 3, and 4. The 25th, 50th, and 75th percentiles of this column are calculated based on the following formulas:

    • percentile_approx(col, 0.25) = 100 (index = 1)

    • percentile_approx(col, 0.5) = 200 + (300 - 200) × 0.5 = 250 (index = 2)

    • percentile_approx(col, 0.75) = 400 (index = 3)

    Note

    Differences between PERCENTILE_APPROX and PERCENTILE:

    • PERCENTILE_APPROX is used to calculate an approximate percentile, and PERCENTILE is used to calculate an exact percentile. If the amount of data is large, PERCENTILE may fail to run due to memory limits, but PERCENTILE_APPROX does not have this issue.

    • The implementation of PERCENTILE_APPROX is consistent with that of PERCENTILE_APPROX in Hive, but the calculation algorithm of PERCENTILE_APPROX is different from that of percentile. If the amount of data is small, the execution result of PERCENTILE_APPROX is different from that of PERCENTILE.

  • Parameters

    • colname: required. The name of a column, which can be of the DOUBLE type.

    • weight: optional. The weight value of data in each row. The value is of the DOUBLE type.

    • p: required. The approximate percentile. Valid values: [0.0,1.0].

    • B: the accuracy of the return value. A higher accuracy indicates a more accurate value. If you do not specify this parameter, 10000 is used.

  • Return value

    A value of the DOUBLE or ARRAY type is returned. The return value varies based on the following rules:

    • If the value of colname is null, the row that contains this value is not used for calculation.

    • If the value of p or B is null, an error is returned.

  • Examples

    • Example 1: Calculate the 30th percentile in the sal column. Sample statement:

      select percentile_approx(sal, 0.3) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 1252.5     |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees based on the deptno column and calculate the 30th percentile of employees in each group in the sal column. Sample statement:

      select deptno, percentile_approx(sal, 0.3) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 1300.0     |
      | 20         | 950.0      |
      | 30         | 1070.0     |
      +------------+------------+
    • Example 3: Use this function with GROUP BY to group all employees based on the deptno column and calculate the 30th, 50th, and 80th percentiles of employees in each group in the sal column. Sample statement:

      set odps.sql.type.system.odps2=true;
      select deptno, percentile_approx(sal, array(0.3, 0.5, 0.8), 1000) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | [1300.0,1875.0,3470.000000000001] |
      | 20         | [950.0,2037.5,2987.5] |
      | 30         | [1070.0,1250.0,1580.0] |
      +------------+------------+
    • Example 4 (example with weight specified): Use this function with GROUP BY to group all employees based on the deptno column and calculate the 30th, 50th, and 80th percentile of employees in each group in the sal column. Values in the cnt column of the emp table indicate the number of employees whose salaries are in the specified percentile. Sample statement:

      select deptno, percentile_approx(sal, deptno, array(0.3, 0.5, 0.8), 1000)
        from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | [1300.0,1875.0,3470.0] |
      | 20         | [950.0,2037.5,2987.5] |
      | 30         | [1070.0,1250.0,1580.0] |
      +------------+------------+

STDDEV

  • Syntax

    double stddev(double <colname>)
    decimal stddev(decimal <colname>)
  • Description

    Returns the population standard deviation of all input values.

  • Parameters

    colname: required. The name of a column, which can be of the DOUBLE or DECIMAL type. If the input value is of the STRING or BIGINT type, it is implicitly converted into the DOUBLE type before calculation.

  • Return value

    If the value of colname is null, the row that contains this value is not used for calculation. The following table describes the mappings between data types of input data and return values.

    Input type

    Return value type

    TINYINT

    DOUBLE

    SMALLINT

    DOUBLE

    INT

    DOUBLE

    BIGINT

    DOUBLE

    FLOAT

    DOUBLE

    DOUBLE

    DOUBLE

    DECIMAL

    DECIMAL

  • Examples

    • Example 1: Calculate the population standard deviation of salary (sal) values of all employees. Sample statement:

      select stddev(sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 1262.7549932628976 |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the population standard deviation of salary values (sal) of employees in each department. Sample statement:

      select deptno, stddev(sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 1546.1421524412158 |
      | 20         | 1004.7387720198718 |
      | 30         | 610.1001739241043 |
      +------------+------------+

STDDEV_SAMP

  • Syntax

    double stddev_samp(double <colname>)
    decimal stddev_samp(decimal <colname>)
  • Description

    Returns the sample standard deviation of all the input values.

  • Parameters

    colname: required. The name of a column, which can be of the DOUBLE or DECIMAL type. If the input value is of the STRING or BIGINT type, it is implicitly converted into the DOUBLE type before calculation.

  • Return value

    If the value of colname is null, the row that contains this value is not used for calculation. The following table describes the mappings between data types of input data and return values.

    Input type

    Return value type

    TINYINT

    DOUBLE

    SMALLINT

    DOUBLE

    INT

    DOUBLE

    BIGINT

    DOUBLE

    FLOAT

    DOUBLE

    DOUBLE

    DOUBLE

    DECIMAL

    DECIMAL

  • Examples

    • Example 1: Calculate the sample standard deviation of salary (sal) values of all employees. Sample statement:

      select stddev_samp(sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 1301.6180541247609 |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the sample standard deviation of salary values of employees in each department. Sample statement:

      select deptno, stddev_samp(sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 1693.7138680032901 |
      | 20         | 1123.3320969330487 |
      | 30         | 668.3312551921141 |
      +------------+------------+

SUM

  • Syntax

    DECIMAL|DOUBLE|BIGINT sum(<colname>)
  • Description

    Returns the sum of a column.

  • Parameters

    colname: required. Column values support all data types and can be converted into the DOUBLE type before calculation. The name of a column, which can be of the DOUBLE, DECIMAL, or BIGINT type. If the input value is of the STRING type, it is implicitly converted into the DOUBLE type before calculation.

  • Return value

    If the value of colname is null, the row that contains this value is not used for calculation. The following table describes the mappings between data types of input data and return values.

    Input type

    Return value type

    TINYINT

    BIGINT

    SMALLINT

    BIGINT

    INT

    BIGINT

    BIGINT

    BIGINT

    FLOAT

    DOUBLE

    DOUBLE

    DOUBLE

    DECIMAL

    DECIMAL

  • Examples

    • Example 1: Calculate the sum of salary (sal) values of all employees. Sample statement:

      select sum(sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 37775      |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the sum of salary values of employees in each department. Sample statement:

      select deptno, sum(sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 17500      |
      | 20         | 10875      |
      | 30         | 9400       |
      +------------+------------+

VAR_SAMP

  • Syntax

    double var_samp(<colname>)
  • Description

    Calculates the sample variance of a specified numeric column. This function is an additional function of MaxCompute V2.0.

  • Parameters

    colname: required. A column of the numeric data type. If the specified column is not a numeric column, a null value is returned.

  • Return value

    A value of the DOUBLE type is returned.

  • Examples

    • Example 1: Calculate the sample variance of the salary values (sal) of all employees. Sample statement:

      select var_samp(sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 1694209.5588235292 |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the sample variance of the salary values (sal) of the employees in the same group. Sample statement:

      select deptno, var_samp(sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 2868666.666666667 |
      | 20         | 1261875.0  |
      | 30         | 446666.6666666667 |
      +------------+------------+

VARIANCE/VAR_POP

  • Syntax

    double variance(<colname>)
    double var_pop(<colname>)
  • Description

    Calculates the variance of a specified numeric column.

  • Parameters

    colname: required. A column of the numeric data type. If the specified column is not a numeric column, a null value is returned. This function is an additional function of MaxCompute V2.0.

  • Return value

    A value of the DOUBLE type is returned.

  • Examples

    • Example 1: Calculate the variance of the salary values (sal) of all employees. Sample statement:

      select variance(sal) from emp;
      -- The preceding statement is equivalent to the following statement: 
      select var_pop(sal) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | 1594550.1730103805 |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the variance of salary values (sal) of employees in the same group. Sample statement:

      select deptno, variance(sal) from emp group by deptno;
      -- The preceding statement is equivalent to the following statement: 
      select deptno, var_pop(sal) from emp group by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 2390555.5555555555 |
      | 20         | 1009500.0  |
      | 30         | 372222.22222222225 |
      +------------+------------+

WM_CONCAT

  • Syntax

    string wm_concat(string <separator>, string <colname>)
  • Description

    Concatenates values in colname by using a delimiter that is specified by separator.

  • Parameters

    • separator: required. The delimiter, which is a constant of the STRING type.

    • colname: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, or DATETIME type, the value is implicitly converted into the STRING type before calculation.

  • Return value (The GROUP BY clause is specified, and the ORDER BY clause is not specified.)

    A value of the STRING type is returned. The return value varies based on the following rules:

    • If the value of separator is not a constant of the STRING type, an error is returned.

    • If the value of colname is not of the STRING, BIGINT, DOUBLE, or DATETIME type, an error is returned.

    • If the value of colname is null, the row that contains this value is not used for calculation.

    Note

    If the value of table_name in the select wm_concat(',', name) from table_name; statement is an empty set, null is returned.

  • Examples

    • Example 1: Concatenate the names (ename) of all employees. Sample statement:

      select wm_concat(',', ename) from emp;

      The following result is returned:

      +------------+
      | _c0        |
      +------------+
      | SMITH,ALLEN,WARD,JONES,MARTIN,BLAKE,CLARK,SCOTT,KING,TURNER,ADAMS,JAMES,FORD,MILLER,JACCKA,WELAN,TEBAGE |
      +------------+
    • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and concatenate the names (ename) of employees in each department. Sample statement:

      select deptno, wm_concat(',', ename) from emp group by deptno order by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | CLARK,KING,MILLER,JACCKA,WELAN,TEBAGE |
      | 20         | SMITH,JONES,SCOTT,ADAMS,FORD |
      | 30         | ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES |
      +------------+------------+
    • Example 3: Use this function with GROUP BY to group all employees by department (deptno) and concatenate the salary values of the employees in each department after duplicates are removed. Sample statement:

      select deptno, wm_concat(distinct ',', sal) from emp group by deptno order by deptno;

      The following result is returned:

      +------------+------------+
      | deptno     | _c1        |
      +------------+------------+
      | 10         | 1300,2450,5000 |
      | 20         | 1100,2975,3000,800 |
      | 30         | 1250,1500,1600,2850,950 |
      +------------+------------+
    • Example 4: Use this function with GROUP BY and ORDER BY to group all employees by department (deptno), concatenate the salary values (sal) of all employees in each department, and sort the salary values (sal) in a specified order. Sample statement:

      select deptno, wm_concat(',',sal) within group(order by sal) from emp group by deptno order by deptno;

      The following result is returned:

      +------------+------------+
      |deptno|_c1|
      +------------+------------+
      |10|1300,1300,2450,2450,5000,5000|
      |20|800,1100,2975,3000,3000|
      |30|950,1250,1250,1500,1600,2850|
      +------------+------------+