Aggregate functions group multiple input records together to form a single output record. The input and output records have a many-to-one relationship. Aggregate functions can be used with a `GROUP BY`

clause of MaxCompute SQL. This topic describes the syntax, parameters, and examples of aggregate functions supported by MaxCompute SQL. It guides you through data development by using aggregate functions.

The following table describes the aggregate functions supported by MaxCompute SQL.

Function | Description |
---|---|

AVG | Calculates the average value of a column. |

COUNT | Calculates the number of records that match the specified criteria. |

COUNT_IF | Calculates the number of records whose specified expression is True. |

MAX | Calculates the maximum value of a column. |

MIN | Calculates the minimum value of a column. |

MEDIAN | Calculates the median value of a column. |

STDDEV | Returns the population standard deviation of all input values. |

STDDEV_SAMP | Returns the sample standard deviation of all input values. |

SUM | Calculates the sum of a column. |

WM_CONCAT | Concatenates strings with a specified delimiter. |

ANY_VALUE | Returns a non-deterministic value from a specified column. |

APPROX_DISTINCT | Returns the approximate number of distinct input values. |

ARG_MAX | Returns the column value of the row that corresponds to the maximum value of a specified column. |

ARG_MIN | Returns the column value of the row that corresponds to the minimum value of a specified column. |

COLLECT_LIST | Aggregates the specified columns into an array. |

COLLECT_SET | Aggregates the specified columns into an array with only distinct values. |

NUMERIC_HISTOGRAM | Returns the approximate histogram of a specified column. |

PERCENTILE_APPROX | Calculates an approximate percentile. This function is suitable for scenarios where large amounts of data are calculated. |

## Sample data

```
create table if not exists emp
(empno bigint,
ename string,
job string,
mgr bigint,
hiredate datetime,
sal bigint,
comm bigint,
deptno bigint);
tunnel upload emp.txt emp;
```

```
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10
```

## Filter expressions

- Limits
- Only built-in aggregate functions of MaxCompute support filter expressions. User-defined aggregate functions (UDAFs) do not support filter expressions.
`COUNT(*)`

cannot be used with filter expressions. Use the COUNT_IF function instead.

- Syntax
`<aggregate_name>(<expression>[,...]) [filter (where <where_condition>)]`

- Description
All aggregate functions support filter expressions. If a filter condition is specified, only the row data that meets the filter condition is passed to the related aggregate function for data processing.

- Parameters
- aggregate_name: required. The name of the aggregate function. Select the aggregate function described in this topic based on your requirements.
- expression: required. The parameters of the aggregate function. Specify this parameter based on the description of each aggregate function.
- where_condition: optional. The filter condition. For more information about where_condition, see where_condition.

- Return value
See the description for the return value of each aggregate function.

- Examples

The following result is returned:`select sum(sal) filter (where deptno=10), sum(sal) filter (where deptno=20), sum(sal) filter (where deptno=30) from emp;`

`+------------+------------+------------+ | _c0 | _c1 | _c2 | +------------+------------+------------+ | 17500 | 10875 | 9400 | +------------+------------+------------+`

## AVG

- Syntax
`double avg(double <colname>) decimal avg(decimal <colname>)`

- Description
Calculates the average value of a column.

- Parameters
colname: required. The name of a column, which must be of the DOUBLE or DECIMAL type. If the input value is of the STRING or BIGINT type, it is implicitly converted into the DOUBLE type before calculation. If the input value is of another data type, an error is returned. If the input value is NULL, the row that contains this value is not included in the calculation. The values of the BOOLEAN type are not included in the calculation.

- Return value
If the input value is of the DECIMAL type, a value of the DECIMAL type is returned. If the input value is of another valid data type, a value of the DOUBLE type is returned.

- Examples
- Example 1: Calculate the average salary (sal) of all employees. Sample statement:

The following result is returned:`select avg(sal) from emp;`

`+------------+ | _c0 | +------------+ | 2222.0588235294117 | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and calculate the average salary (sal) of employees in each department. Sample statement:

The following result is returned:`select deptno, avg(sal) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 2916.6666666666665 | | 20 | 2175.0 | | 30 | 1566.6666666666667 | +------------+------------+`

- Example 1: Calculate the average salary (sal) of all employees. Sample statement:

## COUNT

- Syntax
`bigint count([distinct|all] <value>)`

- Description
Calculates the number of records that match the specified criteria.

- Parameters
- distinct|all: optional. This parameter specifies whether to remove duplicates during the counting. The default value is all, which indicates that all records are counted. If this parameter is set to distinct, only records with distinct values are counted.
- colname: required. The name of a column, which can be of any data type. If the input value is NULL, the row that contains this value is not included in the calculation. The value of colname can be an asterisk
`*`

.`count(*)`

indicates that the number of all rows is returned.

- Return value
A value of the BIGINT type is returned.

- Examples
- Example 1: Calculate the total number of employees in all departments. Sample statement:

The following result is returned:`select count(*) from emp;`

`+------------+ | _c0 | +------------+ | 17 | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and calculate the number of employees in each department. Sample statement:

The following result is returned:`select deptno, count(*) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 6 | | 20 | 5 | | 30 | 6 | +------------+------------+`

- Example 3: Remove duplicates when you calculate the number of departments. Sample statement:

The following result is returned:`select count(distinct deptno) from emp;`

`+------------+ | _c0 | +------------+ | 3 | +------------+`

- Example 1: Calculate the total number of employees in all departments. Sample statement:

## COUNT_IF

- Syntax
`bigint count_if(boolean <expr>)`

- Description
Calculates the number of records whose expr value is True.

- Parameters
expr: required. A BOOLEAN expression. If the value of the expr parameter is False or the value of a specified column in expr is NULL, the row that contains this value is not included in the calculation.

- Return value
A value of the BIGINT type is returned.

- Examples

The following result is returned:`select count_if(sal > 1000), count_if(sal <=1000) from emp;`

`+------------+------------+ | _c0 | _c1 | +------------+------------+ | 15 | 2 | +------------+------------+`

## MAX

- Syntax
`max(<colname>)`

- Description
Calculates the maximum value of a column.

- Parameters
colname: required. The name of a column, which can be of any data type. If the input value is NULL, the row that contains this value is not included in the calculation. The values of the BOOLEAN type are not included in the calculation.

- Return value
The type of the return value is the same as the type of the colname parameter.

- Examples
- Example 1: Calculate the highest salary (sal) of all employees. Sample statement:

The following result is returned:`select max(sal) from emp;`

`+------------+ | _c0 | +------------+ | 5000 | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and calculate the highest salary of employees in each department. Sample statement:

The following result is returned:`select deptno, max(sal) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 5000 | | 20 | 3000 | | 30 | 2850 | +------------+------------+`

- Example 1: Calculate the highest salary (sal) of all employees. Sample statement:

## MIN

- Syntax
`min(<colname>)`

- Description
Calculates the minimum value of a column.

- Parameters
colname: required. The name of a column, which can be of any data type. If the input value is NULL, the row that contains this value is not included in the calculation. The values of the BOOLEAN type are not included in the calculation.

- Return value
The type of the return value is the same as the type of the colname parameter.

- Examples
- Example 1: Calculate the lowest salary (sal) of all employees. Sample statement:

The following result is returned:`select min(sal) from emp;`

`+------------+ | _c0 | +------------+ | 800 | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and calculate the lowest salary of employees in each department. Sample statement:

The following result is returned:`select deptno, min(sal) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 1300 | | 20 | 800 | | 30 | 950 | +------------+------------+`

- Example 1: Calculate the lowest salary (sal) of all employees. Sample statement:

## MEDIAN

- Syntax
`double median(double <colname>) decimal median(decimal <colname>)`

- Description
Calculates the median value of a column.

- Parameters
colname: required. The name of a column, which must be of the DOUBLE or DECIMAL type. If the input value is of the STRING or BIGINT type, it is implicitly converted into the DOUBLE type before calculation. If the input value is NULL, the row that contains this value is not included in the calculation. If the input value is of another data type, an error is returned.

- Return value
A value of the DOUBLE or DECIMAL type is returned.

- Examples
- Example 1: Calculate the median value of salaries (sal) of all employees. Sample statement:

The following result is returned:`select median(sal) from emp;`

`+------------+ | _c0 | +------------+ | 1600.0 | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and calculate the median value of salaries of employees in each department. Sample statement:

The following result is returned:`select deptno, median(sal) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 2450.0 | | 20 | 2975.0 | | 30 | 1375.0 | +------------+------------+`

- Example 1: Calculate the median value of salaries (sal) of all employees. Sample statement:

## STDDEV

- Syntax
`double stddev(double <colname>) decimal stddev(decimal <colname>)`

- Description
Returns the population standard deviation of all input values.

- Parameters
colname: required. A value of the DOUBLE or DECIMAL type. If the input value is of the STRING or BIGINT type, it is implicitly converted into the DOUBLE type before calculation. If the input value is NULL, the row that contains this value is not included in the calculation. If the input value is of another data type, an error is returned.

- Return value
A value of the DOUBLE or DECIMAL type is returned.

- Examples
- Example 1: Calculate the population standard deviation of salaries (sal) of all employees. Sample statement:

The following result is returned:`select stddev(sal) from emp;`

`+------------+ | _c0 | +------------+ | 1262.7549932628976 | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and calculate the population standard deviation of salaries of employees in each department. Sample statement:

The following result is returned:`select deptno, stddev(sal) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 1546.1421524412158 | | 20 | 1004.7387720198718 | | 30 | 610.1001739241043 | +------------+------------+`

- Example 1: Calculate the population standard deviation of salaries (sal) of all employees. Sample statement:

## STDDEV_SAMP

- Syntax
`double stddev_samp(double <colname>) decimal stddev_samp(decimal <colname>)`

- Description
Returns the sample standard deviation of all input values.

- Parameters
colname: required. The name of a column, which must be of the DOUBLE or DECIMAL type. If the input value is of the STRING or BIGINT type, it is implicitly converted into the DOUBLE type before calculation. If the input value is NULL, the row that contains this value is not included in the calculation. If the input value is of another data type, an error is returned.

- Return value
A value of the DOUBLE or DECIMAL type is returned.

- Examples
- Example 1: Calculate the sample standard deviation of salaries (sal) of all employees. Sample statement:

The following result is returned:`select stddev_samp(sal) from emp;`

`+------------+ | _c0 | +------------+ | 1301.6180541247609 | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and calculate the sample standard deviation of salaries of employees in each department. Sample statement:

The following result is returned:`select deptno, stddev_samp(sal) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 1693.7138680032901 | | 20 | 1123.3320969330487 | | 30 | 668.3312551921141 | +------------+------------+`

- Example 1: Calculate the sample standard deviation of salaries (sal) of all employees. Sample statement:

## SUM

- Syntax
`sum(<colname>)`

- Description
Calculates the sum of a column.

- Parameters
colname: required. A value of the DOUBLE, DECIMAL, or BIGINT type, which is the name of a column. If the input value is of the STRING type, it is implicitly converted into the DOUBLE type before calculation. If the input value is NULL, the row that contains this value is not included in the calculation. The values of the BOOLEAN type are not included in the calculation.

- Return value
If the input parameter is of the BIGINT type, a value of the BIGINT type is returned. If the input parameter is of the DOUBLE or STRING type, a value of the DOUBLE type is returned.

- Examples
- Example 1: Calculate the sum of salaries (sal) of all employees. Sample statement:

The following result is returned:`select sum(sal) from emp;`

`+------------+ | _c0 | +------------+ | 37775 | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and calculate the sum of salaries (sal) of employees in each department. Sample statement:

The following result is returned:`select deptno, sum(sal) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 17500 | | 20 | 10875 | | 30 | 9400 | +------------+------------+`

- Example 1: Calculate the sum of salaries (sal) of all employees. Sample statement:

## WM_CONCAT

- Syntax
`string wm_concat(string <separator>, string <colname>)`

- Description
Concatenates values in colname with a delimiter specified by separator.

- Parameters
- separator: required. The delimiter, which is a constant of the STRING type If the input value is of another data type or is not a constant, an error is returned.
- colname: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, or DATETIME type, it is implicitly converted into the STRING type before calculation. If the input value is of another data type, an error is returned.

- Return value
A value of the STRING type is returned.
**Note**If`table_name`

in the`select wm_concat(',', name) from table_name;`

statement is an empty set, NULL is returned. - Examples
- Example 1: Concatenate the names (ename) of all employees. Sample statement:

The following result is returned:`select wm_concat(',', ename) from emp;`

`+------------+ | _c0 | +------------+ | SMITH,ALLEN,WARD,JONES,MARTIN,BLAKE,CLARK,SCOTT,KING,TURNER,ADAMS,JAMES,FORD,MILLER,JACCKA,WELAN,TEBAGE | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and concatenate the names (ename) of employees in each department. Sample statement:

The following result is returned:`select deptno, wm_concat(',', ename) from emp group by deptno order by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | CLARK,KING,MILLER,JACCKA,WELAN,TEBAGE | | 20 | SMITH,JONES,SCOTT,ADAMS,FORD | | 30 | ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES | +------------+------------+`

- Example 3: Use this function with
`GROUP BY`

to group all employees by department (deptno) and concatenate the salary values of the employees in each department after duplicates are removed. Sample statement:

The following result is returned:`select deptno, wm_concat(distinct ',', sal) from emp group by deptno order by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 1300,2450,5000 | | 20 | 1100,2975,3000,800 | | 30 | 1250,1500,1600,2850,950 | +------------+------------+`

- Example 1: Concatenate the names (ename) of all employees. Sample statement:

## Additional functions of MaxCompute V2.0

- Session level: To use a new data type, you must insert
`set odps.sql.type.system.odps2=true;`

before the SQL statement, and commit and execute them together. - Project level: The project owner can configure new data types for the project as required. It takes 10 to 15 minutes for the settings to take effect. Run the following command:

For more information about`setproject odps.sql.type.system.odps2=true;`

`setproject`

, see Project operations. For the precautions you must take when you enable data types at the project level, see Data types.

## ANY_VALUE

- Syntax
`any_value(<colname>)`

- Description
Returns a non-deterministic value from a specified column. This function is an additional function of MaxCompute V2.0.

- Parameters
colname: required. The name of a column, which can be of any data type. If the input value is NULL, the row that contains this value is not included in the calculation.

- Return value
The type of the return value is the same as the type of the colname parameter.

- Examples
- Example 1: Select one of the employees. Sample statement:

The following result is returned:`select any_value(ename) from emp;`

`+------------+ | _c0 | +------------+ | SMITH | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and select one employee from each group. Sample statement:

The following result is returned:`select deptno, any_value(ename) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | CLARK | | 20 | SMITH | | 30 | ALLEN | +------------+------------+`

- Example 1: Select one of the employees. Sample statement:

## ARG_MAX

- Syntax
`arg_max(<valueToMaximize>, <valueToReturn>)`

- Description
Finds the row where the maximum value of valueToMaximize resides and returns the value of valueToReturn in the row. This function is an additional function of MaxCompute V2.0.

- Parameters
- valueToMaximize: required. A value of any data type.
- valueToReturn: required. A value of any data type.

- Return value
The type of the return value is the same as the type of valueToReturn.

- Examples
- Example 1: Return the name of the employee with the highest salary. Sample statement:

The following result is returned:`select arg_max(sal, ename) from emp;`

`+------------+ | _c0 | +------------+ | KING | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and return the name of the employee with the highest salary in each department. Sample statement:

The following result is returned:`select deptno, arg_max(sal, ename) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | KING | | 20 | SCOTT | | 30 | BLAKE | +------------+------------+`

- Example 1: Return the name of the employee with the highest salary. Sample statement:

## ARG_MIN

- Syntax
`arg_min(<valueToMinimize>, <valueToReturn>)`

- Description
Finds the row where the minimum value of valueToMinimize resides and returns the value of valueToReturn in the row. This function is an additional function of MaxCompute V2.0.

- Parameters
- valueToMinimize: required. A value of any data type.
- valueToReturn: required. A value of any data type.

- Return value
The type of the return value is the same as the type of valueToReturn.

- Examples
- Example 1: Return the name of the employee with the lowest salary. Sample statement:

The following result is returned:`select arg_min(sal, ename) from emp;`

`+------------+ | _c0 | +------------+ | SMITH | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and return the name of the employee with the lowest salary in each department. Sample statement:

The following result is returned:`select deptno, arg_min(sal, ename) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | MILLER | | 20 | SMITH | | 30 | JAMES | +------------+------------+`

- Example 1: Return the name of the employee with the lowest salary. Sample statement:

## APPROX_DISTINCT

- Syntax
`approx_distinct(<colname>)`

- Description
Returns the approximate number of distinct input values in a specified column. This function is an additional function of MaxCompute V2.0.

- Parameters
colname: required. The column from which duplicates need to be removed.

- Return value
A value of the BIGINT type is returned. If all input values are empty, 0 is returned. This function produces a standard error of 5%.

- Examples
- Example 1: Calculate the approximate number of distinct values in the sal column. Sample statement:

The following result is returned:`select approx_distinct(sal) from emp;`

`+-------------------+ | numdistinctvalues | +-------------------+ | 12 | +-------------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and calculate the approximate number of distinct values in the sal column. Sample statement:

The following result is returned:`select deptno, approx_distinct(sal) from emp group by deptno;`

`+------------+-------------------+ | deptno | numdistinctvalues | +------------+-------------------+ | 10 | 3 | | 20 | 4 | | 30 | 5 | +------------+-------------------+`

- Example 1: Calculate the approximate number of distinct values in the sal column. Sample statement:

## COLLECT_LIST

- Syntax
`array collect_list(<colname>)`

- Description
Aggregates the values specified by colname into an array. This function is an additional function of MaxCompute V2.0.

- Parameters
colname: required. The name of a column, which can be of any data type.

- Return value
A value of the ARRAY type is returned.

- Examples
- Example 1: Aggregate the salaries (sal) of all employees into an array. Sample statement:

The following result is returned:`select collect_list(sal) from emp;`

`+------------+ | _c0 | +------------+ | [800,1600,1250,2975,1250,2850,2450,3000,5000,1500,1100,950,3000,1300,5000,2450,1300] | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and aggregate the salaries (sal) of the employees in each department. Sample statement:

The following result is returned:`select deptno, collect_list(sal) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | [2450,5000,1300,5000,2450,1300] | | 20 | [800,2975,3000,1100,3000] | | 30 | [1600,1250,1250,2850,1500,950] | +------------+------------+`

- Example 3: Use this function with
`GROUP BY`

to group all employees by department (deptno) and aggregate the salaries (sal) of the employees in each department after duplicates are removed. Sample statement:

The following result is returned:`select deptno, collect_list(distinct sal) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | [1300,2450,5000] | | 20 | [800,1100,2975,3000] | | 30 | [950,1250,1500,1600,2850] | +------------+------------+`

- Example 1: Aggregate the salaries (sal) of all employees into an array. Sample statement:

## COLLECT_SET

- Syntax
`array collect_set(<colname>)`

- Description
Aggregates the values specified by colname into an array with only distinct values. This function is an additional function of MaxCompute V2.0.

- Parameters
colname: required. The name of a column, which can be of any data type.

- Return value
A value of the ARRAY type is returned.

- Examples
- Example 1: Aggregate the salaries (sal) of all employees into an array with only distinct values. Sample statement:

The following result is returned:`select collect_set(sal) from emp;`

`+------------+ | _c0 | +------------+ | [800,950,1100,1250,1300,1500,1600,2450,2850,2975,3000,5000] | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and aggregate the salaries (sal) of the employees in each department into an array with only distinct values. Sample statement:

The following result is returned:`select deptno, collect_set(sal) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | [1300,2450,5000] | | 20 | [800,1100,2975,3000] | | 30 | [950,1250,1500,1600,2850] | +------------+------------+`

- Example 1: Aggregate the salaries (sal) of all employees into an array with only distinct values. Sample statement:

## NUMERIC_HISTOGRAM

- Syntax
`map<double key, double value> numeric_histogram(bigint <buckets>, double <colname>)`

- Description
Calculates the approximate histogram of a specified column. This function is an additional function of MaxCompute V2.0.

- Parameters
- buckets: required. A value of the BIGINT type. This parameter indicates the maximum number of buckets in the column whose approximate histogram is returned.
- colname: required. A value of the DOUBLE type. This parameter indicates the columns whose approximate histograms need to be calculated.

- Return value
A value of the

`map<double key, double value>`

type is returned. In the return value, key indicates the x-axis of the approximate histogram, and value indicates the approximate height of the y-axis of the approximate histogram. - Examples
Calculate the approximate histogram of the sal column. Sample statement:

The following result is returned:`select numeric_histogram(5, sal) from emp;`

`+------------+ | _c0 | +------------+ | {"1328.5714285714287":7.0,"2450.0":2.0,"5000.0":2.0,"875.0":2.0,"2956.25":4.0} | +------------+`

## PERCENTILE_APPROX

- Syntax
`double percentile_approx (double <colname>, <p> [, <B>])) -- Returns multiple approximate percentiles as an array. array<double> percentile_approx (double <colname>, array(<p1> [, <p2>...]) [, <B>])`

- Description
Calculates approximate percentiles. This function applies to the scenarios where large amounts of data are calculated. This function sorts the values in a specified column in descending order and then returns the approximate percentile of the column at the p

^{th}percentage. The value of`percentile_approx`

starts from 1. For example, if a column contains values of 100, 200, and 300, the sequence numbers of these values are 1, 2, and 3. If you calculate the 0.6^{th}percentile of the column, the`PERCENTILE_APPROX`

function returns 180, which is calculated by using the following formula:`100 + (200 - 100) × 0.8`

. This value corresponds to 1.8, which is calculated by using the formula 3 × 0.6 and is between sequence numbers 1 and 2. This function is an additional function of MaxCompute V2.0.**Note**Differences between`PERCENTILE_APPROX`

and`PERCENTILE`

:`PERCENTILE_APPROX`

is used to calculate the approximate percentile, and`PERCENTILE`

is used to calculate the exact percentile. If the amount of data is large,`PERCENTILE`

may fail to run due to memory limits, but`PERCENTILE_APPROX`

does not have this issue.- The implementation of
`PERCENTILE_APPROX`

is consistent with that of`PERCENTILE_APPROX`

in Hive, but the calculation algorithm is different from`PERCENTILE`

. If the amount of data is small, the execution result is different from that of`PERCENTILE`

.

- Parameters
- colname: required. A value of the DOUBLE type, which indicates the name of a column.
- p: required. The approximate percentile. Valid values:
`[0.0, 1.0]`

. - B: the accuracy of the return value. A higher accuracy indicates a more accurate value. If you do not specify this parameter, 10000 is used. If the value of colname is less than the value of B, the exact percentile is returned.

- Return value
A value of the DOUBLE or ARRAY type is returned.

- Examples
- Example1: Calculate the 0.3
^{rd}percentile in the sal column. Sample statement:

The following result is returned:`select percentile_approx(sal, 0.3) from emp;`

`+------------+ | _c0 | +------------+ | 1252.5 | +------------+`

- Example 2: Use this function with
`GROUP BY`

to group all employees by department (deptno) and calculate the 0.3^{rd}percentile of employees in each department in the sal column. Sample statement:

The following result is returned:`select deptno, percentile_approx(sal, 0.3) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 1300.0 | | 20 | 950.0 | | 30 | 1070.0 | +------------+------------+`

- Example 3: Use this function with
`GROUP BY`

to group all employees by department (deptno) and calculate the 0.3^{rd}, 0.5^{th}, and 0.8^{th}percentiles of employees in each department in the sal column. Sample statement:

The following result is returned:`select deptno, percentile_approx(sal, array(0.3, 0.5, 0.8), 1000) from emp group by deptno;`

`+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | [1300.0,1875.0,3470.000000000001] | | 20 | [950.0,2037.5,2987.5] | | 30 | [1070.0,1250.0,1580.0] | +------------+------------+`

- Example1: Calculate the 0.3