You can use window functions to flexibly analyze and process column data in a window in MaxCompute SQL. This topic describes the syntax, parameters, and examples of window functions supported by MaxCompute SQL. It guides you through data development by using window functions.

The following table lists window functions supported by MaxCompute SQL.

Function | Description |
---|---|

COUNT | Counts the rows. |

AVG | Calculates the average value of a column. |

MAX | Returns the maximum value of a column. |

MIN | Returns the minimum value of a column. |

MEDIAN | Calculates the median value of a column. |

STDDEV | Calculates the population standard deviation of a column. |

STDDEV_SAMP | Calculates the sample standard deviation of a column. |

SUM | Calculates the sum of a column. |

DENSE_RANK | Calculates the ranking of a row in an ordered group of rows. |

RANK | Calculates the ranking of a value in a set of values. |

LAG | Returns the values for a row at a given offset preceding the current row. |

LEAD | Returns the values for a row at a given offset following the current row. |

PERCENT_RANK | Calculates the relative percent ranking of a row in a group of data. |

ROW_NUMBER | Calculates the ordinal number of a row. |

CLUSTER_SAMPLE | Performs cluster sampling. |

CUME_DIST | Calculates the cumulative distribution. |

## Limits

The following limits apply to window functions:

- Window functions can be included only in
`SELECT`

statements. - A window function cannot contain nested window functions or aggregate functions.
- Window functions cannot be used with aggregate functions of the same level.
- A maximum of five window functions can be used in a MaxCompute SQL statement.
- A window can contain a maximum of 100 million rows of data. We recommend that you use a maximum of 5 million rows in a window. If the number of rows exceeds 5 million, an error is returned.

## Syntax

```
window_func() over (partition by <col1>[,<col2>…]
[order by <col1>[asc|desc][, <col2>[asc|desc]…]] <windowing_clause>)
```

- partition by <col1>[,<col2>…]: required. This parameter specifies the columns that define the window.
Rows with the same values in partition key columns are considered in the same window.

**Note**`partition by UID`

indicates that records with the same UID are considered in the same window. Therefore, the records with the same UID can contain a maximum of 100 million rows of data. We recommend that you use a maximum of 5 million rows in a window. - order by <col1>[asc|desc][, <col2>[asc|desc]…]: optional. This parameter specifies the rule used to sort data in a window.
**Note**If some values in`ORDER BY`

are the same, the sorting result may not be accurate. To reduce randomness, keep each value in`ORDER BY`

unique. - windowing_clause: optional. You can use the following
`ROWS`

functions in windowing_clause to specify the windowing method:`rows between x preceding|following and y preceding|following`

: indicates a window range from xth row preceding or following the current row to the yth row preceding or following the current row.`rows x preceding|following`

: indicates a window range from the xth row preceding or following the current row to the current row.

**Note**- x and y must be integer constants that are greater than or equal to 0. Their values range from 0 to 10000. The value 0 indicates the current row. Before you use
`ROWS`

to specify the windowing method, you must specify`ORDER BY`

. - Only the following window functions can use
`ROWS`

to specify the windowing method: AVG, COUNT, MAX, MIN, STDDEV, and SUM.

## Sample data

```
create table if not exists emp
(empno bigint,
ename string,
job string,
mgr bigint,
hiredate datetime,
sal bigint,
comm bigint,
deptno bigint);
tunnel upload emp.txt emp;
```

```
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10
```

## COUNT

- Syntax
`bigint count([distinct] <expr>) over (partition by <col 1>[, <col 2>…] [order by col1[asc|desc][, col2[asc|desc]…]] [<windowing_clause>])`

- Description
This function counts the rows.

- Parameters
- expr: required. This parameter specifies the column whose values you want to count. The column can be of any data type. If the value for a row is NULL, this row is not used for the calculation. If the distinct keyword is specified, only distinct values are counted.
- partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
- order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the number of rows for expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the cumulative count from the starting row to the current row in the current window is returned.
**Note**- If the distinct keyword is specified, ORDER BY cannot be used.
- If duplicate values are specified for ORDER BY, the processing method varies based on the compatibility between MaxCompute and Hive. For more information, see the examples in this section.

- Return value
A value of the BIGINT type is returned.

- Examples
- Example 1: Use the sal column to define the window. ORDER BY is not specified. This function returns the cumulative count from the starting row to the last row in the current window. The current window indicates the rows that have the same sal value. Sample statement:

The following result is returned:`select sal, count(sal) over (partition by sal) as count from emp;`

`+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 2 | -- This row is the starting row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value. | 1250 | 2 | -- The return value is the cumulative count from the first row to the second row in the current window. | 1300 | 2 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 2 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 2 | | 3000 | 2 | | 5000 | 2 | | 5000 | 2 | +------------+------------+`

- Example 2: In non-Hive-compatible mode, use the sal column to define the window. ORDER BY is specified. This function returns the cumulative count from the starting row to the current row in the current window. The current window indicates the rows that have the same sal value. Sample statement:

The following result is returned:`-- Disable the Hive-compatible mode. set odps.sql.hive.compatible=false; -- Execute the following statement: select sal, count(sal) over (partition by sal order by sal) as count from emp;`

`+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 1 | -- This row is the starting row of this window. The cumulative count for the first row is 1. | 1250 | 2 | -- The cumulative count for the second row is 2. | 1300 | 1 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 1 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 1 | | 3000 | 2 | | 5000 | 1 | | 5000 | 2 | +------------+------------+`

- Example 3: In Hive-compatible mode, use the sal column to define the window. ORDER BY is specified. This function returns the cumulative count from the starting row to the last row in the current window. The current window indicates the rows that have the same sal value. Sample statement:

The following result is returned:`-- Enable the Hive-compatible mode. set odps.sql.hive.compatible=true; -- Execute the following statement: select sal, count(sal) over (partition by sal order by sal) as count from emp;`

`+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 2 | -- This row is the starting row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value. | 1250 | 2 | -- The return value is the cumulative count from the first row to the second row in the current window. | 1300 | 2 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 2 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 2 | | 3000 | 2 | | 5000 | 2 | | 5000 | 2 | +------------+------------+`

- Example 1: Use the sal column to define the window. ORDER BY is not specified. This function returns the cumulative count from the starting row to the last row in the current window. The current window indicates the rows that have the same sal value. Sample statement:

## AVG

- Syntax
`avg([distinct] <expr>) over (partition by <col1>[, <col2>…] [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])`

- Description
This function calculates the average value of a column.

- Parameters
- expr: required. This parameter specifies the column for which you want to calculate the average value. The column is of the DOUBLE or DECIMAL type. If the distinct keyword is specified, the average value of distinct values is calculated.
- If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before the calculation. If it is of another data type, an error is returned.
- If the value for a row is NULL, this row is not used for the calculation.

- partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
- order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the average value of expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the cumulative average value of the values from the starting row to the current row in the current window is returned.
**Note**- If the distinct keyword is specified, ORDER BY cannot be used.
- If duplicate values are specified for ORDER BY, the processing method varies based on the compatibility between MaxCompute and Hive. For more information, see the examples in this section.

- expr: required. This parameter specifies the column for which you want to calculate the average value. The column is of the DOUBLE or DECIMAL type. If the distinct keyword is specified, the average value of distinct values is calculated.
- Return value
A value of the DOUBLE type is returned.

- Examples
- Example 1: Use the deptno column to define the window and calculate the average value of the sal column. ORDER BY is not specified. This function returns the cumulative average value of the values from the starting row to the last row in the current window. The current window indicates the rows that have the same deptno value. Sample statement:

The following result is returned:`select deptno, sal, avg(sal) over (partition by deptno) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 2916.6666666666665 | -- This row is the starting row of this window. The return value is the cumulative average value of the values from the first row to the sixth row. | 10 | 2450 | 2916.6666666666665 | -- The return value is the cumulative average value of the values from the first row to the sixth row. | 10 | 5000 | 2916.6666666666665 | -- The return value is the cumulative average value of the values from the first row to the sixth row. | 10 | 1300 | 2916.6666666666665 | | 10 | 5000 | 2916.6666666666665 | | 10 | 2450 | 2916.6666666666665 | | 20 | 3000 | 2175.0 | | 20 | 3000 | 2175.0 | | 20 | 800 | 2175.0 | | 20 | 1100 | 2175.0 | | 20 | 2975 | 2175.0 | | 30 | 1500 | 1566.6666666666667 | | 30 | 950 | 1566.6666666666667 | | 30 | 1600 | 1566.6666666666667 | | 30 | 1250 | 1566.6666666666667 | | 30 | 1250 | 1566.6666666666667 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+`

- Example 2: In non-Hive-compatible mode, use the deptno column to define the window and calculate the average value of the sal column. ORDER BY is specified. This function returns the cumulative average value of the values from the starting row to the current row in the current window. The current window indicates the rows that have the same deptno value. Sample statement:

The following result is returned:`-- Disable the Hive-compatible mode. set odps.sql.hive.compatible=false; -- Execute the following statement: select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300.0 | -- This row is the starting row of this window. | 10 | 1300 | 1300.0 | -- The return value is the cumulative average value of the values in the first and second rows. | 10 | 2450 | 1683.3333333333333 | -- The return value is the cumulative average value of the values from the first row to the third row. | 10 | 2450 | 1875.0 | -- The return value is the cumulative average value of the values from the first row to the fourth row. | 10 | 5000 | 2500.0 | -- The return value is the cumulative average value of the values from the first row to the fifth row. | 10 | 5000 | 2916.6666666666665 | -- The return value is the cumulative average value of the values from the first row to the sixth row. | 20 | 800 | 800.0 | | 20 | 1100 | 950.0 | | 20 | 2975 | 1625.0 | | 20 | 3000 | 1968.75 | | 20 | 3000 | 2175.0 | | 30 | 950 | 950.0 | | 30 | 1250 | 1100.0 | | 30 | 1250 | 1150.0 | | 30 | 1500 | 1237.5 | | 30 | 1600 | 1310.0 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+`

- Example 3: In Hive-compatible mode, use the deptno column to define the window and calculate the average value of the sal column. ORDER BY is specified. This function returns the cumulative average value of the values from the starting row to the row that has the same sal value as the current row in the current window. The average values for the rows the have the same sal value are the same. The current window indicates the rows that have the same deptno value. Sample statement:

The following result is returned:`-- Enable the Hive-compatible mode. set odps.sql.hive.compatible=true; -- Execute the following statement: select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300.0 | -- This row is the starting row of this window. The average value for the first row is the cumulative average value of the values in the first and second rows because the two rows have the same sal value. | 10 | 1300 | 1300.0 | -- The return value is the cumulative average value of the values in the first and second rows. | 10 | 2450 | 1875.0 | -- The average value for the third row is the cumulative average value of the values from the first row to the fourth row because the third and fourth rows have the same sal value. | 10 | 2450 | 1875.0 | -- The return value is the cumulative average value of the values from the first row to the fourth row. | 10 | 5000 | 2916.6666666666665 | | 10 | 5000 | 2916.6666666666665 | | 20 | 800 | 800.0 | | 20 | 1100 | 950.0 | | 20 | 2975 | 1625.0 | | 20 | 3000 | 2175.0 | | 20 | 3000 | 2175.0 | | 30 | 950 | 950.0 | | 30 | 1250 | 1150.0 | | 30 | 1250 | 1150.0 | | 30 | 1500 | 1237.5 | | 30 | 1600 | 1310.0 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+`

- Example 1: Use the deptno column to define the window and calculate the average value of the sal column. ORDER BY is not specified. This function returns the cumulative average value of the values from the starting row to the last row in the current window. The current window indicates the rows that have the same deptno value. Sample statement:

## MAX

- Syntax
`max([distinct] <expr>) over(partition by <col1>[, <col2>…] [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])`

- Description
This function returns the maximum value of a column.

- Parameters
- expr: required. This parameter specifies the column whose maximum value you want to obtain. The column can be of any data type other than BOOLEAN. If the value for a row is NULL, this row is not used for the calculation. If the distinct keyword is specified, the maximum value among distinct values is returned. The result is not affected regardless of whether the keyword is specified.
- partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
- order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the maximum value of expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the maximum value among the values from the starting row to the current row in the current window is returned.

**Note**If the distinct keyword is specified, ORDER BY cannot be used. - Return value
A value of the same type as expr is returned.

- Examples
- Example 1: Use the deptno column to define the window and obtain the maximum value of the sal column. ORDER BY is not specified. This function returns the maximum value of the current window. The current window indicates the rows that have the same deptno value. Execute the following statement:

The following result is returned:`select deptno, sal, max(sal) over (partition by deptno) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 5000 | -- This row is the starting row of this window. The return value is the maximum value among the values from the first row to the sixth row. | 10 | 2450 | 5000 | -- The return value is the maximum value among the values from the first row to the sixth row. | 10 | 5000 | 5000 | -- The return value is the maximum value among the values from the first row to the sixth row. | 10 | 1300 | 5000 | | 10 | 5000 | 5000 | | 10 | 2450 | 5000 | | 20 | 3000 | 3000 | | 20 | 3000 | 3000 | | 20 | 800 | 3000 | | 20 | 1100 | 3000 | | 20 | 2975 | 3000 | | 30 | 1500 | 2850 | | 30 | 950 | 2850 | | 30 | 1600 | 2850 | | 30 | 1250 | 2850 | | 30 | 1250 | 2850 | | 30 | 2850 | 2850 | +------------+------------+------------+`

- Example 2: Use the deptno column to define the window and obtain the maximum value of the sal column. ORDER BY is specified. This function returns the maximum value among the values from the starting row to the current row in the current window. The current window indicates the rows that have the same deptno value. Execute the following statement:

The following result is returned:`select deptno, sal, max(sal) over (partition by deptno order by sal) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- This row is the starting row of this window. | 10 | 1300 | 1300 | -- The return value is the maximum value among the values in the first and second rows. | 10 | 2450 | 2450 | -- The return value is the maximum value among the values from the first row to the third row. | 10 | 2450 | 2450 | -- The return value is the maximum value among the values from the first row to the fourth row. | 10 | 5000 | 5000 | | 10 | 5000 | 5000 | | 20 | 800 | 800 | | 20 | 1100 | 1100 | | 20 | 2975 | 2975 | | 20 | 3000 | 3000 | | 20 | 3000 | 3000 | | 30 | 950 | 950 | | 30 | 1250 | 1250 | | 30 | 1250 | 1250 | | 30 | 1500 | 1500 | | 30 | 1600 | 1600 | | 30 | 2850 | 2850 | +------------+------------+------------+`

- Example 1: Use the deptno column to define the window and obtain the maximum value of the sal column. ORDER BY is not specified. This function returns the maximum value of the current window. The current window indicates the rows that have the same deptno value. Execute the following statement:

## MIN

- Syntax
`min([distinct] <expr>) over(partition by <col1>[, <col2>…] [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])`

- Description
This function returns the minimum value of a column.

- Parameters
- expr: required. This parameter specifies the column whose minimum value you want to obtain. The column can be of any data type other than BOOLEAN. If the value for a row is NULL, this row is not used for the calculation. If the distinct keyword is specified, the minimum value among distinct values is returned. The result is not affected regardless of whether the keyword is specified.
- order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the minimum value of expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the minimum value among the values from the starting row to the current row in the current window is returned.
**Note**If the distinct keyword is specified, ORDER BY cannot be used.

- Return value
A value of the same data type as expr is returned.

- Examples
- Example 1: Use the deptno column to define the window and obtain the minimum value of the sal column. ORDER BY is not specified. This function returns the minimum value of the current window. The current window indicates the rows that have the same deptno value. Execute the following statement:

The following result is returned:`select deptno, sal, min(sal) over (partition by deptno) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- This row is the starting row of this window. The return value is the minimum value among the values from the first row to the sixth row. | 10 | 2450 | 1300 | -- The return value is the minimum value among the values from the first row to the sixth row. | 10 | 5000 | 1300 | -- The return value is the minimum value among the values from the first row to the sixth row. | 10 | 1300 | 1300 | | 10 | 5000 | 1300 | | 10 | 2450 | 1300 | | 20 | 3000 | 800 | | 20 | 3000 | 800 | | 20 | 800 | 800 | | 20 | 1100 | 800 | | 20 | 2975 | 800 | | 30 | 1500 | 950 | | 30 | 950 | 950 | | 30 | 1600 | 950 | | 30 | 1250 | 950 | | 30 | 1250 | 950 | | 30 | 2850 | 950 | +------------+------------+------------+`

- Example 2: Use the deptno column to define the window and obtain the minimum value of the sal column. ORDER BY is specified. This function returns the minimum value among the values from the starting row to the current row in the current window. The current window indicates the rows that have the same deptno value. Execute the following statement:

The following result is returned:`select deptno, sal, min(sal) over (partition by deptno order by sal) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- This row is the starting row of this window. | 10 | 1300 | 1300 | -- The return value is the minimum value among the values in the first and second rows. | 10 | 2450 | 1300 | -- The return value is the minimum value among the values from the first row to the third row. | 10 | 2450 | 1300 | | 10 | 5000 | 1300 | | 10 | 5000 | 1300 | | 20 | 800 | 800 | | 20 | 1100 | 800 | | 20 | 2975 | 800 | | 20 | 3000 | 800 | | 20 | 3000 | 800 | | 30 | 950 | 950 | | 30 | 1250 | 950 | | 30 | 1250 | 950 | | 30 | 1500 | 950 | | 30 | 1600 | 950 | | 30 | 2850 | 950 | +------------+------------+------------+`

- Example 1: Use the deptno column to define the window and obtain the minimum value of the sal column. ORDER BY is not specified. This function returns the minimum value of the current window. The current window indicates the rows that have the same deptno value. Execute the following statement:

## MEDIAN

- Syntax
`double median(double <expr>) over (partition by <col1>[, <col2>…]) decimal median(decimal <expr>) over (partition by <col1>[, <col2>…)`

- Description
This function calculates the median value of a column.

- Parameters
- expr: required. This parameter specifies the column or numerals whose median value you want to calculate. The value is of the DOUBLE or DECIMAL type. You can enter 1 to 255 numerals.
- If the input value is of the DOUBLE type, it is automatically converted into an array of the DOUBLE type for the calculation.
- If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before the calculation. If it is of another data type, an error is returned.
- If the input value is NULL, NULL is returned.

- expr: required. This parameter specifies the column or numerals whose median value you want to calculate. The value is of the DOUBLE or DECIMAL type. You can enter 1 to 255 numerals.
- Return value
A value of the DOUBLE or DECIMAL type is returned.

- Examples
Use the deptno column to define the window and calculate the median value of the sal column. This function returns the median value of the current window. The current window indicates the rows that have the same deptno value. Execute the following statement:

The following result is returned:`select deptno, sal, median(sal) over (partition by deptno) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 2450.0 | -- This row is the starting row of this window. The return value is the median value of the values from the first row to the sixth row. | 10 | 2450 | 2450.0 | | 10 | 5000 | 2450.0 | | 10 | 1300 | 2450.0 | | 10 | 5000 | 2450.0 | | 10 | 2450 | 2450.0 | | 20 | 3000 | 2975.0 | | 20 | 3000 | 2975.0 | | 20 | 800 | 2975.0 | | 20 | 1100 | 2975.0 | | 20 | 2975 | 2975.0 | | 30 | 1500 | 1375.0 | | 30 | 950 | 1375.0 | | 30 | 1600 | 1375.0 | | 30 | 1250 | 1375.0 | | 30 | 1250 | 1375.0 | | 30 | 2850 | 1375.0 | +------------+------------+------------+`

## STDDEV

- Syntax
`double stddev|stddev_pop([distinct] <expr>) over (partition by <col1>[, <col2>…] [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>]) decimal stddev|stddev_pop([distinct] <expr>) over (partition by <col1>[, <col2>…] [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])`

- Description
This function calculates the population standard deviation of a column.

- Parameters
- expr: required. This parameter specifies the column whose population standard deviation you want to calculate. The column is of the DOUBLE or DECIMAL type.
- If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before the calculation. If it is of another data type, an error is returned.
- If the value for a row is NULL, this row is not used for the calculation.
- If the distinct keyword is specified, the population standard deviation of distinct values is calculated.

- order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the population standard deviation of expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the cumulative population standard deviation of the values from the starting row to the current row in the current window is returned.
**Note**- If the distinct keyword is specified, ORDER BY cannot be used.
- If duplicate values are specified for ORDER BY, the processing method varies based on the compatibility between MaxCompute and Hive. For more information, see the examples in this section.

- expr: required. This parameter specifies the column whose population standard deviation you want to calculate. The column is of the DOUBLE or DECIMAL type.
- Return value
A value of the same data type as expr is returned.

- Examples
- Example 1: Use the deptno column to define the window and calculate the population standard deviation of the sal column. ORDER BY is not specified. This function returns the cumulative population standard deviation of the current window. The current window indicates the rows that have the same deptno value. Sample statement:

The following result is returned:`select deptno, sal, stddev(sal) over (partition by deptno) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1546.1421524412158 | -- This row is the starting row of this window. The return value is the cumulative population standard deviation of the values from the first row to the sixth row. | 10 | 2450 | 1546.1421524412158 | -- The return value is the cumulative population standard deviation of the values from the first row to the sixth row. | 10 | 5000 | 1546.1421524412158 | | 10 | 1300 | 1546.1421524412158 | | 10 | 5000 | 1546.1421524412158 | | 10 | 2450 | 1546.1421524412158 | | 20 | 3000 | 1004.7387720198718 | | 20 | 3000 | 1004.7387720198718 | | 20 | 800 | 1004.7387720198718 | | 20 | 1100 | 1004.7387720198718 | | 20 | 2975 | 1004.7387720198718 | | 30 | 1500 | 610.1001739241042 | | 30 | 950 | 610.1001739241042 | | 30 | 1600 | 610.1001739241042 | | 30 | 1250 | 610.1001739241042 | | 30 | 1250 | 610.1001739241042 | | 30 | 2850 | 610.1001739241042 | +------------+------------+------------+`

- Example 2: In non-Hive-compatible mode, use the deptno column to define the window and calculate the population standard deviation of the sal column. ORDER BY is specified. This function returns the cumulative population standard deviation of the values from the starting row to the current row in the current window. The current window indicates the rows that have the same deptno value. Sample statement:

The following result is returned:`-- Disable the Hive-compatible mode. set odps.sql.hive.compatible=false; -- Execute the following statement: select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 0.0 | -- This row is the starting row of this window. | 10 | 1300 | 0.0 | -- The return value is the cumulative population standard deviation of the values in the first and second rows. | 10 | 2450 | 542.1151989096865 | -- The return value is the cumulative population standard deviation of the values from the first row to the third row. | 10 | 2450 | 575.0 | -- The return value is the cumulative population standard deviation of the values from the first row to the fourth row. | 10 | 5000 | 1351.6656391282572 | | 10 | 5000 | 1546.1421524412158 | | 20 | 800 | 0.0 | | 20 | 1100 | 150.0 | | 20 | 2975 | 962.4188277460079 | | 20 | 3000 | 1024.2947268730811 | | 20 | 3000 | 1004.7387720198718 | | 30 | 950 | 0.0 | | 30 | 1250 | 150.0 | | 30 | 1250 | 141.4213562373095 | | 30 | 1500 | 194.8557158514987 | | 30 | 1600 | 226.71568097509268 | | 30 | 2850 | 610.1001739241042 | +------------+------------+------------+`

- Example 3: In Hive-compatible mode, use the deptno column to define the window and calculate the population standard deviation of the sal column. ORDER BY is specified. This function returns the cumulative population standard deviation of the values from the starting row to the row that has the same sal value as the current row in the current window. The population standard deviations for the rows the have the same sal value are the same. The current window indicates the rows that have the same deptno value. Sample statement:

The following result is returned:`-- Enable the Hive-compatible mode. set odps.sql.hive.compatible=true; -- Execute the following statement: select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 0.0 | -- This row is the starting row of this window. The population standard deviation for the first row is the cumulative population standard deviation of the values in the first and second rows because the two rows have the same sal value. | 10 | 1300 | 0.0 | -- The return value is the cumulative population standard deviation of the values in the first and second rows. | 10 | 2450 | 575.0 | -- The population standard deviation for the third row is the cumulative population standard deviation of the values from the first row to the fourth row because the third and fourth rows have the same sal value. | 10 | 2450 | 575.0 | -- The return value is the cumulative population standard deviation of the values from the first row to the fourth row. | 10 | 5000 | 1546.1421524412158 | | 10 | 5000 | 1546.1421524412158 | | 20 | 800 | 0.0 | | 20 | 1100 | 150.0 | | 20 | 2975 | 962.4188277460079 | | 20 | 3000 | 1004.7387720198718 | | 20 | 3000 | 1004.7387720198718 | | 30 | 950 | 0.0 | | 30 | 1250 | 141.4213562373095 | | 30 | 1250 | 141.4213562373095 | | 30 | 1500 | 194.8557158514987 | | 30 | 1600 | 226.71568097509268 | | 30 | 2850 | 610.1001739241042 | +------------+------------+------------+`

- Example 1: Use the deptno column to define the window and calculate the population standard deviation of the sal column. ORDER BY is not specified. This function returns the cumulative population standard deviation of the current window. The current window indicates the rows that have the same deptno value. Sample statement:

## STDDEV_SAMP

- Syntax
`double stddev_samp([distinct] <expr>) over(partition by <col1>[, <col2>…] [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>]) decimal stddev_samp([distinct] <expr>) over((partition by [col1,col2…] [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])`

- Description
This function calculates the sample standard deviation of a column.

- Parameters
- expr: required. This parameter specifies the column whose sample standard deviation you want to calculate. The column is of the DOUBLE or DECIMAL type.
- If the value for a row is NULL, this row is not used for the calculation.
- If the distinct keyword is specified, the sample standard deviation of distinct values is calculated.

- order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the sample standard deviation of expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the sample standard deviation of the values from the starting row to the current row in the current window is returned.
**Note**If the distinct keyword is specified, ORDER BY cannot be used.

- expr: required. This parameter specifies the column whose sample standard deviation you want to calculate. The column is of the DOUBLE or DECIMAL type.
- Return value
A value of the same data type as expr is returned.

- Examples
- Example 1: Use the deptno column to define the window and calculate the sample standard deviation of the sal column. ORDER BY is not specified. This function returns the cumulative sample standard deviation of the current window. The current window indicates the rows that have the same deptno value. Sample statement:

The following result is returned:`select deptno, sal, stddev_samp(sal) over (partition by deptno) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1693.7138680032904 | -- This row is the starting row of this window. The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. | 10 | 2450 | 1693.7138680032904 | -- The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. | 10 | 5000 | 1693.7138680032904 | -- The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. | 10 | 1300 | 1693.7138680032904 | | 10 | 5000 | 1693.7138680032904 | | 10 | 2450 | 1693.7138680032904 | | 20 | 3000 | 1123.3320969330487 | | 20 | 3000 | 1123.3320969330487 | | 20 | 800 | 1123.3320969330487 | | 20 | 1100 | 1123.3320969330487 | | 20 | 2975 | 1123.3320969330487 | | 30 | 1500 | 668.331255192114 | | 30 | 950 | 668.331255192114 | | 30 | 1600 | 668.331255192114 | | 30 | 1250 | 668.331255192114 | | 30 | 1250 | 668.331255192114 | | 30 | 2850 | 668.331255192114 | +------------+------------+------------+`

- Example 2: Use the deptno column to define the window and calculate the sample standard deviation of the sal column. ORDER BY is specified. This function returns the cumulative sample standard deviation of the values from the starting row to the current row in the current window. The current window indicates the rows that have the same deptno value. Sample statement:

The following result is returned:`select deptno, sal, stddev_samp(sal) over (partition by deptno order by sal) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 0.0 | -- This row is the starting row of this window. | 10 | 1300 | 0.0 | -- The return value is the cumulative sample standard deviation of the values in the first and second rows. | 10 | 2450 | 663.9528095680697 | -- The return value is the cumulative sample standard deviation of the values from the first row to the third row. | 10 | 2450 | 663.9528095680696 | | 10 | 5000 | 1511.2081259707413 | | 10 | 5000 | 1693.7138680032904 | | 20 | 800 | 0.0 | | 20 | 1100 | 212.13203435596427 | | 20 | 2975 | 1178.7175234126282 | | 20 | 3000 | 1182.7536725793752 | | 20 | 3000 | 1123.3320969330487 | | 30 | 950 | 0.0 | | 30 | 1250 | 212.13203435596427 | | 30 | 1250 | 173.20508075688772 | | 30 | 1500 | 225.0 | | 30 | 1600 | 253.4758371127315 | | 30 | 2850 | 668.331255192114 | +------------+------------+------------+`

- Example 1: Use the deptno column to define the window and calculate the sample standard deviation of the sal column. ORDER BY is not specified. This function returns the cumulative sample standard deviation of the current window. The current window indicates the rows that have the same deptno value. Sample statement:

## SUM

- Syntax
`sum([distinct] <expr>) over (partition by <col1>[, <col2>…] [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])`

- Description
This function calculates the sum of a column.

- Parameters
- expr: required. This parameter specifies the column whose sum you want to calculate. The column is of the DOUBLE, DECIMAL, or BIGINT type.
- If an input value is of the STRING type, it is implicitly converted into a value of the DOUBLE type before the calculation. If it is of another data type, an error is returned.
- If the value for a row is NULL, this row is not used for the calculation.
- If the distinct keyword is specified, the sum of distinct values is calculated.

- order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the sum of expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the cumulative sum of the values from the starting row to the current row in the current window is returned.
**Note**- If the distinct keyword is specified, ORDER BY cannot be used.

- expr: required. This parameter specifies the column whose sum you want to calculate. The column is of the DOUBLE, DECIMAL, or BIGINT type.
- Return value
- If the input values are of the BIGINT type, a value of the BIGINT type is returned.
- If the input values are of the DECIMAL type, a value of the DECIMAL type is returned.
- If the input values are of the DOUBLE or STRING type, a value of the DOUBLE type is returned.

- Examples
- Example 1: Use the deptno column to define the window and calculate the sum of the sal column. ORDER BY is not specified. This function returns the cumulative sum of the current window. The current window indicates the rows that have the same deptno value. Sample statement:

The following result is returned:`select deptno, sal, sum(sal) over (partition by deptno) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 17500 | -- This row is the starting row of this window. The return value is the cumulative sum of the values from the first row to the sixth row. | 10 | 2450 | 17500 | -- The return value is the cumulative sum of the values from the first row to the sixth row. | 10 | 5000 | 17500 | -- The return value is the cumulative sum of the values from the first row to the sixth row. | 10 | 1300 | 17500 | | 10 | 5000 | 17500 | | 10 | 2450 | 17500 | | 20 | 3000 | 10875 | | 20 | 3000 | 10875 | | 20 | 800 | 10875 | | 20 | 1100 | 10875 | | 20 | 2975 | 10875 | | 30 | 1500 | 9400 | | 30 | 950 | 9400 | | 30 | 1600 | 9400 | | 30 | 1250 | 9400 | | 30 | 1250 | 9400 | | 30 | 2850 | 9400 | +------------+------------+------------+`

- Example 2: In non-Hive-compatible mode, use the deptno column to define the window and calculate the sum of the sal column. ORDER BY is specified. This function returns the cumulative sum of the values from the starting row to the current row in the current window. The current window indicates the rows that have the same deptno value. Sample statement:

The following result is returned:`-- Disable the Hive-compatible mode. set odps.sql.hive.compatible=false; -- Execute the following statement: select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- This row is the starting row of this window. | 10 | 1300 | 2600 | -- The return value is the cumulative sum of the values in the first and second rows. | 10 | 2450 | 5050 | -- The return value is the cumulative sum of the values from the first row to the third row. | 10 | 2450 | 7500 | | 10 | 5000 | 12500 | | 10 | 5000 | 17500 | | 20 | 800 | 800 | | 20 | 1100 | 1900 | | 20 | 2975 | 4875 | | 20 | 3000 | 7875 | | 20 | 3000 | 10875 | | 30 | 950 | 950 | | 30 | 1250 | 2200 | | 30 | 1250 | 3450 | | 30 | 1500 | 4950 | | 30 | 1600 | 6550 | | 30 | 2850 | 9400 | +------------+------------+------------+`

- Example 3: In Hive-compatible mode, use the deptno column to define the window and calculate the sum of the sal column. ORDER BY is specified. This function returns the cumulative sum of the values from the starting row to the row that has the same sal value as the current row in the current window. The sum values for the rows the have the same sal value are the same. The current window indicates the rows that have the same deptno value. Sample statement:

The following result is returned:`-- Enable the Hive-compatible mode. set odps.sql.hive.compatible=true; -- Execute the following statement: select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;`

`+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 2600 | -- This row is the starting row of this window. The sum for the first row is the cumulative sum of the values in the first and second rows because the two rows have the same sal value. | 10 | 1300 | 2600 | -- The return value is the cumulative sum of the values in the first and second rows. | 10 | 2450 | 7500 | -- The sum for the third row is the cumulative sum of the values from the first row to the fourth row because the third and fourth rows have the same sal value. | 10 | 2450 | 7500 | -- The return value is the cumulative sum of the values from the first row to the fourth row. | 10 | 5000 | 17500 | | 10 | 5000 | 17500 | | 20 | 800 | 800 | | 20 | 1100 | 1900 | | 20 | 2975 | 4875 | | 20 | 3000 | 10875 | | 20 | 3000 | 10875 | | 30 | 950 | 950 | | 30 | 1250 | 3450 | | 30 | 1250 | 3450 | | 30 | 1500 | 4950 | | 30 | 1600 | 6550 | | 30 | 2850 | 9400 | +------------+------------+------------+`

- Example 1: Use the deptno column to define the window and calculate the sum of the sal column. ORDER BY is not specified. This function returns the cumulative sum of the current window. The current window indicates the rows that have the same deptno value. Sample statement:

## DENSE_RANK

- Syntax
`bigint dense_rank() over (partition by <col1>[, <col2>…] order by <col1>[asc|desc][, <col2>[asc|desc]…])`

- Description
This function calculates the ranking of a row in an ordered group of rows. The rankings for the data in rows of the same group are the same and continuous. For example, third place comes after two second places.

- Parameters
- order by col1[asc|desc][, col2[asc|desc]…]: required. This parameter specifies the column based on which data is ranked.

- Return value
A value of the BIGINT type is returned.

- Examples
Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the ordinal numbers of the employees in their own groups. Sample statement:

The following result is returned:`select deptno, ename, sal, dense_rank() over (partition by deptno order by sal desc) as nums from emp;`

`+------------+------------+------------+------------+ | deptno | ename | sal | nums | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 1 | | 10 | CLARK | 2450 | 2 | | 10 | WELAN | 2450 | 2 | | 10 | TEBAGE | 1300 | 3 | | 10 | MILLER | 1300 | 3 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 1 | | 20 | JONES | 2975 | 2 | | 20 | ADAMS | 1100 | 3 | | 20 | SMITH | 800 | 4 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 2 | | 30 | TURNER | 1500 | 3 | | 30 | MARTIN | 1250 | 4 | | 30 | WARD | 1250 | 4 | | 30 | JAMES | 950 | 5 | +------------+------------+------------+------------+`

## RANK

- Syntax
`bigint rank() over (partition by <col1>[, <col2>…] order by <col1>[asc|desc][, <col2>[asc|desc]…])`

- Description
This function calculates the ranking of a value in a set of values. The rankings for the data in rows of the same group are the same but not continuous. For example, fourth place comes after two second places.

- Parameters
- order by col1[asc|desc][, col2[asc|desc]…]: required. This parameter specifies the column based on which data is ranked.

- Return value
A value of the BIGINT type is returned.

- Examples
Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the ordinal numbers of the employees in their own groups. Sample statement:

The following result is returned:`select deptno, ename, sal, rank() over (partition by deptno order by sal desc) as nums from emp;`

`+------------+------------+------------+------------+ | deptno | ename | sal | nums | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 1 | | 10 | CLARK | 2450 | 3 | | 10 | WELAN | 2450 | 3 | | 10 | TEBAGE | 1300 | 5 | | 10 | MILLER | 1300 | 5 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 1 | | 20 | JONES | 2975 | 3 | | 20 | ADAMS | 1100 | 4 | | 20 | SMITH | 800 | 5 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 2 | | 30 | TURNER | 1500 | 3 | | 30 | MARTIN | 1250 | 4 | | 30 | WARD | 1250 | 4 | | 30 | JAMES | 950 | 6 | +------------+------------+------------+------------+`

## LAG

- Syntax
`lag(<expr>,bigint <offset>, <default>) over(partition by <col1>[, <col2>…] order by <col1>[asc|desc][, <col2>[asc|desc]…])`

- Description
This function returns the values for a row at a given offset preceding the current row. If the current row number is

`m`

, the value of the row with the number of`m-offset`

is retrieved. - Parameters
- expr: required. This parameter specifies the column whose value you want to calculate based on an offset.
- offset: required. The value is a constant of the BIGINT type and must be greater than 0. If the input value is of the STRING or DOUBLE type, it is implicitly converted into a value of the BIGINT type before the calculation.
- default: the default value that is used if offset is out of the valid values. It is a constant and its default value is NULL.
- order by col1[asc|desc][, col2[asc|desc]…]: required. This parameter specifies the column based on which the returned results are sorted.

- Return value
A value of the same data type as expr is returned.

- Examples
Group all employees based on the deptno column and calculate the value of sal for each employee at a given offset. Sample statement:

The following result is returned:`select deptno, ename, sal, lag(sal, 1) over (partition by deptno order by sal) as sal_new from emp;`

`+------------+------------+------------+------------+ | deptno | ename | sal | sal_new | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | NULL | | 10 | MILLER | 1300 | 1300 | | 10 | CLARK | 2450 | 1300 | | 10 | WELAN | 2450 | 2450 | | 10 | KING | 5000 | 2450 | | 10 | JACCKA | 5000 | 5000 | | 20 | SMITH | 800 | NULL | | 20 | ADAMS | 1100 | 800 | | 20 | JONES | 2975 | 1100 | | 20 | SCOTT | 3000 | 2975 | | 20 | FORD | 3000 | 3000 | | 30 | JAMES | 950 | NULL | | 30 | MARTIN | 1250 | 950 | | 30 | WARD | 1250 | 1250 | | 30 | TURNER | 1500 | 1250 | | 30 | ALLEN | 1600 | 1500 | | 30 | BLAKE | 2850 | 1600 | +------------+------------+------------+------------+`

## LEAD

- Syntax
`lead(<expr>, bigint <offset>, <default>) over(partition by <col1>[, <col2>…] [order by <col1>[asc|desc][, <col2>[asc|desc]…]])`

- Description
This function returns the values for a row at a given offset following the current row. If the current row number is

`m`

, the value of the row with the number of`m+offset`

is retrieved. - Parameters
- expr: required. This parameter specifies the column whose value you want to calculate based on an offset.
- offset: required. The value is a constant of the BIGINT type and must be greater than 0. If the input value is of the STRING or DOUBLE type, it is implicitly converted into a value of the BIGINT type before the calculation.
- default: the default value that is used if offset is out of the valid values. It is a constant and its default value is NULL.
- order by col1[asc|desc][, col2[asc|desc]…]: required. This parameter specifies the column based on which the returned results are sorted.

- Return value
A value of the same data type as expr is returned.

- Examples
Group all employees based on the deptno column and calculate the value of sal for each employee at a given offset. Sample statement:

The following result is returned:`select deptno, ename, sal, lead(sal, 1) over (partition by deptno order by sal) as sal_new from emp;`

`+------------+------------+------------+------------+ | deptno | ename | sal | sal_new | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | 1300 | | 10 | MILLER | 1300 | 2450 | | 10 | CLARK | 2450 | 2450 | | 10 | WELAN | 2450 | 5000 | | 10 | KING | 5000 | 5000 | | 10 | JACCKA | 5000 | NULL | | 20 | SMITH | 800 | 1100 | | 20 | ADAMS | 1100 | 2975 | | 20 | JONES | 2975 | 3000 | | 20 | SCOTT | 3000 | 3000 | | 20 | FORD | 3000 | NULL | | 30 | JAMES | 950 | 1250 | | 30 | MARTIN | 1250 | 1250 | | 30 | WARD | 1250 | 1500 | | 30 | TURNER | 1500 | 1600 | | 30 | ALLEN | 1600 | 2850 | | 30 | BLAKE | 2850 | NULL | +------------+------------+------------+------------+`

## PERCENT_RANK

- Syntax
`percent_rank() over(partition by <col1>[, <col2>…] order by <col1>[asc|desc][, <col2>[asc|desc]…])`

- Description
This function calculates the relative percent ranking of a row in a group of data.

- Parameters
- order by col1[asc|desc][, col2[asc|desc]…]: required. This parameter specifies the column based on which the returned results are sorted.

- Return value
A value of the DOUBLE type is returned. The value ranges from 0 to 1. The relative percent ranking is calculated by using the following formula:

`(Rank - 1)/(Number of rows - 1)`

- Examples
Calculate the relative percent rankings of employees by salary in a group. Sample statement:

The following result is returned:`select deptno, ename, sal, percent_rank(sal) over (partition by deptno order by sal desc) as sal_new from emp;`

`+------------+------------+------------+------------+ | deptno | ename | sal | sal_new | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 0.0 | | 10 | KING | 5000 | 0.0 | | 10 | CLARK | 2450 | 0.4 | | 10 | WELAN | 2450 | 0.4 | | 10 | TEBAGE | 1300 | 0.8 | | 10 | MILLER | 1300 | 0.8 | | 20 | SCOTT | 3000 | 0.0 | | 20 | FORD | 3000 | 0.0 | | 20 | JONES | 2975 | 0.5 | | 20 | ADAMS | 1100 | 0.75 | | 20 | SMITH | 800 | 1.0 | | 30 | BLAKE | 2850 | 0.0 | | 30 | ALLEN | 1600 | 0.2 | | 30 | TURNER | 1500 | 0.4 | | 30 | MARTIN | 1250 | 0.6 | | 30 | WARD | 1250 | 0.6 | | 30 | JAMES | 950 | 1.0 | +------------+------------+------------+------------+`

## ROW_NUMBER

- Syntax
`row_number() over(partition by <col1>[, <col2>…] order by <col1>[asc|desc][, <col2>[asc|desc]…])`

- Description
This function calculates the ordinal number of a row. Row numbers start from 1.

- Parameters
- Return value
A value of the BIGINT type is returned.

- Examples
Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the ordinal numbers of the employees in their own groups. Sample statement:

The following result is returned:`select deptno, ename, sal, row_number() over (partition by deptno order by sal desc) as nums from emp;`

`+------------+------------+------------+------------+ | deptno | ename | sal | nums | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 2 | | 10 | CLARK | 2450 | 3 | | 10 | WELAN | 2450 | 4 | | 10 | TEBAGE | 1300 | 5 | | 10 | MILLER | 1300 | 6 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 2 | | 20 | JONES | 2975 | 3 | | 20 | ADAMS | 1100 | 4 | | 20 | SMITH | 800 | 5 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 2 | | 30 | TURNER | 1500 | 3 | | 30 | MARTIN | 1250 | 4 | | 30 | WARD | 1250 | 5 | | 30 | JAMES | 950 | 6 | +------------+------------+------------+------------+`

## CLUSTER_SAMPLE

- Syntax
`boolean cluster_sample(bigint <x>[, bigint <y>]) over (partition by <col1>[, <col2>...])`

- Description
This function performs cluster sampling on data in a window.

- Parameters
- x: required. The value is a constant of the BIGINT type and must be
`greater than or equal to 1`

. If y is specified, x indicates that a window is divided into x portions. Otherwise, x indicates that the records of x rows in a window are extracted. In this case, True is returned for the x rows. If x is set to NULL, NULL is returned. - y: optional. The value is a constant of the BIGINT type and must meet the requirements of
`x ≥ y ≥ 1`

. y indicates that y records of the x portions in a window are extracted. In this case, True is returned for the y records. If y is set to NULL, NULL is returned.

- x: required. The value is a constant of the BIGINT type and must be
- Return value
A value of the BOOLEAN type is returned.

- Examples
If you want to extract a sample of about 20% of the values in each group, execute the following statement:

The following result is returned:`select deptno, sal from ( select deptno, sal, cluster_sample(5, 1) over (partition by deptno) as flag from emp ) sub where flag = true;`

`+------------+------------+ | deptno | sal | +------------+------------+ | 10 | 1300 | | 20 | 3000 | | 30 | 950 | +------------+------------+`

## CUME_DIST

- Syntax
`cume_dist() over(partition by <col1>[, <col2>…] order by <col1> [asc|desc][, <col2>[asc|desc]…]])`

- Description
This function calculates the cumulative distribution. The cumulative distribution is the ratio of rows whose values are greater than or equal to the current value to all rows in a group.

- Parameters
- Return value
The ratio of rows whose values are greater than or equal to the current value to all rows in a group is returned.

- Examples
Group all employees based on the deptno column and calculate the cumulative distribution of employees in each group by salary. Sample statement:

The following result is returned:`select deptno, ename, sal, concat(round(cume_dist() over (partition by deptno order by sal desc)*100,2),'%') as cume_dist from emp;`

`+------------+------------+------------+------------+ | deptno | ename | sal | cume_dist | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 33.33% | | 10 | KING | 5000 | 33.33% | | 10 | CLARK | 2450 | 66.67% | | 10 | WELAN | 2450 | 66.67% | | 10 | TEBAGE | 1300 | 100.0% | | 10 | MILLER | 1300 | 100.0% | | 20 | SCOTT | 3000 | 40.0% | | 20 | FORD | 3000 | 40.0% | | 20 | JONES | 2975 | 60.0% | | 20 | ADAMS | 1100 | 80.0% | | 20 | SMITH | 800 | 100.0% | | 30 | BLAKE | 2850 | 16.67% | | 30 | ALLEN | 1600 | 33.33% | | 30 | TURNER | 1500 | 50.0% | | 30 | MARTIN | 1250 | 83.33% | | 30 | WARD | 1250 | 83.33% | | 30 | JAMES | 950 | 100.0% | +------------+------------+------------+------------+`