You can use window functions to flexibly analyze and process column data in a window in MaxCompute SQL. This topic describes the syntax, parameters, and examples of window functions supported by MaxCompute SQL. It guides you through data development by using window functions.

The following table lists window functions supported by MaxCompute SQL.

Function Description
COUNT Counts the rows.
AVG Calculates the average value of a column.
MAX Returns the maximum value of a column.
MIN Returns the minimum value of a column.
MEDIAN Calculates the median value of a column.
STDDEV Calculates the population standard deviation of a column.
STDDEV_SAMP Calculates the sample standard deviation of a column.
SUM Calculates the sum of a column.
DENSE_RANK Calculates the ranking of a row in an ordered group of rows.
RANK Calculates the ranking of a value in a set of values.
LAG Returns the values for a row at a given offset preceding the current row.
LEAD Returns the values for a row at a given offset following the current row.
PERCENT_RANK Calculates the relative percent ranking of a row in a group of data.
ROW_NUMBER Calculates the ordinal number of a row.
CLUSTER_SAMPLE Performs cluster sampling.
CUME_DIST Calculates the cumulative distribution.

Limits

The following limits apply to window functions:

  • Window functions can be included only in SELECT statements.
  • A window function cannot contain nested window functions or aggregate functions.
  • Window functions cannot be used with aggregate functions of the same level.
  • A maximum of five window functions can be used in a MaxCompute SQL statement.
  • A window can contain a maximum of 100 million rows of data. We recommend that you use a maximum of 5 million rows in a window. If the number of rows exceeds 5 million, an error is returned.

Syntax

Syntax of a window function:
window_func() over (partition by <col1>[,<col2>…]
[order by <col1>[asc|desc][, <col2>[asc|desc]…]] <windowing_clause>)
  • partition by <col1>[,<col2>…]: required. This parameter specifies the columns that define the window.

    Rows with the same values in partition key columns are considered in the same window.

    Note partition by UID indicates that records with the same UID are considered in the same window. Therefore, the records with the same UID can contain a maximum of 100 million rows of data. We recommend that you use a maximum of 5 million rows in a window.
  • order by <col1>[asc|desc][, <col2>[asc|desc]…]: optional. This parameter specifies the rule used to sort data in a window.
    Note If some values in ORDER BY are the same, the sorting result may not be accurate. To reduce randomness, keep each value in ORDER BY unique.
  • windowing_clause: optional. You can use the following ROWS functions in windowing_clause to specify the windowing method:
    • rows between x preceding|following and y preceding|following: indicates a window range from xth row preceding or following the current row to the yth row preceding or following the current row.
    • rows x preceding|following: indicates a window range from the xth row preceding or following the current row to the current row.
    Note
    • x and y must be integer constants that are greater than or equal to 0. Their values range from 0 to 10000. The value 0 indicates the current row. Before you use ROWS to specify the windowing method, you must specify ORDER BY.
    • Only the following window functions can use ROWS to specify the windowing method: AVG, COUNT, MAX, MIN, STDDEV, and SUM.

Sample data

This section provides sample source data for you to understand how to use the functions. Create a table named emp and insert the sample data into it. Example:
create table if not exists emp
   (empno bigint,
    ename string,
    job string,
    mgr bigint,
    hiredate datetime,
    sal bigint,
    comm bigint,
    deptno bigint);
tunnel upload emp.txt emp;
The emp.txt file contains the following sample data:
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

COUNT

  • Syntax
    bigint count([distinct] <expr>) over (partition by <col 1>[, <col 2>…]
    [order by col1[asc|desc][, col2[asc|desc]…]] [<windowing_clause>])
  • Description

    This function counts the rows.

  • Parameters
    • expr: required. This parameter specifies the column whose values you want to count. The column can be of any data type. If the value for a row is NULL, this row is not used for the calculation. If the distinct keyword is specified, only distinct values are counted.
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the number of rows for expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the cumulative count from the starting row to the current row in the current window is returned.
      Note
      • If the distinct keyword is specified, ORDER BY cannot be used.
      • If duplicate values are specified for ORDER BY, the processing method varies based on the compatibility between MaxCompute and Hive. For more information, see the examples in this section.
  • Return value

    A value of the BIGINT type is returned.

  • Examples
    • Example 1: Use the sal column to define the window. ORDER BY is not specified. This function returns the cumulative count from the starting row to the last row in the current window. The current window indicates the rows that have the same sal value. Sample statement:
      select sal, count(sal) over (partition by sal) as count from emp;  
      The following result is returned:
      +------------+------------+
      | sal        | count      |
      +------------+------------+
      | 800        | 1          |
      | 950        | 1          |
      | 1100       | 1          |
      | 1250       | 2          |   -- This row is the starting row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value.
      | 1250       | 2          |   -- The return value is the cumulative count from the first row to the second row in the current window.
      | 1300       | 2          |
      | 1300       | 2          |
      | 1500       | 1          |
      | 1600       | 1          |
      | 2450       | 2          |
      | 2450       | 2          |
      | 2850       | 1          |
      | 2975       | 1          |
      | 3000       | 2          |
      | 3000       | 2          |
      | 5000       | 2          |
      | 5000       | 2          |
      +------------+------------+
    • Example 2: In non-Hive-compatible mode, use the sal column to define the window. ORDER BY is specified. This function returns the cumulative count from the starting row to the current row in the current window. The current window indicates the rows that have the same sal value. Sample statement:
      -- Disable the Hive-compatible mode.
      set odps.sql.hive.compatible=false;
      -- Execute the following statement:
      select sal, count(sal) over (partition by sal order by sal) as count from emp;  
      The following result is returned:
      +------------+------------+
      | sal        | count      |
      +------------+------------+
      | 800        | 1          |
      | 950        | 1          |
      | 1100       | 1          |
      | 1250       | 1          |   -- This row is the starting row of this window. The cumulative count for the first row is 1.
      | 1250       | 2          |   -- The cumulative count for the second row is 2.
      | 1300       | 1          |
      | 1300       | 2          |
      | 1500       | 1          |
      | 1600       | 1          |
      | 2450       | 1          |
      | 2450       | 2          |
      | 2850       | 1          |
      | 2975       | 1          |
      | 3000       | 1          |
      | 3000       | 2          |
      | 5000       | 1          |
      | 5000       | 2          |
      +------------+------------+
    • Example 3: In Hive-compatible mode, use the sal column to define the window. ORDER BY is specified. This function returns the cumulative count from the starting row to the last row in the current window. The current window indicates the rows that have the same sal value. Sample statement:
      -- Enable the Hive-compatible mode.
      set odps.sql.hive.compatible=true;
      -- Execute the following statement:
      select sal, count(sal) over (partition by sal order by sal) as count from emp; 
      The following result is returned:
      +------------+------------+
      | sal        | count      |
      +------------+------------+
      | 800        | 1          |
      | 950        | 1          |
      | 1100       | 1          |
      | 1250       | 2          |   -- This row is the starting row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value.
      | 1250       | 2          |   -- The return value is the cumulative count from the first row to the second row in the current window.
      | 1300       | 2          |
      | 1300       | 2          |
      | 1500       | 1          |
      | 1600       | 1          |
      | 2450       | 2          |
      | 2450       | 2          |
      | 2850       | 1          |
      | 2975       | 1          |
      | 3000       | 2          |
      | 3000       | 2          |
      | 5000       | 2          |
      | 5000       | 2          |
      +------------+------------+

AVG

  • Syntax
    avg([distinct] <expr>) over (partition by <col1>[, <col2>…]
    [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])
  • Description

    This function calculates the average value of a column.

  • Parameters
    • expr: required. This parameter specifies the column for which you want to calculate the average value. The column is of the DOUBLE or DECIMAL type. If the distinct keyword is specified, the average value of distinct values is calculated.
      • If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before the calculation. If it is of another data type, an error is returned.
      • If the value for a row is NULL, this row is not used for the calculation.
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the average value of expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the cumulative average value of the values from the starting row to the current row in the current window is returned.
      Note
      • If the distinct keyword is specified, ORDER BY cannot be used.
      • If duplicate values are specified for ORDER BY, the processing method varies based on the compatibility between MaxCompute and Hive. For more information, see the examples in this section.
  • Return value

    A value of the DOUBLE type is returned.

  • Examples
    • Example 1: Use the deptno column to define the window and calculate the average value of the sal column. ORDER BY is not specified. This function returns the cumulative average value of the values from the starting row to the last row in the current window. The current window indicates the rows that have the same deptno value. Sample statement:
      select deptno, sal, avg(sal) over (partition by deptno) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 2916.6666666666665 |   -- This row is the starting row of this window. The return value is the cumulative average value of the values from the first row to the sixth row.
      | 10         | 2450       | 2916.6666666666665 |   -- The return value is the cumulative average value of the values from the first row to the sixth row.
      | 10         | 5000       | 2916.6666666666665 |   -- The return value is the cumulative average value of the values from the first row to the sixth row.
      | 10         | 1300       | 2916.6666666666665 |
      | 10         | 5000       | 2916.6666666666665 |
      | 10         | 2450       | 2916.6666666666665 |
      | 20         | 3000       | 2175.0     |
      | 20         | 3000       | 2175.0     |
      | 20         | 800        | 2175.0     |
      | 20         | 1100       | 2175.0     |
      | 20         | 2975       | 2175.0     |
      | 30         | 1500       | 1566.6666666666667 |
      | 30         | 950        | 1566.6666666666667 |
      | 30         | 1600       | 1566.6666666666667 |
      | 30         | 1250       | 1566.6666666666667 |
      | 30         | 1250       | 1566.6666666666667 |
      | 30         | 2850       | 1566.6666666666667 |
      +------------+------------+------------+
    • Example 2: In non-Hive-compatible mode, use the deptno column to define the window and calculate the average value of the sal column. ORDER BY is specified. This function returns the cumulative average value of the values from the starting row to the current row in the current window. The current window indicates the rows that have the same deptno value. Sample statement:
      -- Disable the Hive-compatible mode.
      set odps.sql.hive.compatible=false;
      -- Execute the following statement:
      select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300.0     |           -- This row is the starting row of this window.
      | 10         | 1300       | 1300.0     |           -- The return value is the cumulative average value of the values in the first and second rows.
      | 10         | 2450       | 1683.3333333333333 |   -- The return value is the cumulative average value of the values from the first row to the third row.
      | 10         | 2450       | 1875.0     |           -- The return value is the cumulative average value of the values from the first row to the fourth row.
      | 10         | 5000       | 2500.0     |           -- The return value is the cumulative average value of the values from the first row to the fifth row.
      | 10         | 5000       | 2916.6666666666665 |   -- The return value is the cumulative average value of the values from the first row to the sixth row.
      | 20         | 800        | 800.0      |
      | 20         | 1100       | 950.0      |
      | 20         | 2975       | 1625.0     |
      | 20         | 3000       | 1968.75    |
      | 20         | 3000       | 2175.0     |
      | 30         | 950        | 950.0      |
      | 30         | 1250       | 1100.0     |
      | 30         | 1250       | 1150.0     |
      | 30         | 1500       | 1237.5     |
      | 30         | 1600       | 1310.0     |
      | 30         | 2850       | 1566.6666666666667 |
      +------------+------------+------------+
    • Example 3: In Hive-compatible mode, use the deptno column to define the window and calculate the average value of the sal column. ORDER BY is specified. This function returns the cumulative average value of the values from the starting row to the row that has the same sal value as the current row in the current window. The average values for the rows the have the same sal value are the same. The current window indicates the rows that have the same deptno value. Sample statement:
      -- Enable the Hive-compatible mode.
      set odps.sql.hive.compatible=true;
      -- Execute the following statement:
      select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300.0     |          -- This row is the starting row of this window. The average value for the first row is the cumulative average value of the values in the first and second rows because the two rows have the same sal value.
      | 10         | 1300       | 1300.0     |          -- The return value is the cumulative average value of the values in the first and second rows.
      | 10         | 2450       | 1875.0     |          -- The average value for the third row is the cumulative average value of the values from the first row to the fourth row because the third and fourth rows have the same sal value.
      | 10         | 2450       | 1875.0     |          -- The return value is the cumulative average value of the values from the first row to the fourth row.
      | 10         | 5000       | 2916.6666666666665 |
      | 10         | 5000       | 2916.6666666666665 |
      | 20         | 800        | 800.0      |
      | 20         | 1100       | 950.0      |
      | 20         | 2975       | 1625.0     |
      | 20         | 3000       | 2175.0     |
      | 20         | 3000       | 2175.0     |
      | 30         | 950        | 950.0      |
      | 30         | 1250       | 1150.0     |
      | 30         | 1250       | 1150.0     |
      | 30         | 1500       | 1237.5     |
      | 30         | 1600       | 1310.0     |
      | 30         | 2850       | 1566.6666666666667 |
      +------------+------------+------------+

MAX

  • Syntax
    max([distinct] <expr>) over(partition by <col1>[, <col2>…]
    [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])
  • Description

    This function returns the maximum value of a column.

  • Parameters
    • expr: required. This parameter specifies the column whose maximum value you want to obtain. The column can be of any data type other than BOOLEAN. If the value for a row is NULL, this row is not used for the calculation. If the distinct keyword is specified, the maximum value among distinct values is returned. The result is not affected regardless of whether the keyword is specified.
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the maximum value of expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the maximum value among the values from the starting row to the current row in the current window is returned.
    Note If the distinct keyword is specified, ORDER BY cannot be used.
  • Return value

    A value of the same type as expr is returned.

  • Examples
    • Example 1: Use the deptno column to define the window and obtain the maximum value of the sal column. ORDER BY is not specified. This function returns the maximum value of the current window. The current window indicates the rows that have the same deptno value. Execute the following statement:
      select deptno, sal, max(sal) over (partition by deptno) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 5000       |   -- This row is the starting row of this window. The return value is the maximum value among the values from the first row to the sixth row.
      | 10         | 2450       | 5000       |   -- The return value is the maximum value among the values from the first row to the sixth row.
      | 10         | 5000       | 5000       |   -- The return value is the maximum value among the values from the first row to the sixth row.
      | 10         | 1300       | 5000       |
      | 10         | 5000       | 5000       |
      | 10         | 2450       | 5000       |
      | 20         | 3000       | 3000       |
      | 20         | 3000       | 3000       |
      | 20         | 800        | 3000       |
      | 20         | 1100       | 3000       |
      | 20         | 2975       | 3000       |
      | 30         | 1500       | 2850       |
      | 30         | 950        | 2850       |
      | 30         | 1600       | 2850       |
      | 30         | 1250       | 2850       |
      | 30         | 1250       | 2850       |
      | 30         | 2850       | 2850       |
      +------------+------------+------------+
    • Example 2: Use the deptno column to define the window and obtain the maximum value of the sal column. ORDER BY is specified. This function returns the maximum value among the values from the starting row to the current row in the current window. The current window indicates the rows that have the same deptno value. Execute the following statement:
      select deptno, sal, max(sal) over (partition by deptno order by sal) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300       |   -- This row is the starting row of this window.
      | 10         | 1300       | 1300       |   -- The return value is the maximum value among the values in the first and second rows.
      | 10         | 2450       | 2450       |   -- The return value is the maximum value among the values from the first row to the third row.
      | 10         | 2450       | 2450       |   -- The return value is the maximum value among the values from the first row to the fourth row.
      | 10         | 5000       | 5000       |
      | 10         | 5000       | 5000       |
      | 20         | 800        | 800        |
      | 20         | 1100       | 1100       |
      | 20         | 2975       | 2975       |
      | 20         | 3000       | 3000       |
      | 20         | 3000       | 3000       |
      | 30         | 950        | 950        |
      | 30         | 1250       | 1250       |
      | 30         | 1250       | 1250       |
      | 30         | 1500       | 1500       |
      | 30         | 1600       | 1600       |
      | 30         | 2850       | 2850       |
      +------------+------------+------------+

MIN

  • Syntax
    min([distinct] <expr>) over(partition by <col1>[, <col2>…]
    [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])
  • Description

    This function returns the minimum value of a column.

  • Parameters
    • expr: required. This parameter specifies the column whose minimum value you want to obtain. The column can be of any data type other than BOOLEAN. If the value for a row is NULL, this row is not used for the calculation. If the distinct keyword is specified, the minimum value among distinct values is returned. The result is not affected regardless of whether the keyword is specified.
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the minimum value of expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the minimum value among the values from the starting row to the current row in the current window is returned.
      Note If the distinct keyword is specified, ORDER BY cannot be used.
  • Return value

    A value of the same data type as expr is returned.

  • Examples
    • Example 1: Use the deptno column to define the window and obtain the minimum value of the sal column. ORDER BY is not specified. This function returns the minimum value of the current window. The current window indicates the rows that have the same deptno value. Execute the following statement:
      select deptno, sal, min(sal) over (partition by deptno) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300       |   -- This row is the starting row of this window. The return value is the minimum value among the values from the first row to the sixth row.
      | 10         | 2450       | 1300       |   -- The return value is the minimum value among the values from the first row to the sixth row.
      | 10         | 5000       | 1300       |   -- The return value is the minimum value among the values from the first row to the sixth row.
      | 10         | 1300       | 1300       |
      | 10         | 5000       | 1300       |
      | 10         | 2450       | 1300       |
      | 20         | 3000       | 800        |
      | 20         | 3000       | 800        |
      | 20         | 800        | 800        |
      | 20         | 1100       | 800        |
      | 20         | 2975       | 800        |
      | 30         | 1500       | 950        |
      | 30         | 950        | 950        |
      | 30         | 1600       | 950        |
      | 30         | 1250       | 950        |
      | 30         | 1250       | 950        |
      | 30         | 2850       | 950        |
      +------------+------------+------------+
    • Example 2: Use the deptno column to define the window and obtain the minimum value of the sal column. ORDER BY is specified. This function returns the minimum value among the values from the starting row to the current row in the current window. The current window indicates the rows that have the same deptno value. Execute the following statement:
      select deptno, sal, min(sal) over (partition by deptno order by sal) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300       |   -- This row is the starting row of this window.
      | 10         | 1300       | 1300       |   -- The return value is the minimum value among the values in the first and second rows.
      | 10         | 2450       | 1300       |   -- The return value is the minimum value among the values from the first row to the third row.
      | 10         | 2450       | 1300       |
      | 10         | 5000       | 1300       |
      | 10         | 5000       | 1300       |
      | 20         | 800        | 800        |
      | 20         | 1100       | 800        |
      | 20         | 2975       | 800        |
      | 20         | 3000       | 800        |
      | 20         | 3000       | 800        |
      | 30         | 950        | 950        |
      | 30         | 1250       | 950        |
      | 30         | 1250       | 950        |
      | 30         | 1500       | 950        |
      | 30         | 1600       | 950        |
      | 30         | 2850       | 950        |
      +------------+------------+------------+

MEDIAN

  • Syntax
    double median(double <expr>) over (partition by <col1>[, <col2>…])
    decimal median(decimal <expr>) over (partition by <col1>[, <col2>…)
  • Description

    This function calculates the median value of a column.

  • Parameters
    • expr: required. This parameter specifies the column or numerals whose median value you want to calculate. The value is of the DOUBLE or DECIMAL type. You can enter 1 to 255 numerals.
      • If the input value is of the DOUBLE type, it is automatically converted into an array of the DOUBLE type for the calculation.
      • If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before the calculation. If it is of another data type, an error is returned.
      • If the input value is NULL, NULL is returned.
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
  • Return value

    A value of the DOUBLE or DECIMAL type is returned.

  • Examples
    Use the deptno column to define the window and calculate the median value of the sal column. This function returns the median value of the current window. The current window indicates the rows that have the same deptno value. Execute the following statement:
    select deptno, sal, median(sal) over (partition by deptno) from emp;
    The following result is returned:
    +------------+------------+------------+
    | deptno     | sal        | _c2        |
    +------------+------------+------------+
    | 10         | 1300       | 2450.0     |   -- This row is the starting row of this window. The return value is the median value of the values from the first row to the sixth row.
    | 10         | 2450       | 2450.0     |
    | 10         | 5000       | 2450.0     |
    | 10         | 1300       | 2450.0     |
    | 10         | 5000       | 2450.0     |
    | 10         | 2450       | 2450.0     |
    | 20         | 3000       | 2975.0     |
    | 20         | 3000       | 2975.0     |
    | 20         | 800        | 2975.0     |
    | 20         | 1100       | 2975.0     |
    | 20         | 2975       | 2975.0     |
    | 30         | 1500       | 1375.0     |
    | 30         | 950        | 1375.0     |
    | 30         | 1600       | 1375.0     |
    | 30         | 1250       | 1375.0     |
    | 30         | 1250       | 1375.0     |
    | 30         | 2850       | 1375.0     |
    +------------+------------+------------+

STDDEV

  • Syntax
    double stddev|stddev_pop([distinct] <expr>) over (partition by <col1>[, <col2>…]
    [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])
    decimal stddev|stddev_pop([distinct] <expr>) over (partition by <col1>[, <col2>…] 
    [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])
  • Description

    This function calculates the population standard deviation of a column.

  • Parameters
    • expr: required. This parameter specifies the column whose population standard deviation you want to calculate. The column is of the DOUBLE or DECIMAL type.
      • If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before the calculation. If it is of another data type, an error is returned.
      • If the value for a row is NULL, this row is not used for the calculation.
      • If the distinct keyword is specified, the population standard deviation of distinct values is calculated.
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the population standard deviation of expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the cumulative population standard deviation of the values from the starting row to the current row in the current window is returned.
      Note
      • If the distinct keyword is specified, ORDER BY cannot be used.
      • If duplicate values are specified for ORDER BY, the processing method varies based on the compatibility between MaxCompute and Hive. For more information, see the examples in this section.
  • Return value

    A value of the same data type as expr is returned.

  • Examples
    • Example 1: Use the deptno column to define the window and calculate the population standard deviation of the sal column. ORDER BY is not specified. This function returns the cumulative population standard deviation of the current window. The current window indicates the rows that have the same deptno value. Sample statement:
      select deptno, sal, stddev(sal) over (partition by deptno) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1546.1421524412158 |   -- This row is the starting row of this window. The return value is the cumulative population standard deviation of the values from the first row to the sixth row.
      | 10         | 2450       | 1546.1421524412158 |   -- The return value is the cumulative population standard deviation of the values from the first row to the sixth row.
      | 10         | 5000       | 1546.1421524412158 |
      | 10         | 1300       | 1546.1421524412158 |
      | 10         | 5000       | 1546.1421524412158 |
      | 10         | 2450       | 1546.1421524412158 |
      | 20         | 3000       | 1004.7387720198718 |
      | 20         | 3000       | 1004.7387720198718 |
      | 20         | 800        | 1004.7387720198718 |
      | 20         | 1100       | 1004.7387720198718 |
      | 20         | 2975       | 1004.7387720198718 |
      | 30         | 1500       | 610.1001739241042 |
      | 30         | 950        | 610.1001739241042 |
      | 30         | 1600       | 610.1001739241042 |
      | 30         | 1250       | 610.1001739241042 |
      | 30         | 1250       | 610.1001739241042 |
      | 30         | 2850       | 610.1001739241042 |
      +------------+------------+------------+
    • Example 2: In non-Hive-compatible mode, use the deptno column to define the window and calculate the population standard deviation of the sal column. ORDER BY is specified. This function returns the cumulative population standard deviation of the values from the starting row to the current row in the current window. The current window indicates the rows that have the same deptno value. Sample statement:
      -- Disable the Hive-compatible mode.
      set odps.sql.hive.compatible=false;
      -- Execute the following statement:
      select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 0.0        |           -- This row is the starting row of this window.
      | 10         | 1300       | 0.0        |           -- The return value is the cumulative population standard deviation of the values in the first and second rows.
      | 10         | 2450       | 542.1151989096865 |    -- The return value is the cumulative population standard deviation of the values from the first row to the third row.
      | 10         | 2450       | 575.0      |           -- The return value is the cumulative population standard deviation of the values from the first row to the fourth row.
      | 10         | 5000       | 1351.6656391282572 |
      | 10         | 5000       | 1546.1421524412158 |
      | 20         | 800        | 0.0        |
      | 20         | 1100       | 150.0      |
      | 20         | 2975       | 962.4188277460079 |
      | 20         | 3000       | 1024.2947268730811 |
      | 20         | 3000       | 1004.7387720198718 |
      | 30         | 950        | 0.0        |
      | 30         | 1250       | 150.0      |
      | 30         | 1250       | 141.4213562373095 |
      | 30         | 1500       | 194.8557158514987 |
      | 30         | 1600       | 226.71568097509268 |
      | 30         | 2850       | 610.1001739241042 |
      +------------+------------+------------+
    • Example 3: In Hive-compatible mode, use the deptno column to define the window and calculate the population standard deviation of the sal column. ORDER BY is specified. This function returns the cumulative population standard deviation of the values from the starting row to the row that has the same sal value as the current row in the current window. The population standard deviations for the rows the have the same sal value are the same. The current window indicates the rows that have the same deptno value. Sample statement:
      -- Enable the Hive-compatible mode.
      set odps.sql.hive.compatible=true;
      -- Execute the following statement:
      select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 0.0        |           -- This row is the starting row of this window. The population standard deviation for the first row is the cumulative population standard deviation of the values in the first and second rows because the two rows have the same sal value.
      | 10         | 1300       | 0.0        |           -- The return value is the cumulative population standard deviation of the values in the first and second rows.
      | 10         | 2450       | 575.0      |           -- The population standard deviation for the third row is the cumulative population standard deviation of the values from the first row to the fourth row because the third and fourth rows have the same sal value.
      | 10         | 2450       | 575.0      |           -- The return value is the cumulative population standard deviation of the values from the first row to the fourth row.
      | 10         | 5000       | 1546.1421524412158 |
      | 10         | 5000       | 1546.1421524412158 |
      | 20         | 800        | 0.0        |
      | 20         | 1100       | 150.0      |
      | 20         | 2975       | 962.4188277460079 |
      | 20         | 3000       | 1004.7387720198718 |
      | 20         | 3000       | 1004.7387720198718 |
      | 30         | 950        | 0.0        |
      | 30         | 1250       | 141.4213562373095 |
      | 30         | 1250       | 141.4213562373095 |
      | 30         | 1500       | 194.8557158514987 |
      | 30         | 1600       | 226.71568097509268 |
      | 30         | 2850       | 610.1001739241042 |
      +------------+------------+------------+

STDDEV_SAMP

  • Syntax
    double stddev_samp([distinct] <expr>) over(partition by <col1>[, <col2>…]
    [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])
    decimal stddev_samp([distinct] <expr>) over((partition by [col1,col2…] 
    [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])
  • Description

    This function calculates the sample standard deviation of a column.

  • Parameters
    • expr: required. This parameter specifies the column whose sample standard deviation you want to calculate. The column is of the DOUBLE or DECIMAL type.
      • If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before the calculation. If it is of another data type, an error is returned.
      • If the value for a row is NULL, this row is not used for the calculation.
      • If the distinct keyword is specified, the sample standard deviation of distinct values is calculated.
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the sample standard deviation of expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the sample standard deviation of the values from the starting row to the current row in the current window is returned.
      Note If the distinct keyword is specified, ORDER BY cannot be used.
  • Return value

    A value of the same data type as expr is returned.

  • Examples
    • Example 1: Use the deptno column to define the window and calculate the sample standard deviation of the sal column. ORDER BY is not specified. This function returns the cumulative sample standard deviation of the current window. The current window indicates the rows that have the same deptno value. Sample statement:
      select deptno, sal, stddev_samp(sal) over (partition by deptno) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1693.7138680032904 |   -- This row is the starting row of this window. The return value is the cumulative sample standard deviation of the values from the first row to the sixth row.
      | 10         | 2450       | 1693.7138680032904 |   -- The return value is the cumulative sample standard deviation of the values from the first row to the sixth row.
      | 10         | 5000       | 1693.7138680032904 |   -- The return value is the cumulative sample standard deviation of the values from the first row to the sixth row.
      | 10         | 1300       | 1693.7138680032904 |     
      | 10         | 5000       | 1693.7138680032904 |
      | 10         | 2450       | 1693.7138680032904 |
      | 20         | 3000       | 1123.3320969330487 |
      | 20         | 3000       | 1123.3320969330487 |
      | 20         | 800        | 1123.3320969330487 |
      | 20         | 1100       | 1123.3320969330487 |
      | 20         | 2975       | 1123.3320969330487 |
      | 30         | 1500       | 668.331255192114 |
      | 30         | 950        | 668.331255192114 |
      | 30         | 1600       | 668.331255192114 |
      | 30         | 1250       | 668.331255192114 |
      | 30         | 1250       | 668.331255192114 |
      | 30         | 2850       | 668.331255192114 |
      +------------+------------+------------+
    • Example 2: Use the deptno column to define the window and calculate the sample standard deviation of the sal column. ORDER BY is specified. This function returns the cumulative sample standard deviation of the values from the starting row to the current row in the current window. The current window indicates the rows that have the same deptno value. Sample statement:
      select deptno, sal, stddev_samp(sal) over (partition by deptno order by sal) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 0.0        |          -- This row is the starting row of this window.
      | 10         | 1300       | 0.0        |          -- The return value is the cumulative sample standard deviation of the values in the first and second rows.
      | 10         | 2450       | 663.9528095680697 |   -- The return value is the cumulative sample standard deviation of the values from the first row to the third row.
      | 10         | 2450       | 663.9528095680696 |
      | 10         | 5000       | 1511.2081259707413 |
      | 10         | 5000       | 1693.7138680032904 |
      | 20         | 800        | 0.0        |
      | 20         | 1100       | 212.13203435596427 |
      | 20         | 2975       | 1178.7175234126282 |
      | 20         | 3000       | 1182.7536725793752 |
      | 20         | 3000       | 1123.3320969330487 |
      | 30         | 950        | 0.0        |
      | 30         | 1250       | 212.13203435596427 |
      | 30         | 1250       | 173.20508075688772 |
      | 30         | 1500       | 225.0      |
      | 30         | 1600       | 253.4758371127315 |
      | 30         | 2850       | 668.331255192114 |
      +------------+------------+------------+

SUM

  • Syntax
    sum([distinct] <expr>) over (partition by <col1>[, <col2>…]
    [order by <col1>[asc|desc][, <col2>[asc|desc]…]] [<windowing_clause>])
  • Description

    This function calculates the sum of a column.

  • Parameters
    • expr: required. This parameter specifies the column whose sum you want to calculate. The column is of the DOUBLE, DECIMAL, or BIGINT type.
      • If an input value is of the STRING type, it is implicitly converted into a value of the DOUBLE type before the calculation. If it is of another data type, an error is returned.
      • If the value for a row is NULL, this row is not used for the calculation.
      • If the distinct keyword is specified, the sum of distinct values is calculated.
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: optional. If ORDER BY is not specified, the sum of expr in the current window is returned. If ORDER BY is specified, the returned results are sorted in the specified order and the cumulative sum of the values from the starting row to the current row in the current window is returned.
      Note
      • If the distinct keyword is specified, ORDER BY cannot be used.
      • If duplicate values are specified for ORDER BY, the processing method varies based on the compatibility between MaxCompute and Hive. For more information, see the examples in this section.
  • Return value
    • If the input values are of the BIGINT type, a value of the BIGINT type is returned.
    • If the input values are of the DECIMAL type, a value of the DECIMAL type is returned.
    • If the input values are of the DOUBLE or STRING type, a value of the DOUBLE type is returned.
  • Examples
    • Example 1: Use the deptno column to define the window and calculate the sum of the sal column. ORDER BY is not specified. This function returns the cumulative sum of the current window. The current window indicates the rows that have the same deptno value. Sample statement:
      select deptno, sal, sum(sal) over (partition by deptno) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 17500      |   -- This row is the starting row of this window. The return value is the cumulative sum of the values from the first row to the sixth row.
      | 10         | 2450       | 17500      |   -- The return value is the cumulative sum of the values from the first row to the sixth row.
      | 10         | 5000       | 17500      |   -- The return value is the cumulative sum of the values from the first row to the sixth row.
      | 10         | 1300       | 17500      |
      | 10         | 5000       | 17500      |
      | 10         | 2450       | 17500      |
      | 20         | 3000       | 10875      |
      | 20         | 3000       | 10875      |
      | 20         | 800        | 10875      |
      | 20         | 1100       | 10875      |
      | 20         | 2975       | 10875      |
      | 30         | 1500       | 9400       |
      | 30         | 950        | 9400       |
      | 30         | 1600       | 9400       |
      | 30         | 1250       | 9400       |
      | 30         | 1250       | 9400       |
      | 30         | 2850       | 9400       |
      +------------+------------+------------+
    • Example 2: In non-Hive-compatible mode, use the deptno column to define the window and calculate the sum of the sal column. ORDER BY is specified. This function returns the cumulative sum of the values from the starting row to the current row in the current window. The current window indicates the rows that have the same deptno value. Sample statement:
      -- Disable the Hive-compatible mode.
      set odps.sql.hive.compatible=false;
      -- Execute the following statement:
      select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300       |   -- This row is the starting row of this window.
      | 10         | 1300       | 2600       |   -- The return value is the cumulative sum of the values in the first and second rows.
      | 10         | 2450       | 5050       |   -- The return value is the cumulative sum of the values from the first row to the third row.
      | 10         | 2450       | 7500       |
      | 10         | 5000       | 12500      |
      | 10         | 5000       | 17500      |
      | 20         | 800        | 800        |
      | 20         | 1100       | 1900       |
      | 20         | 2975       | 4875       |
      | 20         | 3000       | 7875       |
      | 20         | 3000       | 10875      |
      | 30         | 950        | 950        |
      | 30         | 1250       | 2200       |
      | 30         | 1250       | 3450       |
      | 30         | 1500       | 4950       |
      | 30         | 1600       | 6550       |
      | 30         | 2850       | 9400       |
      +------------+------------+------------+
    • Example 3: In Hive-compatible mode, use the deptno column to define the window and calculate the sum of the sal column. ORDER BY is specified. This function returns the cumulative sum of the values from the starting row to the row that has the same sal value as the current row in the current window. The sum values for the rows the have the same sal value are the same. The current window indicates the rows that have the same deptno value. Sample statement:
      -- Enable the Hive-compatible mode.
      set odps.sql.hive.compatible=true;
      -- Execute the following statement:
      select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;
      The following result is returned:
      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 2600       |   -- This row is the starting row of this window. The sum for the first row is the cumulative sum of the values in the first and second rows because the two rows have the same sal value.
      | 10         | 1300       | 2600       |   -- The return value is the cumulative sum of the values in the first and second rows.
      | 10         | 2450       | 7500       |   -- The sum for the third row is the cumulative sum of the values from the first row to the fourth row because the third and fourth rows have the same sal value.
      | 10         | 2450       | 7500       |   -- The return value is the cumulative sum of the values from the first row to the fourth row.
      | 10         | 5000       | 17500      |
      | 10         | 5000       | 17500      |
      | 20         | 800        | 800        |
      | 20         | 1100       | 1900       |
      | 20         | 2975       | 4875       |
      | 20         | 3000       | 10875      |
      | 20         | 3000       | 10875      |
      | 30         | 950        | 950        |
      | 30         | 1250       | 3450       |
      | 30         | 1250       | 3450       |
      | 30         | 1500       | 4950       |
      | 30         | 1600       | 6550       |
      | 30         | 2850       | 9400       |
      +------------+------------+------------+

DENSE_RANK

  • Syntax
    bigint dense_rank() over (partition by <col1>[, <col2>…]
    order by <col1>[asc|desc][, <col2>[asc|desc]…])
  • Description

    This function calculates the ranking of a row in an ordered group of rows. The rankings for the data in rows of the same group are the same and continuous. For example, third place comes after two second places.

  • Parameters
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: required. This parameter specifies the column based on which data is ranked.
  • Return value

    A value of the BIGINT type is returned.

  • Examples
    Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the ordinal numbers of the employees in their own groups. Sample statement:
    select deptno, ename, sal, dense_rank() over (partition by deptno order by sal desc) as nums from emp;
    The following result is returned:
    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | nums       |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 1          |
    | 10         | KING       | 5000       | 1          |
    | 10         | CLARK      | 2450       | 2          |
    | 10         | WELAN      | 2450       | 2          |
    | 10         | TEBAGE     | 1300       | 3          |
    | 10         | MILLER     | 1300       | 3          |
    | 20         | SCOTT      | 3000       | 1          |
    | 20         | FORD       | 3000       | 1          |
    | 20         | JONES      | 2975       | 2          |
    | 20         | ADAMS      | 1100       | 3          |
    | 20         | SMITH      | 800        | 4          |
    | 30         | BLAKE      | 2850       | 1          |
    | 30         | ALLEN      | 1600       | 2          |
    | 30         | TURNER     | 1500       | 3          |
    | 30         | MARTIN     | 1250       | 4          |
    | 30         | WARD       | 1250       | 4          |
    | 30         | JAMES      | 950        | 5          |
    +------------+------------+------------+------------+

RANK

  • Syntax
    bigint rank() over (partition by <col1>[, <col2>…]
    order by <col1>[asc|desc][, <col2>[asc|desc]…])
  • Description

    This function calculates the ranking of a value in a set of values. The rankings for the data in rows of the same group are the same but not continuous. For example, fourth place comes after two second places.

  • Parameters
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: required. This parameter specifies the column based on which data is ranked.
  • Return value

    A value of the BIGINT type is returned.

  • Examples
    Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the ordinal numbers of the employees in their own groups. Sample statement:
    select deptno, ename, sal, rank() over (partition by deptno order by sal desc) as nums from emp;
    The following result is returned:
    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | nums       |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 1          |
    | 10         | KING       | 5000       | 1          |
    | 10         | CLARK      | 2450       | 3          |
    | 10         | WELAN      | 2450       | 3          |
    | 10         | TEBAGE     | 1300       | 5          |
    | 10         | MILLER     | 1300       | 5          |
    | 20         | SCOTT      | 3000       | 1          |
    | 20         | FORD       | 3000       | 1          |
    | 20         | JONES      | 2975       | 3          |
    | 20         | ADAMS      | 1100       | 4          |
    | 20         | SMITH      | 800        | 5          |
    | 30         | BLAKE      | 2850       | 1          |
    | 30         | ALLEN      | 1600       | 2          |
    | 30         | TURNER     | 1500       | 3          |
    | 30         | MARTIN     | 1250       | 4          |
    | 30         | WARD       | 1250       | 4          |
    | 30         | JAMES      | 950        | 6          |
    +------------+------------+------------+------------+

LAG

  • Syntax
    lag(<expr>,bigint <offset>, <default>) over(partition by <col1>[, <col2>…]
    order by <col1>[asc|desc][, <col2>[asc|desc]…])
  • Description

    This function returns the values for a row at a given offset preceding the current row. If the current row number is m, the value of the row with the number of m-offset is retrieved.

  • Parameters
    • expr: required. This parameter specifies the column whose value you want to calculate based on an offset.
    • offset: required. The value is a constant of the BIGINT type and must be greater than 0. If the input value is of the STRING or DOUBLE type, it is implicitly converted into a value of the BIGINT type before the calculation.
    • default: the default value that is used if offset is out of the valid values. It is a constant and its default value is NULL.
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: required. This parameter specifies the column based on which the returned results are sorted.
  • Return value

    A value of the same data type as expr is returned.

  • Examples

    Group all employees based on the deptno column and calculate the value of sal for each employee at a given offset. Sample statement:

    select deptno, ename, sal, lag(sal, 1) over (partition by deptno order by sal) as sal_new from emp;
    The following result is returned:
    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | sal_new    |
    +------------+------------+------------+------------+
    | 10         | TEBAGE     | 1300       | NULL       |
    | 10         | MILLER     | 1300       | 1300       |
    | 10         | CLARK      | 2450       | 1300       |
    | 10         | WELAN      | 2450       | 2450       |
    | 10         | KING       | 5000       | 2450       |
    | 10         | JACCKA     | 5000       | 5000       |
    | 20         | SMITH      | 800        | NULL       |
    | 20         | ADAMS      | 1100       | 800        |
    | 20         | JONES      | 2975       | 1100       |
    | 20         | SCOTT      | 3000       | 2975       |
    | 20         | FORD       | 3000       | 3000       |
    | 30         | JAMES      | 950        | NULL       |
    | 30         | MARTIN     | 1250       | 950        |
    | 30         | WARD       | 1250       | 1250       |
    | 30         | TURNER     | 1500       | 1250       |
    | 30         | ALLEN      | 1600       | 1500       |
    | 30         | BLAKE      | 2850       | 1600       |
    +------------+------------+------------+------------+

LEAD

  • Syntax
    lead(<expr>, bigint <offset>, <default>) over(partition by <col1>[, <col2>…]
    [order by <col1>[asc|desc][, <col2>[asc|desc]…]])
  • Description

    This function returns the values for a row at a given offset following the current row. If the current row number is m, the value of the row with the number of m+offset is retrieved.

  • Parameters
    • expr: required. This parameter specifies the column whose value you want to calculate based on an offset.
    • offset: required. The value is a constant of the BIGINT type and must be greater than 0. If the input value is of the STRING or DOUBLE type, it is implicitly converted into a value of the BIGINT type before the calculation.
    • default: the default value that is used if offset is out of the valid values. It is a constant and its default value is NULL.
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: required. This parameter specifies the column based on which the returned results are sorted.
  • Return value

    A value of the same data type as expr is returned.

  • Examples

    Group all employees based on the deptno column and calculate the value of sal for each employee at a given offset. Sample statement:

    select deptno, ename, sal, lead(sal, 1) over (partition by deptno order by sal) as sal_new from emp;
    The following result is returned:
    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | sal_new    |
    +------------+------------+------------+------------+
    | 10         | TEBAGE     | 1300       | 1300       |
    | 10         | MILLER     | 1300       | 2450       |
    | 10         | CLARK      | 2450       | 2450       |
    | 10         | WELAN      | 2450       | 5000       |
    | 10         | KING       | 5000       | 5000       |
    | 10         | JACCKA     | 5000       | NULL       |
    | 20         | SMITH      | 800        | 1100       |
    | 20         | ADAMS      | 1100       | 2975       |
    | 20         | JONES      | 2975       | 3000       |
    | 20         | SCOTT      | 3000       | 3000       |
    | 20         | FORD       | 3000       | NULL       |
    | 30         | JAMES      | 950        | 1250       |
    | 30         | MARTIN     | 1250       | 1250       |
    | 30         | WARD       | 1250       | 1500       |
    | 30         | TURNER     | 1500       | 1600       |
    | 30         | ALLEN      | 1600       | 2850       |
    | 30         | BLAKE      | 2850       | NULL       |
    +------------+------------+------------+------------+

PERCENT_RANK

  • Syntax
    percent_rank() over(partition by <col1>[, <col2>…]
    order by <col1>[asc|desc][, <col2>[asc|desc]…])
  • Description

    This function calculates the relative percent ranking of a row in a group of data.

  • Parameters
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: required. This parameter specifies the column based on which the returned results are sorted.
  • Return value

    A value of the DOUBLE type is returned. The value ranges from 0 to 1. The relative percent ranking is calculated by using the following formula: (Rank - 1)/(Number of rows - 1)

  • Examples

    Calculate the relative percent rankings of employees by salary in a group. Sample statement:

    select deptno, ename, sal, percent_rank(sal) over (partition by deptno order by sal desc) as sal_new from emp;
    The following result is returned:
    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | sal_new    |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 0.0        |
    | 10         | KING       | 5000       | 0.0        |
    | 10         | CLARK      | 2450       | 0.4        |
    | 10         | WELAN      | 2450       | 0.4        |
    | 10         | TEBAGE     | 1300       | 0.8        |
    | 10         | MILLER     | 1300       | 0.8        |
    | 20         | SCOTT      | 3000       | 0.0        |
    | 20         | FORD       | 3000       | 0.0        |
    | 20         | JONES      | 2975       | 0.5        |
    | 20         | ADAMS      | 1100       | 0.75       |
    | 20         | SMITH      | 800        | 1.0        |
    | 30         | BLAKE      | 2850       | 0.0        |
    | 30         | ALLEN      | 1600       | 0.2        |
    | 30         | TURNER     | 1500       | 0.4        |
    | 30         | MARTIN     | 1250       | 0.6        |
    | 30         | WARD       | 1250       | 0.6        |
    | 30         | JAMES      | 950        | 1.0        |
    +------------+------------+------------+------------+

ROW_NUMBER

  • Syntax
    row_number() over(partition by <col1>[, <col2>…]
    order by <col1>[asc|desc][, <col2>[asc|desc]…])
  • Description

    This function calculates the ordinal number of a row. Row numbers start from 1.

  • Parameters
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: required. This parameter specifies the column based on which the returned results are sorted.
  • Return value

    A value of the BIGINT type is returned.

  • Examples
    Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the ordinal numbers of the employees in their own groups. Sample statement:
    select deptno, ename, sal, row_number() over (partition by deptno order by sal desc) as nums from emp;
    The following result is returned:
    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | nums       |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 1          |
    | 10         | KING       | 5000       | 2          |
    | 10         | CLARK      | 2450       | 3          |
    | 10         | WELAN      | 2450       | 4          |
    | 10         | TEBAGE     | 1300       | 5          |
    | 10         | MILLER     | 1300       | 6          |
    | 20         | SCOTT      | 3000       | 1          |
    | 20         | FORD       | 3000       | 2          |
    | 20         | JONES      | 2975       | 3          |
    | 20         | ADAMS      | 1100       | 4          |
    | 20         | SMITH      | 800        | 5          |
    | 30         | BLAKE      | 2850       | 1          |
    | 30         | ALLEN      | 1600       | 2          |
    | 30         | TURNER     | 1500       | 3          |
    | 30         | MARTIN     | 1250       | 4          |
    | 30         | WARD       | 1250       | 5          |
    | 30         | JAMES      | 950        | 6          |
    +------------+------------+------------+------------+

CLUSTER_SAMPLE

  • Syntax
    boolean cluster_sample(bigint <x>[, bigint <y>])
    over (partition by <col1>[, <col2>...])
  • Description

    This function performs cluster sampling on data in a window.

  • Parameters
    • x: required. The value is a constant of the BIGINT type and must be greater than or equal to 1. If y is specified, x indicates that a window is divided into x portions. Otherwise, x indicates that the records of x rows in a window are extracted. In this case, True is returned for the x rows. If x is set to NULL, NULL is returned.
    • y: optional. The value is a constant of the BIGINT type and must meet the requirements of x ≥ y ≥ 1. y indicates that y records of the x portions in a window are extracted. In this case, True is returned for the y records. If y is set to NULL, NULL is returned.
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
  • Return value

    A value of the BOOLEAN type is returned.

  • Examples
    If you want to extract a sample of about 20% of the values in each group, execute the following statement:
    select deptno, sal
        from (
            select deptno, sal, cluster_sample(5, 1) over (partition by deptno) as flag
            from emp
            ) sub
        where flag = true;
    The following result is returned:
    +------------+------------+
    | deptno     | sal        |
    +------------+------------+
    | 10         | 1300       |
    | 20         | 3000       |
    | 30         | 950        |
    +------------+------------+

CUME_DIST

  • Syntax
    cume_dist() over(partition by <col1>[, <col2>…] order by <col1> [asc|desc][, <col2>[asc|desc]…]])
  • Description

    This function calculates the cumulative distribution. The cumulative distribution is the ratio of rows whose values are greater than or equal to the current value to all rows in a group.

  • Parameters
    • partition by <col1>[, <col2>…]: the columns that define the window. You must specify at least one column.
    • order by col1[asc|desc][, col2[asc|desc]…]: required. This parameter specifies the column based on which the returned results are sorted.
  • Return value

    The ratio of rows whose values are greater than or equal to the current value to all rows in a group is returned.

  • Examples
    Group all employees based on the deptno column and calculate the cumulative distribution of employees in each group by salary. Sample statement:
    select deptno, ename, sal, concat(round(cume_dist() over (partition by deptno order by sal desc)*100,2),'%') as cume_dist from emp;
    The following result is returned:
    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | cume_dist  |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 33.33%     |
    | 10         | KING       | 5000       | 33.33%     |
    | 10         | CLARK      | 2450       | 66.67%     |
    | 10         | WELAN      | 2450       | 66.67%     |
    | 10         | TEBAGE     | 1300       | 100.0%     |
    | 10         | MILLER     | 1300       | 100.0%     |
    | 20         | SCOTT      | 3000       | 40.0%      |
    | 20         | FORD       | 3000       | 40.0%      |
    | 20         | JONES      | 2975       | 60.0%      |
    | 20         | ADAMS      | 1100       | 80.0%      |
    | 20         | SMITH      | 800        | 100.0%     |
    | 30         | BLAKE      | 2850       | 16.67%     |
    | 30         | ALLEN      | 1600       | 33.33%     |
    | 30         | TURNER     | 1500       | 50.0%      |
    | 30         | MARTIN     | 1250       | 83.33%     |
    | 30         | WARD       | 1250       | 83.33%     |
    | 30         | JAMES      | 950        | 100.0%     |
    +------------+------------+------------+------------+