AnalyticDB for MySQL supports the following window functions:

  • Aggregate functions
  • Sorting functions
    • CUME_DIST: returns the cumulative distribution of each value within a set of values.
    • RANK: returns the ranking of each value within a dataset.
    • DENSE_RANK: returns the ranking of each value within a set of values.
    • NTILE: divides data within each window partition into n buckets with bucket numbered from 1 to n.
    • ROW_NUMBER: returns a unique and sequential number for each row based on the sequence of the row within the window partition, starting from 1.
    • PERCENT_RANK: returns the ranking percentage of each value in a dataset in the format of (r - 1)/(n - 1). r is the rank of the current row calculated by RANK(), and n is the total number of rows within the current window partition.
  • Value functions
    • FIRST_VALUE: returns the value of the first row within the window partition.
    • LAST_VALUE: returns the value of the last row within the window partition.
    • LAG: returns the value of the row that precedes the current row by offset rows in the window.
    • LEAD: returns the value of the row that follows the current row by offset rows in the window.
    • NTH_VALUE: returns the value of the row by the specified number of offset rows in the window. The offset starts from 1.

Overview

Window functions calculate based on row data from the query result. Window functions run after the HAVING clause and before the ORDER BY clause. A window function can be triggered after you use an OVER clause to specify a window.

AnalyticDB for MySQL supports three types of window functions: aggregate functions, sorting functions, and value functions.

Syntax

function over (partition by a order by b RANGE|ROWS BETWEEN start AND end)                

A window function contains the following parts:

  • Partition rule: divides input rows to different partitions. The process is similar to the division process of the GROUP BY clause.
  • Sorting rule: determines the order in which input rows are executed in the window function.
  • Window frame: specifies the window boundary of the computed data.

    A window frame supports the RANGE and ROWS modes:

    • RANGE defines the range of column values.
    • ROWS defines the number of rows in a column.
    • For RANGE and ROWS, you can use BETWEEN start AND end to specify the boundary value. Valid values for the arguments in BETWEEN start AND end:
      • CURRENT ROW: the current row
      • N PRECEDING: the preceding n rows
      • UNBOUNDED PRECEDING: till the first row
      • N FOLLOWING: the following n rows
      • UNBOUNDED FOLLOWING: till the last row

For example, the following query calculates the partial sum of profit based on each row of data in the current window.

select year,country,profit,sum(profit) over (partition by country order by year ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as slidewindow from testwindow;
+------+---------+--------+-------------+
| year | country | profit | slidewindow |
+------+---------+--------+-------------+
| 2001 | USA     |     50 |          50 |
| 2001 | USA     |   1500 |        1550 |
| 2000 | India   |     75 |          75 |
| 2000 | India   |     75 |         150 |
| 2001 | India   |     79 |         229 |
| 2000 | Finland |   1500 |        1500 |
| 2001 | Finland |     10 |        1510 |        

The following query can only calculate the total sum of profit.

select country,sum(profit) over (partition by country) from testwindow;
+---------+-----------------------------------------+
| country | sum(profit) OVER (PARTITION BY country) |
+---------+-----------------------------------------+
| India   |                                     229 |
| India   |                                     229 |
| India   |                                     229 |
| USA     |                                    1550 |
| USA     |                                    1550 |
| Finland |                                    1510 |
| Finland |                                    1510 |        

Precautions

Pay attention to the following requirements when you set boundary values:

  • start cannot be UNBOUNDED FOLLOWING. Otherwise, the Window frame start cannot be UNBOUNDED FOLLOWING error will be prompted.
  • end cannot be UNBOUNDED PRECEDING. Otherwise, the Window frame end cannot be UNBOUNDED PRECEDING error will be prompted.
  • When start is CURRENT ROW and end is N PRECEDING, the Window frame starting from CURRENT ROW cannot end with PRECEDING error is prompted.
  • When start is N FOLLOWING and end is N PRECEDING, the Window frame starting from FOLLOWING cannot end with PRECEDING error is prompted.
  • When start is N FOLLOWING and end is CURRENT ROW, the Window frame starting from FOLLOWING cannot end with CURRENT ROW error is prompted.

When the window frame is in RANGE mode:

  • When start or end is N PRECEDING, the Window frame RANGE PRECEDING is only supported with UNBOUNDED error is prompted.
  • When start or end is N FOLLOWING, the Window frame RANGE FOLLOWING is only supported with UNBOUNDED error is prompted.

Preparations

The data from the testwindow table is used in the examples for the window functions in this topic.

create table testwindow(year int, country varchar(20), product varchar(20), profit int) distributed by hash(year);        
insert into testwindow values (2000,'Finland','Computer',1500);
insert into testwindow values (2001,'Finland','Phone',10);
insert into testwindow values (2000,'India','Calculator',75);
insert into testwindow values (2000,'India','Calculator',75);
insert into testwindow values (2001,'India','Calculator',79);
insert into testwindow values (2001,'USA','Calculator',50);
insert into testwindow values (2001,'USA','Computer',1500);        
SELECT * FROM testwindow;
+------+---------+------------+--------+
| year | country | product    | profit |
+------+---------+------------+--------+
| 2000 | Finland | Computer   |   1500 |
| 2001 | Finland | Phone      |     10 |
| 2000 | India   | Calculator |     75 |
| 2000 | India   | Calculator |     75 |
| 2001 | India   | Calculator |     79 |
| 2001 | USA     | Calculator |     50 |
| 2001 | USA     | Computer   |   1500 |        

Aggregate functions

All aggregate functions can be used as window functions by adding the OVER clause. An aggregate function computes each row of data based on the rows within the current sliding window. For more information, see Aggregate functions.

For example, the following query produces a rolling sum of order prices by date for each clerk:

SELECT clerk, orderdate, orderkey, totalprice,sum(totalprice) OVER (PARTITION BY clerk ORDER BY orderdate) AS rolling_sum FROM orders ORDER BY clerk, orderdate, orderkey            

CUME_DIST

CUME_DIST()           
  • Description: This function returns the cumulative distribution of each value in a set of values.

    Return result: The dataset after the window is ranked within the window partition, including the current row and the number of data rows preceding the current row. Any associated values in the sorting are calculated to the same distribution value.

  • Return value type: DOUBLE.
  • Example:
        select year,country,product,profit,cume_dist() over (partition by country order by profit) as cume_dist from testwindow;
        +------+---------+------------+--------+--------------------+
        | year | country | product    | profit | cume_dist          |
        +------+---------+------------+--------+--------------------+
        | 2001 | USA     | Calculator |     50 |                0.5 |
        | 2001 | USA     | Computer   |   1500 |                1.0 |
        | 2001 | Finland | Phone      |     10 |                0.5 |
        | 2000 | Finland | Computer   |   1500 |                1.0 |
        | 2000 | India   | Calculator |     75 | 0.6666666666666666 |
        | 2000 | India   | Calculator |     75 | 0.6666666666666666 |
        | 2001 | India   | Calculator |     79 |                1.0 |                

RANK

RANK()            
  • Description: This function returns the rank of each value in a dataset.

    The rank value is the number of a row preceding the current row plus one but does not include the number of the current row. Therefore, associated values in the ordering may produce gaps in the sequence. The ranking is calculated for each window partition.

  • Return value type: BIGINT.
  • Example:
        select year,country,product,profit,rank() over (partition by country order by profit) as rank from testwindow;
        +------+---------+------------+--------+------+
        | year | country | product    | profit | rank |
        +------+---------+------------+--------+------+
        | 2001 | Finland | Phone      |     10 |    1 |
        | 2000 | Finland | Computer   |   1500 |    2 |
        | 2001 | USA     | Calculator |     50 |    1 |
        | 2001 | USA     | Computer   |   1500 |    2 |
        | 2000 | India   | Calculator |     75 |    1 |
        | 2000 | India   | Calculator |     75 |    1 |
        | 2001 | India   | Calculator |     79 |    3 |                    

DENSE_RANK

DENSE_RANK()            
  • Description: This function returns the ranking of each value in a set of values.

    DENSE_RANK() has similar features with RANK(), but the associated values of DENSE_RANK() do not produce gaps in the sequence.

  • Return value type: BIGINT.
  • Example:
        select year,country,product,profit,dense_rank() over (partition by country order by profit) as dense_rank from testwindow;
        +------+---------+------------+--------+------------+
        | year | country | product    | profit | dense_rank |
        +------+---------+------------+--------+------------+
        | 2001 | Finland | Phone      |     10 |          1 |
        | 2000 | Finland | Computer   |   1500 |          2 |
        | 2001 | USA     | Calculator |     50 |          1 |
        | 2001 | USA     | Computer   |   1500 |          2 |
        | 2000 | India   | Calculator |     75 |          1 |
        | 2000 | India   | Calculator |     75 |          1 |
        | 2001 | India   | Calculator |     79 |          2 |                   

NTILE

NTILE(n)            
  • Description: This function divides data within each window partition into n buckets with bucket numbered from 1 to n.

    The maximum interval between bucket numbers is 1. If the data rows within the window partition are not evenly distributed to each bucket, the remaining data will be distributed form the first bucket with 1 row of data for each bucket. For example, if there are six rows and four buckets, rows will be distributed to the buckets in the following manner: 1, 1, 2, 2, 3, and 4.

  • Return value type: BIGINT.
  • Example:
        select year,country,product,profit,ntile(2) over (partition by country order by profit) as ntile2 from testwindow;
        +------+---------+------------+--------+--------+
        | year | country | product    | profit | ntile2 |
        +------+---------+------------+--------+--------+
        | 2001 | USA     | Calculator |     50 |      1 |
        | 2001 | USA     | Computer   |   1500 |      2 |
        | 2001 | Finland | Phone      |     10 |      1 |
        | 2000 | Finland | Computer   |   1500 |      2 |
        | 2000 | India   | Calculator |     75 |      1 |
        | 2000 | India   | Calculator |     75 |      1 |
        | 2001 | India   | Calculator |     79 |      2 |                    

ROW_NUMBER

ROW_NUMBER()            
  • Description: This function returns a unique and sequential number for each row based on the sequence of the row within the window partition, starting from 1.
  • Return value type: BIGINT.
  • Example:
        SELECT year, country, product, profit, ROW_NUMBER() OVER(PARTITION BY country) AS row_num1 FROM testwindow;
        +------+---------+------------+--------+----------+
        | year | country | product    | profit | row_num1 |
        +------+---------+------------+--------+----------+
        | 2001 | USA     | Calculator |     50 |        1 |
        | 2001 | USA     | Computer   |   1500 |        2 |
        | 2000 | India   | Calculator |     75 |        1 |
        | 2000 | India   | Calculator |     75 |        2 |
        | 2001 | India   | Calculator |     79 |        3 |
        | 2000 | Finland | Computer   |   1500 |        1 |
        | 2001 | Finland | Phone      |     10 |        2 |                    

PERCENT_RANK

PERCENT_RANK()            
  • Description: This function returns the ranking percentage of each data in a dataset, which is calculated by (r - 1)/(n - 1). r is the ranking of the current row by RANK() and n is the total number of rows within the current window partition.
  • Return value type: DOUBLE.
  • Example:
        select year,country,product,profit,PERCENT_RANK() over (partition by country order by profit) as ntile3 from testwindow;
        +------+---------+------------+--------+--------+
        | year | country | product    | profit | ntile3 |
        +------+---------+------------+--------+--------+
        | 2001 | Finland | Phone      |     10 |    0.0 |
        | 2000 | Finland | Computer   |   1500 |    1.0 |
        | 2001 | USA     | Calculator |     50 |    0.0 |
        | 2001 | USA     | Computer   |   1500 |    1.0 |
        | 2000 | India   | Calculator |     75 |    0.0 |
        | 2000 | India   | Calculator |     75 |    0.0 |
        | 2001 | India   | Calculator |     79 |    1.0 |                    

FIRST_VALUE

FIRST_VALUE(x)        
  • Description: This function returns the value of the first row within the window partition.
  • Return value type: the same as the input argument type.
  • Example:
        select year,country,product,profit,first_value(profit) over (partition by country order by profit) as firstValue from testwindow;
        +------+---------+------------+--------+------------+
        | year | country | product    | profit | firstValue |
        +------+---------+------------+--------+------------+
        | 2000 | India   | Calculator |     75 |         75 |
        | 2000 | India   | Calculator |     75 |         75 |
        | 2001 | India   | Calculator |     79 |         75 |
        | 2001 | USA     | Calculator |     50 |         50 |
        | 2001 | USA     | Computer   |   1500 |         50 |
        | 2001 | Finland | Phone      |     10 |         10 |
        | 2000 | Finland | Computer   |   1500 |         10 |                

LAST_VALUE

LAST_VALUE(x)            
  • Description: This function returns the value of the last row within the window partition.
  • Return value type: the same as the input argument type.
  • Example:
    select year,country,product,profit,last_value(profit) over (partition by country order by profit) as firstValue from testwindow;
    +----------------+-------------------+-------------------+------------------+----------------------+
    | year           | country           | product           | profit           | firstValue           |
    +----------------+-------------------+-------------------+------------------+----------------------+
    |           2001 | USA               | Calculator        |               50 |                   50 |
    |           2001 | USA               | Computer          |             1500 |                 1500 |
    |           2001 | Finland           | Phone             |               10 |                   10 |
    |           2000 | Finland           | Computer          |             1500 |                 1500 |
    |           2000 | India             | Calculator        |               75 |                   75 |
    |           2000 | India             | Calculator        |               75 |                   75 |
    |           2001 | India             | Calculator        |               79 |                   79 |                 

LAG

LAG(x[, offset[, default_value]])           
  • Descriptions: This function returns the value of the row that precedes the current row by offset rows in the window.

    The starting offset value is 0. The offset starts from the current data row. The offset can be a scalar expression. The default offset is 1.

    If the value of offset is null or is greater than the window length, default_value is returned. If default_value is not specified, null is returned.

  • Return value type: the same as the input argument type.
  • Example:
        select year,country,product,profit,lag(profit) over (partition by country order by profit) as lag from testwindow;
        +------+---------+------------+--------+------+
        | year | country | product    | profit | lag  |
        +------+---------+------------+--------+------+
        | 2001 | USA     | Calculator |     50 | NULL |
        | 2001 | USA     | Computer   |   1500 |   50 |
        | 2000 | India   | Calculator |     75 | NULL |
        | 2000 | India   | Calculator |     75 |   75 |
        | 2001 | India   | Calculator |     79 |   75 |
        | 2001 | Finland | Phone      |     10 | NULL |
        | 2000 | Finland | Computer   |   1500 |   10 |                    

LEAD

LEAD(x[,offset[, default_value]])            
  • Descriptions: This function returns the value of the row that follows the current row by offset rows in the window.

    The starting offset value is 0. The offset starts from the current data row. The offset can be a scalar expression. The default offset is 1.

    If the value of offset is null or is greater than the window length, default_value is returned. If default_value is not specified, null is returned.

  • Return value type: the same as the input argument type.
  • Example:
        select year,country,product,profit,lead(profit) over (partition by country order by profit) as lead from testwindow;
        +------+---------+------------+--------+------+
        | year | country | product    | profit | lead |
        +------+---------+------------+--------+------+
        | 2000 | India   | Calculator |     75 |   75 |
        | 2000 | India   | Calculator |     75 |   79 |
        | 2001 | India   | Calculator |     79 | NULL |
        | 2001 | Finland | Phone      |     10 | 1500 |
        | 2000 | Finland | Computer   |   1500 | NULL |
        | 2001 | USA     | Calculator |     50 | 1500 |
        | 2001 | USA     | Computer   |   1500 | NULL |                    

NTH_VALUE

NTH_VALUE(x, offset)            
  • Description: This function returns the value of the row by the specified number of offset rows in the window. The offset starts from 1.

    If offset is null or greater than the number of values in the window, null is returned. If offset is 0 or negative, an error is prompted.

  • Return value type: the same as the input argument type.
  • Example:
        select year,country,product,profit,nth_value(profit,1) over (partition by country order by profit) as nth_value from testwindow;
        +------+---------+------------+--------+-----------+
        | year | country | product    | profit | nth_value |
        +------+---------+------------+--------+-----------+
        | 2001 | Finland | Phone      |     10 |        10 |
        | 2000 | Finland | Computer   |   1500 |        10 |
        | 2001 | USA     | Calculator |     50 |        50 |
        | 2001 | USA     | Computer   |   1500 |        50 |
        | 2000 | India   | Calculator |     75 |        75 |
        | 2000 | India   | Calculator |     75 |        75 |
        | 2001 | India   | Calculator |     79 |        75 |