You can use window functions to perform complex calculations, such as group rankings, moving averages, and cumulative sums. This topic describes the window function syntax and provides examples on how to use the window functions in AnalyticDB for MySQL.
Sorting functions
CUME_DIST: returns the cumulative distribution of each value in a set of values.
RANK: returns the rank of each value in a dataset.
DENSE_RANK: returns the rank of each value in a set of values.
NTILE: distributes data within each window partition into n buckets. Buckets are numbered from 1 to n.
ROW_NUMBER: returns a unique and sequential number for each row based on the sequence of the row within the window partition, starting from 1.
PERCENT_RANK: returns the ranking percentage of each value in a dataset in the
(r - 1)/(n - 1)format. r is the rank of the current row calculated by using the RANK() function, and n is the total number of rows within the current window partition.
Value functions
FIRST_VALUE: returns the value of the first row within the window partition.
LAST_VALUE: returns the value of the last row within the window partition.
LAG: returns the value of the row that precedes the current row by offset rows in the window.
LEAD: returns the value of the row that follows the current row by offset rows in the window.
NTH_VALUE: returns the value of the row that is offset by the specified number of offset rows in the window. The offset starts from 1.
Overview
Window functions calculate an aggregate value based on row data from the query result. Window functions run after the HAVING clause and before the ORDER BY clause. A window function is triggered after you use an OVER clause to specify a window.
AnalyticDB for MySQL supports three types of window functions: aggregate functions, sorting functions, and value functions.
Syntax
function over ([partition by a] order by b RANGE|ROWS BETWEEN start AND end) A window function contains the following parts:
Partition rule: divides input rows into different partitions. The process is similar to the grouping process of the
GROUP BYclause. The partition rule part is optional.Sorting rule: determines the order in which input rows are executed in the window function.
Window frame: specifies the boundary of the data on which the window function performs calculations.
A window frame supports the
RANGEandROWSmodes:RANGEdefines the range of column values.ROWSdefines the number of rows relative to the current row.For
RANGEandROWS, you can useBETWEEN start AND endto specify the boundary value. Valid values for the arguments inBETWEEN start AND end:CURRENT ROW: the current row.N PRECEDING: the precedingnrows.UNBOUNDED PRECEDING: the rows from thefirstrow to the current row.N FOLLOWING: the followingnrows.UNBOUNDED FOLLOWING: the rows from the current row to thelastrow.
For example, the following query calculates the partial sum of profit based on each row of data in the current window:
select year,country,profit,sum(profit) over (partition by country order by year ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as slidewindow from testwindow;
+------+---------+--------+-------------+
| year | country | profit | slidewindow |
+------+---------+--------+-------------+
| 2001 | USA | 50 | 50 |
| 2001 | USA | 1500 | 1550 |
| 2000 | Germany | 75 | 75 |
| 2000 | Germany | 75 | 150 |
| 2001 | Germany | 79 | 229 |
| 2000 | Finland | 1500 | 1500 |
| 2001 | Finland | 10 | 1510 | The following query can calculate only the total sum of profit:
select country,sum(profit) over (partition by country) from testwindow;
+---------+-----------------------------------------+
| country | sum(profit) OVER (PARTITION BY country) |
+---------+-----------------------------------------+
| Germany | 229 |
| Germany | 229 |
| Germany | 229 |
| USA | 1550 |
| USA | 1550 |
| Finland | 1510 |
| Finland | 1510 | Usage notes
Make sure that the boundary values meet the following requirements:
startcannot beUNBOUNDED FOLLOWING. Otherwise, theWindow frame start cannot be UNBOUNDED FOLLOWINGerror is returned.endcannot beUNBOUNDED PRECEDING. Otherwise, theWindow frame end cannot be UNBOUNDED PRECEDINGerror is returned.When
startisCURRENT ROWandendisN PRECEDING, theWindow frame starting from CURRENT ROW cannot end with PRECEDINGerror is returned.When
startisN FOLLOWINGandendisN PRECEDING, theWindow frame starting from FOLLOWING cannot end with PRECEDINGerror is returned.When
startisN FOLLOWINGandendisCURRENT ROW, theWindow frame starting from FOLLOWING cannot end with CURRENT ROWerror is returned.
When the window frame is in RANGE mode, the following rules apply:
When
startorendisN PRECEDING, theWindow frame RANGE PRECEDING is only supported with UNBOUNDEDerror is returned.When
startorendisN FOLLOWING, theWindow frame RANGE FOLLOWING is only supported with UNBOUNDEDerror is returned.
Preparations
In this topic, data from the testwindow table is used in the examples of window functions.
create table testwindow(year int, country varchar(20), product varchar(20), profit int) distributed by hash(year); insert into testwindow values (2000,'Finland','Computer',1500);
insert into testwindow values (2001,'Finland','Phone',10);
insert into testwindow values (2000,'Germany','Calculator',75);
insert into testwindow values (2000,'Germany','Calculator',75);
insert into testwindow values (2001,'Germany','Calculator',79);
insert into testwindow values (2001,'USA','Calculator',50);
insert into testwindow values (2001,'USA','Computer',1500); SELECT * FROM testwindow;
+------+---------+------------+--------+
| year | country | product | profit |
+------+---------+------------+--------+
| 2000 | Finland | Computer | 1500 |
| 2001 | Finland | Phone | 10 |
| 2000 | Germany | Calculator | 75 |
| 2000 | Germany | Calculator | 75 |
| 2001 | Germany | Calculator | 79 |
| 2001 | USA | Calculator | 50 |
| 2001 | USA | Computer | 1500 | Aggregate functions
All aggregate functions can be used as window functions by adding an OVER clause. An aggregate function calculates each row of data based on the rows within the current sliding window. For more information, see Aggregate functions.
For example, the following query produces a rolling sum of order prices by date for each clerk:
SELECT clerk, orderdate, orderkey, totalprice,sum(totalprice) OVER (PARTITION BY clerk ORDER BY orderdate) AS rolling_sum FROM orders ORDER BY clerk, orderdate, orderkey CUME_DIST
CUME_DIST() Description: This function returns the cumulative distribution of each value in a set of values.
Return result: the dataset obtained after sorting within the window partition, including the current row and the number of data rows preceding the current row. Any associated values in the sorting are calculated to the same distribution value.
Data type of the return value: DOUBLE.
Example:
select year,country,product,profit,cume_dist() over (partition by country order by profit) as cume_dist from testwindow; +------+---------+------------+--------+--------------------+ | year | country | product | profit | cume_dist | +------+---------+------------+--------+--------------------+ | 2001 | USA | Calculator | 50 | 0.5 | | 2001 | USA | Computer | 1500 | 1.0 | | 2001 | Finland | Phone | 10 | 0.5 | | 2000 | Finland | Computer | 1500 | 1.0 | | 2000 | Germany | Calculator | 75 | 0.6666666666666666 | | 2000 | Germany | Calculator | 75 | 0.6666666666666666 | | 2001 | Germany | Calculator | 79 | 1.0 |
RANK
RANK() Description: This function returns the rank of each value in a dataset.
The rank value is the number of rows preceding the current row plus one. The current row is not counted. Therefore, the associated values in the sorting may produce gaps in the sequence. The rank is calculated for each window partition.
Data type of the return value: BIGINT.
Example:
select year,country,product,profit,rank() over (partition by country order by profit) as rank from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | rank | +------+---------+------------+--------+------+ | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 1 | | 2001 | Germany | Calculator | 79 | 3 |
DENSE_RANK
DENSE_RANK() Description: This function returns the rank of each value in a set of values.
DENSE_RANK()andRANK()have similar features, but the associated values ofDENSE_RANK()do not produce gaps in the sequence.Data type of the return value: BIGINT.
Example:
select year,country,product,profit,dense_rank() over (partition by country order by profit) as dense_rank from testwindow; +------+---------+------------+--------+------------+ | year | country | product | profit | dense_rank | +------+---------+------------+--------+------------+ | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 1 | | 2001 | Germany | Calculator | 79 | 2 |
NTILE
NTILE(n) Description: This function distributes data within each window partition into
nbuckets. Buckets are numbered from1ton.The maximum difference between bucket numbers is
1. If the data rows within the window partition are not evenly distributed to each bucket, the remaining data is distributed from thefirstbucket withonerow of data foreachbucket. For example, if six rows and four buckets exist, rows are distributed to the buckets in the following manner:1, 1, 2, 2, 3, and 4.Data type of the return value: BIGINT.
Example:
select year,country,product,profit,ntile(2) over (partition by country order by profit) as ntile2 from testwindow; +------+---------+------------+--------+--------+ | year | country | product | profit | ntile2 | +------+---------+------------+--------+--------+ | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 1 | | 2001 | Germany | Calculator | 79 | 2 |
ROW_NUMBER
ROW_NUMBER() Description: This function returns a unique and sequential number for each row based on the sequence of the row within the window partition, starting from
1.Data type of the return value: BIGINT.
Example:
SELECT year, country, product, profit, ROW_NUMBER() OVER(PARTITION BY country) AS row_num1 FROM testwindow; +------+---------+------------+--------+----------+ | year | country | product | profit | row_num1 | +------+---------+------------+--------+----------+ | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 2 | | 2001 | Germany | Calculator | 79 | 3 | | 2000 | Finland | Computer | 1500 | 1 | | 2001 | Finland | Phone | 10 | 2 |
PERCENT_RANK
PERCENT_RANK() Description: This function returns the ranking percentage of each value in a dataset in the
(r - 1)/(n - 1)format.ris the rank of the current row calculated byRANK(), andnis the total number of rows within the current window partition.Data type of the return value: DOUBLE.
Example:
select year,country,product,profit,PERCENT_RANK() over (partition by country order by profit) as ntile3 from testwindow; +------+---------+------------+--------+--------+ | year | country | product | profit | ntile3 | +------+---------+------------+--------+--------+ | 2001 | Finland | Phone | 10 | 0.0 | | 2000 | Finland | Computer | 1500 | 1.0 | | 2001 | USA | Calculator | 50 | 0.0 | | 2001 | USA | Computer | 1500 | 1.0 | | 2000 | Germany | Calculator | 75 | 0.0 | | 2000 | Germany | Calculator | 75 | 0.0 | | 2001 | Germany | Calculator | 79 | 1.0 |
FIRST_VALUE
FIRST_VALUE(x) Description: This function returns the value of the first row within the window partition.
Data type of the return value: the same as the input argument type.
Example:
select year,country,product,profit,first_value(profit) over (partition by country order by profit) as firstValue from testwindow; +------+---------+------------+--------+------------+ | year | country | product | profit | firstValue | +------+---------+------------+--------+------------+ | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 75 | | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 50 | | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 10 |
LAST_VALUE
LAST_VALUE(x) Description: This function returns the value of the last row within the window partition. The default window frame of LAST_VALUE is rows between unbounded preceding and current row, which compares the data in the current row and that in all preceding rows. If you want LAST_VALUE to return the value of the last row, add the following clause after the ORDER BY clause: rows between unbounded preceding and unbounded following.
Data type of the return value: the same as the input argument type.
Example 1:
select year,country,product,profit,last_value(profit) over (partition by country order by profit) as firstValue from testwindow; +----------------+-------------------+-------------------+------------------+----------------------+ | year | country | product | profit | firstValue | +----------------+-------------------+-------------------+------------------+----------------------+ | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 1500 | | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 1500 | | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 79 |Example 2:
select year,country,product,profit,last_value(profit) over (partition by country order by profitrows between unbounded preceding and unbounded following) as lastValue from testwindow; +------+---------+-----------+--------+-----------+ | year | country | product | profit | lastValue | +------+---------+-----------+--------+-----------+ | 2001 | Finland | Phone | 10 | 1500 | | 2000 | Finland | Computer | 1500 | 1500 | | 2000 | Germany | Calculator| 75 | 79 | | 2000 | Germany | Calculator| 75 | 79 | | 2001 | Germany | Calculator| 79 | 79 | | 2001 | USA | Calculator| 50 | 1500 | | 2001 | USA | Computer | 1500 | 1500 | +------+---------+-----------+--------+-----------+
LAG
LAG(x[, offset[, default_value]]) Description: This function returns the value of the row that precedes the current row by
offsetrows in the window.The starting offset value is
0, which specifies the current data row. The offset value can be a scalar expression. The defaultoffsetvalue is1.If the offset value is
nullor is greater than the window length,default_valueis returned. If you do not specifydefault_value,nullis returned.Data type of the return value: the same as the input argument type.
Example:
select year,country,product,profit,lag(profit) over (partition by country order by profit) as lag from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | lag | +------+---------+------------+--------+------+ | 2001 | USA | Calculator | 50 | NULL | | 2001 | USA | Computer | 1500 | 50 | | 2000 | Germany | Calculator | 75 | NULL | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 75 | | 2001 | Finland | Phone | 10 | NULL | | 2000 | Finland | Computer | 1500 | 10 |
LEAD
LEAD(x[,offset[, default_value]]) Description: This function returns the value of the row that follows the current row by
offsetrows in the window.The starting
offsetvalue is0, which specifies the current data row. The offset value can be a scalar expression. The defaultoffsetvalue is1.If the offset value is
nullor is greater than the window length,default_valueis returned. If you do not specifydefault_value,nullis returned.Data type of the return value: the same as the input argument type.
Example:
select year,country,product,profit,lead(profit) over (partition by country order by profit) as lead from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | lead | +------+---------+------------+--------+------+ | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 79 | | 2001 | Germany | Calculator | 79 | NULL | | 2001 | Finland | Phone | 10 | 1500 | | 2000 | Finland | Computer | 1500 | NULL | | 2001 | USA | Calculator | 50 | 1500 | | 2001 | USA | Computer | 1500 | NULL |
NTH_VALUE
NTH_VALUE(x, offset) Description: This function returns the value of the row that is offset by the specified number of
offsetrows in the window. The offset starts from1.If the
offsetvalue isnullor is greater than the number of values in the window,nullis returned. If theoffsetvalue is0or negative, an error is returned.Data type of the return value: the same as the input argument type.
Example:
select year,country,product,profit,nth_value(profit,1) over (partition by country order by profit) as nth_value from testwindow; +------+---------+------------+--------+-----------+ | year | country | product | profit | nth_value | +------+---------+------------+--------+-----------+ | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 10 | | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 50 | | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 75 |