AnalyticDB for MySQL supports the following window functions.
- Aggregate functions
- Sorting functions
- CUME_DIST: returns the cumulative distribution of each value within a set of values.
- RANK: returns the rank of each value within a dataset.
- DENSE_RANK: returns the rank of each value within a set of values.
- NTILE: divides data within each window partition into n buckets. Buckets are numbered from 1 to n.
- ROW_NUMBER: returns a unique and sequential number for each row based on the sequence of the row within the window partition, starting from 1.
- PERCENT_RANK: returns the ranking percentage of each value in a dataset in the format of
(r - 1)/(n - 1)
. r is the rank of the current row calculated by RANK(), and n is the total number of rows within the current window partition.
- Value functions
- FIRST_VALUE: returns the value of the first row within the window partition.
- LAST_VALUE: returns the value of the last row within the window partition.
- LAG: returns the value of the row that precedes the current row by offset rows in the window.
- LEAD: returns the value of the row that follows the current row by offset rows in the window.
- NTH_VALUE: returns the value of the row by the specified number of offset rows in the window. The offset starts from 1.
Overview
Window functions calculate an aggregate value based on row data from the query result.
Window functions run after the HAVING
clause and before the ORDER BY
clause. A window function can be triggered after you use an OVER
clause to specify a window.
AnalyticDB for MySQL supports three types of window functions: aggregate functions, sorting functions, and value functions.
Syntax
function over (partition by a order by b RANGE|ROWS BETWEEN start AND end)
A window function contains the following parts:
- Partition rule: divides input rows into different partitions. The process is similar
to the division process of the
GROUP BY
clause. - Sorting rule: determines the order in which input rows are executed in the window function.
- Window frame: specifies the window boundary of the computed data.
A window frame supports the
RANGE
andROWS
modes:RANGE
defines the range of column values.ROWS
defines the number of rows in a column.- For
RANGE
andROWS
, you can useBETWEEN start AND end
to specify the boundary value. Valid values for the arguments inBETWEEN start AND end
:CURRENT ROW
: the current rowN PRECEDING
: the precedingn
rowsUNBOUNDED PRECEDING
: till thefirst
rowN FOLLOWING
: the followingn
rowsUNBOUNDED FOLLOWING
: till thelast
row
For example, the following query calculates the partial sum of profit
based on each row of data in the current window:
select year,country,profit,sum(profit) over (partition by country order by year ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as slidewindow from testwindow;
+------+---------+--------+-------------+
| year | country | profit | slidewindow |
+------+---------+--------+-------------+
| 2001 | USA | 50 | 50 |
| 2001 | USA | 1500 | 1550 |
| 2000 | India | 75 | 75 |
| 2000 | India | 75 | 150 |
| 2001 | India | 79 | 229 |
| 2000 | Finland | 1500 | 1500 |
| 2001 | Finland | 10 | 1510 |
The following query can only calculate the total sum of profit
:
select country,sum(profit) over (partition by country) from testwindow;
+---------+-----------------------------------------+
| country | sum(profit) OVER (PARTITION BY country) |
+---------+-----------------------------------------+
| India | 229 |
| India | 229 |
| India | 229 |
| USA | 1550 |
| USA | 1550 |
| Finland | 1510 |
| Finland | 1510 |
Precautions
Make sure that the boundary values meet the following requirements:
start
cannot beUNBOUNDED FOLLOWING
. Otherwise, theWindow frame start cannot be UNBOUNDED FOLLOWING
error is prompted.end
cannot beUNBOUNDED PRECEDING
. Otherwise, theWindow frame end cannot be UNBOUNDED PRECEDING
error is prompted.- When
start
isCURRENT ROW
andend
isN PRECEDING
, theWindow frame starting from CURRENT ROW cannot end with PRECEDING
error is prompted. - When
start
isN FOLLOWING
andend
isN PRECEDING
, theWindow frame starting from FOLLOWING cannot end with PRECEDING
error is prompted. - When
start
isN FOLLOWING
andend
isCURRENT ROW
, theWindow frame starting from FOLLOWING cannot end with CURRENT ROW
error is prompted.
When the window frame is in RANGE
mode:
- When
start
orend
isN PRECEDING
, theWindow frame RANGE PRECEDING is only supported with UNBOUNDED
error is prompted. - When
start
orend
isN FOLLOWING
, theWindow frame RANGE FOLLOWING is only supported with UNBOUNDED
error is prompted.
Preparations
The data from the testwindow
table is used in the examples for the window functions in this topic.
create table testwindow(year int, country varchar(20), product varchar(20), profit int) distributed by hash(year);
insert into testwindow values (2000,'Finland','Computer',1500);
insert into testwindow values (2001,'Finland','Phone',10);
insert into testwindow values (2000,'India','Calculator',75);
insert into testwindow values (2000,'India','Calculator',75);
insert into testwindow values (2001,'India','Calculator',79);
insert into testwindow values (2001,'USA','Calculator',50);
insert into testwindow values (2001,'USA','Computer',1500);
SELECT * FROM testwindow;
+------+---------+------------+--------+
| year | country | product | profit |
+------+---------+------------+--------+
| 2000 | Finland | Computer | 1500 |
| 2001 | Finland | Phone | 10 |
| 2000 | India | Calculator | 75 |
| 2000 | India | Calculator | 75 |
| 2001 | India | Calculator | 79 |
| 2001 | USA | Calculator | 50 |
| 2001 | USA | Computer | 1500 |
Aggregate functions
All aggregate functions can be used as window functions by adding the OVER
clause. An aggregate function computes each row of data based on the rows within
the current sliding window. For more information, see Aggregate functions.
For example, the following query produces a rolling sum of order prices by date for each clerk:
SELECT clerk, orderdate, orderkey, totalprice,sum(totalprice) OVER (PARTITION BY clerk ORDER BY orderdate) AS rolling_sum FROM orders ORDER BY clerk, orderdate, orderkey
CUME_DIST
CUME_DIST()
- Description: This function returns the cumulative distribution of each value in a
set of values.
Return result: the dataset after the window is ranked within the window partition, including the current row and the number of data rows preceding the current row. Any associated values in the sorting are calculated to the same distribution value.
- Return value type: DOUBLE.
- Example:
select year,country,product,profit,cume_dist() over (partition by country order by profit) as cume_dist from testwindow; +------+---------+------------+--------+--------------------+ | year | country | product | profit | cume_dist | +------+---------+------------+--------+--------------------+ | 2001 | USA | Calculator | 50 | 0.5 | | 2001 | USA | Computer | 1500 | 1.0 | | 2001 | Finland | Phone | 10 | 0.5 | | 2000 | Finland | Computer | 1500 | 1.0 | | 2000 | India | Calculator | 75 | 0.6666666666666666 | | 2000 | India | Calculator | 75 | 0.6666666666666666 | | 2001 | India | Calculator | 79 | 1.0 |
RANK
RANK()
- Description: This function returns the rank of each value in a dataset.
The rank value is the number of a row preceding the current row plus one but does not include the number of the current row. Therefore, associated values in the ordering may produce gaps in the sequence. The rank is calculated for each window partition.
- Return value type: BIGINT.
- Example:
select year,country,product,profit,rank() over (partition by country order by profit) as rank from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | rank | +------+---------+------------+--------+------+ | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | India | Calculator | 75 | 1 | | 2000 | India | Calculator | 75 | 1 | | 2001 | India | Calculator | 79 | 3 |
DENSE_RANK
DENSE_RANK()
- Description: This function returns the rank of each value in a set of values.
DENSE_RANK()
has similar features asRANK()
, but the associated values ofDENSE_RANK()
do not produce gaps in the sequence. - Return value type: BIGINT.
- Example:
select year,country,product,profit,dense_rank() over (partition by country order by profit) as dense_rank from testwindow; +------+---------+------------+--------+------------+ | year | country | product | profit | dense_rank | +------+---------+------------+--------+------------+ | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | India | Calculator | 75 | 1 | | 2000 | India | Calculator | 75 | 1 | | 2001 | India | Calculator | 79 | 2 |
NTILE
NTILE(n)
- Description: This function divides data within each window partition into
n
buckets. Buckets are numbered from1
ton
.The maximum interval between bucket numbers is
1
. If the data rows within the window partition are not evenly distributed to each bucket, the remaining data is distributed form thefirst
bucket with1
row of data foreach
bucket. For example, if six rows exist for four buckets, rows are distributed to the buckets in the following manner:1, 1, 2, 2, 3, and 4
. - Return value type: BIGINT.
- Example:
select year,country,product,profit,ntile(2) over (partition by country order by profit) as ntile2 from testwindow; +------+---------+------------+--------+--------+ | year | country | product | profit | ntile2 | +------+---------+------------+--------+--------+ | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2000 | India | Calculator | 75 | 1 | | 2000 | India | Calculator | 75 | 1 | | 2001 | India | Calculator | 79 | 2 |
ROW_NUMBER
ROW_NUMBER()
- Description: This function returns a unique and sequential number for each row based
on the sequence of the row within the window partition, starting from
1
. - Return value type: BIGINT.
- Example:
SELECT year, country, product, profit, ROW_NUMBER() OVER(PARTITION BY country) AS row_num1 FROM testwindow; +------+---------+------------+--------+----------+ | year | country | product | profit | row_num1 | +------+---------+------------+--------+----------+ | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | India | Calculator | 75 | 1 | | 2000 | India | Calculator | 75 | 2 | | 2001 | India | Calculator | 79 | 3 | | 2000 | Finland | Computer | 1500 | 1 | | 2001 | Finland | Phone | 10 | 2 |
PERCENT_RANK
PERCENT_RANK()
- Description: This function returns the ranking percentage of each value in a dataset
in the format of
(r - 1)/(n - 1)
.r
is the rank of the current row calculated byRANK()
, andn
is the total number of rows within the current window partition. - Return value type: DOUBLE.
- Example:
select year,country,product,profit,PERCENT_RANK() over (partition by country order by profit) as ntile3 from testwindow; +------+---------+------------+--------+--------+ | year | country | product | profit | ntile3 | +------+---------+------------+--------+--------+ | 2001 | Finland | Phone | 10 | 0.0 | | 2000 | Finland | Computer | 1500 | 1.0 | | 2001 | USA | Calculator | 50 | 0.0 | | 2001 | USA | Computer | 1500 | 1.0 | | 2000 | India | Calculator | 75 | 0.0 | | 2000 | India | Calculator | 75 | 0.0 | | 2001 | India | Calculator | 79 | 1.0 |
FIRST_VALUE
FIRST_VALUE(x)
- Description: This function returns the value of the first row within the window partition.
- Return value type: the same as the input argument type.
- Example:
select year,country,product,profit,first_value(profit) over (partition by country order by profit) as firstValue from testwindow; +------+---------+------------+--------+------------+ | year | country | product | profit | firstValue | +------+---------+------------+--------+------------+ | 2000 | India | Calculator | 75 | 75 | | 2000 | India | Calculator | 75 | 75 | | 2001 | India | Calculator | 79 | 75 | | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 50 | | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 10 |
LAST_VALUE
LAST_VALUE(x)
- Description: This function returns the value of the last row within the window partition. The default window frame of LAST_VALUE is rows between unbounded preceding and current row, which compares the data in the current row and that in all preceding rows. If you want LAST_VALUE to return the value of the last row, add the following statement after the ORDER BY clause: rows between unbounded preceding and unbounded following.
- Return value type: the same as the input argument type.
- Example 1:
select year,country,product,profit,last_value(profit) over (partition by country order by profit) as firstValue from testwindow; +----------------+-------------------+-------------------+------------------+----------------------+ | year | country | product | profit | firstValue | +----------------+-------------------+-------------------+------------------+----------------------+ | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 1500 | | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 1500 | | 2000 | India | Calculator | 75 | 75 | | 2000 | India | Calculator | 75 | 75 | | 2001 | India | Calculator | 79 | 79 |
- Example 2:
select year,country,product,profit,last_value(profit) over (partition by country order by profit rows between unbounded preceding and unbounded following) as lastValue from testwindow; +------+---------+------------+--------+-----------+ | year | country | product | profit | lastValue | +------+---------+------------+--------+-----------+ | 2001 | Finland | Phone | 10 | 1500 | | 2000 | Finland | Computer | 1500 | 1500 | | 2000 | India | Calculator | 75 | 79 | | 2000 | India | Calculator | 75 | 79 | | 2001 | India | Calculator | 79 | 79 | | 2001 | USA | Calculator | 50 | 1500 | | 2001 | USA | Computer | 1500 | 1500 | +------+---------+------------+--------+-----------+
LAG
LAG(x[, offset[, default_value]])
- Description: This function returns the value of the row that precedes the current
row by
offset
rows in the window.The starting offset value is
0
. The offset starts from the current data row. The offset can be a scalar expression. The defaultoffset
value is1
.If the offset value is
null
or is greater than the window length,default_value
is returned. Ifdefault_value
is not specified,null
is returned. - Return value type: the same as the input argument type.
- Example:
select year,country,product,profit,lag(profit) over (partition by country order by profit) as lag from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | lag | +------+---------+------------+--------+------+ | 2001 | USA | Calculator | 50 | NULL | | 2001 | USA | Computer | 1500 | 50 | | 2000 | India | Calculator | 75 | NULL | | 2000 | India | Calculator | 75 | 75 | | 2001 | India | Calculator | 79 | 75 | | 2001 | Finland | Phone | 10 | NULL | | 2000 | Finland | Computer | 1500 | 10 |
LEAD
LEAD(x[,offset[, default_value]])
- Description: This function returns the value of the row that follows the current row
by
offset
rows in the window.The starting
offset
value is0
. The offset starts from the current data row. The offset can be a scalar expression. The defaultoffset
value is1
.If the offset value is
null
or is greater than the window length,default_value
is returned. Ifdefault_value
is not specified,null
is returned. - Return value type: the same as the input argument type.
- Example:
select year,country,product,profit,lead(profit) over (partition by country order by profit) as lead from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | lead | +------+---------+------------+--------+------+ | 2000 | India | Calculator | 75 | 75 | | 2000 | India | Calculator | 75 | 79 | | 2001 | India | Calculator | 79 | NULL | | 2001 | Finland | Phone | 10 | 1500 | | 2000 | Finland | Computer | 1500 | NULL | | 2001 | USA | Calculator | 50 | 1500 | | 2001 | USA | Computer | 1500 | NULL |
NTH_VALUE
NTH_VALUE(x, offset)
- Description: This function returns the value of the row by the specified number of
offset
rows in the window. The offset starts from1
.If the
offset
value isnull
or is greater than the number of values in the window,null
is returned. If theoffset
value is0
or negative, an error is prompted. - Return value type: the same as the input argument type.
- Example:
select year,country,product,profit,nth_value(profit,1) over (partition by country order by profit) as nth_value from testwindow; +------+---------+------------+--------+-----------+ | year | country | product | profit | nth_value | +------+---------+------------+--------+-----------+ | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 10 | | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 50 | | 2000 | India | Calculator | 75 | 75 | | 2000 | India | Calculator | 75 | 75 | | 2001 | India | Calculator | 79 | 75 |