Window functions - MaxCompute - Alibaba Cloud Documentation Center

This topic describes window functions that are supported by the DataFrame API.

grouped = iris.groupby('name')
grouped.mutate(grouped.sepallength.cumsum(), grouped.sort('sepallength').row_number()).head(10)
          name  sepallength_sum  row_number
0  Iris-setosa            250.3           1
1  Iris-setosa            250.3           2
2  Iris-setosa            250.3           3
3  Iris-setosa            250.3           4
4  Iris-setosa            250.3           5
5  Iris-setosa            250.3           6
6  Iris-setosa            250.3           7
7  Iris-setosa            250.3           8
8  Iris-setosa            250.3           9
9  Iris-setosa            250.3          10

Use window functions to select columns.

iris['name', 'sepallength', iris.groupby('name').sort('sepallength').sepallength.cumcount()].head(5)
          name  sepallength  sepallength_count
0  Iris-setosa          4.3                  1
1  Iris-setosa          4.4                  2
2  Iris-setosa          4.4                  3
3  Iris-setosa          4.4                  4
4  Iris-setosa          4.5                  5

Use window functions to aggregate data by scalar. The processing method is a combination of grouping and aggregation.
```
from odps.df import Scalar
iris.groupby(Scalar(1)).sort('sepallength').sepallength.cumcount()
```

The following table lists the window functions that are supported by the DataFrame API.


Window function	Description
cumsum	Calculates the cumulative sum.
cummean	Calculates the cumulative mean.
cummedian	Calculates the cumulative median.
cumstd	Calculates the cumulative standard deviation.
cummax	Calculates the cumulative maximum.
cummin	Calculates the cumulative minimum.
cumcount	Calculates the cumulative sum.
lag	Retrieves the value of a row at a given offset that precedes the current row. The system determines the number of the row to retrieve the value based on the following formula: Number of the current row - Offset value.
lead	Retrieves the value of a row at a given offset that follows the current row. The system determines the number of the row to retrieve the value based on the following formula: Number of the current row + Offset value.
rank	Calculates the rank of a row in an ordered group.
dense_rank	Calculates the dense rank of a row in an ordered group.
percent_rank	Calculates the relative rank of a row in an ordered group.
row_number	Calculates the row number. Row numbers start from 1.
qcut	Divides a group of data into N bins based on the data sequence and returns the number of the bin that contains the current data. If data is not evenly distributed in bins, more data is distributed to the first bin by default.
nth_value	Retrieves the Nth value in the group.
cume_dist	Calculates the proportion of rows whose values are no greater than the current value to all rows in a group.

The following tables list parameters that are supported by different window functions.

The rank, dense_rank, percent_rank, and row_number window functions support the following parameters.


Parameter	Description
sort	The keyword that is used to sort rows. This parameter is an empty string by default.
ascending	Specifies whether to sort rows in ascending order. This parameter is set to True by default.

The lag and lead window functions support the following parameters in addition to the parameters that are supported by the rank function.


Parameter	Description
offset	The number of rows that are between the current row and the row where you want to retrieve data.
default	The value to return if the row at the specified offset does not exist.

The cumsum, cummax, cummin, cummean, cummedian, cumcount, and cumstd window functions support the following parameters in addition to the parameters that are supported by the rank function.


Parameter	Description
unique	Specifies whether to enable data deduplication. This parameter is set to False by default.
preceding	The start of a window.
following	The end of a window.