All Products
Search
Document Center

MaxCompute:Window functions

Last Updated:Mar 26, 2026

Window functions compute aggregations over partitions of a DataFrame while preserving the original row structure. Unlike groupby aggregations that collapse each group into a single summary row, window functions return one result per input row — making them the right tool for rankings, running totals, and row-offset lookups.

The PyODPS DataFrame API supports window functions through groupby combined with mutate or column selection.

Usage examples

The examples below use the pyodps_iris table. For details on the dataset, see Use DataFrame to process data.

Group by name and apply window functions with mutate

mutate applies window functions across each group and returns a DataFrame with the same number of rows as the original. The following example groups by name and computes a cumulative sum and a row number within each group.

iris = DataFrame(o.get_table('pyodps_iris'))
grouped = iris.groupby('name')
print(grouped.mutate(grouped.sepallength.cumsum(), grouped.sort('sepallength').row_number()).head(10))

Output:

            name  sepallength_sum  row_number
0  Iris-setosa            250.3           1
1  Iris-setosa            250.3           2
2  Iris-setosa            250.3           3
3  Iris-setosa            250.3           4
4  Iris-setosa            250.3           5
5  Iris-setosa            250.3           6
6  Iris-setosa            250.3           7
7  Iris-setosa            250.3           8
8  Iris-setosa            250.3           9
9  Iris-setosa            250.3          10

Select columns using a window function

Pass a window function expression directly as a column in a selection to add computed columns alongside existing ones.

iris = DataFrame(o.get_table('pyodps_iris'))
print(iris['name', 'sepallength', iris.groupby('name').sort('sepallength').sepallength.cumcount()].head(5))

Output:

          name  sepallength  sepallength_count
0  Iris-setosa          4.3                  1
1  Iris-setosa          4.4                  2
2  Iris-setosa          4.4                  3
3  Iris-setosa          4.4                  4
4  Iris-setosa          4.5                  5

Aggregate by scalar

Grouping by Scalar(1) treats the entire DataFrame as a single group, applying the window function globally. The processing logic is identical to group-based aggregation.

from odps.df import Scalar
iris = DataFrame(o.get_table('pyodps_iris'))
iris.groupby(Scalar(1)).sort('sepallength').sepallength.cumcount()

Supported window functions

Ranking functions

FunctionDescription
rankReturns the rank of a row within its group. Rows with equal values receive the same rank, leaving gaps in the sequence.
dense_rankReturns the dense rank of a row within its group. Equal values receive the same rank with no gaps.
percent_rankReturns the relative rank of a row within its group.
row_numberReturns the sequential row number within the group, starting from 1.

Analytic functions

FunctionDescription
lagReturns the value from the row that is offset rows before the current row. If no such row exists, returns default.
leadReturns the value from the row that is offset rows after the current row. If no such row exists, returns default.
nth_valueReturns the Nth value within the group.
cume_distReturns the proportion of rows in the group whose values are less than or equal to the current row's value.
qcutDivides the group into N bins based on the data sequence and returns the bin number for each row. If data cannot be evenly distributed, extra rows go into the first bin by default.

Cumulative functions

FunctionDescription
cumsumCalculates the cumulative sum within each group.
cummeanCalculates the cumulative mean within each group.
cummedianCalculates the cumulative median within each group.
cumstdCalculates the cumulative standard deviation within each group.
cummaxCalculates the cumulative maximum within each group.
cumminCalculates the cumulative minimum within each group.
cumcountCalculates the cumulative count within each group.

Parameters

Ranking functions

rank, dense_rank, percent_rank, and row_number accept the following parameters.

ParameterDefaultDescription
sort"" (empty string)The column to sort by within the group.
ascendingTrueSort order. Set to False for descending order.

lag and lead

In addition to the ranking function parameters, lag and lead accept:

ParameterDefaultDescription
offsetNumber of rows before (for lag) or after (for lead) the current row to retrieve.
defaultValue to return when the target row does not exist (for example, looking before the first row or after the last row).

Cumulative functions

In addition to the ranking function parameters, cumsum, cummean, cummedian, cumstd, cummax, cummin, and cumcount accept:

ParameterDefaultDescription
uniqueFalseSet to True to deduplicate values before computing.
precedingThe start of the window frame (number of preceding rows to include).
followingThe end of the window frame (number of following rows to include).