All Products
Search
Document Center

Description of window functions

Last Updated: Jun 18, 2021

Analytic functions (also called window functions) and aggregate functions perform aggregate operations on a group of rows (a set of rows). The difference is that aggregate functions return a value (a row) for each group, but window functions return multiple values (multiple rows) for each group. A group of rows is also known as a window and is defined by analytic_clause. Window sizes depend on the actual number of rows or a logical interval, such as time.

To trigger an analytic function, you must use a special keyword OVER to specify a window. A window consists of three parts:

  • Partitioning specifications: used to split input rows into different partitions. This process is similar to the splitting process of the GROUP BY clause.

  • Sorting specifications: used to determine the order in which input data rows are executed in the window function.

  • Window boundary: specifies a window boundary for calculating data. The default value is RANGE UNBOUNDED PRECEDING. This boundary contains all data in all the rows that range from the start row to the current row in the current partition.

Analytic functions are the last group of operations that are performed in a query except for the final ORDER BY clause. Before window functions are processed, all the JOIN operations and WHERE, GROUP BY, and HAVING clauses must be completed. Therefore, window functions can appear in only the select list or the ORDER BY clause.

Analytic functions are generally used to calculate cumulative, moving, centered, and reporting aggregates.

Syntax

analytic_function

analytic_function([ arguments ]) OVER (analytic_clause)

analytic_clause

[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]

query_partition_clause

PARTITION BY { expr[, expr ]... | ( expr[, expr ]... ) }

order_by_clause

ORDER [ SIBLINGS ] BY{ expr | position | c_alias } [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ] [, { expr | position | c_alias } [ ASC | DESC ][ NULLS FIRST | NULLS LAST ]]...

windowing_clause

{ ROWS | RANGE } { BETWEEN { UNBOUNDED PRECEDING | CURRENT ROW | value_expr { PRECEDING | FOLLOWING } } AND{ UNBOUNDED FOLLOWING | CURRENT ROW | value_expr { PRECEDING | FOLLOWING } } | { UNBOUNDED PRECEDING | CURRENT ROW| value_expr PRECEDING}}

The following sections describe the semantics of the syntax.

analytic_function

analytic_function specifies the name of an analytic function.

arguments

The parameters (arguments). Analytic functions use 0 to 3 parameters. The parameters can be numeric data types or non-numeric data types that can be implicitly converted to numeric data types. ApsaraDB for OceanBase determines the parameter that has the highest numeric precedence based on the precedence of data types. Then, it implicitly converts the remaining parameters to the data type of the parameter that has the highest numeric precedence. The return type is also the data type of the parameter that has the highest numeric precedence, unless otherwise noted for a single function.

analytic_clause

The analytic clause (analytic_clause). Use OVER analytic_clause to indicate that the function performs operations on a query result set. This clause is calculated after the FROM, WHERE, GROUP BY, and HAVING clauses. You can use this clause to specify analytic functions in the select list or the ORDER BY clause. If you need to filter the results of a query based on an analytic function, nest these functions in the parent query and filter the results of the nested subquery.

Notice

  • You cannot nest analytic functions by specifying analytic functions in analytic_clause. However, you can specify an analytic function in a subquery and calculate another analytic function over the subquery.

  • You can use user-defined analytic functions and built-in analytic functions to specify analytic_clause.

query_partition_clause

The partitioning clause (query_partition_clause). Use the PARTITION BY clause to partition a query result set to groups based on one or more value_expr. If you omit this clause, the function considers all the rows in the query result set as a single group.

You can specify multiple analytic functions in the same query. Each function has the same or different PARTITION BY keys. If you use query_partition_clause to specify an analytic function and the queried objects have the parallel attribute, the function calculations are also parallelized.

Valid values of value_expr are constants, columns, non-analytic functions, function expressions, or expressions that involve one of them.

order_by_clause

Use the sorting clause order_by_clause to specify how data is sorted in a partition. For all the analytic functions, you can sort values in a partition on multiple keys. Each key is defined by value_expr and is qualified by a sorting sequence.

In each function, you can specify multiple sorting expressions. This is especially useful when you use the functions that sort values.

When order_by_clause generates the identical values for multiple rows, the function has the following behavior:

  • CUME_DIST, DENSE_RANK, NTILE, PERCENT_RANK, and RANK return the same result for each row.

  • ROW_NUMBER assigns a distinct value to each row even if a value that is based on order_by_clause is available. The value is based on the order in which the row is processed. This order may be nondeterministic if ORDER BY cannot implement a total sorting.

  • For other analytic functions, the result depends on the window rules. If you specify a logical window that has the RANGE keyword, the function returns the same result for each row. If you use the ROWS keyword to specify a physical window, the result is nondeterministic.

Limits on the ORDER BY clause

The ORDER BY clause is subject to the following limits:

  • In analytic functions, order_by_clause must use an expression (expr). The SIBLINGS keyword is invalid. This keyword is relevant in only hierarchical queries. Position (position) and column aliases (c_alias) are also invalid. Otherwise, this order_by_clause clause is the same as the sorting command of an overall query or a subquery.

  • An analytic function that uses the RANGE keyword can use multiple sort keys in the ORDER BY clause of this function. You must specify the following windows:

    • RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, RANGE UNBOUNDED PRECEDING for short

    • RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

    • RANGE BETWEEN CURRENT ROW AND CURRENT ROW

    • RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

Window boundaries other than the preceding four windows can have only one sort key in the ORDER BY clause of the analytic function. This limit does not apply to the window boundaries that are specified by the ROW keyword.

ASC or DESC keyword

Specify the sorting sequence. ASC refers to the ascending order and DESC refers to the descending order. The default value is the ascending order (ASC).

NULLS FIRST or NULLS LAST keyword

nulls first and nulls last in order_by_clause. nulls first indicates that NULL values are processed as minimum values during sorting. nulls last indicates that NULL values are processed as maximum values during sorting.

windowing_clause

The window function clause (windowing_clause). You can use windowing_clause in some analytic functions. The following keywords are relevant:

ROWS or RANGE keyword

These keywords define a window for calculating the function result for each row. Then, the function is applied to all the rows in the window. The window moves through a query result set or a partition from top to bottom. A window is also called FRAME. ApsaraDB for OceanBase supports the following window statements:

  • ROWS: specifies the window in physical units (rows).

  • RANGE: specifies the window as a logical offset. The default method is RANGE UNBOUNDED PRECEDING. You can use window functions in analytic functions. To use windowing_clause, you must add order_by_clause.

If the window boundaries are defined by the RANGE clause in windowing_clause, you can specify only one expression in order_by_clause. For more information, see the limits on the ORDER BY clause. The value returned by an analytic function that has a logical offset is always deterministic. However, the value returned by an analytic function that has a physical offset may generate nondeterministic results. An analytic function that has a physical offset can return a deterministic value only when the sorting expression returns a unique sorting. Therefore, you must specify multiple columns in order_by_clause to implement the unique sorting.

BETWEEN ... AND keyword

Use the BETWEEN ... AND clause to specify the start point and end point of the window. The first expression (before AND) defines the start point. The second expression (after AND) defines the end point. If you omit BETWEEN and specify only one end point, ApsaraDB for OceanBase considers this point as the start point and the current row as the default end point.

UNBOUNDED PRECEDING keyword

UNBOUNDED PRECEDING indicates that the window starts at the first row of the partition. This is the start point instead of the end point.

UNBOUNDED FOLLOWING keyword

UNBOUNDED FOLLOWING indicates that the window ends at the last row of the partition. This is the end point instead of the start point.

CURRENT ROW keyword

If CURRENT ROW serves as a start point, it specifies that the window starts from the current row or the current value. This depends on whether you have specified ROW or RANGE. In this case, the end point cannot be value_expr PRECEDING. If CURRENT ROW serves as an end point, it specifies that the window ends at the current row or the current value. This depends on whether you have specified ROW or RANGE. In this case, the start point cannot be value_expr FOLLOWING.

value_expr PRECEDING or value_expr FOLLOWING keyword

  • If value_expr FOLLOWING is the start point, the end point must be value_expr FOLLOWING.

  • If value_expr PRECEDING is the end point, the start point must be value_expr PRECEDING.

If you need to define a logical window that is defined by a time interval in the numeric format, you may need to use conversion functions.

If you specify ROWS:

  • value_expr is a physical offset. It must be a constant or an expression and must be calculated as a positive number.

  • If value_expr is part of the start point, it must calculate the part that precedes the start point and the end point as a row.

If you specify RANGE:

  • value_expr is a logical offset. It must be a constant or an expression whose result is a positive numeric value or an interval literal.

  • You can specify only one expression in order_by_clause.

  • If value_expr is a numeric value, ORDER BY expr must be the numeric or DATE data type.

  • If value_expr is an interval value, ORDER BY expr must be the DATE data type. If you completely omit windowing_clause, the default value is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.