Search
Document Center

# Description of window functions

Last Updated: Jun 18, 2021

Analytic functions (also called window functions) and aggregate functions perform aggregate operations on a group of rows (a set of rows). The difference is that aggregate functions return a value (a row) for each group, but window functions return multiple values (multiple rows) for each group. A group of rows is also known as a window and is defined by `analytic_clause`. Window sizes depend on the actual number of rows or a logical interval, such as time.

To trigger an analytic function, you must use a special keyword `OVER` to specify a window. A window consists of three parts:

• Partitioning specifications: used to split input rows into different partitions. This process is similar to the splitting process of the `GROUP BY` clause.

• Sorting specifications: used to determine the order in which input data rows are executed in the window function.

• Window boundary: specifies a window boundary for calculating data. The default value is `RANGE UNBOUNDED PRECEDING`. This boundary contains all data in all the rows that range from the start row to the current row in the current partition.

Analytic functions are the last group of operations that are performed in a query except for the final `ORDER BY` clause. Before window functions are processed, all the `JOIN` operations and `WHERE`, `GROUP BY`, and `HAVING` clauses must be completed. Therefore, window functions can appear in only the select list or the `ORDER BY` clause.

Analytic functions are generally used to calculate cumulative, moving, centered, and reporting aggregates.

## Syntax

analytic_function

``analytic_function([ arguments ]) OVER (analytic_clause)``

analytic_clause

``[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]``

query_partition_clause

``PARTITION BY { expr[, expr ]... | ( expr[, expr ]... ) }``

order_by_clause

``ORDER [ SIBLINGS ] BY{ expr | position | c_alias } [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ] [, { expr | position | c_alias } [ ASC | DESC ][ NULLS FIRST | NULLS LAST ]]...``

windowing_clause

``{ ROWS | RANGE } { BETWEEN { UNBOUNDED PRECEDING | CURRENT ROW | value_expr { PRECEDING | FOLLOWING } } AND{ UNBOUNDED FOLLOWING | CURRENT ROW | value_expr { PRECEDING | FOLLOWING } } | { UNBOUNDED PRECEDING | CURRENT ROW| value_expr PRECEDING}}``

The following sections describe the semantics of the syntax.

## analytic_function

analytic_function specifies the name of an analytic function.

## arguments

The parameters (arguments). Analytic functions use 0 to 3 parameters. The parameters can be numeric data types or non-numeric data types that can be implicitly converted to numeric data types. ApsaraDB for OceanBase determines the parameter that has the highest numeric precedence based on the precedence of data types. Then, it implicitly converts the remaining parameters to the data type of the parameter that has the highest numeric precedence. The return type is also the data type of the parameter that has the highest numeric precedence, unless otherwise noted for a single function.

## analytic_clause

The analytic clause (analytic_clause). Use `OVER analytic_clause` to indicate that the function performs operations on a query result set. This clause is calculated after the `FROM`, `WHERE`, `GROUP BY`, and `HAVING` clauses. You can use this clause to specify analytic functions in the select list or the ORDER BY clause. If you need to filter the results of a query based on an analytic function, nest these functions in the parent query and filter the results of the nested subquery.

Notice

• You cannot nest analytic functions by specifying analytic functions in analytic_clause. However, you can specify an analytic function in a subquery and calculate another analytic function over the subquery.

• You can use user-defined analytic functions and built-in analytic functions to specify `analytic_clause`.

## query_partition_clause

The partitioning clause (query_partition_clause). Use the `PARTITION BY` clause to partition a query result set to groups based on one or more `value_expr`. If you omit this clause, the function considers all the rows in the query result set as a single group.

You can specify multiple analytic functions in the same query. Each function has the same or different PARTITION BY keys. If you use `query_partition_clause` to specify an analytic function and the queried objects have the parallel attribute, the function calculations are also parallelized.

Valid values of `value_expr` are constants, columns, non-analytic functions, function expressions, or expressions that involve one of them.

## order_by_clause

Use the sorting clause `order_by_clause` to specify how data is sorted in a partition. For all the analytic functions, you can sort values in a partition on multiple keys. Each key is defined by `value_expr` and is qualified by a sorting sequence.

In each function, you can specify multiple sorting expressions. This is especially useful when you use the functions that sort values.

When `order_by_clause` generates the identical values for multiple rows, the function has the following behavior:

• `CUME_DIST`, `DENSE_RANK`, `NTILE`, `PERCENT_RANK`, and `RANK` return the same result for each row.

• `ROW_NUMBER` assigns a distinct value to each row even if a value that is based on `order_by_clause` is available. The value is based on the order in which the row is processed. This order may be nondeterministic if `ORDER BY` cannot implement a total sorting.

• For other analytic functions, the result depends on the window rules. If you specify a logical window that has the `RANGE` keyword, the function returns the same result for each row. If you use the `ROWS` keyword to specify a physical window, the result is nondeterministic.

### Limits on the ORDER BY clause

The `ORDER BY` clause is subject to the following limits:

• In analytic functions, `order_by_clause` must use an expression (expr). The `SIBLINGS` keyword is invalid. This keyword is relevant in only hierarchical queries. Position (position) and column aliases (c_alias) are also invalid. Otherwise, this `order_by_clause` clause is the same as the sorting command of an overall query or a subquery.

• An analytic function that uses the RANGE keyword can use multiple sort keys in the `ORDER BY` clause of this function. You must specify the following windows:

• `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`, `RANGE UNBOUNDED PRECEDING` for short

• `RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING`

• `RANGE BETWEEN CURRENT ROW AND CURRENT ROW`

• `RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`

Window boundaries other than the preceding four windows can have only one sort key in the `ORDER BY` clause of the analytic function. This limit does not apply to the window boundaries that are specified by the `ROW` keyword.

### ASC or DESC keyword

Specify the sorting sequence. `ASC` refers to the ascending order and `DESC` refers to the descending order. The default value is the ascending order (ASC).

### NULLS FIRST or NULLS LAST keyword

`nulls first` and `nulls last` in `order_by_clause`. `nulls first` indicates that NULL values are processed as minimum values during sorting. `nulls last` indicates that NULL values are processed as maximum values during sorting.

## windowing_clause

The window function clause (windowing_clause). You can use `windowing_clause` in some analytic functions. The following keywords are relevant:

### ROWS or RANGE keyword

These keywords define a window for calculating the function result for each row. Then, the function is applied to all the rows in the window. The window moves through a query result set or a partition from top to bottom. A window is also called `FRAME`. ApsaraDB for OceanBase supports the following window statements:

• `ROWS`: specifies the window in physical units (rows).

• `RANGE`: specifies the window as a logical offset. The default method is `RANGE UNBOUNDED PRECEDING`. You can use window functions in analytic functions. To use `windowing_clause`, you must add `order_by_clause`.

If the window boundaries are defined by the `RANGE` clause in `windowing_clause`, you can specify only one expression in `order_by_clause`. For more information, see the limits on the ORDER BY clause. The value returned by an analytic function that has a logical offset is always deterministic. However, the value returned by an analytic function that has a physical offset may generate nondeterministic results. An analytic function that has a physical offset can return a deterministic value only when the sorting expression returns a unique sorting. Therefore, you must specify multiple columns in `order_by_clause` to implement the unique sorting.

### BETWEEN ... AND keyword

Use the `BETWEEN ... AND` clause to specify the start point and end point of the window. The first expression (before AND) defines the start point. The second expression (after AND) defines the end point. If you omit `BETWEEN` and specify only one end point, ApsaraDB for OceanBase considers this point as the start point and the current row as the default end point.

### UNBOUNDED PRECEDING keyword

`UNBOUNDED PRECEDING` indicates that the window starts at the first row of the partition. This is the start point instead of the end point.

### UNBOUNDED FOLLOWING keyword

`UNBOUNDED FOLLOWING` indicates that the window ends at the last row of the partition. This is the end point instead of the start point.

### CURRENT ROW keyword

If `CURRENT ROW` serves as a start point, it specifies that the window starts from the current row or the current value. This depends on whether you have specified `ROW` or `RANGE`. In this case, the end point cannot be `value_expr PRECEDING`. If `CURRENT ROW` serves as an end point, it specifies that the window ends at the current row or the current value. This depends on whether you have specified `ROW` or `RANGE`. In this case, the start point cannot be `value_expr FOLLOWING`.

### value_expr PRECEDING or value_expr FOLLOWING keyword

• If `value_expr FOLLOWING` is the start point, the end point must be `value_expr FOLLOWING`.

• If `value_expr PRECEDING` is the end point, the start point must be `value_expr PRECEDING`.

If you need to define a logical window that is defined by a time interval in the numeric format, you may need to use conversion functions.

If you specify `ROWS`:

• `value_expr` is a physical offset. It must be a constant or an expression and must be calculated as a positive number.

• If `value_expr` is part of the start point, it must calculate the part that precedes the start point and the end point as a row.

If you specify `RANGE`:

• `value_expr` is a logical offset. It must be a constant or an expression whose result is a positive numeric value or an interval literal.

• You can specify only one expression in `order_by_clause`.

• If `value_expr` is a numeric value, `ORDER BY expr` must be the numeric or `DATE` data type.

• If `value_expr` is an interval value, `ORDER BY expr` must be the `DATE` data type. If you completely omit `windowing_clause`, the default value is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`.