All Products
Search
Document Center

Data Lake Analytics - Deprecated:Window functions

Last Updated:Apr 25, 2021

Window functions perform calculations on rows of the query result. Window functions are executed after the HAVING clause but before the ORDER BY clause. To run a window function, you must first use the OVER clause to specify the window. A window has the following three components:

  • A partition specification that separates the input rows into different partitions. This is similar to how the GROUP BY clause separates rows into different groups for aggregate functions.

  • An ordering specification that determines the sequence of the input row processed by the window function.

  • The window frame that specifies the sliding window of rows to be processed by the function for a given row. The RANGE UNBOUNDED PRECEDING is the default value, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The window frame contains all rows from the start of the partition to the last peer of the current row.

For example, the following query ranks orders for each clerk by price:

SELECT orderkey, clerk, totalprice, rank() OVER (PARTITION BY clerk ORDER BY totalprice DESC) AS rnk FROM orders ORDER BY clerk, rnk

Aggregate functions

All aggregate functions can perform window functions by adding the OVER clause. The aggregate function is computed for each row over the total rows within the current row’s window frame.

For example, the following query produces a rolling sum of order prices by date for each clerk:

SELECT clerk, orderdate, orderkey, totalprice, sum(totalprice) OVER (PARTITION BY clerk ORDER BY orderdate) AS rolling_sum FROM orders ORDER BY clerk, orderdate, orderkey

Ranking functions

cume_dist () → bigint

Returns the cumulative distribution of a value among a group of values. Any tie values in the ordering are evaluated to the same distribution value.

dense_rank () → bigint

Returns the rank of a value among a group of values. This function is similar to rank(), except tie values do not create gaps in the sequence.

ntile (n) → bigint

Distributes the rows for each window partition into n buckets ranging from 1 to a maximum of n. The maximum interval difference between two bucket values is 1. If the number of rows in the partition is not distributed evenly into the number of buckets, then the remainder values are distributed one per bucket, starting from the first bucket.

percent_rank () → bigint

Returns the percentage ranking of a value among a group of values. The result is (r - 1) / (n - 1) where r is the rank() of the row and n is the total number of rows in the window partition.

rank () → bigint

Returns the rank of a value within a group of values. The rank is one plus the number of rows before the value that are not peered with the row. The ranking is calculated for each window partition

row_number () → bigint

Returns a unique, sequential number for each row, starting with one, based on the row sequence in the window partition.

Value functions

first_value (x) → [Same as input type]

Returns the first value of the window.

last_value (x) → [Same as input type]

Returns the last value of the window.

nth_value (x, offset) → [Same as input type]

Returns the value at the specified offset from the start of the window. Offsets start at 1. If the offset is null or larger than the number of values in the window, NULL is returned. If the offset is zero or negative, an error is returned.

lead (x[, offset[, default_value]]) → [same as input type]

Returns the offset row value following the current row in the window. Offsets starts at 0, which is the current row. The offset can be any scalar expression. If the offset is null or larger than the window, the default_value is returned. If the offset is not specified, null is returned. The default offset is 1.

lag (x[, offset[, default_value]]) → [same as input type]

Returns the offset row value before the current row in the window. Offsets start at 0, which is the current row. The offset can be any scalar expression. If the offset is null or larger than the window length, the default_value is returned. If the offset is not specified, NULL is returned. The default offset is 1.