All Products
Search
Document Center

MaxCompute:Window functions

Last Updated:Dec 31, 2025

Window functions perform aggregation or other calculations on a dynamically defined subset of data. They are often used for tasks such as processing time series data, ranking, and calculating moving averages. This topic describes the command syntax, parameters, and examples of the window functions that MaxCompute SQL supports.

Applicability

  • Window functions can appear only in SELECT statements.

  • A window function cannot be nested with other window functions or aggregate functions.

  • Window functions cannot be used with aggregate functions at the same level.

Index

MaxCompute SQL supports the following window functions.

Function

Features

AVG

Calculates the average value of data in a window.

CLUSTER_SAMPLE

Performs random sampling. Returns true if the row is sampled.

COUNT

Counts the number of records in a window.

CUME_DIST

Calculates the cumulative distribution.

DENSE_RANK

Calculates the rank. The ranks are consecutive.

FIRST_VALUE

Returns the value of the first row in the window frame of the current row.

LAG

Returns the value from the Nth row before the current row in a partition.

LAST_VALUE

Returns the value of the last row in the window frame of the current row.

LEAD

Returns the value from the Nth row after the current row in a partition.

MAX

Calculates the maximum value in a window.

MEDIAN

Calculates the median value in a window.

MIN

Calculates the minimum value in a window.

NTILE

Divides ordered data into N equal-sized groups and returns the group number (from 1 to N) for each row.

NTH_VALUE

Returns the value of the Nth row in the window frame of the current row.

PERCENT_RANK

Calculates the rank as a percentage.

PERCENTILE_CONT

Calculates the exact percentile.

PERCENTILE_DISC

Calculates a given percentile value by sorting the specified column in ascending order.

RANK

Calculates the rank. The ranks may not be consecutive.

ROW_NUMBER

Calculates the row number, starting from 1.

STDDEV

Calculates the population standard deviation. It is an alias for STDDEV_POP.

STDDEV_SAMP

Calculates the sample standard deviation.

SUM

Calculates the sum of data in a window.

Window function syntax

The syntax for a window function is as follows.

<function_name>([distinct][<expression> [, ...]]) over (<window_definition>)
<function_name>([distinct][<expression> [, ...]]) over <window_name>
  • function_name: A built-in window function, an aggregate function, or a user-defined aggregate function (UDAF).

  • expression: The format of the function, which must conform to the function's syntax.

  • windowing_definition: The window definition. For more information about the syntax, see windowing_definition.

  • window_name: The name of the window. You can use the window keyword to define a custom window and assign a name to the windowing_definition. The syntax for a named window definition (named_window_def) is as follows:

    window <window_name> as (<window_definition>)

    The following describes the positions of custom statements in SQL:

    select ... from ... [where ...] [group by ...] [having ...] named_window_def [order by ...] [limit ...]

windowing_definition

The syntax for windowing_definition is as follows.

--partition_clause:
[partition by <expression> [, ...]]
--orderby_clause:
[order by <expression> [asc|desc][nulls {first|last}] [, ...]]
[<frame_clause>]

When you add a window function to a SELECT statement, the data is partitioned and sorted based on the partition by and order by clauses in the window definition. If you do not specify a partition by clause, all data is treated as a single partition. If you do not specify an order by clause, the order of data within a partition is not guaranteed. For each row, referred to as the current row, a segment of data is extracted from the partition based on the frame_clause to form the window for that row. The window function then computes a result for the current row based on the data in its window.

  • partition by <expression> [, ...]: Optional. Specifies the partition. Rows with the same partition key column values are in the same partition. For more information about the format, see Table operations.

  • order by <expression> [asc|desc][nulls {first|last}] [, ...]: Optional. Specifies how data is sorted within a partition.

    Note

    If rows have the same order by values, the sort order is not guaranteed. To ensure a consistent order, make sure that the order by values are as unique as possible.

  • frame_clause: Optional. Defines the boundaries of the window. For more information about frame_clause, see frame_clause.

filter_clause

The syntax for filter_clause is as follows.

FILTER (WHERE filter_condition)

filter_condition is a Boolean expression, which is used in the same way as the WHERE clause in a select ... from ... where statement.

If you provide a FILTER clause, only rows for which the filter_condition evaluates to true are included in the window frame. For aggregate window functions (such as COUNT, SUM, AVG, MAX, and MIN), a value is still returned for every row. However, rows where the FILTER expression does not evaluate to true (such as NULL or false) are not included in the window frame for each row's calculation. NULL is treated as false.

Example

  • Prepare data

    -- Create a table.
    CREATE TABLE IF NOT EXISTS mf_window_fun(key BIGINT,value BIGINT) STORED AS ALIORC;
    
    -- Insert data.
    insert into mf_window_fun values (1,100),(2,200),(1,150),(2,250),(3,300),(4,400),(5,500),(6,600),(7,700);
    
    -- Query data from the mf_window_fun table.
    select * from mf_window_fun;
    
    -- The following result is returned:
    +------------+------------+
    | key        | value      |
    +------------+------------+
    | 1          | 100        |
    | 2          | 200        |
    | 1          | 150        |
    | 2          | 250        |
    | 3          | 300        |
    | 4          | 400        |
    | 5          | 500        |
    | 6          | 600        |
    | 7          | 700        |
    +------------+------------+
  • Query the cumulative sum of rows whose value is greater than 100 within the window.

    select key,sum(value) filter(where value > 100) 
           over (partition by key order by key)  
           from mf_window_fun;

    The following result is returned:

    +------------+------------+
    | key        | _c1        |
    +------------+------------+
    | 1          | NULL       | -- Skipped
    | 1          | 150        |
    | 2          | 200        |
    | 2          | 450        |
    | 3          | 300        |
    | 4          | 400        |
    | 5          | 500        |
    | 6          | 600        |
    | 7          | 700        |
    +------------+------------+
Note
  • The FILTER clause does not remove rows that fail the filter_condition from the query result. It only excludes them from the window function's calculation. To remove these rows from the final output, you must use a select ... from ... where clause. The window function's value for an excluded row is not 0 or NULL. Instead, it inherits the value from the preceding row.

  • You can use the FILTER clause only with aggregate window functions, such as COUNT, SUM, AVG, MAX, MIN, and WM_CONCAT. You cannot use the FILTER clause with non-aggregate functions such as RANK, ROW_NUMBER, or NTILE. Otherwise, a syntax error occurs.

  • To use the FILTER syntax in a window function, you must enable the following session flag: set odps.sql.window.function.newimpl=true;.

frame_clause

The syntax for frame_clause is as follows.

-- Format 1
{ROWS|RANGE|GROUPS} <frame_start> [<frame_exclusion>]
-- Format 2
{ROWS|RANGE|GROUPS} between <frame_start> and <frame_end> [<frame_exclusion>]

frame_clause is a closed interval that defines the boundaries of the window. It includes the rows at the frame_start and frame_end positions.

  • ROWS|RANGE|GROUPS: Required. The type of frame_clause. The implementation rules for frame_start and frame_end vary by type.

    • ROWS: Defines window boundaries based on the number of rows.

    • RANGE: Defines window boundaries by comparing the values of the order by column. Typically, an order by clause is specified in the window definition. If no order by clause is specified, all rows in a partition have the same order by column value. NULL values are considered equal.

    • GROUPS: All rows in a partition with the same order by column value form a GROUP. If no order by clause is specified, all rows in the partition form a single GROUP. NULL values are considered equal.

  • frame_start and frame_end: Specify the start and end boundaries of the window. frame_start is required. frame_end is optional. If omitted, the default value is CURRENT ROW.

    The position specified by frame_start must precede the position specified by frame_end, or match the position of frame_end. In other words, frame_start is closer to the head of the partition than frame_end. The head of the partition is the position of the first row after the data is sorted by the order by statement in the window definition. The following table describes the valid values and logic for frame_start and frame_end when the frame_clause type is ROWS, RANGE, or GROUPS.

    frame_clause type

    frame_start/frame_end value

    Description

    ROWS, RANGE, GROUPS

    UNBOUNDED PRECEDING

    The first row of the partition. Counting starts from 1.

    UNBOUNDED FOLLOWING

    The last row of the partition.

    ROWS

    CURRENT ROW

    The position of the current row. Each row of data corresponds to a window function result. The current row is the row for which the window function result is being calculated.

    offset PRECEDING

    The position that is offset rows before the current row, towards the head of the partition. For example, 0 PRECEDING refers to the current row, and 1 PRECEDING refers to the preceding row. offset must be a non-negative integer.

    offset FOLLOWING

    The position that is offset rows after the current row, towards the tail of the partition. For example, 0 FOLLOWING refers to the current row, and 1 FOLLOWING refers to the next row. offset must be a non-negative integer.

    RANGE

    CURRENT ROW

    • As frame_start, it refers to the position of the first row that has the same order by column value as the current row.

    • As frame_end, it refers to the position of the last row that has the same order by column value as the current row.

    offset PRECEDING

    The positions of frame_start and frame_end depend on the order by sequence. Assume the window is sorted by X. Xi represents the X value of the i-th row, and Xc represents the X value of the current row. The positions are described as follows:

    • When order by is ascending:

      • frame_start: The position of the first row that satisfies Xc - Xi <= offset.

      • frame_end: The position of the last row that satisfies Xc - Xi >= offset.

    • When order by is descending:

      • frame_start: The position of the first row that satisfies Xi - Xc <= offset.

      • frame_end: The position of the last row that satisfies Xi - Xc >= offset.

    The supported data types for the order by column are: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATETIME, DATE, and TIMESTAMP.

    The syntax for the offset of date types is as follows:

    • N: Represents N days or N seconds. It must be a non-negative integer. For DATETIME and TIMESTAMP, it represents N seconds. For DATE, it represents N days.

    • interval 'N' {YEAR\MONTH\DAY\HOUR\MINUTE\SECOND}: Represents N years, months, days, hours, minutes, or seconds. For example, INTERVAL '3' YEAR represents 3 years.

    • INTERVAL 'N-M' YEAR TO MONTH: Represents N years and M months. For example, INTERVAL '1-3' YEAR TO MONTH represents 1 year and 3 months.

    • INTERVAL 'D[ H[:M[:S[:N]]]]' DAY TO SECOND: Represents D days, H hours, M minutes, S seconds, and N nanoseconds. For example, INTERVAL '1 2:3:4:5' DAY TO SECOND represents 1 day, 2 hours, 3 minutes, 4 seconds, and 5 nanoseconds.

    offset FOLLOWING

    The positions of frame_start and frame_end depend on the order by sequence. Assume the window is sorted by X. Xi represents the X value of the i-th row, and Xc represents the X value of the current row. The positions are described as follows:

    • When order by is ascending:

      • frame_start: The position of the first row that satisfies Xi - Xc >= offset.

      • frame_end: The position of the last row that satisfies Xi - Xc <= offset.

    • When order by is descending:

      • frame_start: The position of the first row that satisfies Xc - Xi >= offset.

      • frame_end: The position of the last row that satisfies Xc - Xi <= offset.

    GROUPS

    CURRENT ROW

    • As frame_start, it refers to the first row of the GROUP to which the current row belongs.

    • As frame_end, it refers to the last row of the GROUP to which the current row belongs.

    offset PRECEDING

    • As frame_start, it refers to the position of the first row in the GROUP that is offset GROUPs before the current row's GROUP, towards the head of the partition.

    • As frame_end, it refers to the position of the last row in the GROUP that is offset GROUPs before the current row's GROUP, towards the head of the partition.

    Note

    You cannot set frame_start to UNBOUNDED FOLLOWING or frame_end to UNBOUNDED PRECEDING.

    offset FOLLOWING

    • As frame_start, it refers to the position of the first row in the GROUP that is offset GROUPs after the current row's GROUP, towards the tail of the partition.

    • As frame_end, it refers to the position of the last row in the GROUP that is offset GROUPs after the current row's GROUP, towards the tail of the partition.

    Note

    You cannot set frame_start to UNBOUNDED FOLLOWING or frame_end to UNBOUNDED PRECEDING.

  • frame_exclusion: Optional. Used to exclude a portion of data from the window. Valid values are:

    • EXCLUDE NO OTHERS: Does not exclude any data.

    • EXCLUDE CURRENT ROW: Excludes the current row.

    • EXCLUDE GROUP: Excludes the entire GROUP, which means all data in the partition that has the same order by value as the current row.

    • EXCLUDE TIES: Excludes all rows that share the same order by value as the current row, except for the current row itself.

Default frame_clause

If you do not specify a frame_clause, MaxCompute uses a default frame_clause to determine the boundaries of the data included in the window. The default frame_clause is:

  • When Hive compatible mode is enabled (set odps.sql.hive.compatible=true;), the default frame_clause is as follows, which is the same as in most other SQL systems.

    RANGE between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS
  • When Hive compatible mode is disabled (set odps.sql.hive.compatible=false;), if an order by clause is specified and the window function is AVG, COUNT, MAX, MIN, STDDEV, STDDEV_POP, STDDEV_SAMP, or SUM, the default frame_clause is of the ROWS type.

    ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS

Window boundary examples

Assume a table `tbl` has the structure pid: bigint, oid: bigint, rid: bigint and contains the following data:

+------------+------------+------------+
| pid        | oid        | rid        |
+------------+------------+------------+
| 1          | NULL       | 1          |
| 1          | NULL       | 2          |
| 1          | 1          | 3          |
| 1          | 1          | 4          |
| 1          | 2          | 5          |
| 1          | 4          | 6          |
| 1          | 7          | 7          |
| 1          | 11         | 8          |
| 2          | NULL       | 9          |
| 2          | NULL       | 10         |
+------------+------------+------------+
  • ROW type window

    • Window definition 1

      partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW
      --SQL statement is as follows.
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [1]    |
      | 1          | NULL       | 2          | [1, 2] |
      | 1          | 1          | 3          | [1, 2, 3] |
      | 1          | 1          | 4          | [1, 2, 3, 4] |
      | 1          | 2          | 5          | [1, 2, 3, 4, 5] |
      | 1          | 4          | 6          | [1, 2, 3, 4, 5, 6] |
      | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7] |
      | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 2          | NULL       | 9          | [9]    |
      | 2          | NULL       | 10         | [9, 10] |
      +------------+------------+------------+--------+
    • Window definition 2

      partition by pid order by oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING
      --SQL statement is as follows.
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | NULL       | 2          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 1          | 3          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 1          | 4          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 2          | 5          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 4          | 6          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 2          | NULL       | 9          | [9, 10] |
      | 2          | NULL       | 10         | [9, 10] |
      +------------+------------+------------+--------+
    • Window definition 3

      partition by pid order by oid ROWS between 1 FOLLOWING and 3 FOLLOWING
      --SQL statement is as follows.
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid ROWS between 1 FOLLOWING and 3 FOLLOWING) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [2, 3, 4] |
      | 1          | NULL       | 2          | [3, 4, 5] |
      | 1          | 1          | 3          | [4, 5, 6] |
      | 1          | 1          | 4          | [5, 6, 7] |
      | 1          | 2          | 5          | [6, 7, 8] |
      | 1          | 4          | 6          | [7, 8] |
      | 1          | 7          | 7          | [8]    |
      | 1          | 11         | 8          | NULL   |
      | 2          | NULL       | 9          | [10]   |
      | 2          | NULL       | 10         | NULL   |
      +------------+------------+------------+--------+
    • Window definition 4

      partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW
      --SQL statement is as follows.
      select pid, 
      oid, 
      rid, 
      collect_list(rid) over(partition by pid order by 
      oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | NULL   |
      | 1          | NULL       | 2          | [1]    |
      | 1          | 1          | 3          | [1, 2] |
      | 1          | 1          | 4          | [1, 2, 3] |
      | 1          | 2          | 5          | [1, 2, 3, 4] |
      | 1          | 4          | 6          | [1, 2, 3, 4, 5] |
      | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6] |
      | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7] |
      | 2          | NULL       | 9          | NULL   |
      | 2          | NULL       | 10         | [9]    |
      +------------+------------+------------+--------+
    • Window definition 5

      partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP
      --SQL statement is as follows.
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | NULL   |
      | 1          | NULL       | 2          | NULL   |
      | 1          | 1          | 3          | [1, 2] |
      | 1          | 1          | 4          | [1, 2] |
      | 1          | 2          | 5          | [1, 2, 3, 4] |
      | 1          | 4          | 6          | [1, 2, 3, 4, 5] |
      | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6] |
      | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7] |
      | 2          | NULL       | 9          | NULL   |
      | 2          | NULL       | 10         | NULL   |
      +------------+------------+------------+--------+
    • Window definition 6

      partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES
      --SQL statement is as follows.
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES) as window from tbl;                            

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [1]    |
      | 1          | NULL       | 2          | [2]    |
      | 1          | 1          | 3          | [1, 2, 3] |
      | 1          | 1          | 4          | [1, 2, 4] |
      | 1          | 2          | 5          | [1, 2, 3, 4, 5] |
      | 1          | 4          | 6          | [1, 2, 3, 4, 5, 6] |
      | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7] |
      | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 2          | NULL       | 9          | [9]    |
      | 2          | NULL       | 10         | [10]   |
      +------------+------------+------------+--------+

      By comparing the window results for rows where rid is 2, 4, and 10 in this example and the previous one, you can observe the difference between EXCLUDE CURRENT ROW and EXCLUDE GROUP. For EXCLUDE GROUP, in the same partition (where pid is equal), all data with the same oid as the current row is excluded.

  • RANGE type window

    • Window definition 1

      partition by pid order by oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW
      --SQL statement is as follows.
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [1, 2] |
      | 1          | NULL       | 2          | [1, 2] |
      | 1          | 1          | 3          | [1, 2, 3, 4] |
      | 1          | 1          | 4          | [1, 2, 3, 4] |
      | 1          | 2          | 5          | [1, 2, 3, 4, 5] |
      | 1          | 4          | 6          | [1, 2, 3, 4, 5, 6] |
      | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7] |
      | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 2          | NULL       | 9          | [9, 10] |
      | 2          | NULL       | 10         | [9, 10] |
      +------------+------------+------------+--------+

      When CURRENT ROW is used as frame_end, it includes all rows up to the last row that has the same order by value oid as the current row. Therefore, the window result for the record where rid is 1 is [1, 2].

    • Window definition 2

      partition by pid order by oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING
      --SQL statement is as follows.
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | NULL       | 2          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 1          | 3          | [3, 4, 5, 6, 7, 8] |
      | 1          | 1          | 4          | [3, 4, 5, 6, 7, 8] |
      | 1          | 2          | 5          | [5, 6, 7, 8] |
      | 1          | 4          | 6          | [6, 7, 8] |
      | 1          | 7          | 7          | [7, 8] |
      | 1          | 11         | 8          | [8]    |
      | 2          | NULL       | 9          | [9, 10] |
      | 2          | NULL       | 10         | [9, 10] |
      +------------+------------+------------+--------+
    • Window definition 3

      partition by pid order by oid RANGE between 3 PRECEDING and 1 PRECEDING
      --SQL statement is as follows.
      
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid RANGE between 3 PRECEDING and 1 PRECEDING) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [1, 2] |
      | 1          | NULL       | 2          | [1, 2] |
      | 1          | 1          | 3          | NULL   |
      | 1          | 1          | 4          | NULL   |
      | 1          | 2          | 5          | [3, 4] |
      | 1          | 4          | 6          | [3, 4, 5] |
      | 1          | 7          | 7          | [6]    |
      | 1          | 11         | 8          | NULL   |
      | 2          | NULL       | 9          | [9, 10] |
      | 2          | NULL       | 10         | [9, 10] |
      +------------+------------+------------+--------+

      For rows where the order by value oid is NULL, if you use offset {PRECEDING|FOLLOWING} and offset is not UNBOUNDED, the boundary is determined as follows: When used as frame_start, it points to the first row with a NULL order by value in the partition. When used as frame_end, it points to the last row with a NULL order by value.

  • GROUPS type window

    The window definition is as follows:

    partition by pid order by oid GROUPS between 2 PRECEDING and CURRENT ROW
    --SQL statement is as follows.
    select pid, 
           oid, 
           rid, 
    collect_list(rid) over(partition by pid order by 
    oid GROUPS between 2 PRECEDING and CURRENT ROW) as window from tbl;

    The following result is returned:

    +------------+------------+------------+--------+
    | pid        | oid        | rid        | window |
    +------------+------------+------------+--------+
    | 1          | NULL       | 1          | [1, 2] |
    | 1          | NULL       | 2          | [1, 2] |
    | 1          | 1          | 3          | [1, 2, 3, 4] |
    | 1          | 1          | 4          | [1, 2, 3, 4] |
    | 1          | 2          | 5          | [1, 2, 3, 4, 5] |
    | 1          | 4          | 6          | [3, 4, 5, 6] |
    | 1          | 7          | 7          | [5, 6, 7] |
    | 1          | 11         | 8          | [6, 7, 8] |
    | 2          | NULL       | 9          | [9, 10] |
    | 2          | NULL       | 10         | [9, 10] |
    +------------+------------+------------+--------+

Sample data

To help you understand how to use each function, this topic provides source data and function examples based on that data. The following example commands create a table named emp and add data to it:

create table if not exists emp
   (empno bigint,
    ename string,
    job string,
    mgr bigint,
    hiredate datetime,
    sal bigint,
    comm bigint,
    deptno bigint);
tunnel upload emp.txt emp;

The emp.txt file contains the following data:

7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

AVG

  • Syntax

    double avg([distinct] double <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
    decimal avg([distinct] decimal <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the average value of expr in a window.

  • Parameters

    • expr: Required. The expression for which to calculate the result. It must be of the DOUBLE or DECIMAL type.

      • If the input value is of the STRING or BIGINT type, it is implicitly converted to the DOUBLE type for calculation. An error is returned for other data types.

      • If the input value is NULL, the row is not included in the calculation.

      • If you specify the distinct keyword, the function calculates the average of unique values.

    • partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.

  • Return value

    If expr is of the DECIMAL type, a DECIMAL value is returned. Otherwise, a DOUBLE value is returned. If all values of expr are NULL, NULL is returned.

  • Examples

    • Example 1: Partition by department (deptno) and calculate the average salary (sal) without sorting. The function calculates the average salary for the entire partition (all rows with the same deptno). The command is as follows:

      select deptno, sal, avg(sal) over (partition by deptno) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 2916.6666666666665 |   -- This is the first row of the window. The value is the cumulative average from the first to the sixth row.
      | 10         | 2450       | 2916.6666666666665 |   -- The value is the cumulative average from the first to the sixth row.
      | 10         | 5000       | 2916.6666666666665 |   -- The value is the cumulative average from the first to the sixth row.
      | 10         | 1300       | 2916.6666666666665 |
      | 10         | 5000       | 2916.6666666666665 |
      | 10         | 2450       | 2916.6666666666665 |
      | 20         | 3000       | 2175.0     |
      | 20         | 3000       | 2175.0     |
      | 20         | 800        | 2175.0     |
      | 20         | 1100       | 2175.0     |
      | 20         | 2975       | 2175.0     |
      | 30         | 1500       | 1566.6666666666667 |
      | 30         | 950        | 1566.6666666666667 |
      | 30         | 1600       | 1566.6666666666667 |
      | 30         | 1250       | 1566.6666666666667 |
      | 30         | 1250       | 1566.6666666666667 |
      | 30         | 2850       | 1566.6666666666667 |
      +------------+------------+------------+
    • Example 2: In non-Hive compatible mode, partition by department (deptno), sort by salary (sal), and calculate the average salary. The function calculates a running average from the first row of the partition to the current row. The commands are as follows:

      -- Disable Hive compatible mode.
      set odps.sql.hive.compatible=false;
      -- Execute the following SQL command.
      select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300.0     |           -- First row of the window.
      | 10         | 1300       | 1300.0     |           -- Cumulative average from the first to the second row.
      | 10         | 2450       | 1683.3333333333333 |   -- Cumulative average from the first to the third row.
      | 10         | 2450       | 1875.0     |           -- Cumulative average from the first to the fourth row.
      | 10         | 5000       | 2500.0     |           -- Cumulative average from the first to the fifth row.
      | 10         | 5000       | 2916.6666666666665 |   -- Cumulative average from the first to the sixth row.
      | 20         | 800        | 800.0      |
      | 20         | 1100       | 950.0      |
      | 20         | 2975       | 1625.0     |
      | 20         | 3000       | 1968.75    |
      | 20         | 3000       | 2175.0     |
      | 30         | 950        | 950.0      |
      | 30         | 1250       | 1100.0     |
      | 30         | 1250       | 1150.0     |
      | 30         | 1500       | 1237.5     |
      | 30         | 1600       | 1310.0     |
      | 30         | 2850       | 1566.6666666666667 |
      +------------+------------+------------+
    • Example 3: In Hive compatible mode, partition by department (deptno), sort by salary (sal), and calculate the average salary. The function calculates a running average from the first row of the partition to the last peer of the current row (rows with the same sal have the same average). The commands are as follows:

      -- Enable Hive compatible mode.
      set odps.sql.hive.compatible=true;
      -- Execute the following SQL command.
      select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300.0     |          -- First row of the window. Since the sal of the first and second rows are the same, the average for the first row is the cumulative average of the first two rows.
      | 10         | 1300       | 1300.0     |          -- Cumulative average from the first to the second row.
      | 10         | 2450       | 1875.0     |          -- Since the sal of the third and fourth rows are the same, the average for the third row is the cumulative average of the first four rows.
      | 10         | 2450       | 1875.0     |          -- Cumulative average from the first to the fourth row.
      | 10         | 5000       | 2916.6666666666665 |
      | 10         | 5000       | 2916.6666666666665 |
      | 20         | 800        | 800.0      |
      | 20         | 1100       | 950.0      |
      | 20         | 2975       | 1625.0     |
      | 20         | 3000       | 2175.0     |
      | 20         | 3000       | 2175.0     |
      | 30         | 950        | 950.0      |
      | 30         | 1250       | 1150.0     |
      | 30         | 1250       | 1150.0     |
      | 30         | 1500       | 1237.5     |
      | 30         | 1600       | 1310.0     |
      | 30         | 2850       | 1566.6666666666667 |
      +------------+------------+------------+

CLUSTER_SAMPLE

  • Syntax

    boolean cluster_sample(bigint <N>) OVER ([partition_clause])
    boolean cluster_sample(bigint <N>, bigint <M>) OVER ([partition_clause])
  • Description

    • cluster_sample(bigint <N>): Randomly samples N rows from the partition.

    • cluster_sample(bigint <N>, bigint <M>): Randomly samples a fraction of rows (M/N) from the partition. The number of rows sampled is approximately partition_row_count × M / N, where partition_row_count is the number of rows in the partition.

  • Parameters

    • N: Required. A BIGINT constant. If N is NULL, the return value is NULL.

    • M: Required. A BIGINT constant. If M is NULL, the return value is NULL.

    • partition_clause: Optional. For more information, see windowing_definition.

  • Return value

    Returns a BOOLEAN value.

  • Example

    To sample approximately 20% of the rows from each group, use the following command:

    select deptno, sal
        from (
            select deptno, sal, cluster_sample(5, 1) over (partition by deptno) as flag
            from emp
            ) sub
        where flag = true;

    The following result is returned:

    +------------+------------+
    | deptno     | sal        |
    +------------+------------+
    | 10         | 1300       |
    | 20         | 3000       |
    | 30         | 950        |
    +------------+------------+

COUNT

Syntax

-- Calculate the number of records.
BIGINT COUNT([DISTINCT|ALL] <colname>)

-- Calculate the number of records in a window.
BIGINT COUNT(*) OVER ([partition_clause] [orderby_clause] [frame_clause])
BIGINT COUNT([DISTINCT] <expr>[,...]) OVER ([partition_clause] [orderby_clause] [frame_clause])

Parameters

  • DISTINCT|ALL: Optional. Specifies whether to remove duplicate records before counting. The default value is ALL, which counts all records. If you specify DISTINCT, the function counts only unique records.

  • colname: Required. The column whose values you want to count. This parameter can be of any data type. You can specify * for colname, as in COUNT(*). In this case, the function returns the total number of rows. Rows where the colname value is NULL are not included in the calculation.

  • expr: Required. An expression. This parameter can be of any data type. Rows where the expression evaluates to NULL are not included in the calculation. If you specify the DISTINCT keyword, the function returns the count of unique values.

    COUNT([DISTINCT] <expr>[,...]): Counts the number of rows in the specified window where the values of all specified expressions are not NULL. If you specify the DISTINCT keyword, the function counts the rows after removing duplicate rows.

  • partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.

Return value

The function returns a BIGINT value. Rows where colname is NULL are not included in the calculation.

Examples

Prepare test data

If you already have data, you can skip this step.

  1. Download the test data test_data.txt.

  2. Create a test table.

    CREATE TABLE IF NOT EXISTS emp(
      empno BIGINT,
      ename STRING,
      job STRING,
      mgr BIGINT,
      hiredate DATETIME,
      sal BIGINT,
      comm BIGINT,
      deptno BIGINT
    );
  3. Load the data.

    Replace FILE_PATH with the actual path and name of your data file.

    TUNNEL UPLOAD FILE_PATH emp;   

Example 1: Specify a column for the window and return the cumulative count without sorting

This example partitions the window by the salary (sal) column. The data is not sorted. The function returns the cumulative count of all rows in the current window. The window contains all rows that have the same sal value.

  • Sample command

    SELECT sal, COUNT(sal) OVER (PARTITION BY sal) AS count FROM emp;
  • Result

    +------------+------------+
    | sal        | count      | 
    +------------+------------+
    | 800        | 1          | 
    | 950        | 1          | 
    | 1100       | 1          | 
    | 1250       | 2          |  -- The partition contains two rows where sal is 1250. The function returns 2 for both rows.
    | 1250       | 2          |  -- The count is also 2 for the second row in the partition.
    | 1300       | 2          | 
    | 1300       | 2          | 
    | 1500       | 1          | 
    | 1600       | 1          | 
    | 2450       | 2          | 
    | 2450       | 2          | 
    | 2850       | 1          | 
    | 2975       | 1          | 
    | 3000       | 2          | 
    | 3000       | 2          | 
    | 5000       | 2          | 
    | 5000       | 2          | 
    +------------+------------+

Example 2: In non-Hive compatible mode, specify a column for the window and return the cumulative count after sorting

In non-Hive compatible mode, this example partitions the window by the salary (sal) column and sorts the data. The function returns the cumulative count from the first row to the current row in the current window. The window contains all rows that have the same sal value.

  • Sample command

    -- Disable Hive compatible mode.
    SET odps.sql.hive.compatible=false;
    
    SELECT sal, COUNT(sal) OVER (PARTITION BY sal ORDER BY sal) AS count FROM emp;
  • Return value

    +------------+------------+
    | sal        | count      |
    +------------+------------+
    | 800        | 1          |
    | 950        | 1          |
    | 1100       | 1          |
    | 1250       | 1          |   -- The window partition starts. The running count for the first row is 1.
    | 1250       | 2          |   -- The running count for the second row is 2.
    | 1300       | 1          |
    | 1300       | 2          |
    | 1500       | 1          |
    | 1600       | 1          |
    | 2450       | 1          |
    | 2450       | 2          |
    | 2850       | 1          |
    | 2975       | 1          |
    | 3000       | 1          |
    | 3000       | 2          |
    | 5000       | 1          |
    | 5000       | 2          |
    +------------+------------+

Example 3: In Hive compatible mode, specify a column for the window and return the cumulative count after sorting

In Hive compatible mode, this example partitions the window by the salary (sal) column and sorts the data. The function returns the cumulative count from the first row to the last row in the current window. The window contains all rows that have the same sal value.

  • Sample command

    -- Enable Hive compatible mode.
    SET odps.sql.hive.compatible=true;
    
    SELECT sal, COUNT(sal) OVER (PARTITION BY sal ORDER BY sal) AS count FROM emp; 
  • Result

    +------------+------------+
    | sal        | count      |
    +------------+------------+
    | 800        | 1          |
    | 950        | 1          |
    | 1100       | 1          |
    | 1250       | 2          |   -- The partition contains two rows where sal is 1250. The function returns 2 for both rows.
    | 1250       | 2          |   -- The count is also 2 for the second row in the partition.
    | 1300       | 2          |
    | 1300       | 2          |
    | 1500       | 1          |
    | 1600       | 1          |
    | 2450       | 2          |
    | 2450       | 2          |
    | 2850       | 1          |
    | 2975       | 1          |
    | 3000       | 2          |
    | 3000       | 2          |
    | 5000       | 2          |
    | 5000       | 2          |
    +------------+------------+

Example 4: Return the total number of rows

Calculate the total number of employees in all departments.

  • Sample command

    SELECT COUNT(*) FROM emp;
  • Result

    +------------+
    | _c0        |
    +------------+
    | 17         |
    +------------+

Example 5: Group data and calculate the total for each group

This example uses the COUNT function with GROUP BY to group all employees by department (deptno) and calculate the number of employees in each department.

  • Sample command

    SELECT deptno, COUNT(*) FROM emp GROUP BY deptno;
  • Result

    +------------+------------+
    | deptno     | _c1        | 
    +------------+------------+
    | 20         | 5          | 
    | 30         | 6          | 
    | 10         | 6          | 
    +------------+------------+

Example 6: Count unique values

This example uses DISTINCT to remove duplicate values and count the number of departments.

  • Sample command

    SELECT COUNT(DISTINCT deptno) FROM emp;
  • Return value

    +------------+
    | _c0        |
    +------------+
    | 3          |
    +------------+

CUME_DIST

  • Syntax

    double cume_dist() over([partition_clause] [orderby_clause])
  • Description

    Calculates the cumulative distribution of a value within a group of values. The result is the number of rows with values less than or equal to the current row's value, divided by the total number of rows in the partition. The comparison is determined by the orderby_clause.

  • Parameters

    partition_clause and orderby_clause: For more information, see windowing_definition.

  • Return value

    Returns a DOUBLE value. The specific return value is equal to row_number_of_last_peer / partition_row_count, where row_number_of_last_peer is the ROW_NUMBER window function's return value for the last row of the current row's GROUP, and partition_row_count is the number of rows in the partition to which the row belongs.

  • Example

    Partition by department (deptno) and calculate the cumulative distribution of salary (sal) within each department. The command is as follows:

    select deptno, ename, sal, concat(round(cume_dist() over (partition by deptno order by sal desc)*100,2),'%') as cume_dist from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | cume_dist  |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 33.33%     |
    | 10         | KING       | 5000       | 33.33%     |
    | 10         | CLARK      | 2450       | 66.67%     |
    | 10         | WELAN      | 2450       | 66.67%     |
    | 10         | TEBAGE     | 1300       | 100.0%     |
    | 10         | MILLER     | 1300       | 100.0%     |
    | 20         | SCOTT      | 3000       | 40.0%      |
    | 20         | FORD       | 3000       | 40.0%      |
    | 20         | JONES      | 2975       | 60.0%      |
    | 20         | ADAMS      | 1100       | 80.0%      |
    | 20         | SMITH      | 800        | 100.0%     |
    | 30         | BLAKE      | 2850       | 16.67%     |
    | 30         | ALLEN      | 1600       | 33.33%     |
    | 30         | TURNER     | 1500       | 50.0%      |
    | 30         | MARTIN     | 1250       | 83.33%     |
    | 30         | WARD       | 1250       | 83.33%     |
    | 30         | JAMES      | 950        | 100.0%     |
    +------------+------------+------------+------------+

DENSE_RANK

  • Syntax

    bigint dense_rank() over ([partition_clause] [orderby_clause])
  • Description

    Calculates the rank of the current row within its partition based on the sort order specified in the orderby_clause. The ranking starts at 1. Rows with the same order by values in a partition are assigned the same rank. The rank increments by 1 each time the order by value changes.

  • Parameters

    partition_clause and orderby_clause: For more information, see windowing_definition.

  • Return value

    Returns a BIGINT value. If an orderby_clause is not specified, all rows receive a rank of 1.

  • Example

    Partition by department (deptno) and rank employees within each department based on their salary (sal) in descending order. The command is as follows:

    select deptno, ename, sal, dense_rank() over (partition by deptno order by sal desc) as nums from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | nums       |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 1          |
    | 10         | KING       | 5000       | 1          |
    | 10         | CLARK      | 2450       | 2          |
    | 10         | WELAN      | 2450       | 2          |
    | 10         | TEBAGE     | 1300       | 3          |
    | 10         | MILLER     | 1300       | 3          |
    | 20         | SCOTT      | 3000       | 1          |
    | 20         | FORD       | 3000       | 1          |
    | 20         | JONES      | 2975       | 2          |
    | 20         | ADAMS      | 1100       | 3          |
    | 20         | SMITH      | 800        | 4          |
    | 30         | BLAKE      | 2850       | 1          |
    | 30         | ALLEN      | 1600       | 2          |
    | 30         | TURNER     | 1500       | 3          |
    | 30         | MARTIN     | 1250       | 4          |
    | 30         | WARD       | 1250       | 4          |
    | 30         | JAMES      | 950        | 5          |
    +------------+------------+------------+------------+

FIRST_VALUE

  • Syntax

    first_value(<expr>[, <ignore_nulls>]) over ([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the value of the expression expr from the first row of the window frame.

  • Parameters

    • expr: Required. The expression for which to calculate the result.

    • ignore_nulls: Optional. A BOOLEAN value that specifies whether to ignore NULL values. The default value is False. If this parameter is set to True, the function returns the first non-NULL value of expr in the window frame.

    • partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.

  • Return value

    The return value has the same data type as expr.

  • Example

    The following command groups all employees by department and returns the first row of data from each group:

    • Without specifying order by:

      select deptno, ename, sal, first_value(sal) over (partition by deptno) as first_value from emp;

      The following result is returned:

      +------------+------------+------------+-------------+
      | deptno     | ename      | sal        | first_value |
      +------------+------------+------------+-------------+
      | 10         | TEBAGE     | 1300       | 1300        |   -- First row of the current window.
      | 10         | CLARK      | 2450       | 1300        |
      | 10         | KING       | 5000       | 1300        |
      | 10         | MILLER     | 1300       | 1300        |
      | 10         | JACCKA     | 5000       | 1300        |
      | 10         | WELAN      | 2450       | 1300        |
      | 20         | FORD       | 3000       | 3000        |   -- First row of the current window.
      | 20         | SCOTT      | 3000       | 3000        |
      | 20         | SMITH      | 800        | 3000        |
      | 20         | ADAMS      | 1100       | 3000        |
      | 20         | JONES      | 2975       | 3000        |
      | 30         | TURNER     | 1500       | 1500        |   -- First row of the current window.
      | 30         | JAMES      | 950        | 1500        |
      | 30         | ALLEN      | 1600       | 1500        |
      | 30         | WARD       | 1250       | 1500        |
      | 30         | MARTIN     | 1250       | 1500        |
      | 30         | BLAKE      | 2850       | 1500        |
      +------------+------------+------------+-------------+
    • Specifying order by:

      select deptno, ename, sal, first_value(sal) over (partition by deptno order by sal desc) as first_value from emp;

      The following result is returned:

      +------------+------------+------------+-------------+
      | deptno     | ename      | sal        | first_value |
      +------------+------------+------------+-------------+
      | 10         | JACCKA     | 5000       | 5000        |   -- First row of the current window.
      | 10         | KING       | 5000       | 5000        |
      | 10         | CLARK      | 2450       | 5000        |
      | 10         | WELAN      | 2450       | 5000        |
      | 10         | TEBAGE     | 1300       | 5000        |
      | 10         | MILLER     | 1300       | 5000        |
      | 20         | SCOTT      | 3000       | 3000        |   -- First row of the current window.
      | 20         | FORD       | 3000       | 3000        |
      | 20         | JONES      | 2975       | 3000        |
      | 20         | ADAMS      | 1100       | 3000        |
      | 20         | SMITH      | 800        | 3000        |
      | 30         | BLAKE      | 2850       | 2850        |   -- First row of the current window.
      | 30         | ALLEN      | 1600       | 2850        |
      | 30         | TURNER     | 1500       | 2850        |
      | 30         | MARTIN     | 1250       | 2850        |
      | 30         | WARD       | 1250       | 2850        |
      | 30         | JAMES      | 950        | 2850        |
      +------------+------------+------------+-------------+

LAG

  • Syntax

    lag(<expr>[,bigint <offset>[, <default>]]) over([partition_clause] orderby_clause)
  • Description

    Returns the value of the expression expr from the row that is offset rows before the current row (towards the head of the partition). The expression expr can be a column, a column operation, or a function operation.

  • Parameters

    • expr: Required. The expression for which to calculate the result.

    • offset: Optional. The offset, which is a BIGINT constant greater than or equal to 1. A value of 1 indicates the preceding row. The default value is 1. If the input value is of the STRING or DOUBLE type, it is implicitly converted to the BIGINT type for calculation.

    • default: Optional. Specifies a default value to return when the offset is out of bounds. If this parameter is not specified, the default is NULL. The value must be a constant with the same data type as expr. If expr is not a constant, this value is evaluated based on the current row.

    • partition_clause and orderby_clause: For more information, see windowing_definition.

  • Return value

    The return value has the same data type as expr.

  • Example

    Partition by department (deptno) and retrieve the salary (sal) from the preceding row for each employee. The command is as follows:

    select deptno, ename, sal, lag(sal, 1) over (partition by deptno order by sal) as sal_new from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | sal_new    |
    +------------+------------+------------+------------+
    | 10         | TEBAGE     | 1300       | NULL       |
    | 10         | MILLER     | 1300       | 1300       |
    | 10         | CLARK      | 2450       | 1300       |
    | 10         | WELAN      | 2450       | 2450       |
    | 10         | KING       | 5000       | 2450       |
    | 10         | JACCKA     | 5000       | 5000       |
    | 20         | SMITH      | 800        | NULL       |
    | 20         | ADAMS      | 1100       | 800        |
    | 20         | JONES      | 2975       | 1100       |
    | 20         | SCOTT      | 3000       | 2975       |
    | 20         | FORD       | 3000       | 3000       |
    | 30         | JAMES      | 950        | NULL       |
    | 30         | MARTIN     | 1250       | 950        |
    | 30         | WARD       | 1250       | 1250       |
    | 30         | TURNER     | 1500       | 1250       |
    | 30         | ALLEN      | 1600       | 1500       |
    | 30         | BLAKE      | 2850       | 1600       |
    +------------+------------+------------+------------+

LAST_VALUE

  • Syntax

    last_value(<expr>[, <ignore_nulls>]) over([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the value of the expression expr from the last row of the window frame.

  • Parameters

    • expr: Required. The expression for which to calculate the result.

    • ignore_nulls: Optional. A BOOLEAN value that specifies whether to ignore NULL values. The default value is False. If this parameter is set to True, the function returns the last non-NULL value of expr in the window frame.

    • partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.

  • Return value

    The return value has the same data type as expr.

  • Example

    The following command groups all employees by department and returns the last row of data from each group:

    • Without an order by clause, the window frame includes all rows in the partition. The function returns the value from the last row in the partition.

      select deptno, ename, sal, last_value(sal) over (partition by deptno) as last_value from emp;

      The following result is returned:

      +------------+------------+------------+-------------+
      | deptno     | ename      | sal        | last_value |
      +------------+------------+------------+-------------+
      | 10         | TEBAGE     | 1300       | 2450        |
      | 10         | CLARK      | 2450       | 2450        |
      | 10         | KING       | 5000       | 2450        |
      | 10         | MILLER     | 1300       | 2450        |
      | 10         | JACCKA     | 5000       | 2450        |   
      | 10         | WELAN      | 2450       | 2450        |   -- Last row of the current window.
      | 20         | FORD       | 3000       | 2975        |
      | 20         | SCOTT      | 3000       | 2975        |
      | 20         | SMITH      | 800        | 2975        |
      | 20         | ADAMS      | 1100       | 2975        |
      | 20         | JONES      | 2975       | 2975        |   -- Last row of the current window.
      | 30         | TURNER     | 1500       | 2850        |
      | 30         | JAMES      | 950        | 2850        |
      | 30         | ALLEN      | 1600       | 2850        |
      | 30         | WARD       | 1250       | 2850        |
      | 30         | MARTIN     | 1250       | 2850        |
      | 30         | BLAKE      | 2850       | 2850        |   -- Last row of the current window.
      +------------+------------+------------+-------------+
    • With an order by clause, the default window frame is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. The function returns the value from the current row.

      select deptno, ename, sal, last_value(sal) over (partition by deptno order by sal desc) as last_value from emp;

      The following result is returned:

      +------------+------------+------------+-------------+
      | deptno     | ename      | sal        | last_value |
      +------------+------------+------------+-------------+
      | 10         | JACCKA     | 5000       | 5000        |   -- Current row of the current window.
      | 10         | KING       | 5000       | 5000        |   -- Current row of the current window.
      | 10         | CLARK      | 2450       | 2450        |   -- Current row of the current window.
      | 10         | WELAN      | 2450       | 2450        |   -- Current row of the current window.
      | 10         | TEBAGE     | 1300       | 1300        |   -- Current row of the current window.
      | 10         | MILLER     | 1300       | 1300        |   -- Current row of the current window.
      | 20         | SCOTT      | 3000       | 3000        |   -- Current row of the current window.
      | 20         | FORD       | 3000       | 3000        |   -- Current row of the current window.
      | 20         | JONES      | 2975       | 2975        |   -- Current row of the current window.
      | 20         | ADAMS      | 1100       | 1100        |   -- Current row of the current window.
      | 20         | SMITH      | 800        | 800         |   -- Current row of the current window.
      | 30         | BLAKE      | 2850       | 2850        |   -- Current row of the current window.
      | 30         | ALLEN      | 1600       | 1600        |   -- Current row of the current window.
      | 30         | TURNER     | 1500       | 1500        |   -- Current row of the current window.
      | 30         | MARTIN     | 1250       | 1250        |   -- Current row of the current window.
      | 30         | WARD       | 1250       | 1250        |   -- Current row of the current window.
      | 30         | JAMES      | 950        | 950         |   -- Current row of the current window.
      +------------+------------+------------+-------------+

LEAD

  • Syntax

    lead(<expr>[, bigint <offset>[, <default>]]) over([partition_clause] orderby_clause)
  • Description

    Returns the value of the expression expr from the row that is offset rows after the current row (towards the tail of the partition). The expression expr can be a column, a column operation, or a function operation.

  • Parameters

    • expr: Required. The expression for which to calculate the result.

    • offset: Optional. The offset, which is a BIGINT constant greater than or equal to 0. A value of 0 indicates the current row, and a value of 1 indicates the next row. The default value is 1. If the input value is of the STRING or DOUBLE type, it is implicitly converted to the BIGINT type for calculation.

    • default: Optional. The value to return if the offset is out of bounds. This value must be a constant with the same data type as expr. The default is NULL. If expr is not a constant, the value is evaluated based on the current row.

    • partition_clause and orderby_clause: For more information, see windowing_definition.

  • Return value

    The return value has the same data type as expr.

  • Example

    Partition by department (deptno) and retrieve the salary (sal) from the next row for each employee. The command is as follows:

    select deptno, ename, sal, lead(sal, 1) over (partition by deptno order by sal) as sal_new from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | sal_new    |
    +------------+------------+------------+------------+
    | 10         | TEBAGE     | 1300       | 1300       |
    | 10         | MILLER     | 1300       | 2450       |
    | 10         | CLARK      | 2450       | 2450       |
    | 10         | WELAN      | 2450       | 5000       |
    | 10         | KING       | 5000       | 5000       |
    | 10         | JACCKA     | 5000       | NULL       |
    | 20         | SMITH      | 800        | 1100       |
    | 20         | ADAMS      | 1100       | 2975       |
    | 20         | JONES      | 2975       | 3000       |
    | 20         | SCOTT      | 3000       | 3000       |
    | 20         | FORD       | 3000       | NULL       |
    | 30         | JAMES      | 950        | 1250       |
    | 30         | MARTIN     | 1250       | 1250       |
    | 30         | WARD       | 1250       | 1500       |
    | 30         | TURNER     | 1500       | 1600       |
    | 30         | ALLEN      | 1600       | 2850       |
    | 30         | BLAKE      | 2850       | NULL       |
    +------------+------------+------------+------------+

MAX

  • Syntax

    max(<expr>) over([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the maximum value of expr in a window.

  • Parameters

    • expr: Required. The expression used to calculate the maximum value. It can be of any data type except BOOLEAN. If the value is NULL, the row is not included in the calculation.

    • partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.

  • Return value

    The return value has the same data type as expr.

  • Examples

    • Example 1: Partition by department (deptno), calculate the maximum salary (sal), and do not sort. The function returns the maximum value of the current partition (rows with the same deptno). The command is as follows:

      select deptno, sal, max(sal) over (partition by deptno) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 5000       |   -- First row of the window. The value is the maximum from the first to the sixth row.
      | 10         | 2450       | 5000       |   -- The value is the maximum from the first to the sixth row.
      | 10         | 5000       | 5000       |   -- The value is the maximum from the first to the sixth row.
      | 10         | 1300       | 5000       |
      | 10         | 5000       | 5000       |
      | 10         | 2450       | 5000       |
      | 20         | 3000       | 3000       |
      | 20         | 3000       | 3000       |
      | 20         | 800        | 3000       |
      | 20         | 1100       | 3000       |
      | 20         | 2975       | 3000       |
      | 30         | 1500       | 2850       |
      | 30         | 950        | 2850       |
      | 30         | 1600       | 2850       |
      | 30         | 1250       | 2850       |
      | 30         | 1250       | 2850       |
      | 30         | 2850       | 2850       |
      +------------+------------+------------+
    • Example 2: Partition by department (deptno), calculate the maximum salary (sal), and sort the results. The function returns the maximum value from the first row to the current row of the current partition (rows with the same deptno). The command is as follows:

      select deptno, sal, max(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300       |   -- First row of the window.
      | 10         | 1300       | 1300       |   -- Maximum value from the first to the second row.
      | 10         | 2450       | 2450       |   -- Maximum value from the first to the third row.
      | 10         | 2450       | 2450       |   -- Maximum value from the first to the fourth row.
      | 10         | 5000       | 5000       |
      | 10         | 5000       | 5000       |
      | 20         | 800        | 800        |
      | 20         | 1100       | 1100       |
      | 20         | 2975       | 2975       |
      | 20         | 3000       | 3000       |
      | 20         | 3000       | 3000       |
      | 30         | 950        | 950        |
      | 30         | 1250       | 1250       |
      | 30         | 1250       | 1250       |
      | 30         | 1500       | 1500       |
      | 30         | 1600       | 1600       |
      | 30         | 2850       | 2850       |
      +------------+------------+------------+

MEDIAN

  • Syntax

    median(<expr>) over ([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Calculates the median of expr in a window.

  • Parameters

    • expr: Required. The expression for which to calculate the median. It must be of the DOUBLE or DECIMAL type.

      • If the input value is of the STRING or BIGINT type, it is implicitly converted to the DOUBLE type for calculation. An error is returned for other data types.

      • If the input is NULL, the return value is NULL.

    • partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.

  • Return value

    Returns a DOUBLE or DECIMAL value. If all expr values are NULL, NULL is returned.

  • Example

    Partition by department (deptno) and calculate the median salary (sal). The function returns the median for the entire partition (all rows with the same deptno). The command is as follows:

    select deptno, sal, median(sal) over (partition by deptno) from emp;

    The following result is returned:

    +------------+------------+------------+
    | deptno     | sal        | _c2        |
    +------------+------------+------------+
    | 10         | 1300       | 2450.0     |   -- First row of the window. The value is the median from the first to the sixth row.
    | 10         | 2450       | 2450.0     |
    | 10         | 5000       | 2450.0     |
    | 10         | 1300       | 2450.0     |
    | 10         | 5000       | 2450.0     |
    | 10         | 2450       | 2450.0     |
    | 20         | 3000       | 2975.0     |
    | 20         | 3000       | 2975.0     |
    | 20         | 800        | 2975.0     |
    | 20         | 1100       | 2975.0     |
    | 20         | 2975       | 2975.0     |
    | 30         | 1500       | 1375.0     |
    | 30         | 950        | 1375.0     |
    | 30         | 1600       | 1375.0     |
    | 30         | 1250       | 1375.0     |
    | 30         | 1250       | 1375.0     |
    | 30         | 2850       | 1375.0     |
    +------------+------------+------------+

MIN

  • Syntax

    min(<expr>) over([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the minimum value of expr in a window.

  • Parameters

    • expr: Required. The expression used to calculate the minimum value. It can be of any data type except BOOLEAN. If the value is NULL, the row is not included in the calculation.

    • partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.

  • Return value

    The return value has the same data type as expr.

  • Examples

    • Example 1: Partition by department (deptno), calculate the minimum salary (sal), and do not sort. The function returns the minimum value of the current partition (rows with the same deptno). The command is as follows:

      select deptno, sal, min(sal) over (partition by deptno) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300       |   -- First row of the window. The value is the minimum from the first to the sixth row.
      | 10         | 2450       | 1300       |   -- The value is the minimum from the first to the sixth row.
      | 10         | 5000       | 1300       |   -- The value is the minimum from the first to the sixth row.
      | 10         | 1300       | 1300       |
      | 10         | 5000       | 1300       |
      | 10         | 2450       | 1300       |
      | 20         | 3000       | 800        |
      | 20         | 3000       | 800        |
      | 20         | 800        | 800        |
      | 20         | 1100       | 800        |
      | 20         | 2975       | 800        |
      | 30         | 1500       | 950        |
      | 30         | 950        | 950        |
      | 30         | 1600       | 950        |
      | 30         | 1250       | 950        |
      | 30         | 1250       | 950        |
      | 30         | 2850       | 950        |
      +------------+------------+------------+
    • Example 2: Partition by department (deptno), calculate the minimum salary (sal), and sort the results. The function returns the minimum value from the first row to the current row of the current partition (rows with the same deptno). The command is as follows:

      select deptno, sal, min(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300       |   -- First row of the window.
      | 10         | 1300       | 1300       |   -- Minimum value from the first to the second row.
      | 10         | 2450       | 1300       |   -- Minimum value from the first to the third row.
      | 10         | 2450       | 1300       |
      | 10         | 5000       | 1300       |
      | 10         | 5000       | 1300       |
      | 20         | 800        | 800        |
      | 20         | 1100       | 800        |
      | 20         | 2975       | 800        |
      | 20         | 3000       | 800        |
      | 20         | 3000       | 800        |
      | 30         | 950        | 950        |
      | 30         | 1250       | 950        |
      | 30         | 1250       | 950        |
      | 30         | 1500       | 950        |
      | 30         | 1600       | 950        |
      | 30         | 2850       | 950        |
      +------------+------------+------------+

NTILE

  • Syntax

    bigint ntile(bigint <N>) over ([partition_clause] [orderby_clause])
  • Description

    Divides the ordered rows in a partition into N groups, as equal in size as possible, and returns the group number for each row. If the number of rows is not evenly divisible by N, the first groups (those with smaller group numbers) will have one extra row.

  • Parameters

    • N: Required. The number of groups. A BIGINT value.

    • partition_clause and orderby_clause: For more information, see windowing_definition.

  • Return value

    Returns a BIGINT value.

  • Example

    Divide all employees into 3 groups within each department based on salary (sal) in descending order, and return the group number for each employee. The command is as follows:

    select deptno, ename, sal, ntile(3) over (partition by deptno order by sal desc) as nt3 from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | nt3        |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 1          |
    | 10         | KING       | 5000       | 1          |
    | 10         | CLARK      | 2450       | 2          |
    | 10         | WELAN      | 2450       | 2          |
    | 10         | TEBAGE     | 1300       | 3          |
    | 10         | MILLER     | 1300       | 3          |
    | 20         | SCOTT      | 3000       | 1          |
    | 20         | FORD       | 3000       | 1          |
    | 20         | JONES      | 2975       | 2          |
    | 20         | ADAMS      | 1100       | 2          |
    | 20         | SMITH      | 800        | 3          |
    | 30         | BLAKE      | 2850       | 1          |
    | 30         | ALLEN      | 1600       | 1          |
    | 30         | TURNER     | 1500       | 2          |
    | 30         | MARTIN     | 1250       | 2          |
    | 30         | WARD       | 1250       | 3          |
    | 30         | JAMES      | 950        | 3          |
    +------------+------------+------------+------------+

NTH_VALUE

  • Syntax

    nth_value(<expr>, <number> [, <ignore_nulls>]) over ([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the value of the expression expr from the Nth row of the window frame.

  • Parameters

    • expr: Required. The expression for which to calculate the result.

    • number: Required. A BIGINT value. An integer greater than or equal to 1. If the value is 1, this function is equivalent to FIRST_VALUE.

    • ignore_nulls: Optional. A BOOLEAN value that specifies whether to ignore NULL values. The default value is False. If this parameter is set to True, the function returns the Nth non-NULL value of expr in the window frame.

    • partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.

  • Return value

    The return value has the same data type as expr.

  • Example

    The following command groups all employees by department and returns the 6th row of data from each group:

    • Without an order by clause, the window frame includes all rows in the partition. The function returns the value of the 6th row in the partition.

      select deptno, ename, sal, nth_value(sal,6) over (partition by deptno) as nth_value from emp;

      The following result is returned:

      +------------+------------+------------+------------+
      | deptno     | ename      | sal        | nth_value  |
      +------------+------------+------------+------------+
      | 10         | TEBAGE     | 1300       | 2450       |
      | 10         | CLARK      | 2450       | 2450       |
      | 10         | KING       | 5000       | 2450       |
      | 10         | MILLER     | 1300       | 2450       |
      | 10         | JACCKA     | 5000       | 2450       |
      | 10         | WELAN      | 2450       | 2450       |   -- 6th row of the current window.
      | 20         | FORD       | 3000       | NULL       |
      | 20         | SCOTT      | 3000       | NULL       |
      | 20         | SMITH      | 800        | NULL       |
      | 20         | ADAMS      | 1100       | NULL       |
      | 20         | JONES      | 2975       | NULL       |   -- The current window does not have a 6th row, so NULL is returned.
      | 30         | TURNER     | 1500       | 2850       |
      | 30         | JAMES      | 950        | 2850       |
      | 30         | ALLEN      | 1600       | 2850       |
      | 30         | WARD       | 1250       | 2850       |
      | 30         | MARTIN     | 1250       | 2850       |
      | 30         | BLAKE      | 2850       | 2850       |   -- 6th row of the current window.
      +------------+------------+------------+------------+
    • With an order by clause, the default window frame is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. The function returns the value of the 6th row in the window frame.

      select deptno, ename, sal, nth_value(sal,6) over (partition by deptno order by sal) as nth_value from emp;

      The following result is returned:

      +------------+------------+------------+------------+
      | deptno     | ename      | sal        | nth_value  |
      +------------+------------+------------+------------+
      | 10         | TEBAGE     | 1300       | NULL       |   
      | 10         | MILLER     | 1300       | NULL       |   -- The current window has only 2 rows, so the 6th row exceeds the window length.
      | 10         | CLARK      | 2450       | NULL       |
      | 10         | WELAN      | 2450       | NULL       |
      | 10         | KING       | 5000       | 5000       |  
      | 10         | JACCKA     | 5000       | 5000       |
      | 20         | SMITH      | 800        | NULL       |
      | 20         | ADAMS      | 1100       | NULL       |
      | 20         | JONES      | 2975       | NULL       |
      | 20         | SCOTT      | 3000       | NULL       |
      | 20         | FORD       | 3000       | NULL       |
      | 30         | JAMES      | 950        | NULL       |
      | 30         | MARTIN     | 1250       | NULL       |
      | 30         | WARD       | 1250       | NULL       |
      | 30         | TURNER     | 1500       | NULL       |
      | 30         | ALLEN      | 1600       | NULL       |
      | 30         | BLAKE      | 2850       | 2850       |
      +------------+------------+------------+------------+

PERCENT_RANK

  • Syntax

    double percent_rank() over([partition_clause] [orderby_clause])
  • Description

    Calculates the percentile rank of the current row within its partition, based on the sort order specified by orderby_clause.

  • Parameters

    partition_clause and orderby_clause: For more information, see windowing_definition.

  • Return value

    Returns a DOUBLE value in the range of [0.0, 1.0]. The specific return value is equal to “(rank - 1) / (partition_row_count - 1)”, where rank is the result of the RANK window function for that row, and partition_row_count is the number of rows in the partition to which the row belongs. If the partition contains only one row, the output is 0.0.

  • Example

    Calculate the percentile rank of each employee's salary within their department. The command is as follows:

    select deptno, ename, sal, percent_rank() over (partition by deptno order by sal desc) as sal_new from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | sal_new    |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 0.0        |
    | 10         | KING       | 5000       | 0.0        |
    | 10         | CLARK      | 2450       | 0.4        |
    | 10         | WELAN      | 2450       | 0.4        |
    | 10         | TEBAGE     | 1300       | 0.8        |
    | 10         | MILLER     | 1300       | 0.8        |
    | 20         | SCOTT      | 3000       | 0.0        |
    | 20         | FORD       | 3000       | 0.0        |
    | 20         | JONES      | 2975       | 0.5        |
    | 20         | ADAMS      | 1100       | 0.75       |
    | 20         | SMITH      | 800        | 1.0        |
    | 30         | BLAKE      | 2850       | 0.0        |
    | 30         | ALLEN      | 1600       | 0.2        |
    | 30         | TURNER     | 1500       | 0.4        |
    | 30         | MARTIN     | 1250       | 0.6        |
    | 30         | WARD       | 1250       | 0.6        |
    | 30         | JAMES      | 950        | 1.0        |
    +------------+------------+------------+------------+

PERCENTILE_CONT

  • Syntax

    -- Calculate the exact percentile
    PERCENTILE_CONT(<col_name>, DOUBLE <percentile>[, BOOLEAN <isIgnoreNull>])
    
    -- Calculate the exact percentile in a window
    PERCENTILE_CONT(<col_name>, DOUBLE <percentile>[, BOOLEAN <isIgnoreNull>]) OVER ([partition_clause] [orderby_clause])
  • Description

    Calculates the exact percentile. It uses a linear interpolation algorithm, sorts the specified column in ascending order, and returns the exact value at the specified percentile.

  • Parameters

    • col_name: Required. A column of the DOUBLE or DECIMAL type.

    • percentile: Required. The percentile to calculate. A DOUBLE constant in the range of [0, 1].

    • isIgnoreNull: Optional. Specifies whether to ignore NULL values. A BOOLEAN constant. The default value is TRUE. If set to FALSE, NULL values are treated as the minimum value during sorting.

    • partition_clause and orderby_clause: For more information, see Window functions.

  • Return value

    Returns the calculated percentile value as a DOUBLE.

  • Examples

    • Example 1: Ignore NULL values and calculate the exact percentile in a window.

      SELECT
        PERCENTILE_CONT(x, 0) OVER() AS min,
        PERCENTILE_CONT(x, 0.01) OVER() AS percentile1,
        PERCENTILE_CONT(x, 0.5) OVER() AS median,
        PERCENTILE_CONT(x, 0.9) OVER() AS percentile90,
        PERCENTILE_CONT(x, 1) OVER() AS max
      FROM VALUES(0D),(3D),(NULL),(1D),(2D) AS tbl(x) LIMIT 1;
      
      -- Return result
      +------------+-------------+------------+--------------+------------+
      | min        | percentile1 | median     | percentile90 | max        | 
      +------------+-------------+------------+--------------+------------+
      | 0.0        | 0.03        | 1.5        | 2.7          | 3.0        | 
      +------------+-------------+------------+--------------+------------+
    • Example 2: Do not ignore NULL values. NULL values are treated as the minimum value during sorting. Calculate the exact percentile in a window.

      SELECT
        PERCENTILE_CONT(x, 0, false) OVER() AS min,
        PERCENTILE_CONT(x, 0.01, false) OVER() AS percentile1,
        PERCENTILE_CONT(x, 0.5, false) OVER() AS median,
        PERCENTILE_CONT(x, 0.9, false) OVER() AS percentile90,
        PERCENTILE_CONT(x, 1, false) OVER() AS max
      FROM VALUES(0D),(3D),(NULL),(1D),(2D) AS tbl(x) LIMIT 1;
      
      -- Return result
      +------------+-------------+------------+--------------+------------+
      | min        | percentile1 | median     | percentile90 | max        | 
      +------------+-------------+------------+--------------+------------+
      | NULL       | 0.0         | 1.0        | 2.6          | 3.0        | 
      +------------+-------------+------------+--------------+------------+

PERCENTILE_DISC

  • Syntax

    -- Calculate a given percentile value
    PERCENTILE_DISC(<col_name>, DOUBLE <percentile>[, BOOLEAN <isIgnoreNull>])
    
    -- Calculate the percentile value in a window
    PERCENTILE_DISC(<col_name>, DOUBLE <percentile>[, BOOLEAN <isIgnoreNull>]) OVER ([partition_clause] [orderby_clause])
  • Description

    Calculates a given percentile value. It first sorts the specified column in ascending order and then returns the first value whose cumulative distribution is greater than or equal to the specified percentile.

  • Parameters

    • col_name: Required. A column with any sortable data type.

    • percentile: Required. The percentile to calculate. A DOUBLE constant in the range of [0, 1].

    • isIgnoreNull: Optional. Specifies whether to ignore NULL values. A BOOLEAN constant. The default value is TRUE. If set to FALSE, NULL values are treated as the minimum value during sorting.

    • partition_clause and orderby_clause: For more information, see Window functions.

  • Return value

    Returns the calculated percentile value. The data type is the same as the input col_name column.

  • Examples

    • Example 1: Ignore NULL values and calculate the percentile value in a window.

      SELECT
        x,
        PERCENTILE_DISC(x, 0) OVER() AS min,
        PERCENTILE_DISC(x, 0.5) OVER() AS median,
        PERCENTILE_DISC(x, 1) OVER() AS max
      FROM VALUES('c'),(NULL),('b'),('a') AS tbl(x);
      
      -- Return result
      +------------+------------+------------+------------+
      | x          | min        | median     | max        | 
      +------------+------------+------------+------------+
      | c          | a          | b          | c          | 
      | NULL       | a          | b          | c          | 
      | b          | a          | b          | c          | 
      | a          | a          | b          | c          | 
      +------------+------------+------------+------------+
    • Example 2: Do not ignore NULL values. NULL values are treated as the minimum value during sorting. Calculate the percentile value in a window.

      SELECT
        x,
        PERCENTILE_DISC(x, 0, false) OVER() AS min,
        PERCENTILE_DISC(x, 0.5, false) OVER() AS median,
        PERCENTILE_DISC(x, 1, false) OVER() AS max
      FROM VALUES('c'),(NULL),('b'),('a') AS tbl(x);
      
      -- Return result
      +------------+------------+------------+------------+
      | x          | min        | median     | max        | 
      +------------+------------+------------+------------+
      | c          | NULL       | a          | c          | 
      | NULL       | NULL       | a          | c          | 
      | b          | NULL       | a          | c          | 
      | a          | NULL       | a          | c          | 
      +------------+------------+------------+------------+

RANK

  • Syntax

    bigint rank() over ([partition_clause] [orderby_clause])
  • Description

    Calculates the rank of the current row within its partition, based on the sort order specified by orderby_clause. Counting starts from 1.

  • Parameters

    partition_clause and orderby_clause: For more information, see windowing_definition.

  • Return value

    Returns a BIGINT value. The return values may be duplicated and non-consecutive. The specific return value is the ROW_NUMBER() value of the first row in the GROUP to which the data row belongs. If an orderby_clause is not specified, all rows receive a rank of 1.

  • Example

    Partition by department (deptno) and rank employees within each department based on their salary (sal) in descending order. The command is as follows:

    select deptno, ename, sal, rank() over (partition by deptno order by sal desc) as nums from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | nums       |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 1          |
    | 10         | KING       | 5000       | 1          |
    | 10         | CLARK      | 2450       | 3          |
    | 10         | WELAN      | 2450       | 3          |
    | 10         | TEBAGE     | 1300       | 5          |
    | 10         | MILLER     | 1300       | 5          |
    | 20         | SCOTT      | 3000       | 1          |
    | 20         | FORD       | 3000       | 1          |
    | 20         | JONES      | 2975       | 3          |
    | 20         | ADAMS      | 1100       | 4          |
    | 20         | SMITH      | 800        | 5          |
    | 30         | BLAKE      | 2850       | 1          |
    | 30         | ALLEN      | 1600       | 2          |
    | 30         | TURNER     | 1500       | 3          |
    | 30         | MARTIN     | 1250       | 4          |
    | 30         | WARD       | 1250       | 4          |
    | 30         | JAMES      | 950        | 6          |
    +------------+------------+------------+------------+

ROW_NUMBER

  • Syntax

    row_number() over([partition_clause] [orderby_clause])
  • Description

    Calculates the row number of the current row within its partition, starting from 1.

  • Parameters

    For more information, see windowing_definition. The frame_clause is not allowed.

  • Return value

    Returns a BIGINT value.

  • Example

    Partition by department (deptno) and assign a unique, sequential number to each employee within their department, based on salary (sal) in descending order. The command is as follows:

    select deptno, ename, sal, row_number() over (partition by deptno order by sal desc) as nums from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | nums       |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 1          |
    | 10         | KING       | 5000       | 2          |
    | 10         | CLARK      | 2450       | 3          |
    | 10         | WELAN      | 2450       | 4          |
    | 10         | TEBAGE     | 1300       | 5          |
    | 10         | MILLER     | 1300       | 6          |
    | 20         | SCOTT      | 3000       | 1          |
    | 20         | FORD       | 3000       | 2          |
    | 20         | JONES      | 2975       | 3          |
    | 20         | ADAMS      | 1100       | 4          |
    | 20         | SMITH      | 800        | 5          |
    | 30         | BLAKE      | 2850       | 1          |
    | 30         | ALLEN      | 1600       | 2          |
    | 30         | TURNER     | 1500       | 3          |
    | 30         | MARTIN     | 1250       | 4          |
    | 30         | WARD       | 1250       | 5          |
    | 30         | JAMES      | 950        | 6          |
    +------------+------------+------------+------------+

STDDEV

  • Syntax

    double stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
    decimal stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Calculates the population standard deviation. This is an alias for the STDDEV_POP function.

  • Parameters

    • expr: Required. The expression for which to calculate the population standard deviation. It must be of the DOUBLE or DECIMAL type.

      • If the input value is of the STRING or BIGINT type, it is implicitly converted to the DOUBLE type for calculation. An error is returned for other data types.

      • If the input value is NULL, the row is not included in the calculation.

      • If you specify the distinct keyword, the function calculates the population standard deviation of unique values.

    • partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.

  • Return value

    The return value has the same data type as expr. If all values of expr are NULL, NULL is returned.

  • Examples

    • Example 1: Partition by department (deptno), calculate the population standard deviation of salary (sal), and do not sort. The function returns the cumulative population standard deviation of the current partition (rows with the same deptno). The command is as follows:

      select deptno, sal, stddev(sal) over (partition by deptno) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1546.1421524412158 |   -- First row of the window. The value is the cumulative population standard deviation from the first to the sixth row.
      | 10         | 2450       | 1546.1421524412158 |   -- The value is the cumulative population standard deviation from the first to the sixth row.
      | 10         | 5000       | 1546.1421524412158 |
      | 10         | 1300       | 1546.1421524412158 |
      | 10         | 5000       | 1546.1421524412158 |
      | 10         | 2450       | 1546.1421524412158 |
      | 20         | 3000       | 1004.7387720198718 |
      | 20         | 3000       | 1004.7387720198718 |
      | 20         | 800        | 1004.7387720198718 |
      | 20         | 1100       | 1004.7387720198718 |
      | 20         | 2975       | 1004.7387720198718 |
      | 30         | 1500       | 610.1001739241042 |
      | 30         | 950        | 610.1001739241042 |
      | 30         | 1600       | 610.1001739241042 |
      | 30         | 1250       | 610.1001739241042 |
      | 30         | 1250       | 610.1001739241042 |
      | 30         | 2850       | 610.1001739241042 |
      +------------+------------+------------+
    • Example 2: In non-Hive compatible mode, partition by department (deptno), calculate the population standard deviation of salary (sal), and sort the results. The function returns the cumulative population standard deviation from the first row to the current row of the current partition (rows with the same deptno). The commands are as follows:

      -- Disable Hive compatible mode.
      set odps.sql.hive.compatible=false;
      -- Execute the following SQL command.
      select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 0.0        |           -- First row of the window.
      | 10         | 1300       | 0.0        |           -- Cumulative population standard deviation from the first to the second row.
      | 10         | 2450       | 542.1151989096865 |    -- Cumulative population standard deviation from the first to the third row.
      | 10         | 2450       | 575.0      |           -- Cumulative population standard deviation from the first to the fourth row.
      | 10         | 5000       | 1351.6656391282572 |
      | 10         | 5000       | 1546.1421524412158 |
      | 20         | 800        | 0.0        |
      | 20         | 1100       | 150.0      |
      | 20         | 2975       | 962.4188277460079 |
      | 20         | 3000       | 1024.2947268730811 |
      | 20         | 3000       | 1004.7387720198718 |
      | 30         | 950        | 0.0        |
      | 30         | 1250       | 150.0      |
      | 30         | 1250       | 141.4213562373095 |
      | 30         | 1500       | 194.8557158514987 |
      | 30         | 1600       | 226.71568097509268 |
      | 30         | 2850       | 610.1001739241042 |
      +------------+------------+------------+
    • Example 3: In Hive compatible mode, partition by department (deptno), calculate the population standard deviation of salary (sal), and sort the results. The function returns the cumulative population standard deviation from the first row to the row with the same value as the current row (rows with the same sal have the same population standard deviation). The commands are as follows:

      -- Enable Hive compatible mode.
      set odps.sql.hive.compatible=true;
      -- Execute the following SQL command.
      select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 0.0        |           -- First row of the window. Since the sal of the first and second rows are the same, the population standard deviation for the first row is the cumulative population standard deviation of the first two rows.
      | 10         | 1300       | 0.0        |           -- Cumulative population standard deviation from the first to the second row.
      | 10         | 2450       | 575.0      |           -- Since the sal of the third and fourth rows are the same, the population standard deviation for the third row is the cumulative population standard deviation of the first four rows.
      | 10         | 2450       | 575.0      |           -- Cumulative population standard deviation from the first to the fourth row.
      | 10         | 5000       | 1546.1421524412158 |
      | 10         | 5000       | 1546.1421524412158 |
      | 20         | 800        | 0.0        |
      | 20         | 1100       | 150.0      |
      | 20         | 2975       | 962.4188277460079 |
      | 20         | 3000       | 1004.7387720198718 |
      | 20         | 3000       | 1004.7387720198718 |
      | 30         | 950        | 0.0        |
      | 30         | 1250       | 141.4213562373095 |
      | 30         | 1250       | 141.4213562373095 |
      | 30         | 1500       | 194.8557158514987 |
      | 30         | 1600       | 226.71568097509268 |
      | 30         | 2850       | 610.1001739241042 |
      +------------+------------+------------+

STDDEV_SAMP

  • Syntax

    double stddev_samp([distinct] <expr>) over([partition_clause] [orderby_clause] [frame_clause])
    decimal stddev_samp([distinct] <expr>) over([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Calculates the sample standard deviation.

  • Parameters

    • expr: Required. The expression for which to calculate the sample standard deviation. It must be of the DOUBLE or DECIMAL type.

      • If the input value is of the STRING or BIGINT type, it is implicitly converted to the DOUBLE type for calculation. An error is returned for other data types.

      • If the input value is NULL, the row is not included in the calculation.

      • If you specify the distinct keyword, the function calculates the sample standard deviation of unique values.

    • partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.

  • Return value

    The return value has the same data type as expr. If all values of expr are NULL, NULL is returned. If the window contains only one non-NULL value for expr, the result is 0.

  • Examples

    • Example 1: Partition by department (deptno), calculate the sample standard deviation of salary (sal), and do not sort. The function returns the cumulative sample standard deviation of the current partition (rows with the same deptno). The command is as follows:

      select deptno, sal, stddev_samp(sal) over (partition by deptno) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1693.7138680032904 |   -- First row of the window. The value is the cumulative sample standard deviation from the first to the sixth row.
      | 10         | 2450       | 1693.7138680032904 |   -- The value is the cumulative sample standard deviation from the first to the sixth row.
      | 10         | 5000       | 1693.7138680032904 |   -- The value is the cumulative sample standard deviation from the first to the sixth row.
      | 10         | 1300       | 1693.7138680032904 |     
      | 10         | 5000       | 1693.7138680032904 |
      | 10         | 2450       | 1693.7138680032904 |
      | 20         | 3000       | 1123.3320969330487 |
      | 20         | 3000       | 1123.3320969330487 |
      | 20         | 800        | 1123.3320969330487 |
      | 20         | 1100       | 1123.3320969330487 |
      | 20         | 2975       | 1123.3320969330487 |
      | 30         | 1500       | 668.331255192114 |
      | 30         | 950        | 668.331255192114 |
      | 30         | 1600       | 668.331255192114 |
      | 30         | 1250       | 668.331255192114 |
      | 30         | 1250       | 668.331255192114 |
      | 30         | 2850       | 668.331255192114 |
      +------------+------------+------------+
    • Example 2: Partition by department (deptno), calculate the sample standard deviation of salary (sal), and sort the results. The function returns the cumulative sample standard deviation from the first row to the current row of the current partition (rows with the same deptno). The command is as follows:

      select deptno, sal, stddev_samp(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 0.0        |          -- First row of the window.
      | 10         | 1300       | 0.0        |          -- Cumulative sample standard deviation from the first to the second row.
      | 10         | 2450       | 663.9528095680697 |   -- Cumulative sample standard deviation from the first to the third row.
      | 10         | 2450       | 663.9528095680696 |
      | 10         | 5000       | 1511.2081259707413 |
      | 10         | 5000       | 1693.7138680032904 |
      | 20         | 800        | 0.0        |
      | 20         | 1100       | 212.13203435596427 |
      | 20         | 2975       | 1178.7175234126282 |
      | 20         | 3000       | 1182.7536725793752 |
      | 20         | 3000       | 1123.3320969330487 |
      | 30         | 950        | 0.0        |
      | 30         | 1250       | 212.13203435596427 |
      | 30         | 1250       | 173.20508075688772 |
      | 30         | 1500       | 225.0      |
      | 30         | 1600       | 253.4758371127315 |
      | 30         | 2850       | 668.331255192114 |
      +------------+------------+------------+

SUM

  • Syntax

    sum([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the sum of expr in a window.

  • Parameters

    • expr: Required. The column for which to calculate the sum. It must be of the DOUBLE, DECIMAL, or BIGINT type.

      • If the input value is of the STRING type, it is implicitly converted to the DOUBLE type for calculation. An error is returned for other data types.

      • If the input value is NULL, the row is not included in the calculation.

      • If you specify the distinct keyword, the function calculates the sum of unique values.

    • partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.

  • Return value

    • If the input value is of the BIGINT type, a BIGINT value is returned.

    • If the input value is of the DECIMAL type, a DECIMAL value is returned.

    • If the input value is of the DOUBLE or STRING type, a DOUBLE value is returned.

    • If all input values are NULL, NULL is returned.

  • Examples

    • Example 1: Partition by department (deptno), calculate the sum of salary (sal), and do not sort. The function returns the cumulative sum of the current partition (rows with the same deptno). The command is as follows:

      select deptno, sal, sum(sal) over (partition by deptno) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 17500      |   -- First row of the window. The value is the cumulative sum from the first to the sixth row.
      | 10         | 2450       | 17500      |   -- The value is the cumulative sum from the first to the sixth row.
      | 10         | 5000       | 17500      |   -- The value is the cumulative sum from the first to the sixth row.
      | 10         | 1300       | 17500      |
      | 10         | 5000       | 17500      |
      | 10         | 2450       | 17500      |
      | 20         | 3000       | 10875      |
      | 20         | 3000       | 10875      |
      | 20         | 800        | 10875      |
      | 20         | 1100       | 10875      |
      | 20         | 2975       | 10875      |
      | 30         | 1500       | 9400       |
      | 30         | 950        | 9400       |
      | 30         | 1600       | 9400       |
      | 30         | 1250       | 9400       |
      | 30         | 1250       | 9400       |
      | 30         | 2850       | 9400       |
      +------------+------------+------------+
    • Example 2: In non-Hive compatible mode, partition by department (deptno), calculate the sum of salary (sal), and sort the results. The function returns the cumulative sum from the first row to the current row of the current partition (rows with the same deptno). The commands are as follows:

      -- Disable Hive compatible mode.
      set odps.sql.hive.compatible=false;
      -- Execute the following SQL command.
      select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300       |   -- First row of the window.
      | 10         | 1300       | 2600       |   -- Cumulative sum from the first to the second row.
      | 10         | 2450       | 5050       |   -- Cumulative sum from the first to the third row.
      | 10         | 2450       | 7500       |
      | 10         | 5000       | 12500      |
      | 10         | 5000       | 17500      |
      | 20         | 800        | 800        |
      | 20         | 1100       | 1900       |
      | 20         | 2975       | 4875       |
      | 20         | 3000       | 7875       |
      | 20         | 3000       | 10875      |
      | 30         | 950        | 950        |
      | 30         | 1250       | 2200       |
      | 30         | 1250       | 3450       |
      | 30         | 1500       | 4950       |
      | 30         | 1600       | 6550       |
      | 30         | 2850       | 9400       |
      +------------+------------+------------+
    • Example 3: In Hive compatible mode, partition by department (deptno), calculate the sum of salary (sal), and sort the results. The function returns the cumulative sum from the first row to the row with the same value as the current row (rows with the same sal have the same sum). The commands are as follows:

      -- Enable Hive compatible mode.
      set odps.sql.hive.compatible=true;
      -- Execute the following SQL command.
      select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 2600       |   -- First row of the window. Since the sal of the first and second rows are the same, the sum for the first row is the cumulative sum of the first two rows.
      | 10         | 1300       | 2600       |   -- Cumulative sum from the first to the second row.
      | 10         | 2450       | 7500       |   -- Since the sal of the third and fourth rows are the same, the sum for the third row is the cumulative sum of the firstfour rows.
      | 10         | 5000       | 17500      |
      | 10         | 5000       | 17500      |
      | 20         | 800        | 800        |
      | 20         | 1100       | 1900       |
      | 20         | 2975       | 4875       |
      | 20         | 3000       | 10875      |
      | 20         | 3000       | 10875      |
      | 30         | 950        | 950        |
      | 30         | 1250       | 3450       |
      | 30         | 1250       | 3450       |
      | 30         | 1500       | 4950       |
      | 30         | 1600       | 6550       |
      | 30         | 2850       | 9400       |
      +------------+------------+------------+

References