All Products
Search
Document Center

MaxCompute:Window functions

Last Updated:Dec 22, 2023

You can use window functions in MaxCompute SQL to flexibly process and analyze data of columns in a window. This topic describes the syntax and parameters of window functions that are supported by MaxCompute SQL. This topic also provides examples on how to use window functions. It guides you through data development by using window functions.

The following table lists the window functions that are supported by MaxCompute SQL.

Function

Description

AVG

Calculates the average value of data in a window.

CLUSTER_SAMPLE

Samples random rows of data. If true is returned, the specified row of data is sampled.

COUNT

Calculates the number of rows in a window.

CUME_DIST

Calculates the cumulative distribution of data in a partition.

DENSE_RANK

Calculates the percentile rank of a row in a group of rows. The ranks are consecutive.

FIRST_VALUE

Obtains the calculated result of the first row of data in the window to which the current row belongs.

LAG

Obtains the calculated result of the Nth row of data that precedes the current row at a given offset in a window.

LAST_VALUE

Obtains the calculated result of the last row of data in the window to which the current row belongs.

LEAD

Obtains the calculated result of the Nth row of data that follows the current row at a given offset in a window.

MAX

Calculates the maximum value in a window.

MEDIAN

Calculates the median in a window.

MIN

Calculates the minimum value in a window.

NTILE

Splits rows of data in a partition into N groups of equal size and returns the number of the group to which the current row belongs. The group number ranges from 1 to N.

NTH_VALUE

Obtains the calculated result of the Nth row of data in a window to which the current row belongs.

PERCENT_RANK

Calculates the percentile rank of a row in a group of rows. The function returns a percentage value.

RANK

Calculates the percentile rank of a row in a group of rows. The ranks may not be consecutive integers.

ROW_NUMBER

Calculates the sequence number of a row. The row number starts from 1.

STDDEV

Returns the population standard deviation of all the input values. This function is also called STDDEV_POP.

STDDEV_SAMP

Returns the sample standard deviation of all the input values.

SUM

Calculates the sum of data in a window.

Limits

Before you use window functions, take note of the following limits:

  • Window functions are supported only in SELECT statements.

  • A window function cannot contain nested window functions or aggregate functions.

  • You cannot use window functions together with aggregate functions of the same level.

Syntax

Syntax of window functions:

<function_name>([distinct][<expression> [, ...]]) over (<window_definition>)
<function_name>([distinct][<expression> [, ...]]) over <window_name>
  • function_name: a built-in window function, built-in aggregate function, or user-defined aggregate function (UDAF). For more information about built-in aggregate functions, see Aggregate functions. For more information about UDAFs, see Overview.

  • expression: the format of a window function. The format is subject to the function syntax.

  • windowing_definition: the definition of a window. For information about the syntax of windowing_definition, see windowing_definition.

  • window_name: the name of a window. You can use the window keyword to configure a window and use windowing_definition to specify the name of the window. Syntax of named_window_def:

    window <window_name> as (<window_definition>)

    Position of named_window_def in an SQL statement:

    select ... from ... [where ...] [group by ...] [having ...] named_window_def [order by ...] [limit ...]

windowing_definition

Syntax

-- partition_clause:
[partition by <expression> [, ...]]
-- orderby_clause:
[order by <expression> [asc|desc][nulls {first|last}] [, ...]]
[<frame_clause>]

If you use a window function in a SELECT statement, data is partitioned and sorted based on partition by and order by in windowing_definition when the window function is executed. If the SELECT statement does not include partition by, only one partition exists and the partition contains all data. If the SELECT statement does not include order by, data in a partition is arranged in a random order, and a data stream is generated. After the data stream is generated, a group of rows is extracted from the data stream based on frame_clause in windowing_definition to create a window for the current row. The window function calculates the data included in the window to which the current row belongs.

  • partition by <expression> [, ...]: optional. This parameter specifies the partition information. If the values of partition key columns are the same for a group of rows, these rows are included in the same window. For more information about the format of partition by, see Table operations.

  • order by <expression> [asc|desc][nulls {first|last}] [, ...]: optional. This parameter specifies how to sort rows of data in a window.

    Note

    If the values of the column that is specified in order by are the same, the sorting result may not be accurate. To reduce the random ordering of data, make sure that the values of the column that is specified in order by are unique.

  • frame_clause: optional. This parameter is used to determine the data boundaries of a window. For more information about frame_clause, see frame_clause.

frame_clause

Syntax

-- Syntax 1 
{ROWS|RANGE|GROUPS} <frame_start> [<frame_exclusion>]
-- Syntax 2 
{ROWS|RANGE|GROUPS} between <frame_start> and <frame_end> [<frame_exclusion>]

frame_clause is a closed interval that is used to determine the data boundaries of a window. The data boundaries are determined based on the rows that are specified by frame_start and frame_end.

  • ROWS|RANGE|GROUPS: required. ROWS, RANGE, and GROUPS indicate the types of frame_clause. The implementation rules of frame_start and frame_end vary based on the type of frame_clause.

    • ROWS: The data boundaries of a window are determined based on the number of rows.

    • RANGE: The data boundaries of a window are determined based on the comparison results of the values of the column that is specified in order by. In most cases, order by is specified in windowing_definition. If order by is not specified in windowing_definition, the values of the column that is specified in order by are the same for all rows in a partition. null values are considered equivalent.

    • In a partition, rows that have the same value of the column specified in order by form a group. If order by is not specified, all rows in the partition form a group. null values are considered equivalent.

  • frame_start and frame_end: the start and end rows of a window. frame_start is required. frame_end is optional. If frame_end is not specified, the default value CURRENT ROW is used.

    The row specified by frame_start must precede or be the same as the row specified by frame_end. Compared with the row specified by frame_end, the row specified by frame_start is closer to the first row in a window after all data in the window is sorted based on the column that is specified in order by of windowing_definition. The following table describes the valid values and logic of frame_start and frame_end when the type of frame_clause is ROWS, RANGE, or GROUPS.

    frame_clause type

    frame_start or frame_end value

    Description

    ROWS, RANGE, and GROUPS

    UNBOUNDED PRECEDING

    Indicates the first row of a partition. Rows are counted from 1.

    UNBOUNDED FOLLOWING

    Indicates the last row of a partition.

    ROWS

    CURRENT ROW

    Indicates the current row. Each row of data corresponds to a result calculated by a window function. The current row indicates the row whose data is calculated by using a window function.

    offset PRECEDING

    Indicates the Nth row that precedes the current row at a given offset. For example, if 0 PRECEDING indicates the current row, 1 PRECEDING indicates the previous row. offset must be a non-negative integer.

    offset FOLLOWING

    Indicates the Nth row that follows the current row at a given offset. For example, if 0 FOLLOWING indicates the current row, 1 FOLLOWING indicates the next row. offset must be a non-negative integer.

    RANGE

    CURRENT ROW

    • If frame_start is set to CURRENT ROW, it indicates the first row that has the same value of the column specified in order by as the current row.

    • If frame_end is set to CURRENT ROW, it indicates the last row that has the same value of the column specified in order by as the current row.

    offset PRECEDING

    The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by order by. For example, data in a window is sorted based on column X, Xi indicates the X value that corresponds to the ith row, and Xc indicates the X value that corresponds to the current row. Positions of rows specified by frame_start and frame_end:

    • order by is set to asc:

      • frame_start indicates the first row that meets the following requirement: Xc - Xi <= offset.

      • frame_end indicates the last row that meets the following requirement: Xc - Xi >= offset.

    • order by is set to desc:

      • frame_start indicates the first row that meets the following requirement: Xi - Xc <= offset.

      • frame_end indicates the last row that meets the following requirement: Xi - Xc >= offset.

    The column that is specified by order by can be of the following data types: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATETIME, DATE, and TIMESTAMP.

    Syntax for offset of a date type:

    • N: indicates N days or N seconds. It must be a non-negative integer. For an offset of the DATETIME or TIMESTAMP type, it indicates N seconds. For an offset of the DATE type, it indicates N days.

    • interval 'N' {YEAR\MONTH\DAY\HOUR\MINUTE\SECOND}: indicates N years, months, days, hours, minutes, or seconds. For example, INTERVAL '3' YEAR indicates 3 years.

    • INTERVAL 'N-M' YEAR TO MONTH: indicates N years and M months. For example, INTERVAL '1-3' YEAR TO MONTH indicates 1 year and 3 months.

    • INTERVAL 'D[ H[:M[:S[:N]]]]' DAY TO SECOND: indicates D days, H hours, M minutes, S seconds, and N nanoseconds. For example, INTERVAL '1 2:3:4:5' DAY TO SECOND indicates 1 day, 2 hours, 3 minutes, 4 seconds, and 5 nanoseconds.

    offset FOLLOWING

    The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by order by. For example, data in a window is sorted based on column X, Xi indicates the X value that corresponds to the ith row, and Xc indicates the X value that corresponds to the current row. Positions of rows specified by frame_start and frame_end:

    • order by is set to asc:

      • frame_start: indicates the first row that meets the following requirement: Xi - Xc >= offset.

      • frame_end: indicates the last row that meets the following requirement: Xi - Xc <= offset.

    • order by is set to desc:

      • frame_start: indicates the first row that meets the following requirement: Xc - Xi >= offset.

      • frame_end: indicates the last row that meets the following requirement: Xc - Xi <= offset.

    GROUPS

    CURRENT ROW

    • If frame_start is set to CURRENT ROW, it indicates the first row of the group to which the current row belongs.

    • If frame_end is set to CURRENT ROW, it indicates the last row of the group to which the current row belongs.

    offset PRECEDING

    • If frame_start is set to offset PRECEDING, it indicates the first row of the Nth group that precedes the group of the current row at a given offset.

    • If frame_end is set to offset PRECEDING, it indicates the last row of the Nth group that precedes the group of the current row at a given offset.

    Note

    You cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING.

    offset FOLLOWING

    • If frame_start is set to offset FOLLOWING, it indicates the first row of the Nth group that follows the group of the current row at a given offset.

    • If frame_end is set to offset FOLLOWING, the last row of the Nth group that follows the group of the current row at a specified offset is used.

    Note

    You cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING.

  • frame_exclusion: optional. This parameter is used to remove specific rows from a window. Valid values:

    • EXCLUDE NO OTHERS: No rows are excluded from the window.

    • EXCLUDE CURRENT ROW: The current row is excluded from the window.

    • EXCLUDE GROUP: An entire group of rows in a partition is excluded from the window. In the group, all rows have the same value of the column that is specified in order by as the current row.

    • EXCLUDE TIES: An entire group of rows, except for the current row, are excluded from the window.

Default frame_clause

If you do not specify frame_clause, MaxCompute uses the default frame_clause to determine the data boundaries of a window. Values of the default frame_clause:

  • If odps.sql.hive.compatible is set to true, the following default frame_clause is used. This rule applies to most SQL systems.

    RANGE between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS
  • If odps.sql.hive.compatible is set to false, order by is specified, and one of the following window functions is used, the default frame_clause in ROWS mode is used: AVG, COUNT, MAX, MIN, STDDEV, STEDEV_POP, STDDEV_SAMP, and SUM.

    ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS

Example of data boundaries of a window

In this example, a table named tbl contains three columns that are of the BIGINT type: pid, oid, and rid. The tbl table contains the following data:

+------------+------------+------------+
| pid        | oid        | rid        |
+------------+------------+------------+
| 1          | NULL       | 1          |
| 1          | NULL       | 2          |
| 1          | 1          | 3          |
| 1          | 1          | 4          |
| 1          | 2          | 5          |
| 1          | 4          | 6          |
| 1          | 7          | 7          |
| 1          | 11         | 8          |
| 2          | NULL       | 9          |
| 2          | NULL       | 10         |
+------------+------------+------------+
  • Windows in ROWS mode

    • windowing_definition 1

      partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW
      -- Sample SQL statement: 
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [1]    |
      | 1          | NULL       | 2          | [1, 2] |
      | 1          | 1          | 3          | [1, 2, 3] |
      | 1          | 1          | 4          | [1, 2, 3, 4] |
      | 1          | 2          | 5          | [1, 2, 3, 4, 5] |
      | 1          | 4          | 6          | [1, 2, 3, 4, 5, 6] |
      | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7] |
      | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 2          | NULL       | 9          | [9]    |
      | 2          | NULL       | 10         | [9, 10] |
      +------------+------------+------------+--------+
    • windowing_definition 2

      partition by pid order by oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING
      -- Sample SQL statement: 
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | NULL       | 2          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 1          | 3          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 1          | 4          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 2          | 5          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 4          | 6          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 2          | NULL       | 9          | [9, 10] |
      | 2          | NULL       | 10         | [9, 10] |
      +------------+------------+------------+--------+
    • windowing_definition 3

      partition by pid order by oid ROWS between 1 FOLLOWING and 3 FOLLOWING
      -- Sample SQL statement: 
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid ROWS between 1 FOLLOWING and 3 FOLLOWING) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [2, 3, 4] |
      | 1          | NULL       | 2          | [3, 4, 5] |
      | 1          | 1          | 3          | [4, 5, 6] |
      | 1          | 1          | 4          | [5, 6, 7] |
      | 1          | 2          | 5          | [6, 7, 8] |
      | 1          | 4          | 6          | [7, 8] |
      | 1          | 7          | 7          | [8]    |
      | 1          | 11         | 8          | NULL   |
      | 2          | NULL       | 9          | [10]   |
      | 2          | NULL       | 10         | NULL   |
      +------------+------------+------------+--------+
    • windowing_definition 4

      partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW
      -- Sample SQL statement: 
      select pid, 
      oid, 
      rid, 
      collect_list(rid) over(partition by pid order by 
      oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | NULL   |
      | 1          | NULL       | 2          | [1]    |
      | 1          | 1          | 3          | [1, 2] |
      | 1          | 1          | 4          | [1, 2, 3] |
      | 1          | 2          | 5          | [1, 2, 3, 4] |
      | 1          | 4          | 6          | [1, 2, 3, 4, 5] |
      | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6] |
      | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7] |
      | 2          | NULL       | 9          | NULL   |
      | 2          | NULL       | 10         | [9]    |
      +------------+------------+------------+--------+
    • windowing_definition 5

      partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP
      -- Sample SQL statement: 
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | NULL   |
      | 1          | NULL       | 2          | NULL   |
      | 1          | 1          | 3          | [1, 2] |
      | 1          | 1          | 4          | [1, 2] |
      | 1          | 2          | 5          | [1, 2, 3, 4] |
      | 1          | 4          | 6          | [1, 2, 3, 4, 5] |
      | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6] |
      | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7] |
      | 2          | NULL       | 9          | NULL   |
      | 2          | NULL       | 10         | NULL   |
      +------------+------------+------------+--------+
    • windowing_definition 6

      partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES
      -- Sample SQL statement: 
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES) as window from tbl;                            

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [1]    |
      | 1          | NULL       | 2          | [2]    |
      | 1          | 1          | 3          | [1, 2, 3] |
      | 1          | 1          | 4          | [1, 2, 4] |
      | 1          | 2          | 5          | [1, 2, 3, 4, 5] |
      | 1          | 4          | 6          | [1, 2, 3, 4, 5, 6] |
      | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7] |
      | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 2          | NULL       | 9          | [9]    |
      | 2          | NULL       | 10         | [10]   |
      +------------+------------+------------+--------+

      The differences between EXCLUDE CURRENT ROW and EXCLUDE GROUP can be obtained based on the comparison between the window column values of rows with the rid column values of 2, 4, and 10 in Syntax 5 and Syntax 6. If frame_exclusion is set to EXCLUDE GROUP, the rows that have the same pid column value in a partition are extracted when the rows have the same oid column value as the current row.

  • Windows in RANGE mode

    • windowing_definition 1

      partition by pid order by oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW
      -- Sample SQL statement: 
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [1, 2] |
      | 1          | NULL       | 2          | [1, 2] |
      | 1          | 1          | 3          | [1, 2, 3, 4] |
      | 1          | 1          | 4          | [1, 2, 3, 4] |
      | 1          | 2          | 5          | [1, 2, 3, 4, 5] |
      | 1          | 4          | 6          | [1, 2, 3, 4, 5, 6] |
      | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7] |
      | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 2          | NULL       | 9          | [9, 10] |
      | 2          | NULL       | 10         | [9, 10] |
      +------------+------------+------------+--------+

      If frame_end is set to CURRENT ROW, the last row that has the same value of the oid column in order by as the current row is obtained. Therefore, the window column value of the row whose rid column value is 1 is [1, 2].

    • windowing_definition 2

      partition by pid order by oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING
      -- Sample SQL statement: 
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | NULL       | 2          | [1, 2, 3, 4, 5, 6, 7, 8] |
      | 1          | 1          | 3          | [3, 4, 5, 6, 7, 8] |
      | 1          | 1          | 4          | [3, 4, 5, 6, 7, 8] |
      | 1          | 2          | 5          | [5, 6, 7, 8] |
      | 1          | 4          | 6          | [6, 7, 8] |
      | 1          | 7          | 7          | [7, 8] |
      | 1          | 11         | 8          | [8]    |
      | 2          | NULL       | 9          | [9, 10] |
      | 2          | NULL       | 10         | [9, 10] |
      +------------+------------+------------+--------+
    • windowing_definition 3

      partition by pid order by oid RANGE between 3 PRECEDING and 1 PRECEDING
      -- Sample SQL statement: 
      
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid RANGE between 3 PRECEDING and 1 PRECEDING) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [1, 2] |
      | 1          | NULL       | 2          | [1, 2] |
      | 1          | 1          | 3          | NULL   |
      | 1          | 1          | 4          | NULL   |
      | 1          | 2          | 5          | [3, 4] |
      | 1          | 4          | 6          | [3, 4, 5] |
      | 1          | 7          | 7          | [6]    |
      | 1          | 11         | 8          | NULL   |
      | 2          | NULL       | 9          | [9, 10] |
      | 2          | NULL       | 10         | [9, 10] |
      +------------+------------+------------+--------+

      For the row whose value of oid in order by is null, if frame_start is set to offset PRECEDING or offset FOLLOWING, the row is the first row whose value of oid in order by is null. If frame_end is set to offset PRECEDING or offset FOLLOWING, the row is the last row whose value of oid in order by is null.

  • Windows in GROUPS mode

    windowing_definition

    partition by pid order by oid GROUPS between 2 PRECEDING and CURRENT ROW
    -- Sample SQL statement: 
    select pid, 
           oid, 
           rid, 
    collect_list(rid) over(partition by pid order by 
    oid GROUPS between 2 PRECEDING and CURRENT ROW) as window from tbl;

    The following result is returned:

    +------------+------------+------------+--------+
    | pid        | oid        | rid        | window |
    +------------+------------+------------+--------+
    | 1          | NULL       | 1          | [1, 2] |
    | 1          | NULL       | 2          | [1, 2] |
    | 1          | 1          | 3          | [1, 2, 3, 4] |
    | 1          | 1          | 4          | [1, 2, 3, 4] |
    | 1          | 2          | 5          | [1, 2, 3, 4, 5] |
    | 1          | 4          | 6          | [3, 4, 5, 6] |
    | 1          | 7          | 7          | [5, 6, 7] |
    | 1          | 11         | 8          | [6, 7, 8] |
    | 2          | NULL       | 9          | [9, 10] |
    | 2          | NULL       | 10         | [9, 10] |
    +------------+------------+------------+--------+

Sample data

This section provides sample source data and examples for you to understand how to use the functions. Create a table named emp and insert the sample data into the table. Sample commands:

create table if not exists emp
   (empno bigint,
    ename string,
    job string,
    mgr bigint,
    hiredate datetime,
    sal bigint,
    comm bigint,
    deptno bigint);
tunnel upload emp.txt emp;

The emp.txt file contains the following sample data:

7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

AVG

  • Syntax

    double avg([distinct] double <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
    decimal avg([distinct] decimal <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the average value of expr in a window.

  • Parameters

    • expr: required. The expression that is used to calculate the returned result. A value of the DOUBLE or DECIMAL type.

      • If the input value is of the STRING or BIGINT type, the value is implicitly converted into a value of the DOUBLE type before calculation. If the input value is of another data type, an error is returned.

      • If the input value is null, the row that contains the value is not used for calculation.

      • If the distinct keyword is specified, the average value of distinct values is calculated.

    • partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.

  • Return value

    If the input value of expr is of the DECIMAL type, a value of the DECIMAL type is returned. If the input value is of another data type, a value of the DOUBLE type is returned. If the input value of expr is null, null is returned.

  • Examples

    • Example 1: Use the deptno column to define a window and calculate the average value of the sal column. The order by clause is not specified. This function returns the cumulative average value of the values from the first row to the last row in the current window. The current window includes the rows that have the same deptno column value. Sample statement:

      select deptno, sal, avg(sal) over (partition by deptno) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 2916.6666666666665 |   -- This row is the first row of this window. The return value is the cumulative average value of the values from the first row to the sixth row. 
      | 10         | 2450       | 2916.6666666666665 |   -- The return value is the cumulative average value of the values from the first row to the sixth row. 
      | 10         | 5000       | 2916.6666666666665 |   -- The return value is the cumulative average value of the values from the first row to the sixth row. 
      | 10         | 1300       | 2916.6666666666665 |
      | 10         | 5000       | 2916.6666666666665 |
      | 10         | 2450       | 2916.6666666666665 |
      | 20         | 3000       | 2175.0     |
      | 20         | 3000       | 2175.0     |
      | 20         | 800        | 2175.0     |
      | 20         | 1100       | 2175.0     |
      | 20         | 2975       | 2175.0     |
      | 30         | 1500       | 1566.6666666666667 |
      | 30         | 950        | 1566.6666666666667 |
      | 30         | 1600       | 1566.6666666666667 |
      | 30         | 1250       | 1566.6666666666667 |
      | 30         | 1250       | 1566.6666666666667 |
      | 30         | 2850       | 1566.6666666666667 |
      +------------+------------+------------+
    • Example 2: Disable the Hive-compatible data type edition and use the deptno column to define a window and calculate the average value of the sal column. ORDER BY is specified. This function returns the cumulative average value of the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statement:

      -- Disable the Hive-compatible data type edition. 
      set odps.sql.hive.compatible=false;
      -- Execute the following statement: 
      select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300.0     |           -- This row is the first row of this window. 
      | 10         | 1300       | 1300.0     |           -- The return value is the cumulative average value of the values in the first and second rows. 
      | 10         | 2450       | 1683.3333333333333 |   -- The return value is the cumulative average value of the values from the first row to the third row. 
      | 10         | 2450       | 1875.0     |           -- The return value is the cumulative average value of the values from the first row to the fourth row. 
      | 10         | 5000       | 2500.0     |           -- The return value is the cumulative average value of the values from the first row to the fifth row. 
      | 10         | 5000       | 2916.6666666666665 |   -- The return value is the cumulative average value of the values from the first row to the sixth row. 
      | 20         | 800        | 800.0      |
      | 20         | 1100       | 950.0      |
      | 20         | 2975       | 1625.0     |
      | 20         | 3000       | 1968.75    |
      | 20         | 3000       | 2175.0     |
      | 30         | 950        | 950.0      |
      | 30         | 1250       | 1100.0     |
      | 30         | 1250       | 1150.0     |
      | 30         | 1500       | 1237.5     |
      | 30         | 1600       | 1310.0     |
      | 30         | 2850       | 1566.6666666666667 |
      +------------+------------+------------+
    • Example 3: Enable the Hive-compatible data type edition and use the deptno column to define a window and calculate the average value of the sal column. ORDER BY is specified. This function returns the cumulative average value of the values from the first row to the row that has the same sal value as the current row in the current window. The average values for the rows that have the same sal value are the same. The current window includes the rows that have the same deptno value. Sample statement:

      -- Enable the Hive-compatible data type edition. 
      set odps.sql.hive.compatible=true;
      -- Execute the following statement: 
      select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300.0     |          -- This row is the first row of this window. The average value for the first row is the cumulative average value of the values in the first and second rows because the two rows have the same sal value. 
      | 10         | 1300       | 1300.0     |          -- The return value is the cumulative average value of the values in the first and second rows. 
      | 10         | 2450       | 1875.0     |          -- The average value for the third row is the cumulative average value of the values from the first row to the fourth row because the third and fourth rows have the same sal value. 
      | 10         | 2450       | 1875.0     |          -- The return value is the cumulative average value of the values from the first row to the fourth row. 
      | 10         | 5000       | 2916.6666666666665 |
      | 10         | 5000       | 2916.6666666666665 |
      | 20         | 800        | 800.0      |
      | 20         | 1100       | 950.0      |
      | 20         | 2975       | 1625.0     |
      | 20         | 3000       | 2175.0     |
      | 20         | 3000       | 2175.0     |
      | 30         | 950        | 950.0      |
      | 30         | 1250       | 1150.0     |
      | 30         | 1250       | 1150.0     |
      | 30         | 1500       | 1237.5     |
      | 30         | 1600       | 1310.0     |
      | 30         | 2850       | 1566.6666666666667 |
      +------------+------------+------------+

CLUSTER_SAMPLE

  • Syntax

    boolean cluster_sample(bigint <N>) OVER ([partition_clause])
    boolean cluster_sample(bigint <N>, bigint <M>) OVER ([partition_clause])
  • Description

    • cluster_sample(bigint <N>): specifies that N random rows of data are sampled.

    • cluster_sample(bigint <N>, bigint <M>): specifies that rows are sampled based on a specified ratio (M/N). The number of rows that are sampled is calculated by using the following formula: partition_row_count × M/N. partition_row_count specifies the number of rows in a partition.

  • Parameters

    • N: required. A constant of the BIGINT type. If N is set to null, null is returned.

    • M: required. A constant of the BIGINT type. If M is set to null, null is returned.

    • partition_clause: optional. For more information, see windowing_definition.

  • Return value

    A value of the BOOLEAN type is returned.

  • Examples

    Sample about 20% of data entries in each group. Sample statement:

    select deptno, sal
        from (
            select deptno, sal, cluster_sample(5, 1) over (partition by deptno) as flag
            from emp
            ) sub
        where flag = true;

    The following result is returned:

    +------------+------------+
    | deptno     | sal        |
    +------------+------------+
    | 10         | 1300       |
    | 20         | 3000       |
    | 30         | 950        |
    +------------+------------+

COUNT

  • Syntax

    bigint count(*) over ([partition_clause] [orderby_clause] [frame_clause])
    bigint count([distinct] <expr>[,...]) over ([partition_clause] [orderby_clause] [frame_clause])
  • Description

    • count(*): the total number of rows to return.

    • count([distinct] <expr>[,...]): When you calculate the number of rows, the rows whose expr is null are ignored. If multiple expr parameters exist, any rows whose expr is null are ignored. In addition, if the distinct keyword is specified, the number of rows after deduplication is calculated. Any rows whose expr is null are ignored.

  • Parameters

    • expr: required. This parameter specifies the column whose values you want to count. All data types are supported. If the value for a row is null, this row is not used for the calculation. If the DISTINCT keyword is specified, the count value of distinct values is obtained.

    • partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the BIGINT type is returned.

  • Examples

    • Example 1: Use the sal column to define a window. The order by clause is not specified. This function returns the cumulative count from the first row to the last row in the current window. The current window includes the rows that have the same sal value. Sample statement:

      select sal, count(sal) over (partition by sal) as count from emp;  

      The following result is returned:

      +------------+------------+
      | sal        | count      |
      +------------+------------+
      | 800        | 1          |
      | 950        | 1          |
      | 1100       | 1          |
      | 1250       | 2          |   -- This row is the first row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value. 
      | 1250       | 2          |   -- The return value is the cumulative count from the first row to the second row in the current window. 
      | 1300       | 2          |
      | 1300       | 2          |
      | 1500       | 1          |
      | 1600       | 1          |
      | 2450       | 2          |
      | 2450       | 2          |
      | 2850       | 1          |
      | 2975       | 1          |
      | 3000       | 2          |
      | 3000       | 2          |
      | 5000       | 2          |
      | 5000       | 2          |
      +------------+------------+
    • Example 2: Disable the Hive-compatible data type edition and use the sal column to define a window. ORDER BY is specified. This function returns the cumulative count from the first row to the current row in the current window. The current window includes the rows that have the same sal value. Sample statement:

      -- Disable the Hive-compatible data type edition. 
      set odps.sql.hive.compatible=false;
      -- Execute the following statement: 
      select sal, count(sal) over (partition by sal order by sal) as count from emp;  

      The following result is returned:

      +------------+------------+
      | sal        | count      |
      +------------+------------+
      | 800        | 1          |
      | 950        | 1          |
      | 1100       | 1          |
      | 1250       | 1          |   -- This row is the first row of this window. The cumulative count for the first row is 1. 
      | 1250       | 2          |   -- The cumulative count for the second row is 2. 
      | 1300       | 1          |
      | 1300       | 2          |
      | 1500       | 1          |
      | 1600       | 1          |
      | 2450       | 1          |
      | 2450       | 2          |
      | 2850       | 1          |
      | 2975       | 1          |
      | 3000       | 1          |
      | 3000       | 2          |
      | 5000       | 1          |
      | 5000       | 2          |
      +------------+------------+
    • Example 3: Enable the Hive-compatible data type edition and use the sal column to define a window. ORDER BY is specified. This function returns the cumulative count from the first row to the last row in the current window. The current window includes the rows that have the same sal value. Sample statement:

      -- Enable the Hive-compatible data type edition. 
      set odps.sql.hive.compatible=true;
      -- Execute the following statement: 
      select sal, count(sal) over (partition by sal order by sal) as count from emp; 

      The following result is returned:

      +------------+------------+
      | sal        | count      |
      +------------+------------+
      | 800        | 1          |
      | 950        | 1          |
      | 1100       | 1          |
      | 1250       | 2          |   -- This row is the first row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value. 
      | 1250       | 2          |   -- The return value is the cumulative count from the first row to the second row in the current window. 
      | 1300       | 2          |
      | 1300       | 2          |
      | 1500       | 1          |
      | 1600       | 1          |
      | 2450       | 2          |
      | 2450       | 2          |
      | 2850       | 1          |
      | 2975       | 1          |
      | 3000       | 2          |
      | 3000       | 2          |
      | 5000       | 2          |
      | 5000       | 2          |
      +------------+------------+

CUME_DIST

  • Syntax

    double cume_dist() over([partition_clause] [orderby_clause])
  • Description

    Calculates the cumulative distribution. The cumulative distribution indicates the ratio of rows whose values are greater than or equal to the values of the current row to all rows in a partition. The ratio is determined based on orderby_clause.

  • Parameters

    partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the DOUBLE type is returned. The return value is calculated by using the following formula: row_number_of_last_peer/partition_row_count. row_number_of_last_peer indicates the value returned by ROW_NUMBER that corresponds to the last row of the group to which the current row belongs. partition_row_count indicates the number of rows in a partition to which the current row belongs.

  • Examples

    Group all employees based on the deptno column and calculate the cumulative distribution of employees in each group by salary. Sample statement:

    select deptno, ename, sal, concat(round(cume_dist() over (partition by deptno order by sal desc)*100,2),'%') as cume_dist from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | cume_dist  |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 33.33%     |
    | 10         | KING       | 5000       | 33.33%     |
    | 10         | CLARK      | 2450       | 66.67%     |
    | 10         | WELAN      | 2450       | 66.67%     |
    | 10         | TEBAGE     | 1300       | 100.0%     |
    | 10         | MILLER     | 1300       | 100.0%     |
    | 20         | SCOTT      | 3000       | 40.0%      |
    | 20         | FORD       | 3000       | 40.0%      |
    | 20         | JONES      | 2975       | 60.0%      |
    | 20         | ADAMS      | 1100       | 80.0%      |
    | 20         | SMITH      | 800        | 100.0%     |
    | 30         | BLAKE      | 2850       | 16.67%     |
    | 30         | ALLEN      | 1600       | 33.33%     |
    | 30         | TURNER     | 1500       | 50.0%      |
    | 30         | MARTIN     | 1250       | 83.33%     |
    | 30         | WARD       | 1250       | 83.33%     |
    | 30         | JAMES      | 950        | 100.0%     |
    +------------+------------+------------+------------+

DENSE_RANK

  • Syntax

    bigint dense_rank() over ([partition_clause] [orderby_clause])
  • Description

    Returns the rank of the current row in a partition based on the order specified by orderby_clause. The rank counts from 1. In a partition, rows with the same value of the column that is specified in order by have the same rank. The rank increases by 1 each time the value of the column that is specified in order by changes.

  • Parameters

    partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the BIGINT type is returned. If orderby_clause is not specified, the values in the returned results are all 1.

  • Examples

    Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the ordinal numbers of the employees in their own groups. Sample statement:

    select deptno, ename, sal, dense_rank() over (partition by deptno order by sal desc) as nums from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | nums       |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 1          |
    | 10         | KING       | 5000       | 1          |
    | 10         | CLARK      | 2450       | 2          |
    | 10         | WELAN      | 2450       | 2          |
    | 10         | TEBAGE     | 1300       | 3          |
    | 10         | MILLER     | 1300       | 3          |
    | 20         | SCOTT      | 3000       | 1          |
    | 20         | FORD       | 3000       | 1          |
    | 20         | JONES      | 2975       | 2          |
    | 20         | ADAMS      | 1100       | 3          |
    | 20         | SMITH      | 800        | 4          |
    | 30         | BLAKE      | 2850       | 1          |
    | 30         | ALLEN      | 1600       | 2          |
    | 30         | TURNER     | 1500       | 3          |
    | 30         | MARTIN     | 1250       | 4          |
    | 30         | WARD       | 1250       | 4          |
    | 30         | JAMES      | 950        | 5          |
    +------------+------------+------------+------------+

FIRST_VALUE

  • Syntax

    first_value(<expr>[, <ignore_nulls>]) over ([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the value of expr that corresponds to the first row of a window.

  • Parameters

    • expr: required. The expression that is used to calculate the returned result.

    • ignore_nulls: optional. A value of the BOOLEAN type. This parameter specifies whether to ignore null values. Default value: false. If this parameter is set to True, a non-null value of expr that corresponds to the first row of a window is returned.

    • partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the same data type as expr is returned.

  • Examples

    Group all employees by department and return the first row of data in each group. Sample statement:

    • order by is not specified.

      select deptno, ename, sal, first_value(sal) over (partition by deptno) as first_value from emp;

      The following result is returned:

      +------------+------------+------------+-------------+
      | deptno     | ename      | sal        | first_value |
      +------------+------------+------------+-------------+
      | 10         | TEBAGE     | 1300       | 1300        |   -- This row is the first row of this window. 
      | 10         | CLARK      | 2450       | 1300        |
      | 10         | KING       | 5000       | 1300        |
      | 10         | MILLER     | 1300       | 1300        |
      | 10         | JACCKA     | 5000       | 1300        |
      | 10         | WELAN      | 2450       | 1300        |
      | 20         | FORD       | 3000       | 3000        |   -- This row is the first row of this window. 
      | 20         | SCOTT      | 3000       | 3000        |
      | 20         | SMITH      | 800        | 3000        |
      | 20         | ADAMS      | 1100       | 3000        |
      | 20         | JONES      | 2975       | 3000        |
      | 30         | TURNER     | 1500       | 1500        |   -- This row is the first row of this window. 
      | 30         | JAMES      | 950        | 1500        |
      | 30         | ALLEN      | 1600       | 1500        |
      | 30         | WARD       | 1250       | 1500        |
      | 30         | MARTIN     | 1250       | 1500        |
      | 30         | BLAKE      | 2850       | 1500        |
      +------------+------------+------------+-------------+
    • order by is specified.

      select deptno, ename, sal, first_value(sal) over (partition by deptno order by sal desc) as first_value from emp;

      The following result is returned:

      +------------+------------+------------+-------------+
      | deptno     | ename      | sal        | first_value |
      +------------+------------+------------+-------------+
      | 10         | JACCKA     | 5000       | 5000        |   -- This row is the first row of this window. 
      | 10         | KING       | 5000       | 5000        |
      | 10         | CLARK      | 2450       | 5000        |
      | 10         | WELAN      | 2450       | 5000        |
      | 10         | TEBAGE     | 1300       | 5000        |
      | 10         | MILLER     | 1300       | 5000        |
      | 20         | SCOTT      | 3000       | 3000        |   -- This row is the first row of this window. 
      | 20         | FORD       | 3000       | 3000        |
      | 20         | JONES      | 2975       | 3000        |
      | 20         | ADAMS      | 1100       | 3000        |
      | 20         | SMITH      | 800        | 3000        |
      | 30         | BLAKE      | 2850       | 2850        |   -- This row is the first row of this window. 
      | 30         | ALLEN      | 1600       | 2850        |
      | 30         | TURNER     | 1500       | 2850        |
      | 30         | MARTIN     | 1250       | 2850        |
      | 30         | WARD       | 1250       | 2850        |
      | 30         | JAMES      | 950        | 2850        |
      +------------+------------+------------+-------------+

LAG

  • Syntax

    lag(<expr>[,bigint <offset>[, <default>]]) over([partition_clause] orderby_clause)
  • Description

    Returns the value of expr that precedes the current row at a given offset. The expression expr can be a column, a column operation, or a function operation.

  • Parameters

    • expr: required. The expression that is used to calculate the returned result.

    • offset: optional. The value is a constant of the BIGINT type and must be greater than or equal to 0. The value 0 indicates the current row, and the value 1 indicates the previous row. Default value: 1. If the input value is of the STRING or DOUBLE type, it is implicitly converted into a value of the BIGINT type before calculation.

    • default: optional. The default value when the value of offset is out of the valid range. The value of this parameter must be a constant. The default value of this parameter is null. The value of this parameter must be of the same data type as the value of expr. If the value of expr is not a constant, the parameter value is determined based on the current row.

    • partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the same data type as expr is returned.

  • Examples

    Group all employees based on the deptno column and calculate the value of sal for each employee at a given offset. Sample statement:

    select deptno, ename, sal, lag(sal, 1) over (partition by deptno order by sal) as sal_new from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | sal_new    |
    +------------+------------+------------+------------+
    | 10         | TEBAGE     | 1300       | NULL       |
    | 10         | MILLER     | 1300       | 1300       |
    | 10         | CLARK      | 2450       | 1300       |
    | 10         | WELAN      | 2450       | 2450       |
    | 10         | KING       | 5000       | 2450       |
    | 10         | JACCKA     | 5000       | 5000       |
    | 20         | SMITH      | 800        | NULL       |
    | 20         | ADAMS      | 1100       | 800        |
    | 20         | JONES      | 2975       | 1100       |
    | 20         | SCOTT      | 3000       | 2975       |
    | 20         | FORD       | 3000       | 3000       |
    | 30         | JAMES      | 950        | NULL       |
    | 30         | MARTIN     | 1250       | 950        |
    | 30         | WARD       | 1250       | 1250       |
    | 30         | TURNER     | 1500       | 1250       |
    | 30         | ALLEN      | 1600       | 1500       |
    | 30         | BLAKE      | 2850       | 1600       |
    +------------+------------+------------+------------+

LAST_VALUE

  • Syntax

    last_value(<expr>[, <ignore_nulls>]) over([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the calculated result of the last row of data in a window using the expression that is specified by expr.

  • Parameters

    • expr: required. The expression that is used to calculate the returned result.

    • ignore_nulls: optional. A value of the BOOLEAN type. This parameter specifies whether to ignore null values. Default value: false. If this parameter is set to true, a non-null value of expr that corresponds to the last row of a window is returned.

    • partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the same data type as expr is returned.

  • Examples

    Group all employees by department and return the last row of data in each group. Sample statement:

    • If order by is not specified, the rows from the first row to the last row belong to the current window. The value of the last row in the current window is returned.

      select deptno, ename, sal, last_value(sal) over (partition by deptno) as last_value from emp;

      The following result is returned:

      +------------+------------+------------+-------------+
      | deptno     | ename      | sal        | last_value |
      +------------+------------+------------+-------------+
      | 10         | TEBAGE     | 1300       | 2450        |
      | 10         | CLARK      | 2450       | 2450        |
      | 10         | KING       | 5000       | 2450        |
      | 10         | MILLER     | 1300       | 2450        |
      | 10         | JACCKA     | 5000       | 2450        |   
      | 10         | WELAN      | 2450       | 2450        |   -- This row is the last row of this window. 
      | 20         | FORD       | 3000       | 2975        |
      | 20         | SCOTT      | 3000       | 2975        |
      | 20         | SMITH      | 800        | 2975        |
      | 20         | ADAMS      | 1100       | 2975        |
      | 20         | JONES      | 2975       | 2975        |   -- This row is the last row of this window. 
      | 30         | TURNER     | 1500       | 2850        |
      | 30         | JAMES      | 950        | 2850        |
      | 30         | ALLEN      | 1600       | 2850        |
      | 30         | WARD       | 1250       | 2850        |
      | 30         | MARTIN     | 1250       | 2850        |
      | 30         | BLAKE      | 2850       | 2850        |   -- This row is the last row of the current window. 
      +------------+------------+------------+-------------+
    • If order by is specified, the rows from the first row to the current row belong to the current window. The value of the current row in the current window is returned.

      select deptno, ename, sal, last_value(sal) over (partition by deptno order by sal desc) as last_value from emp;

      The following result is returned:

      +------------+------------+------------+-------------+
      | deptno     | ename      | sal        | last_value |
      +------------+------------+------------+-------------+
      | 10         | JACCKA     | 5000       | 5000        |   -- This row is the current row of this window. 
      | 10         | KING       | 5000       | 5000        |   -- This row is the current row of this window. 
      | 10         | CLARK      | 2450       | 2450        |   -- This row is the current row of this window. 
      | 10         | WELAN      | 2450       | 2450        |   -- This row is the current row of this window. 
      | 10         | TEBAGE     | 1300       | 1300        |   -- This row is the current row of this window. 
      | 10         | MILLER     | 1300       | 1300        |   -- This row is the current row of this window. 
      | 20         | SCOTT      | 3000       | 3000        |   -- This row is the current row of this window. 
      | 20         | FORD       | 3000       | 3000        |   -- This row is the current row of this window. 
      | 20         | JONES      | 2975       | 2975        |   -- This row is the current row of this window. 
      | 20         | ADAMS      | 1100       | 1100        |   -- This row is the current row of this window. 
      | 20         | SMITH      | 800        | 800         |   -- This row is the current row of this window. 
      | 30         | BLAKE      | 2850       | 2850        |   -- This row is the current row of this window. 
      | 30         | ALLEN      | 1600       | 1600        |   -- This row is the current row of this window. 
      | 30         | TURNER     | 1500       | 1500        |   -- This row is the current row of this window. 
      | 30         | MARTIN     | 1250       | 1250        |   -- This row is the current row of this window. 
      | 30         | WARD       | 1250       | 1250        |   -- This row is the current row of this window. 
      | 30         | JAMES      | 950        | 950         |   -- This row is the current row of this window. 
      +------------+------------+------------+-------------+

LEAD

  • Syntax

    lead(<expr>[, bigint <offset>[, <default>]]) over([partition_clause] orderby_clause)
  • Description

    Returns the value of expr that corresponds to the Nth row following the current row at a given offset. The expression expr can be a column, a column operation, or a function operation.

  • Parameters

    • expr: required. The expression that is used to calculate the returned result.

    • offset: optional. The value is a constant of the BIGINT type and must be greater than or equal to 0. The value 0 indicates the current row, and the value 1 indicates the next row. Default value: 1. If the input value is of the STRING or DOUBLE type, it is implicitly converted into a value of the BIGINT type before calculation.

    • default: optional. The default value when the value of offset is out of the valid range. The value of this parameter must be a constant. The default value of this parameter is null. The value of this parameter must be of the same data type as the value of expr. If the value of expr is not a constant, the parameter value is determined based on the current row.

    • partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the same data type as expr is returned.

  • Examples

    Group all employees based on the deptno column and calculate the value of sal for each employee at a given offset. Sample statement:

    select deptno, ename, sal, lead(sal, 1) over (partition by deptno order by sal) as sal_new from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | sal_new    |
    +------------+------------+------------+------------+
    | 10         | TEBAGE     | 1300       | 1300       |
    | 10         | MILLER     | 1300       | 2450       |
    | 10         | CLARK      | 2450       | 2450       |
    | 10         | WELAN      | 2450       | 5000       |
    | 10         | KING       | 5000       | 5000       |
    | 10         | JACCKA     | 5000       | NULL       |
    | 20         | SMITH      | 800        | 1100       |
    | 20         | ADAMS      | 1100       | 2975       |
    | 20         | JONES      | 2975       | 3000       |
    | 20         | SCOTT      | 3000       | 3000       |
    | 20         | FORD       | 3000       | NULL       |
    | 30         | JAMES      | 950        | 1250       |
    | 30         | MARTIN     | 1250       | 1250       |
    | 30         | WARD       | 1250       | 1500       |
    | 30         | TURNER     | 1500       | 1600       |
    | 30         | ALLEN      | 1600       | 2850       |
    | 30         | BLAKE      | 2850       | NULL       |
    +------------+------------+------------+------------+

MAX

  • Syntax

    max(<expr>) over([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the maximum value of expr in a window.

  • Parameters

    • expr: required. The expression that is used to calculate the maximum value. The input value can be of any data type other than BOOLEAN. If the value for a row is null, this row is not used for calculation.

    • partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the same type as expr is returned.

  • Examples

    • Example 1: Use the deptno column to define a window and obtain the maximum value of the sal column. The order by clause is not specified. This function returns the maximum value of the current window. The current window includes the rows that have the same deptno value. Sample statement:

      select deptno, sal, max(sal) over (partition by deptno) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 5000       |   -- This row is the first row of this window. The return value is the maximum value among the values from the first row to the sixth row. 
      | 10         | 2450       | 5000       |   -- The return value is the maximum value among the values from the first row to the sixth row. 
      | 10         | 5000       | 5000       |   -- The return value is the maximum value among the values from the first row to the sixth row. 
      | 10         | 1300       | 5000       |
      | 10         | 5000       | 5000       |
      | 10         | 2450       | 5000       |
      | 20         | 3000       | 3000       |
      | 20         | 3000       | 3000       |
      | 20         | 800        | 3000       |
      | 20         | 1100       | 3000       |
      | 20         | 2975       | 3000       |
      | 30         | 1500       | 2850       |
      | 30         | 950        | 2850       |
      | 30         | 1600       | 2850       |
      | 30         | 1250       | 2850       |
      | 30         | 1250       | 2850       |
      | 30         | 2850       | 2850       |
      +------------+------------+------------+
    • Example 2: Use the deptno column to define a window and obtain the maximum value of the sal column. The order by clause is specified. This function returns the maximum value among the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statement:

      select deptno, sal, max(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300       |   -- This row is the first row of this window. 
      | 10         | 1300       | 1300       |   -- The return value is the maximum value among the values in the first and second rows. 
      | 10         | 2450       | 2450       |   -- The return value is the maximum value among the values from the first row to the third row. 
      | 10         | 2450       | 2450       |   -- The return value is the maximum value among the values from the first row to the fourth row. 
      | 10         | 5000       | 5000       |
      | 10         | 5000       | 5000       |
      | 20         | 800        | 800        |
      | 20         | 1100       | 1100       |
      | 20         | 2975       | 2975       |
      | 20         | 3000       | 3000       |
      | 20         | 3000       | 3000       |
      | 30         | 950        | 950        |
      | 30         | 1250       | 1250       |
      | 30         | 1250       | 1250       |
      | 30         | 1500       | 1500       |
      | 30         | 1600       | 1600       |
      | 30         | 2850       | 2850       |
      +------------+------------+------------+

MEDIAN

  • Syntax

    median(<expr>) over ([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the median of expr in a window.

  • Parameters

    • expr: required. The expression that is used to calculate the median. A value of the DOUBLE or DECIMAL type. The value of this parameter must be 1 to 255 digits in length.

      • If the input value is of the STRING or BIGINT type, the value is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.

      • If the input value is null, null is returned.

    • partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the DOUBLE or DECIMAL type is returned. If the values of all expressions that are specified by expr are null, null is returned.

  • Examples

    Use the deptno column to define a window and calculate the median value of the sal column. This function returns the median value of the current window. The current window includes the rows that have the same deptno value. Sample statement:

    select deptno, sal, median(sal) over (partition by deptno) from emp;

    The following result is returned:

    +------------+------------+------------+
    | deptno     | sal        | _c2        |
    +------------+------------+------------+
    | 10         | 1300       | 2450.0     |   -- This row is the first row of this window. The return value is the median value of the values from the first row to the sixth row. 
    | 10         | 2450       | 2450.0     |
    | 10         | 5000       | 2450.0     |
    | 10         | 1300       | 2450.0     |
    | 10         | 5000       | 2450.0     |
    | 10         | 2450       | 2450.0     |
    | 20         | 3000       | 2975.0     |
    | 20         | 3000       | 2975.0     |
    | 20         | 800        | 2975.0     |
    | 20         | 1100       | 2975.0     |
    | 20         | 2975       | 2975.0     |
    | 30         | 1500       | 1375.0     |
    | 30         | 950        | 1375.0     |
    | 30         | 1600       | 1375.0     |
    | 30         | 1250       | 1375.0     |
    | 30         | 1250       | 1375.0     |
    | 30         | 2850       | 1375.0     |
    +------------+------------+------------+

MIN

  • Syntax

    min(<expr>) over([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the minimum value of expr in a window.

  • Parameters

    • expr: required. The expression that is used to calculate the minimum value. The input value can be of any data type other than BOOLEAN. If the value for a row is null, this row is not used for calculation.

    • partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the same data type as expr is returned.

  • Examples

    • Example 1: Use the deptno column to define a window and obtain the minimum value of the sal column. The order by clause is not specified. This function returns the minimum value of the current window. The current window includes the rows that have the same deptno value. Sample statement:

      select deptno, sal, min(sal) over (partition by deptno) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300       |   -- This row is the first row of this window. The return value is the minimum value among the values from the first row to the sixth row. 
      | 10         | 2450       | 1300       |   -- The return value is the minimum value among the values from the first row to the sixth row. 
      | 10         | 5000       | 1300       |   -- The return value is the minimum value among the values from the first row to the sixth row. 
      | 10         | 1300       | 1300       |
      | 10         | 5000       | 1300       |
      | 10         | 2450       | 1300       |
      | 20         | 3000       | 800        |
      | 20         | 3000       | 800        |
      | 20         | 800        | 800        |
      | 20         | 1100       | 800        |
      | 20         | 2975       | 800        |
      | 30         | 1500       | 950        |
      | 30         | 950        | 950        |
      | 30         | 1600       | 950        |
      | 30         | 1250       | 950        |
      | 30         | 1250       | 950        |
      | 30         | 2850       | 950        |
      +------------+------------+------------+
    • Example 2: Use the deptno column to define a window and obtain the minimum value of the sal column. The order by clause is specified. This function returns the minimum value among the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statement:

      select deptno, sal, min(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300       |   -- This row is the first row of this window. 
      | 10         | 1300       | 1300       |   -- The return value is the minimum value among the values in the first and second rows. 
      | 10         | 2450       | 1300       |   -- The return value is the minimum value among the values from the first row to the third row. 
      | 10         | 2450       | 1300       |
      | 10         | 5000       | 1300       |
      | 10         | 5000       | 1300       |
      | 20         | 800        | 800        |
      | 20         | 1100       | 800        |
      | 20         | 2975       | 800        |
      | 20         | 3000       | 800        |
      | 20         | 3000       | 800        |
      | 30         | 950        | 950        |
      | 30         | 1250       | 950        |
      | 30         | 1250       | 950        |
      | 30         | 1500       | 950        |
      | 30         | 1600       | 950        |
      | 30         | 2850       | 950        |
      +------------+------------+------------+

NTILE

  • Syntax

    bigint ntile(bigint <N>) over ([partition_clause] [orderby_clause])
  • Description

    Splits rows of data in a partition into N groups of equal size and returns the number of the group to which the specified row belongs. If data in the partition cannot be split into N groups of equal size, one more row is preferentially allocated to the first M groups.

  • Parameters

    • N: required. This parameter specifies the number of splits. A value of the BIGINT type.

    • partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the BIGINT type is returned.

  • Examples

    Divide all employees into three groups based on the sal column in descending order and obtain the number of the group to which each employee belongs. Sample statement:

    select deptno, ename, sal, ntile(3) over (partition by deptno order by sal desc) as nt3 from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | nt3        |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 1          |
    | 10         | KING       | 5000       | 1          |
    | 10         | CLARK      | 2450       | 2          |
    | 10         | WELAN      | 2450       | 2          |
    | 10         | TEBAGE     | 1300       | 3          |
    | 10         | MILLER     | 1300       | 3          |
    | 20         | SCOTT      | 3000       | 1          |
    | 20         | FORD       | 3000       | 1          |
    | 20         | JONES      | 2975       | 2          |
    | 20         | ADAMS      | 1100       | 2          |
    | 20         | SMITH      | 800        | 3          |
    | 30         | BLAKE      | 2850       | 1          |
    | 30         | ALLEN      | 1600       | 1          |
    | 30         | TURNER     | 1500       | 2          |
    | 30         | MARTIN     | 1250       | 2          |
    | 30         | WARD       | 1250       | 3          |
    | 30         | JAMES      | 950        | 3          |
    +------------+------------+------------+------------+

NTH_VALUE

  • Syntax

    nth_value(<expr>, <number> [, <ignore_nulls>]) over ([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the value of expr that corresponds to the Nth row in a window.

  • Parameters

    • expr: required. The expression that is used to calculate the returned result.

    • number: required. A value of the BIGINT type. The value must be an integer greater than or equal to 1. If the input value is 1, this function is equivalent to FIRST_VALUE.

    • ignore_nulls: optional. A value of the BOOLEAN type. This parameter specifies whether to ignore null values. Default value: false. If this parameter is set to True, a non-null value of expr that corresponds to the Nth row of a window is returned.

    • partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the same data type as expr is returned.

  • Examples

    Group all employees by department and return the sixth row of data in each group. Sample statement:

    • If order by is not specified, the rows from the first row to the last row belong to the current window. The value of the sixth row in the current window is returned. Sample statement:

      select deptno, ename, sal, nth_value(sal,6) over (partition by deptno) as nth_value from emp;

      The following result is returned:

      +------------+------------+------------+------------+
      | deptno     | ename      | sal        | nth_value  |
      +------------+------------+------------+------------+
      | 10         | TEBAGE     | 1300       | 2450       |
      | 10         | CLARK      | 2450       | 2450       |
      | 10         | KING       | 5000       | 2450       |
      | 10         | MILLER     | 1300       | 2450       |
      | 10         | JACCKA     | 5000       | 2450       |
      | 10         | WELAN      | 2450       | 2450       |   -- This row is the sixth row of this window. 
      | 20         | FORD       | 3000       | NULL       |
      | 20         | SCOTT      | 3000       | NULL       |
      | 20         | SMITH      | 800        | NULL       |
      | 20         | ADAMS      | 1100       | NULL       |
      | 20         | JONES      | 2975       | NULL       |   -- This current window has less than six rows, and null is returned. 
      | 30         | TURNER     | 1500       | 2850       |
      | 30         | JAMES      | 950        | 2850       |
      | 30         | ALLEN      | 1600       | 2850       |
      | 30         | WARD       | 1250       | 2850       |
      | 30         | MARTIN     | 1250       | 2850       |
      | 30         | BLAKE      | 2850       | 2850       |   -- This row is the sixth row of the current window. 
      +------------+------------+------------+------------+
    • If order by is specified, the rows from the first row to the current row belong to the current window. The value of the sixth row in the current window is returned.

      select deptno, ename, sal, nth_value(sal,6) over (partition by deptno order by sal) as nth_value from emp;

      The following result is returned:

      +------------+------------+------------+------------+
      | deptno     | ename      | sal        | nth_value  |
      +------------+------------+------------+------------+
      | 10         | TEBAGE     | 1300       | NULL       |   
      | 10         | MILLER     | 1300       | NULL       |   -- This window has only two rows, and null is returned. 
      | 10         | CLARK      | 2450       | NULL       |
      | 10         | WELAN      | 2450       | NULL       |
      | 10         | KING       | 5000       | 5000       |  
      | 10         | JACCKA     | 5000       | 5000       |
      | 20         | SMITH      | 800        | NULL       |
      | 20         | ADAMS      | 1100       | NULL       |
      | 20         | JONES      | 2975       | NULL       |
      | 20         | SCOTT      | 3000       | NULL       |
      | 20         | FORD       | 3000       | NULL       |
      | 30         | JAMES      | 950        | NULL       |
      | 30         | MARTIN     | 1250       | NULL       |
      | 30         | WARD       | 1250       | NULL       |
      | 30         | TURNER     | 1500       | NULL       |
      | 30         | ALLEN      | 1600       | NULL       |
      | 30         | BLAKE      | 2850       | 2850       |
      +------------+------------+------------+------------+

PERCENT_RANK

  • Syntax

    double percent_rank() over([partition_clause] [orderby_clause])
  • Description

    Calculates the percentile rank of the current row in a partition based on orderby_clause.

  • Parameters

    partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the DOUBLE type is returned. The valid value range is [0.0, 1.0]. The return value is calculated by using the following formula: "(rank - 1)/(partition_row_count - 1)". rank indicates the return value of the RANK function that corresponds to the current row. partition_row_count indicates the number of rows in the partition to which the current row belongs. If the partition contains only one row of data, 0.0 is returned.

  • Examples

    Calculate the percentile rank of each employee in a group based on the sal column. Sample statement:

    select deptno, ename, sal, percent_rank() over (partition by deptno order by sal desc) as sal_new from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | sal_new    |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 0.0        |
    | 10         | KING       | 5000       | 0.0        |
    | 10         | CLARK      | 2450       | 0.4        |
    | 10         | WELAN      | 2450       | 0.4        |
    | 10         | TEBAGE     | 1300       | 0.8        |
    | 10         | MILLER     | 1300       | 0.8        |
    | 20         | SCOTT      | 3000       | 0.0        |
    | 20         | FORD       | 3000       | 0.0        |
    | 20         | JONES      | 2975       | 0.5        |
    | 20         | ADAMS      | 1100       | 0.75       |
    | 20         | SMITH      | 800        | 1.0        |
    | 30         | BLAKE      | 2850       | 0.0        |
    | 30         | ALLEN      | 1600       | 0.2        |
    | 30         | TURNER     | 1500       | 0.4        |
    | 30         | MARTIN     | 1250       | 0.6        |
    | 30         | WARD       | 1250       | 0.6        |
    | 30         | JAMES      | 950        | 1.0        |
    +------------+------------+------------+------------+

RANK

  • Syntax

    bigint rank() over ([partition_clause] [orderby_clause])
  • Description

    Returns the rank of the current row in a partition based on the order specified by orderby_clause. The rank counts from 1.

  • Parameters

    partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the BIGINT type is returned. The return value may be duplicate and inconsecutive. The return value is the sequence number of the first row in the group to which the current row belongs. The sequence number of the first row is calculated by using the ROW_NUMBER() function. If orderby_clause is not specified, the values in the returned results are all 1.

  • Examples

    Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the ordinal numbers of the employees in their own groups. Sample statement:

    select deptno, ename, sal, rank() over (partition by deptno order by sal desc) as nums from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | nums       |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 1          |
    | 10         | KING       | 5000       | 1          |
    | 10         | CLARK      | 2450       | 3          |
    | 10         | WELAN      | 2450       | 3          |
    | 10         | TEBAGE     | 1300       | 5          |
    | 10         | MILLER     | 1300       | 5          |
    | 20         | SCOTT      | 3000       | 1          |
    | 20         | FORD       | 3000       | 1          |
    | 20         | JONES      | 2975       | 3          |
    | 20         | ADAMS      | 1100       | 4          |
    | 20         | SMITH      | 800        | 5          |
    | 30         | BLAKE      | 2850       | 1          |
    | 30         | ALLEN      | 1600       | 2          |
    | 30         | TURNER     | 1500       | 3          |
    | 30         | MARTIN     | 1250       | 4          |
    | 30         | WARD       | 1250       | 4          |
    | 30         | JAMES      | 950        | 6          |
    +------------+------------+------------+------------+

ROW_NUMBER

  • Syntax

    row_number() over([partition_clause] [orderby_clause])
  • Description

    Returns the sequence number of the current row in a partition. The sequence number counts from 1.

  • Parameters

    For more information, see windowing_definition. frame_clause is not supported.

  • Return value

    A value of the BIGINT type is returned.

  • Examples

    Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the sequence numbers of the employees in their respective groups. Sample statement:

    select deptno, ename, sal, row_number() over (partition by deptno order by sal desc) as nums from emp;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | nums       |
    +------------+------------+------------+------------+
    | 10         | JACCKA     | 5000       | 1          |
    | 10         | KING       | 5000       | 2          |
    | 10         | CLARK      | 2450       | 3          |
    | 10         | WELAN      | 2450       | 4          |
    | 10         | TEBAGE     | 1300       | 5          |
    | 10         | MILLER     | 1300       | 6          |
    | 20         | SCOTT      | 3000       | 1          |
    | 20         | FORD       | 3000       | 2          |
    | 20         | JONES      | 2975       | 3          |
    | 20         | ADAMS      | 1100       | 4          |
    | 20         | SMITH      | 800        | 5          |
    | 30         | BLAKE      | 2850       | 1          |
    | 30         | ALLEN      | 1600       | 2          |
    | 30         | TURNER     | 1500       | 3          |
    | 30         | MARTIN     | 1250       | 4          |
    | 30         | WARD       | 1250       | 5          |
    | 30         | JAMES      | 950        | 6          |
    +------------+------------+------------+------------+

STDDEV

  • Syntax

    double stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
    decimal stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the population standard deviation of all input values. This function is also called STDDEV_POP.

  • Parameters

    • expr: required. The expression that is used to calculate the population standard deviation. A value of the DOUBLE or DECIMAL type.

      • If an input value is of the STRING or BIGINT type, the value is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.

      • If the input value is null, the row that contains the value is not used for calculation.

      • If the distinct keyword is specified, the population standard deviation of distinct values is calculated.

    • partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the same data type as expr is returned. If the values of all expressions that are specified by expr are null, null is returned.

  • Examples

    • Example 1: Use the deptno column to define a window and calculate the population standard deviation of the sal column. The order by clause is not specified. This function returns the cumulative population standard deviation of the current window. The current window includes the rows that have the same deptno value. Sample statement:

      select deptno, sal, stddev(sal) over (partition by deptno) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1546.1421524412158 |   -- This row is the first row of this window. The return value is the cumulative population standard deviation of the values from the first row to the sixth row. 
      | 10         | 2450       | 1546.1421524412158 |   -- The return value is the cumulative population standard deviation of the values from the first row to the sixth row. 
      | 10         | 5000       | 1546.1421524412158 |
      | 10         | 1300       | 1546.1421524412158 |
      | 10         | 5000       | 1546.1421524412158 |
      | 10         | 2450       | 1546.1421524412158 |
      | 20         | 3000       | 1004.7387720198718 |
      | 20         | 3000       | 1004.7387720198718 |
      | 20         | 800        | 1004.7387720198718 |
      | 20         | 1100       | 1004.7387720198718 |
      | 20         | 2975       | 1004.7387720198718 |
      | 30         | 1500       | 610.1001739241042 |
      | 30         | 950        | 610.1001739241042 |
      | 30         | 1600       | 610.1001739241042 |
      | 30         | 1250       | 610.1001739241042 |
      | 30         | 1250       | 610.1001739241042 |
      | 30         | 2850       | 610.1001739241042 |
      +------------+------------+------------+
    • Example 2: Disable the Hive-compatible data type edition and use the deptno column to define a window and calculate the population standard deviation of the sal column. The ORDER BY clause is specified. This function returns the cumulative population standard deviation of the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statement:

      -- Disable the Hive-compatible data type edition. 
      set odps.sql.hive.compatible=false;
      -- Execute the following statement: 
      select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 0.0        |           -- This row is the first row of this window. 
      | 10         | 1300       | 0.0        |           -- The return value is the cumulative population standard deviation of the values in the first and second rows. 
      | 10         | 2450       | 542.1151989096865 |    -- The return value is the cumulative population standard deviation of the values from the first row to the third row. 
      | 10         | 2450       | 575.0      |           -- The return value is the cumulative population standard deviation of the values from the first row to the fourth row. 
      | 10         | 5000       | 1351.6656391282572 |
      | 10         | 5000       | 1546.1421524412158 |
      | 20         | 800        | 0.0        |
      | 20         | 1100       | 150.0      |
      | 20         | 2975       | 962.4188277460079 |
      | 20         | 3000       | 1024.2947268730811 |
      | 20         | 3000       | 1004.7387720198718 |
      | 30         | 950        | 0.0        |
      | 30         | 1250       | 150.0      |
      | 30         | 1250       | 141.4213562373095 |
      | 30         | 1500       | 194.8557158514987 |
      | 30         | 1600       | 226.71568097509268 |
      | 30         | 2850       | 610.1001739241042 |
      +------------+------------+------------+
    • Example 3: Enable the Hive-compatible data type edition and use the deptno column to define a window and calculate the population standard deviation of the sal column. The ORDER BY clause is specified. This function returns the cumulative population standard deviation of the values from the first row to the row that has the same sal value as the current row in the current window. The population standard deviations for the rows that have the same sal value are the same. The current window includes the rows that have the same deptno value. Sample statement:

      -- Enable the Hive-compatible data type edition. 
      set odps.sql.hive.compatible=true;
      -- Execute the following statement: 
      select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 0.0        |           -- This row is the first row of this window. The population standard deviation for the first row is the cumulative population standard deviation of the values in the first and second rows because the two rows have the same sal value. 
      | 10         | 1300       | 0.0        |           -- The return value is the cumulative population standard deviation of the values in the first and second rows. 
      | 10         | 2450       | 575.0      |           -- The population standard deviation for the third row is the cumulative population standard deviation of the values from the first row to the fourth row because the third and fourth rows have the same sal value. 
      | 10         | 2450       | 575.0      |           -- The return value is the cumulative population standard deviation of the values from the first row to the fourth row. 
      | 10         | 5000       | 1546.1421524412158 |
      | 10         | 5000       | 1546.1421524412158 |
      | 20         | 800        | 0.0        |
      | 20         | 1100       | 150.0      |
      | 20         | 2975       | 962.4188277460079 |
      | 20         | 3000       | 1004.7387720198718 |
      | 20         | 3000       | 1004.7387720198718 |
      | 30         | 950        | 0.0        |
      | 30         | 1250       | 141.4213562373095 |
      | 30         | 1250       | 141.4213562373095 |
      | 30         | 1500       | 194.8557158514987 |
      | 30         | 1600       | 226.71568097509268 |
      | 30         | 2850       | 610.1001739241042 |
      +------------+------------+------------+

STDDEV_SAMP

  • Syntax

    double stddev_samp([distinct] <expr>) over([partition_clause] [orderby_clause] [frame_clause])
    decimal stddev_samp([distinct] <expr>) over([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the sample standard deviation of all the input values.

  • Parameters

    • expr: required. This parameter specifies the expression that is used to calculate the sample standard deviation. A value of the DOUBLE or DECIMAL type.

      • If an input value is of the STRING or BIGINT type, the value is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.

      • If the input value is null, the row that contains the value is not used for calculation.

      • If the distinct keyword is specified, the sample standard deviation of distinct values is calculated.

    • partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.

  • Return value

    A value of the same data type as expr is returned. If the values of all expressions that are specified by expr are null, null is returned. If the window has only one row of data whose expr value is not null, 0 is returned.

  • Examples

    • Example 1: Use the deptno column to define a window and calculate the sample standard deviation of the sal column. The order by clause is not specified. This function returns the cumulative sample standard deviation of the current window. The current window includes the rows that have the same deptno value. Sample statement:

      select deptno, sal, stddev_samp(sal) over (partition by deptno) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1693.7138680032904 |   -- This row is the first row of this window. The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. 
      | 10         | 2450       | 1693.7138680032904 |   -- The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. 
      | 10         | 5000       | 1693.7138680032904 |   -- The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. 
      | 10         | 1300       | 1693.7138680032904 |     
      | 10         | 5000       | 1693.7138680032904 |
      | 10         | 2450       | 1693.7138680032904 |
      | 20         | 3000       | 1123.3320969330487 |
      | 20         | 3000       | 1123.3320969330487 |
      | 20         | 800        | 1123.3320969330487 |
      | 20         | 1100       | 1123.3320969330487 |
      | 20         | 2975       | 1123.3320969330487 |
      | 30         | 1500       | 668.331255192114 |
      | 30         | 950        | 668.331255192114 |
      | 30         | 1600       | 668.331255192114 |
      | 30         | 1250       | 668.331255192114 |
      | 30         | 1250       | 668.331255192114 |
      | 30         | 2850       | 668.331255192114 |
      +------------+------------+------------+
    • Example 2: Use the deptno column to define a window and calculate the sample standard deviation of the sal column. The ORDER BY clause is specified. This function returns the cumulative sample standard deviation of the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statement:

      select deptno, sal, stddev_samp(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 0.0        |          -- This row is the first row of this window. 
      | 10         | 1300       | 0.0        |          -- The return value is the cumulative sample standard deviation of the values in the first and second rows. 
      | 10         | 2450       | 663.9528095680697 |   -- The return value is the cumulative sample standard deviation of the values from the first row to the third row. 
      | 10         | 2450       | 663.9528095680696 |
      | 10         | 5000       | 1511.2081259707413 |
      | 10         | 5000       | 1693.7138680032904 |
      | 20         | 800        | 0.0        |
      | 20         | 1100       | 212.13203435596427 |
      | 20         | 2975       | 1178.7175234126282 |
      | 20         | 3000       | 1182.7536725793752 |
      | 20         | 3000       | 1123.3320969330487 |
      | 30         | 950        | 0.0        |
      | 30         | 1250       | 212.13203435596427 |
      | 30         | 1250       | 173.20508075688772 |
      | 30         | 1500       | 225.0      |
      | 30         | 1600       | 253.4758371127315 |
      | 30         | 2850       | 668.331255192114 |
      +------------+------------+------------+

SUM

  • Syntax

    sum([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
  • Description

    Returns the sum of expr in a window.

  • Parameters

    • expr: required. This parameter specifies the column whose sum you want to calculate. The column is of the DOUBLE, DECIMAL, or BIGINT type.

      • If an input value is of the STRING type, the value is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.

      • If the input value is null, the row that contains the value is not used for calculation.

      • If the distinct keyword is specified, the sum of distinct values is calculated.

    • partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.

  • Return value

    • If the input values are of the BIGINT type, a value of the BIGINT type is returned.

    • If the input values are of the DECIMAL type, a value of the DECIMAL type is returned.

    • If input values are of the DOUBLE or STRING type, a value of the DOUBLE type is returned.

    • If input values are null, null is returned.

  • Examples

    • Example 1: Use the deptno column to define a window and calculate the sum of the sal column. The order by clause is not specified. This function returns the cumulative sum of the current window. The current window includes the rows that have the same deptno value. Sample statement:

      select deptno, sal, sum(sal) over (partition by deptno) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 17500      |   -- This row is the first row of this window. The return value is the cumulative sum of the values from the first row to the sixth row. 
      | 10         | 2450       | 17500      |   -- The return value is the cumulative sum of the values from the first row to the sixth row. 
      | 10         | 5000       | 17500      |   -- The return value is the cumulative sum of the values from the first row to the sixth row. 
      | 10         | 1300       | 17500      |
      | 10         | 5000       | 17500      |
      | 10         | 2450       | 17500      |
      | 20         | 3000       | 10875      |
      | 20         | 3000       | 10875      |
      | 20         | 800        | 10875      |
      | 20         | 1100       | 10875      |
      | 20         | 2975       | 10875      |
      | 30         | 1500       | 9400       |
      | 30         | 950        | 9400       |
      | 30         | 1600       | 9400       |
      | 30         | 1250       | 9400       |
      | 30         | 1250       | 9400       |
      | 30         | 2850       | 9400       |
      +------------+------------+------------+
    • Example 2: Disable the Hive-compatible data type edition and use the deptno column to define a window and calculate the sum of the sal column. The ORDER BY clause is specified. This function returns the cumulative sum of the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statement:

      -- Disable the Hive-compatible data type edition. 
      set odps.sql.hive.compatible=false;
      -- Execute the following statement: 
      select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 1300       |   -- This row is the first row of this window. 
      | 10         | 1300       | 2600       |   -- The return value is the cumulative sum of the values in the first and second rows. 
      | 10         | 2450       | 5050       |   -- The return value is the cumulative sum of the values from the first row to the third row. 
      | 10         | 2450       | 7500       |
      | 10         | 5000       | 12500      |
      | 10         | 5000       | 17500      |
      | 20         | 800        | 800        |
      | 20         | 1100       | 1900       |
      | 20         | 2975       | 4875       |
      | 20         | 3000       | 7875       |
      | 20         | 3000       | 10875      |
      | 30         | 950        | 950        |
      | 30         | 1250       | 2200       |
      | 30         | 1250       | 3450       |
      | 30         | 1500       | 4950       |
      | 30         | 1600       | 6550       |
      | 30         | 2850       | 9400       |
      +------------+------------+------------+
    • Example 3: Enable the Hive-compatible data type edition and use the deptno column to define a window and calculate the sum of the sal column. The ORDER BY clause is specified. This function returns the cumulative sum of the values from the first row to the row that has the same sal value as the current row in the current window. The sum values for the rows that have the same sal value are the same. The current window includes the rows that have the same deptno value. Sample statement:

      -- Enable the Hive-compatible data type edition. 
      set odps.sql.hive.compatible=true;
      -- Execute the following statement: 
      select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;

      The following result is returned:

      +------------+------------+------------+
      | deptno     | sal        | _c2        |
      +------------+------------+------------+
      | 10         | 1300       | 2600       |   -- This row is the first row of this window. The sum for the first row is the cumulative sum of the values in the first and second rows because the two rows have the same sal value. 
      | 10         | 1300       | 2600       |   -- The return value is the cumulative sum of the values in the first and second rows. 
      | 10         | 2450       | 7500       |   -- The sum for the third row is the cumulative sum of the values from the first row to the fourth row because the third and fourth rows have the same sal value. 
      | 10         | 2450       | 7500       |   -- The return value is the cumulative sum of the values from the first row to the fourth row. 
      | 10         | 5000       | 17500      |
      | 10         | 5000       | 17500      |
      | 20         | 800        | 800        |
      | 20         | 1100       | 1900       |
      | 20         | 2975       | 4875       |
      | 20         | 3000       | 10875      |
      | 20         | 3000       | 10875      |
      | 30         | 950        | 950        |
      | 30         | 1250       | 3450       |
      | 30         | 1250       | 3450       |
      | 30         | 1500       | 4950       |
      | 30         | 1600       | 6550       |
      | 30         | 2850       | 9400       |
      +------------+------------+------------+