Window functions supported by the SQL analysis feature - IoT Platform

Limits

Before you use window functions, take note of the following limits:

Window functions are supported only in SELECT statements.
A window function cannot contain nested window functions or nested aggregate functions.
You cannot use window functions together with aggregate functions of the same level.

Syntax

Syntax of window functions:

<function_name>([distinct][<expression> [, ...]]) over (<window_definition>)
<function_name>([distinct][<expression> [, ...]]) over <window_name>

function_name: the name of a built-in window function.
expression: the format of the window function. The format is subject to the function syntax.
windowing_definition: the definition of a window. For information about the syntax of windowing_definition, see windowing_definition.
window_name: the name of a window. You can use the window keyword to configure a window and use windowing_definition to specify the name of the window. Syntax of named_window_def:
```
window <window_name> as (<window_definition>)
```
Position of named_window_def in an SQL statement:
```
select ... from ... [where ...] [group by ...] [having ...] named_window_def [order by ...] [limit ...]
```

windowing_definition

Syntax

--partition_clause:
[partition by <expression> [, ...]]
--orderby_clause:
[order by <expression> [asc|desc][nulls {first|last}] [, ...]]
[<frame_clause>]

If you use a window function in a SELECT statement, data is partitioned and sorted based on PARTITION BY and ORDER BY in windowing_definition when the window function is executed. If the SELECT statement does not include partition by, only one partition exists and the partition contains all data. If the SELECT statement does not include order by, data in a partition is arranged in a random order, and a data stream is generated. After the data stream is generated, a group of rows is extracted from the data stream based on frame_clause in windowing_definition to create a window for the current row. The window function calculates the data included in the window to which the current row belongs.

partition by <expression> [, ...]: optional. This parameter specifies the partition information. If the values of partition key columns are the same for a group of rows, these rows are included in the same window. For more information about the format of PARTITION BY, see Table operations.
order by <expression> [asc|desc][nulls {first|last}] [, ...]: optional. This parameter specifies how to sort rows of data in a window.

Note
If the values of the column that is specified in order by are the same, the sorting result may not be accurate. To reduce the random ordering of data, make sure that the values of the column that is specified in order by are unique.
frame_clause: optional. This parameter is used to determine the data boundaries of a window. For more information about frame_clause, see frame_clause.

frame_clause

Syntax

-- Syntax 1 
{ROWS|RANGE|GROUPS} <frame_start> [<frame_exclusion>]
-- Syntax 2 
{ROWS|RANGE|GROUPS} between <frame_start> and <frame_end> [<frame_exclusion>]

frame_clause is a closed interval that is used to determine the data boundaries of a window. The data boundaries are determined based on the rows that are specified by frame_start and frame_end.

ROWS|RANGE|GROUPS: required. ROWS, RANGE, and GROUPS indicate the types of frame_clause. The implementation rules of frame_start and frame_end vary based on the type of frame_clause. Description:
- ROWS: The data boundaries of a window are determined based on the number of rows.
- RANGE: The data boundaries of a window are determined based on the comparison results of the values of the column that is specified in order by. In most cases, order by is specified in windowing_definition. If order by is not specified in windowing_definition, the values of the column that is specified in order by are the same for all rows in a partition. NULL values are considered equivalent.
- GROUPS: In a partition, rows that have the same value of the column specified in order by form a group. If order by is not specified, all rows in the partition form a group. NULL values are considered equivalent.

frame_start and frame_end: the start and end rows of a window. frame_start is required. frame_end is optional. If frame_end is not specified, the default value CURRENT ROW is used.

The row specified by frame_start must precede or be the same as the row specified by frame_end. Compared with the row specified by frame_end, the row specified by frame_start is closer to the first row in a window after all data in the window is sorted based on the column that is specified in order by of windowing_definition. The following table describes the valid values and logic of frame_start and frame_end when the type of frame_clause is ROWS, RANGE, or GROUPS.

frame_clause type	Value of frame_start or frame_end	Description
ROWS, RANGE, and GROUPS	UNBOUNDED PRECEDING	Indicates the first row of a partition. Rows are counted from 1.
ROWS, RANGE, and GROUPS	UNBOUNDED FOLLOWING	Indicates the last row of a partition.
ROWS	CURRENT ROW	Indicates the current row. Each row of data corresponds to a result calculated by a window function. The current row is the row whose data is calculated by using a window function.
	offset PRECEDING	Indicates the Nth row that precedes the current row at a specified `offset`. For example, if `0 PRECEDING` indicates the current row, `1 PRECEDING` indicates the previous row. `offset` must be a non-negative integer.
	offset FOLLOWING	Indicates the Nth row that follows the current row at a specified `offset`. For example, if `0 FOLLOWING` indicates the current row, `1 FOLLOWING` indicates the next row. `offset` must be a non-negative integer.
RANGE	CURRENT ROW	If frame_start is set to CURRENT ROW, the first row that has the same value of the column specified in `order by` is used as the current row. If frame_end is set to CURRENT ROW, the last row that has the same value of the column specified in `order by` is used as the current row.
	offset PRECEDING	The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by `order by`. For example, if data in a window is sorted by X, Xi specifies the X value that corresponds to the ith row, and Xc specifies the X value that corresponds to the current row. Positions of rows specified by frame_start and frame_end: `order by` is set to asc: frame_start specifies the first row that meets the following requirement: `Xc - Xi ≤ offset`. frame_end specifies the last row that meets the following requirement: `Xc - Xi ≥ offset`. `order by` is set to desc: frame_start specifies the first row that meets the following requirement: `Xi - Xc ≤ offset`. frame_end specifies the last row that meets the following requirement: `Xi - Xc ≥ offset`. The column that is specified in `order by` can be of the following data types: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATETIME, DATE, and TIMESTAMP. Syntax for `offset` of the DATE type: `N`: indicates N days or N seconds. It must be a non-negative integer. For an offset of the DATETIME or TIMESTAMP type, N is N seconds. For an offset of the DATE type, N is N days. `interval 'N' {YEAR\MONTH\DAY\HOUR\MINUTE\SECOND}`: indicates N years, months, days, hours, minutes, or seconds. For example, `INTERVAL '3' YEAR` indicates 3 years. `INTERVAL 'N-M' YEAR TO MONTH`: indicates N years and M months. For example, `INTERVAL '1-3' YEAR TO MONTH` indicates 1 year and 3 months. `INTERVAL 'D[ H[:M[:S[:N]]]]' DAY TO SECOND`: indicates D days, H hours, M minutes, S seconds, and N nanoseconds. For example, `INTERVAL '1 2:3:4:5' DAY TO SECOND` indicates 1 day, 2 hours, 3 minutes, 4 seconds, and 5 nanoseconds.
	offset FOLLOWING	The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by `order by`. For example, if data in a window is sorted by X, Xi specifies the X value that corresponds to the ith row, and Xc specifies the X value that corresponds to the current row. Positions of rows specified by frame_start and frame_end: `order by` is set to asc: frame_start specifies the first row that meets the following requirement: `Xi - Xc ≥ offset`. frame_end specifies the last row that meets the following requirement: `Xi - Xc ≤ offset`. `order by` is set to desc: frame_start specifies the first row that meets the following requirement: `Xc - Xi >= offset`. frame_end specifies the last row that meets the following requirement: `Xc - Xi <= offset`.
GROUPS	CURRENT ROW	If frame_start is set to CURRENT ROW, the first row of the group to which the current row belongs is used. If frame_end is set to CURRENT ROW, the last row of the group to which the current row belongs is used.
	offset PRECEDING	If frame_start is set to offset PRECEDING, the first row of the Nth group that precedes the group of the current row at a specified `offset` is used. If frame_end is set to offset PRECEDING, the last row of the Nth group that precedes the group of the current row at a specified `offset` is used. Note You cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING.
	offset FOLLOWING	If frame_start is set to offset FOLLOWING, the first row of the Nth group that follows the group of the current row at a specified `offset` is used. If frame_end is set to offset FOLLOWING, the last row of the Nth group that follows the group of the current row at a specified `offset` is used. Note You cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING.

frame_exclusion: optional. This parameter is used to remove specific rows from a window. Valid values:
- EXCLUDE NO OTHERS: No rows are excluded from the window.
- EXCLUDE CURRENT ROW: The current row is excluded from the window.
- EXCLUDE GROUP: indicates that an entire group of rows in a partition is excluded from the window. In the group, all rows have the same value of the column that is specified in order by as the current row.
- EXCLUDE TIES: An entire group of rows, except for the current row, are excluded from the window.

Default frame_clause

If you do not configure frame_clause, the default value of frame_clause in the following format. The format is the same as the format that is used by most SQL-based systems.

RANGE between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS

Example of data boundaries of a window

In this example, a table named tbl contains three columns that are of the BIGINT type: pid, oid, and rid. The tbl table contains the following data:

+------------+------------+------------+
| pid        | oid        | rid        |
+------------+------------+------------+
| 1          | NULL       | 1          |
| 1          | NULL       | 2          |
| 1          | 1          | 3          |
| 1          | 1          | 4          |
| 1          | 2          | 5          |
| 1          | 4          | 6          |
| 1          | 7          | 7          |
| 1          | 11         | 8          |
| 2          | NULL       | 9          |
| 2          | NULL       | 10         |
+------------+------------+------------+

You can replace ellipses (...) in the following SQL statements with windowing_definition to display the data in the windows in which each row of data is included.

Note

If a value in the window column in the returned result is NULL, no data is contained in the window.

Windows in ROWS mode

windowing_definition 1

partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW
-- Sample SQL statement: 
select pid, 
       oid, 
       rid, 
collect_list(rid) over(partition by pid order by 
oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;

Sample result:

+------------+------------+------------+--------+
| pid        | oid        | rid        | window |
+------------+------------+------------+--------+
| 1          | NULL       | 1          | [1]    |
| 1          | NULL       | 2          | [1, 2] |
| 1          | 1          | 3          | [1, 2, 3] |
| 1          | 1          | 4          | [1, 2, 3, 4] |
| 1          | 2          | 5          | [1, 2, 3, 4, 5] |
| 1          | 4          | 6          | [1, 2, 3, 4, 5, 6] |
| 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7] |
| 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
| 2          | NULL       | 9          | [9]    |
| 2          | NULL       | 10         | [9, 10] |
+------------+------------+------------+--------+

windowing_definition 2

partition by pid order by oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING
-- Sample SQL statement: 
select pid, 
       oid, 
       rid, 
collect_list(rid) over(partition by pid order by 
oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING) as window from tbl;

Sample result:

+------------+------------+------------+--------+
| pid        | oid        | rid        | window |
+------------+------------+------------+--------+
| 1          | NULL       | 1          | [1, 2, 3, 4, 5, 6, 7, 8] |
| 1          | NULL       | 2          | [1, 2, 3, 4, 5, 6, 7, 8] |
| 1          | 1          | 3          | [1, 2, 3, 4, 5, 6, 7, 8] |
| 1          | 1          | 4          | [1, 2, 3, 4, 5, 6, 7, 8] |
| 1          | 2          | 5          | [1, 2, 3, 4, 5, 6, 7, 8] |
| 1          | 4          | 6          | [1, 2, 3, 4, 5, 6, 7, 8] |
| 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7, 8] |
| 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
| 2          | NULL       | 9          | [9, 10] |
| 2          | NULL       | 10         | [9, 10] |
+------------+------------+------------+--------+

windowing_definition 3

partition by pid order by oid ROWS between 1 FOLLOWING and 3 FOLLOWING
-- Sample SQL statement: 
select pid, 
       oid, 
       rid, 
collect_list(rid) over(partition by pid order by 
oid ROWS between 1 FOLLOWING and 3 FOLLOWING) as window from tbl;

Sample result:

+------------+------------+------------+--------+
| pid        | oid        | rid        | window |
+------------+------------+------------+--------+
| 1          | NULL       | 1          | [2, 3, 4] |
| 1          | NULL       | 2          | [3, 4, 5] |
| 1          | 1          | 3          | [4, 5, 6] |
| 1          | 1          | 4          | [5, 6, 7] |
| 1          | 2          | 5          | [6, 7, 8] |
| 1          | 4          | 6          | [7, 8] |
| 1          | 7          | 7          | [8]    |
| 1          | 11         | 8          | NULL   |
| 2          | NULL       | 9          | [10]   |
| 2          | NULL       | 10         | NULL   |
+------------+------------+------------+--------+

windowing_definition 4

partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW
-- Sample SQL statement: 
select pid, 
oid, 
rid, 
collect_list(rid) over(partition by pid order by 
oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW) as window from tbl;

Sample result:

+------------+------------+------------+--------+
| pid        | oid        | rid        | window |
+------------+------------+------------+--------+
| 1          | NULL       | 1          | NULL   |
| 1          | NULL       | 2          | [1]    |
| 1          | 1          | 3          | [1, 2] |
| 1          | 1          | 4          | [1, 2, 3] |
| 1          | 2          | 5          | [1, 2, 3, 4] |
| 1          | 4          | 6          | [1, 2, 3, 4, 5] |
| 1          | 7          | 7          | [1, 2, 3, 4, 5, 6] |
| 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7] |
| 2          | NULL       | 9          | NULL   |
| 2          | NULL       | 10         | [9]    |
+------------+------------+------------+--------+

windowing_definition 5

partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP
-- Sample SQL statement: 
select pid, 
       oid, 
       rid, 
collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP) as window from tbl;

Sample result:

+------------+------------+------------+--------+
| pid        | oid        | rid        | window |
+------------+------------+------------+--------+
| 1          | NULL       | 1          | NULL   |
| 1          | NULL       | 2          | NULL   |
| 1          | 1          | 3          | [1, 2] |
| 1          | 1          | 4          | [1, 2] |
| 1          | 2          | 5          | [1, 2, 3, 4] |
| 1          | 4          | 6          | [1, 2, 3, 4, 5] |
| 1          | 7          | 7          | [1, 2, 3, 4, 5, 6] |
| 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7] |
| 2          | NULL       | 9          | NULL   |
| 2          | NULL       | 10         | NULL   |
+------------+------------+------------+--------+

windowing_definition 6

partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES
-- Sample SQL statement: 
select pid, 
       oid, 
       rid, 
collect_list(rid) over(partition by pid order by 
oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES) as window from tbl;

Sample result:

+------------+------------+------------+--------+
| pid        | oid        | rid        | window |
+------------+------------+------------+--------+
| 1          | NULL       | 1          | [1]    |
| 1          | NULL       | 2          | [2]    |
| 1          | 1          | 3          | [1, 2, 3] |
| 1          | 1          | 4          | [1, 2, 4] |
| 1          | 2          | 5          | [1, 2, 3, 4, 5] |
| 1          | 4          | 6          | [1, 2, 3, 4, 5, 6] |
| 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7] |
| 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
| 2          | NULL       | 9          | [9]    |
| 2          | NULL       | 10         | [10]   |
+------------+------------+------------+--------+

The differences between EXCLUDE CURRENT ROW and EXCLUDE GROUP can be obtained based on the comparison between the window column values of rows with the rid column values of 2, 4, and 10 in windowing_definition 5 and windowing_definition 6. If frame_exclusion is set to EXCLUDE GROUP, the rows that have the same pid column value in a partition are extracted when the rows have the same oid column value as the current row.

Windows in RANGE mode

windowing_definition 1

partition by pid order by oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW
-- Sample SQL statement: 
select pid, 
       oid, 
       rid, 
collect_list(rid) over(partition by pid order by 
oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;

Sample result:

+------------+------------+------------+--------+
| pid        | oid        | rid        | window |
+------------+------------+------------+--------+
| 1          | NULL       | 1          | [1, 2] |
| 1          | NULL       | 2          | [1, 2] |
| 1          | 1          | 3          | [1, 2, 3, 4] |
| 1          | 1          | 4          | [1, 2, 3, 4] |
| 1          | 2          | 5          | [1, 2, 3, 4, 5] |
| 1          | 4          | 6          | [1, 2, 3, 4, 5, 6] |
| 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7] |
| 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
| 2          | NULL       | 9          | [9, 10] |
| 2          | NULL       | 10         | [9, 10] |
+------------+------------+------------+--------+

If frame_end is set to CURRENT ROW, the last row that has the same value of the oid column in order by as the current row is obtained. Therefore, the window column value of the row whose rid column value is 1 is [1, 2].

windowing_definition 2

partition by pid order by oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING
-- Sample SQL statement: 
select pid, 
       oid, 
       rid, 
collect_list(rid) over(partition by pid order by 
oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING) as window from tbl;

Sample result:

+------------+------------+------------+--------+
| pid        | oid        | rid        | window |
+------------+------------+------------+--------+
| 1          | NULL       | 1          | [1, 2, 3, 4, 5, 6, 7, 8] |
| 1          | NULL       | 2          | [1, 2, 3, 4, 5, 6, 7, 8] |
| 1          | 1          | 3          | [3, 4, 5, 6, 7, 8] |
| 1          | 1          | 4          | [3, 4, 5, 6, 7, 8] |
| 1          | 2          | 5          | [5, 6, 7, 8] |
| 1          | 4          | 6          | [6, 7, 8] |
| 1          | 7          | 7          | [7, 8] |
| 1          | 11         | 8          | [8]    |
| 2          | NULL       | 9          | [9, 10] |
| 2          | NULL       | 10         | [9, 10] |
+------------+------------+------------+--------+

windowing_definition 3

partition by pid order by oid RANGE between 3 PRECEDING and 1 PRECEDING
-- Sample SQL statement: 

select pid, 
       oid, 
       rid, 
collect_list(rid) over(partition by pid order by 
oid RANGE between 3 PRECEDING and 1 PRECEDING) as window from tbl;

Sample result:

+------------+------------+------------+--------+
| pid        | oid        | rid        | window |
+------------+------------+------------+--------+
| 1          | NULL       | 1          | [1, 2] |
| 1          | NULL       | 2          | [1, 2] |
| 1          | 1          | 3          | NULL   |
| 1          | 1          | 4          | NULL   |
| 1          | 2          | 5          | [3, 4] |
| 1          | 4          | 6          | [3, 4, 5] |
| 1          | 7          | 7          | [6]    |
| 1          | 11         | 8          | NULL   |
| 2          | NULL       | 9          | [9, 10] |
| 2          | NULL       | 10         | [9, 10] |
+------------+------------+------------+--------+

For the row whose value of oid in order by is NULL, if frame_start is set to offset PRECEDING or offset FOLLOWING, the row is the first row whose value of oid in order by is NULL only if the offset is not unbounded. If frame_end is set to offset PRECEDING or offset FOLLOWING, the row is the last row whose value of oid in order by is NULL only if the offset is not unbounded.

Windows in GROUPS mode

windowing_definition

partition by pid order by oid GROUPS between 2 PRECEDING and CURRENT ROW
-- Sample SQL statement: 
select pid, 
       oid, 
       rid, 
collect_list(rid) over(partition by pid order by 
oid GROUPS between 2 PRECEDING and CURRENT ROW) as window from tbl;

Sample result:

+------------+------------+------------+--------+
| pid        | oid        | rid        | window |
+------------+------------+------------+--------+
| 1          | NULL       | 1          | [1, 2] |
| 1          | NULL       | 2          | [1, 2] |
| 1          | 1          | 3          | [1, 2, 3, 4] |
| 1          | 1          | 4          | [1, 2, 3, 4] |
| 1          | 2          | 5          | [1, 2, 3, 4, 5] |
| 1          | 4          | 6          | [3, 4, 5, 6] |
| 1          | 7          | 7          | [5, 6, 7] |
| 1          | 11         | 8          | [6, 7, 8] |
| 2          | NULL       | 9          | [9, 10] |
| 2          | NULL       | 10         | [9, 10] |
+------------+------------+------------+--------+

Sample data

This section provides sample source data for the function examples below. Create a table named emp and insert the sample data into the table. Sample statement:

create table if not exists emp
   (empno bigint,
    ename string,
    job string,
    mgr bigint,
    hiredate datetime,
    sal bigint,
    comm bigint,
    deptno bigint);
tunnel upload emp.txt emp;

The emp.txt file contains the following sample data:

7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

AVG

Syntax

double avg([distinct] double <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
decimal avg([distinct] decimal <expr>) over ([partition_clause] [orderby_clause] [frame_clause])

Description

Returns the average value of expr in a window.
Description
- expr: required. This parameter specifies the expression that is used to calculate the returned result. The value is of the DOUBLE or DECIMAL type.
  - If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
  - If the input value is NULL, the row that contains the value is not used for calculation.
  - If the distinct keyword is specified, the average value of distinct values is calculated.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value

If the input value of expr is of the DECIMAL type, a value of the DECIMAL type is returned. If the input value is of another data type, a value of the DOUBLE type is returned. If the input value of expr is NULL, NULL is returned.

Examples

Example 1: Use the deptno column to define a window and calculate the average value of the sal column. ORDER BY is not specified. This function returns the cumulative average value of the values from the first row to the last row in the current window. The current window includes the rows that have the same deptno column value. Sample statement:

select deptno, sal, avg(sal) over (partition by deptno) from emp;

Sample result:

+------------+------------+------------+
| deptno     | sal        | _c2        |
+------------+------------+------------+
| 10         | 1300       | 2916.6666666666665 |   -- This row is the first row of this window. The return value is the cumulative average value of the values from the first row to the sixth row. 
| 10         | 2450       | 2916.6666666666665 |   -- The return value is the cumulative average value of the values from the first row to the sixth row. 
| 10         | 5000       | 2916.6666666666665 |   -- The return value is the cumulative average value of the values from the first row to the sixth row. 
| 10         | 1300       | 2916.6666666666665 |
| 10         | 5000       | 2916.6666666666665 |
| 10         | 2450       | 2916.6666666666665 |
| 20         | 3000       | 2175.0     |
| 20         | 3000       | 2175.0     |
| 20         | 800        | 2175.0     |
| 20         | 1100       | 2175.0     |
| 20         | 2975       | 2175.0     |
| 30         | 1500       | 1566.6666666666667 |
| 30         | 950        | 1566.6666666666667 |
| 30         | 1600       | 1566.6666666666667 |
| 30         | 1250       | 1566.6666666666667 |
| 30         | 1250       | 1566.6666666666667 |
| 30         | 2850       | 1566.6666666666667 |
+------------+------------+------------+

Example 2: In Hive-compatible mode, use the deptno column to define a window and calculate the average value of the sal column. ORDER BY is specified. This function returns the cumulative average value of the values from the first row to the row that has the same sal value as the current row in the current window. The average values for the rows that have the same sal value are the same. The current window includes the rows that have the same deptno value. Sample statement:

-- Execute the following statement: 
select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;

Sample result:

+------------+------------+------------+
| deptno     | sal        | _c2        |
+------------+------------+------------+
| 10         | 1300       | 1300.0     |          -- This row is the first row of this window. The average value for the first row is the cumulative average value of the values in the first and second rows because the two rows have the same sal value. 
| 10         | 1300       | 1300.0     |          -- The return value is the cumulative average value of the values in the first and second rows. 
| 10         | 2450       | 1875.0     |          -- The average value for the third row is the cumulative average value of the values from the first row to the fourth row because the third and fourth rows have the same sal value. 
| 10         | 2450       | 1875.0     |          -- The return value is the cumulative average value of the values from the first row to the fourth row. 
| 10         | 5000       | 2916.6666666666665 |
| 10         | 5000       | 2916.6666666666665 |
| 20         | 800        | 800.0      |
| 20         | 1100       | 950.0      |
| 20         | 2975       | 1625.0     |
| 20         | 3000       | 2175.0     |
| 20         | 3000       | 2175.0     |
| 30         | 950        | 950.0      |
| 30         | 1250       | 1150.0     |
| 30         | 1250       | 1150.0     |
| 30         | 1500       | 1237.5     |
| 30         | 1600       | 1310.0     |
| 30         | 2850       | 1566.6666666666667 |
+------------+------------+------------+

CLUSTER_SAMPLE

Syntax

boolean cluster_sample(bigint <N>) OVER ([partition_clause])
boolean cluster_sample(bigint <N>, bigint <M>) OVER ([partition_clause])

Description
- cluster_sample(bigint <N>): specifies that N random rows of data are sampled.
- cluster_sample(bigint <N>, bigint <M>): specifies that rows are sampled based on a specified ratio (M/N). The number of rows that are sampled is calculated by using the following formula: partition_row_count × M/N. partition_row_count specifies the number of rows in a partition.
Description
- N: required. A constant of the BIGINT type. If N is set to NULL, NULL is returned.
- M: required. A constant of the BIGINT type. If M is set to NULL, NULL is returned.
- partition_clause: optional. For more information, see windowing_definition.
Return value

The return value is a value of the BOOLEAN type.

Examples

Sample about 20% of data entries in each group. Sample statement:

select deptno, sal
    from (
        select deptno, sal, cluster_sample(5, 1) over (partition by deptno) as flag
        from emp
        ) sub
    where flag = true;

Sample result:

+------------+------------+
| deptno     | sal        |
+------------+------------+
| 10         | 1300       |
| 20         | 3000       |
| 30         | 950        |
+------------+------------+

COUNT

Syntax

bigint count(*) over ([partition_clause] [orderby_clause] [frame_clause])
bigint count([distinct] <expr>[,...]) over ([partition_clause] [orderby_clause] [frame_clause])

Description
- count(*): the total number of rows to return.
- count([distinct] <expr>[,...]): When you calculate the number of rows, the rows whose expr is NULL are ignored. If multiple expr parameters exist, any rows whose expr is NULL are ignored. In addition, if the distinct keyword is specified, the number of rows after deduplication is calculated. Any rows whose expr is NULL are ignored.
Description
- expr: required. This parameter specifies the column whose values you want to count. All data types are supported. If the input value is NULL, the row that contains this value is not used for the calculation. If the DISTINCT keyword is specified, the count value of distinct values is obtained.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the BIGINT type is returned.

Examples

Example 1: Use the sal column to define a window. ORDER BY is not specified. This function returns the cumulative count from the first row to the last row in the current window. The current window includes the rows that have the same sal value. Sample statement:

select sal, count(sal) over (partition by sal) as count from emp;

Sample result:

+------------+------------+
| sal        | count      |
+------------+------------+
| 800        | 1          |
| 950        | 1          |
| 1100       | 1          |
| 1250       | 2          |   -- This row is the first row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value. 
| 1250       | 2          |   -- The return value is the cumulative count from the first row to the second row in the current window. 
| 1300       | 2          |
| 1300       | 2          |
| 1500       | 1          |
| 1600       | 1          |
| 2450       | 2          |
| 2450       | 2          |
| 2850       | 1          |
| 2975       | 1          |
| 3000       | 2          |
| 3000       | 2          |
| 5000       | 2          |
| 5000       | 2          |
+------------+------------+

Example 2: Use the sal column to define a window. ORDER BY is specified. This function returns the cumulative count from the first row to the last row in the current window. The current window includes the rows that have the same sal value. Sample statement:

-- Execute the following statement: 
select sal, count(sal) over (partition by sal order by sal) as count from emp;

Sample result:

+------------+------------+
| sal        | count      |
+------------+------------+
| 800        | 1          |
| 950        | 1          |
| 1100       | 1          |
| 1250       | 2          |   -- This row is the first row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value. 
| 1250       | 2          |   -- The return value is the cumulative count from the first row to the second row in the current window. 
| 1300       | 2          |
| 1300       | 2          |
| 1500       | 1          |
| 1600       | 1          |
| 2450       | 2          |
| 2450       | 2          |
| 2850       | 1          |
| 2975       | 1          |
| 3000       | 2          |
| 3000       | 2          |
| 5000       | 2          |
| 5000       | 2          |
+------------+------------+

CUME_DIST

Syntax

double cume_dist() over([partition_clause] [orderby_clause])

Description

This function calculates the cumulative distribution. The cumulative distribution indicates the ratio of rows whose values are greater than or equal to the values of the current row to all rows in a partition. The ratio is determined based on orderby_clause.
Description

partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the DOUBLE type is returned. The return value is calculated by using the following formula: row_number_of_last_peer/partition_row_count. row_number_of_last_peer indicates the value returned by ROW_NUMBER that corresponds to the last row of the group to which the current row belongs. partition_row_count indicates the number of rows in a partition to which the current row belongs.

Examples

Group all employees by department (deptno) and calculate the cumulative distribution of employees in each group by salary. Sample statement:

select deptno, ename, sal, concat(round(cume_dist() over (partition by deptno order by sal desc)*100,2),'%') as cume_dist from emp;

Sample result:

+------------+------------+------------+------------+
| deptno     | ename      | sal        | cume_dist  |
+------------+------------+------------+------------+
| 10         | JACCKA     | 5000       | 33.33%     |
| 10         | KING       | 5000       | 33.33%     |
| 10         | CLARK      | 2450       | 66.67%     |
| 10         | WELAN      | 2450       | 66.67%     |
| 10         | TEBAGE     | 1300       | 100.0%     |
| 10         | MILLER     | 1300       | 100.0%     |
| 20         | SCOTT      | 3000       | 40.0%      |
| 20         | FORD       | 3000       | 40.0%      |
| 20         | JONES      | 2975       | 60.0%      |
| 20         | ADAMS      | 1100       | 80.0%      |
| 20         | SMITH      | 800        | 100.0%     |
| 30         | BLAKE      | 2850       | 16.67%     |
| 30         | ALLEN      | 1600       | 33.33%     |
| 30         | TURNER     | 1500       | 50.0%      |
| 30         | MARTIN     | 1250       | 83.33%     |
| 30         | WARD       | 1250       | 83.33%     |
| 30         | JAMES      | 950        | 100.0%     |
+------------+------------+------------+------------+

DENSE_RANK

Syntax

bigint dense_rank() over ([partition_clause] [orderby_clause])

Description

Calculates the rank of a row in an ordered group of rows that are sorted based on orderby_clause. The rank counts from 1. In a partition, rows with the same value of the column that is specified in order by have the same rank. The rank increases by 1 each time the value of the column that is specified in order by changes.
Description

partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the BIGINT type is returned. If orderby_clause is not specified, the values in the returned results are all 1.

Examples

Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the sequence numbers of the employees in their respective groups. Sample statement:

select deptno, ename, sal, dense_rank() over (partition by deptno order by sal desc) as nums from emp;

Sample result:

+------------+------------+------------+------------+
| deptno     | ename      | sal        | nums       |
+------------+------------+------------+------------+
| 10         | JACCKA     | 5000       | 1          |
| 10         | KING       | 5000       | 1          |
| 10         | CLARK      | 2450       | 2          |
| 10         | WELAN      | 2450       | 2          |
| 10         | TEBAGE     | 1300       | 3          |
| 10         | MILLER     | 1300       | 3          |
| 20         | SCOTT      | 3000       | 1          |
| 20         | FORD       | 3000       | 1          |
| 20         | JONES      | 2975       | 2          |
| 20         | ADAMS      | 1100       | 3          |
| 20         | SMITH      | 800        | 4          |
| 30         | BLAKE      | 2850       | 1          |
| 30         | ALLEN      | 1600       | 2          |
| 30         | TURNER     | 1500       | 3          |
| 30         | MARTIN     | 1250       | 4          |
| 30         | WARD       | 1250       | 4          |
| 30         | JAMES      | 950        | 5          |
+------------+------------+------------+------------+

FIRST_VALUE

Syntax

first_value(<expr>[, <ignore_nulls>]) over ([partition_clause] [orderby_clause] [frame_clause])

Description

Returns the value of expr that corresponds to the first row of a window.
Description
- expr: required. The expression that is used to calculate the returned result.
- ignore_nulls: optional. A value of the BOOLEAN type. This parameter specifies whether to ignore null values. Default value: false. If this parameter is set to true, a non-null value of expr that corresponds to the first row of a window is returned.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the same data type as expr is returned.

Examples

Group all employees by department and return the first row of data in each group. Sample statement:

ORDER BY is not specified.

select deptno, ename, sal, first_value(sal) over (partition by deptno) as first_value from emp;

Sample result:

+------------+------------+------------+-------------+
| deptno     | ename      | sal        | first_value |
+------------+------------+------------+-------------+
| 10         | TEBAGE     | 1300       | 1300        |   -- This row is the first row of the current window. 
| 10         | CLARK      | 2450       | 1300        |
| 10         | KING       | 5000       | 1300        |
| 10         | MILLER     | 1300       | 1300        |
| 10         | JACCKA     | 5000       | 1300        |
| 10         | WELAN      | 2450       | 1300        |
| 20         | FORD       | 3000       | 3000        |   -- This row is the first row of the current window. 
| 20         | SCOTT      | 3000       | 3000        |
| 20         | SMITH      | 800        | 3000        |
| 20         | ADAMS      | 1100       | 3000        |
| 20         | JONES      | 2975       | 3000        |
| 30         | TURNER     | 1500       | 1500        |   -- This row is the first row of the current window. 
| 30         | JAMES      | 950        | 1500        |
| 30         | ALLEN      | 1600       | 1500        |
| 30         | WARD       | 1250       | 1500        |
| 30         | MARTIN     | 1250       | 1500        |
| 30         | BLAKE      | 2850       | 1500        |
+------------+------------+------------+-------------+

ORDER BY is specified.

select deptno, ename, sal, first_value(sal) over (partition by deptno order by sal desc) as first_value from emp;

Sample result:

+------------+------------+------------+-------------+
| deptno     | ename      | sal        | first_value |
+------------+------------+------------+-------------+
| 10         | JACCKA     | 5000       | 5000        |   -- This row is the first row of the current window. 
| 10         | KING       | 5000       | 5000        |
| 10         | CLARK      | 2450       | 5000        |
| 10         | WELAN      | 2450       | 5000        |
| 10         | TEBAGE     | 1300       | 5000        |
| 10         | MILLER     | 1300       | 5000        |
| 20         | SCOTT      | 3000       | 3000        |   -- This row is the first row of the current window. 
| 20         | FORD       | 3000       | 3000        |
| 20         | JONES      | 2975       | 3000        |
| 20         | ADAMS      | 1100       | 3000        |
| 20         | SMITH      | 800        | 3000        |
| 30         | BLAKE      | 2850       | 2850        |   -- This row is the first row of the current window. 
| 30         | ALLEN      | 1600       | 2850        |
| 30         | TURNER     | 1500       | 2850        |
| 30         | MARTIN     | 1250       | 2850        |
| 30         | WARD       | 1250       | 2850        |
| 30         | JAMES      | 950        | 2850        |
+------------+------------+------------+-------------+

LAG

Syntax

lag(<expr>[,bigint <offset>[, <default>]]) over([partition_clause] orderby_clause)

Description

Returns the value of expr that precedes the current row at a given offset. The expression expr can be a column, a column operation, or a function operation.
Description
- expr: required. The expression that is used to calculate the returned result.
- offset: optional. The value is a constant of the BIGINT type and must be greater than or equal to 0. The value 0 indicates the current row, and the value 1 indicates the previous row. Default value: 1. If the input value is of the STRING or DOUBLE type, it is implicitly converted into a value of the BIGINT type before calculation.
- default: optional. The default value when the value of offset is out of the valid range. The value of this parameter must be a constant. The default value of this parameter is null. The value of this parameter must be of the same data type as the value of expr. If the value of expr is not a constant, the parameter value is determined based on the current row.
- partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the same data type as expr is returned.

Examples

Group all employees based on the deptno column in a window and calculate the value of sal for each employee at a given offset. Sample statement:

select deptno, ename, sal, lag(sal, 1) over (partition by deptno order by sal) as sal_new from emp;

Sample result:

+------------+------------+------------+------------+
| deptno     | ename      | sal        | sal_new    |
+------------+------------+------------+------------+
| 10         | TEBAGE     | 1300       | NULL       |
| 10         | MILLER     | 1300       | 1300       |
| 10         | CLARK      | 2450       | 1300       |
| 10         | WELAN      | 2450       | 2450       |
| 10         | KING       | 5000       | 2450       |
| 10         | JACCKA     | 5000       | 5000       |
| 20         | SMITH      | 800        | NULL       |
| 20         | ADAMS      | 1100       | 800        |
| 20         | JONES      | 2975       | 1100       |
| 20         | SCOTT      | 3000       | 2975       |
| 20         | FORD       | 3000       | 3000       |
| 30         | JAMES      | 950        | NULL       |
| 30         | MARTIN     | 1250       | 950        |
| 30         | WARD       | 1250       | 1250       |
| 30         | TURNER     | 1500       | 1250       |
| 30         | ALLEN      | 1600       | 1500       |
| 30         | BLAKE      | 2850       | 1600       |
+------------+------------+------------+------------+

LAST_VALUE

Syntax

last_value(<expr>[, <ignore_nulls>]) over([partition_clause] [orderby_clause] [frame_clause])

Description

Returns the calculated result of the last row of data in a window using the expression that is specified by expr.
Description
- expr: required. The expression that is used to calculate the returned result.
- ignore_nulls: optional. A value of the BOOLEAN type. This parameter specifies whether to ignore null values. Default value: false. If this parameter is set to true, a non-null value of expr that corresponds to the last row of a window is returned.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the same data type as expr is returned.

Examples

Group all employees based on the deptno column and return the last row of data in each group. Sample statement:

If order by is not specified, the rows from the first row to the last row belong to the current window. The value of the last row in the current window is returned.

select deptno, ename, sal, last_value(sal) over (partition by deptno) as last_value from emp;

Sample result:

+------------+------------+------------+-------------+
| deptno     | ename      | sal        | last_value |
+------------+------------+------------+-------------+
| 10         | TEBAGE     | 1300       | 2450        |
| 10         | CLARK      | 2450       | 2450        |
| 10         | KING       | 5000       | 2450        |
| 10         | MILLER     | 1300       | 2450        |
| 10         | JACCKA     | 5000       | 2450        |   
| 10         | WELAN      | 2450       | 2450        |   -- This row is the last row of the current window. 
| 20         | FORD       | 3000       | 2975        |
| 20         | SCOTT      | 3000       | 2975        |
| 20         | SMITH      | 800        | 2975        |
| 20         | ADAMS      | 1100       | 2975        |
| 20         | JONES      | 2975       | 2975        |   -- This row is the last row of the current window. 
| 30         | TURNER     | 1500       | 2850        |
| 30         | JAMES      | 950        | 2850        |
| 30         | ALLEN      | 1600       | 2850        |
| 30         | WARD       | 1250       | 2850        |
| 30         | MARTIN     | 1250       | 2850        |
| 30         | BLAKE      | 2850       | 2850        |   -- This row is the last row of the current window. 
+------------+------------+------------+-------------+

If order by is specified, the rows from the first row to the current row belong to the current window. The value of the current row in the current window is returned.

select deptno, ename, sal, last_value(sal) over (partition by deptno order by sal desc) as last_value from emp;

Sample result:

+------------+------------+------------+-------------+
| deptno     | ename      | sal        | last_value |
+------------+------------+------------+-------------+
| 10         | JACCKA     | 5000       | 5000        |   -- This row is the current row of the current window. 
| 10         | KING       | 5000       | 5000        |   -- This row is the current row of the current window. 
| 10         | CLARK      | 2450       | 2450        |   -- This row is the current row of the current window. 
| 10         | WELAN      | 2450       | 2450        |   -- This row is the current row of the current window. 
| 10         | TEBAGE     | 1300       | 1300        |   -- This row is the current row of the current window. 
| 10         | MILLER     | 1300       | 1300        |   -- This row is the current row of the current window. 
| 20         | SCOTT      | 3000       | 3000        |   -- This row is the current row of the current window. 
| 20         | FORD       | 3000       | 3000        |   -- This row is the current row of the current window. 
| 20         | JONES      | 2975       | 2975        |   -- This row is the current row of the current window. 
| 20         | ADAMS      | 1100       | 1100        |   -- This row is the current row of the current window. 
| 20         | SMITH      | 800        | 800         |   -- This row is the current row of the current window. 
| 30         | BLAKE      | 2850       | 2850        |   -- This row is the current row of the current window. 
| 30         | ALLEN      | 1600       | 1600        |   -- This row is the current row of the current window. 
| 30         | TURNER     | 1500       | 1500        |   -- This row is the current row of the current window. 
| 30         | MARTIN     | 1250       | 1250        |   -- This row is the current row of the current window. 
| 30         | WARD       | 1250       | 1250        |   -- This row is the current row of the current window. 
| 30         | JAMES      | 950        | 950         |   -- This row is the current row of the current window. 
+------------+------------+------------+-------------+

LEAD

Syntax

lead(<expr>[, bigint <offset>[, <default>]]) over([partition_clause] orderby_clause)

Description

Returns the value of expr that corresponds to the Nth row following the current row at a given offset. The expression expr can be a column, a column operation, or a function operation.
Description
- expr: required. The expression that is used to calculate the returned result.
- offset: optional. The value is a constant of the BIGINT type and must be greater than or equal to 0. The value 0 indicates the current row, and the value 1 indicates the next row. Default value: 1. If the input value is of the STRING or DOUBLE type, it is implicitly converted into a value of the BIGINT type before calculation.
- default: optional. The default value when the value of offset is out of the valid range. The value of this parameter must be a constant. The default value of this parameter is null. The value of this parameter must be of the same data type as the value of expr. If the value of expr is not a constant, the parameter value is determined based on the current row.
- partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the same data type as expr is returned.

Examples

Group all employees based on the deptno column in a window and calculate the value of sal for each employee at a given offset. Sample statement:

select deptno, ename, sal, lead(sal, 1) over (partition by deptno order by sal) as sal_new from emp;

Sample result:

+------------+------------+------------+------------+
| deptno     | ename      | sal        | sal_new    |
+------------+------------+------------+------------+
| 10         | TEBAGE     | 1300       | 1300       |
| 10         | MILLER     | 1300       | 2450       |
| 10         | CLARK      | 2450       | 2450       |
| 10         | WELAN      | 2450       | 5000       |
| 10         | KING       | 5000       | 5000       |
| 10         | JACCKA     | 5000       | NULL       |
| 20         | SMITH      | 800        | 1100       |
| 20         | ADAMS      | 1100       | 2975       |
| 20         | JONES      | 2975       | 3000       |
| 20         | SCOTT      | 3000       | 3000       |
| 20         | FORD       | 3000       | NULL       |
| 30         | JAMES      | 950        | 1250       |
| 30         | MARTIN     | 1250       | 1250       |
| 30         | WARD       | 1250       | 1500       |
| 30         | TURNER     | 1500       | 1600       |
| 30         | ALLEN      | 1600       | 2850       |
| 30         | BLAKE      | 2850       | NULL       |
+------------+------------+------------+------------+

MAX

Syntax

max(<expr>) over([partition_clause] [orderby_clause] [frame_clause])

Description

Returns the maximum value of expr in a window.
Description
- expr: required. The expression that is used to calculate the maximum value. The input value can be of any data type other than BOOLEAN. If the input value for a row is NULL, this row is not used for calculation.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the same type as expr is returned.

Examples

Example 1: Use the deptno column to define a window and obtain the maximum value of the sal column. The ORDER BY clause is not specified. This function returns the maximum value of the current window. The current window includes the rows that have the same deptno value. Sample statement:

select deptno, sal, max(sal) over (partition by deptno) from emp;

Sample result:

+------------+------------+------------+
| deptno     | sal        | _c2        |
+------------+------------+------------+
| 10         | 1300       | 5000       |   -- This row is the first row of this window. The return value is the maximum value among the values from the first row to the sixth row. 
| 10         | 2450       | 5000       |   -- The return value is the maximum value among the values from the first row to the sixth row. 
| 10         | 5000       | 5000       |   -- The return value is the maximum value among the values from the first row to the sixth row. 
| 10         | 1300       | 5000       |
| 10         | 5000       | 5000       |
| 10         | 2450       | 5000       |
| 20         | 3000       | 3000       |
| 20         | 3000       | 3000       |
| 20         | 800        | 3000       |
| 20         | 1100       | 3000       |
| 20         | 2975       | 3000       |
| 30         | 1500       | 2850       |
| 30         | 950        | 2850       |
| 30         | 1600       | 2850       |
| 30         | 1250       | 2850       |
| 30         | 1250       | 2850       |
| 30         | 2850       | 2850       |
+------------+------------+------------+

Example 2: Use the deptno column to define a window and obtain the maximum value of the sal column. The ORDER BY clause is specified. This function returns the maximum value among the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Run the following command:

select deptno, sal, max(sal) over (partition by deptno order by sal) from emp;

Sample result:

+------------+------------+------------+
| deptno     | sal        | _c2        |
+------------+------------+------------+
| 10         | 1300       | 1300       |   -- This row is the first row of this window. 
| 10         | 1300       | 1300       |   -- The return value is the maximum value among the values in the first and second rows. 
| 10         | 2450       | 2450       |   -- The return value is the maximum value among the values from the first row to the third row. 
| 10         | 2450       | 2450       |   -- The return value is the maximum value among the values from the first row to the fourth row. 
| 10         | 5000       | 5000       |
| 10         | 5000       | 5000       |
| 20         | 800        | 800        |
| 20         | 1100       | 1100       |
| 20         | 2975       | 2975       |
| 20         | 3000       | 3000       |
| 20         | 3000       | 3000       |
| 30         | 950        | 950        |
| 30         | 1250       | 1250       |
| 30         | 1250       | 1250       |
| 30         | 1500       | 1500       |
| 30         | 1600       | 1600       |
| 30         | 2850       | 2850       |
+------------+------------+------------+

MEDIAN

Syntax

median(<expr>) over ([partition_clause] [orderby_clause] [frame_clause])

Description

Returns the median of expr in a window.
Description
- expr: required. The expression that is used to calculate the median. A value of the DOUBLE or DECIMAL type. The value of this parameter must be 1 to 255 digits in length.
  - If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
  - If the input value is NULL, NULL is returned.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the DOUBLE or DECIMAL type is returned. If the values of all expressions that are specified by expr are NULL, NULL is returned.

Examples

Use the deptno column to define a window and calculate the median value of the sal column. This function returns the median value of the current window. The current window includes the rows that have the same deptno value. Run the following command:

select deptno, sal, median(sal) over (partition by deptno) from emp;

Sample result:

+------------+------------+------------+
| deptno     | sal        | _c2        |
+------------+------------+------------+
| 10         | 1300       | 2450.0     |   -- This row is the first row of this window. The return value is the median value of the values from the first row to the sixth row. 
| 10         | 2450       | 2450.0     |
| 10         | 5000       | 2450.0     |
| 10         | 1300       | 2450.0     |
| 10         | 5000       | 2450.0     |
| 10         | 2450       | 2450.0     |
| 20         | 3000       | 2975.0     |
| 20         | 3000       | 2975.0     |
| 20         | 800        | 2975.0     |
| 20         | 1100       | 2975.0     |
| 20         | 2975       | 2975.0     |
| 30         | 1500       | 1375.0     |
| 30         | 950        | 1375.0     |
| 30         | 1600       | 1375.0     |
| 30         | 1250       | 1375.0     |
| 30         | 1250       | 1375.0     |
| 30         | 2850       | 1375.0     |
+------------+------------+------------+

MIN

Syntax

min(<expr>) over([partition_clause] [orderby_clause] [frame_clause])

Description

Returns the minimum value of expr in a window.
Description
- expr: required. The expression that is used to calculate the minimum value. The input value can be of any data type other than BOOLEAN. If the value for a row is null, this row is not used for calculation.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the same data type as expr is returned.

Examples

Example 1: Use the deptno column to define a window and obtain the minimum value of the sal column. The ORDER BY clause is not specified. This function returns the minimum value of the current window. The current window includes the rows that have the same deptno value. Run the following command:

select deptno, sal, min(sal) over (partition by deptno) from emp;

Sample result:

+------------+------------+------------+
| deptno     | sal        | _c2        |
+------------+------------+------------+
| 10         | 1300       | 1300       |   -- This row is the first row of this window. The return value is the minimum value among the values from the first row to the sixth row. 
| 10         | 2450       | 1300       |   -- The return value is the minimum value among the values from the first row to the sixth row. 
| 10         | 5000       | 1300       |   -- The return value is the minimum value among the values from the first row to the sixth row. 
| 10         | 1300       | 1300       |
| 10         | 5000       | 1300       |
| 10         | 2450       | 1300       |
| 20         | 3000       | 800        |
| 20         | 3000       | 800        |
| 20         | 800        | 800        |
| 20         | 1100       | 800        |
| 20         | 2975       | 800        |
| 30         | 1500       | 950        |
| 30         | 950        | 950        |
| 30         | 1600       | 950        |
| 30         | 1250       | 950        |
| 30         | 1250       | 950        |
| 30         | 2850       | 950        |
+------------+------------+------------+

Example 2: Use the deptno column to define a window and obtain the minimum value of the sal column. The ORDER BY clause is specified. This function returns the minimum value among the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Run the following command:

select deptno, sal, min(sal) over (partition by deptno order by sal) from emp;

Sample result:

+------------+------------+------------+
| deptno     | sal        | _c2        |
+------------+------------+------------+
| 10         | 1300       | 1300       |   -- This row is the first row of this window. 
| 10         | 1300       | 1300       |   -- The return value is the minimum value among the values in the first and second rows. 
| 10         | 2450       | 1300       |   -- The return value is the minimum value among the values from the first row to the third row. 
| 10         | 2450       | 1300       |
| 10         | 5000       | 1300       |
| 10         | 5000       | 1300       |
| 20         | 800        | 800        |
| 20         | 1100       | 800        |
| 20         | 2975       | 800        |
| 20         | 3000       | 800        |
| 20         | 3000       | 800        |
| 30         | 950        | 950        |
| 30         | 1250       | 950        |
| 30         | 1250       | 950        |
| 30         | 1500       | 950        |
| 30         | 1600       | 950        |
| 30         | 2850       | 950        |
+------------+------------+------------+

NTILE

Syntax

bigint ntile(bigint <N>) over ([partition_clause] [orderby_clause])

Description

Splits rows of data in a partition into N groups of equal size and returns the number of the group to which the specified row belongs. If data in the partition cannot be split into N groups of equal size, one more row is preferentially allocated to the first M groups.
Description
- N: required. This parameter specifies the number of splits. The input value is of the BIGINT type.
- partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the BIGINT type is returned.

Examples

Divide all employees into three groups based on the sal column in descending order and obtain the number of the group to which each employee belongs. Sample statement:

select deptno, ename, sal, ntile(3) over (partition by deptno order by sal desc) as nt3 from emp;

Sample result:

+------------+------------+------------+------------+
| deptno     | ename      | sal        | nt3        |
+------------+------------+------------+------------+
| 10         | JACCKA     | 5000       | 1          |
| 10         | KING       | 5000       | 1          |
| 10         | CLARK      | 2450       | 2          |
| 10         | WELAN      | 2450       | 2          |
| 10         | TEBAGE     | 1300       | 3          |
| 10         | MILLER     | 1300       | 3          |
| 20         | SCOTT      | 3000       | 1          |
| 20         | FORD       | 3000       | 1          |
| 20         | JONES      | 2975       | 2          |
| 20         | ADAMS      | 1100       | 2          |
| 20         | SMITH      | 800        | 3          |
| 30         | BLAKE      | 2850       | 1          |
| 30         | ALLEN      | 1600       | 1          |
| 30         | TURNER     | 1500       | 2          |
| 30         | MARTIN     | 1250       | 2          |
| 30         | WARD       | 1250       | 3          |
| 30         | JAMES      | 950        | 3          |
+------------+------------+------------+------------+

NTH_VALUE

Syntax

nth_value(<expr>, <number> [, <ignore_nulls>]) over ([partition_clause] [orderby_clause] [frame_clause])

Description

Returns the value of expr that corresponds to the Nth row in a window.
Description
- expr: required. The expression that is used to calculate the returned result.
- number: required. The value is of the BIGINT type The value must be an integer greater than or equal to 1. If the input value is 1, this function is equivalent to FIRST_VALUE.
- ignore_nulls: optional. A value of the BOOLEAN type. This parameter specifies whether to ignore null values. Default value: false. If this parameter is set to true, a non-null value of expr that corresponds to the Nth row of a window is returned.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the same data type as expr is returned.

Examples

Group all employees by department and return the sixth row of data in each group. Sample statement:

If order by is not specified, the rows from the first row to the last row belong to the current window. The value of the sixth row in the current window is returned.

select deptno, ename, sal, nth_value(sal,6) over (partition by deptno) as nth_value from emp;

Sample result:

+------------+------------+------------+------------+
| deptno     | ename      | sal        | nth_value  |
+------------+------------+------------+------------+
| 10         | TEBAGE     | 1300       | 2450       |
| 10         | CLARK      | 2450       | 2450       |
| 10         | KING       | 5000       | 2450       |
| 10         | MILLER     | 1300       | 2450       |
| 10         | JACCKA     | 5000       | 2450       |
| 10         | WELAN      | 2450       | 2450       |   -- This row is the sixth row of the current window. 
| 20         | FORD       | 3000       | NULL       |
| 20         | SCOTT      | 3000       | NULL       |
| 20         | SMITH      | 800        | NULL       |
| 20         | ADAMS      | 1100       | NULL       |
| 20         | JONES      | 2975       | NULL       |   -- This current window has less than six rows, and null is returned. 
| 30         | TURNER     | 1500       | 2850       |
| 30         | JAMES      | 950        | 2850       |
| 30         | ALLEN      | 1600       | 2850       |
| 30         | WARD       | 1250       | 2850       |
| 30         | MARTIN     | 1250       | 2850       |
| 30         | BLAKE      | 2850       | 2850       |   -- This row is the sixth row of the current window. 
+------------+------------+------------+------------+

If order by is specified, the rows from the first row to the current row belong to the current window. The value of the sixth row in the current window is returned.

select deptno, ename, sal, nth_value(sal,6) over (partition by deptno order by sal) as nth_value from emp;

Sample result:

+------------+------------+------------+------------+
| deptno     | ename      | sal        | nth_value  |
+------------+------------+------------+------------+
| 10         | TEBAGE     | 1300       | NULL       |   
| 10         | MILLER     | 1300       | NULL       |   -- This window has only two rows, and NULL is returned. 
| 10         | CLARK      | 2450       | NULL       |
| 10         | WELAN      | 2450       | NULL       |
| 10         | KING       | 5000       | 5000       |  
| 10         | JACCKA     | 5000       | 5000       |
| 20         | SMITH      | 800        | NULL       |
| 20         | ADAMS      | 1100       | NULL       |
| 20         | JONES      | 2975       | NULL       |
| 20         | SCOTT      | 3000       | NULL       |
| 20         | FORD       | 3000       | NULL       |
| 30         | JAMES      | 950        | NULL       |
| 30         | MARTIN     | 1250       | NULL       |
| 30         | WARD       | 1250       | NULL       |
| 30         | TURNER     | 1500       | NULL       |
| 30         | ALLEN      | 1600       | NULL       |
| 30         | BLAKE      | 2850       | 2850       |
+------------+------------+------------+------------+

PERCENT_RANK

Syntax

double percent_rank() over([partition_clause] [orderby_clause])

Description

Calculates the percentile rank of the current row in a partition based on orderby_clause.
Description

partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the DOUBLE type is returned. The valid value range is [0.0, 1.0]. The return value is calculated by using the following formula: (rank - 1)/(partition_row_count - 1). rank indicates the return value of the RANK function that corresponds to the current row. partition_row_count indicates the number of rows in the partition to which the current row belongs. If the partition contains only one row of data, 0.0 is returned.

Examples

Calculate the percentile rank of each employee in a group based on the sal column. Sample statement:

select deptno, ename, sal, percent_rank() over (partition by deptno order by sal desc) as sal_new from emp;

Sample result:

+------------+------------+------------+------------+
| deptno     | ename      | sal        | sal_new    |
+------------+------------+------------+------------+
| 10         | JACCKA     | 5000       | 0.0        |
| 10         | KING       | 5000       | 0.0        |
| 10         | CLARK      | 2450       | 0.4        |
| 10         | WELAN      | 2450       | 0.4        |
| 10         | TEBAGE     | 1300       | 0.8        |
| 10         | MILLER     | 1300       | 0.8        |
| 20         | SCOTT      | 3000       | 0.0        |
| 20         | FORD       | 3000       | 0.0        |
| 20         | JONES      | 2975       | 0.5        |
| 20         | ADAMS      | 1100       | 0.75       |
| 20         | SMITH      | 800        | 1.0        |
| 30         | BLAKE      | 2850       | 0.0        |
| 30         | ALLEN      | 1600       | 0.2        |
| 30         | TURNER     | 1500       | 0.4        |
| 30         | MARTIN     | 1250       | 0.6        |
| 30         | WARD       | 1250       | 0.6        |
| 30         | JAMES      | 950        | 1.0        |
+------------+------------+------------+------------+

RANK

Syntax

bigint rank() over ([partition_clause] [orderby_clause])

Description

Calculates the rank of a row in an ordered group of rows that are sorted based on orderby_clause. The rank counts from 1.
Description

partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the BIGINT type is returned. The return value may be duplicate and inconsecutive. The return value is the sequence number of the first row in the group to which the current row belongs. The sequence number of the first row is calculated by using the ROW_NUMBER() function. If orderby_clause is not specified, the values in the returned results are all 1.

Examples

Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the sequence numbers of the employees in their respective groups. Sample statement:

select deptno, ename, sal, rank() over (partition by deptno order by sal desc) as nums from emp;

Sample result:

+------------+------------+------------+------------+
| deptno     | ename      | sal        | nums       |
+------------+------------+------------+------------+
| 10         | JACCKA     | 5000       | 1          |
| 10         | KING       | 5000       | 1          |
| 10         | CLARK      | 2450       | 3          |
| 10         | WELAN      | 2450       | 3          |
| 10         | TEBAGE     | 1300       | 5          |
| 10         | MILLER     | 1300       | 5          |
| 20         | SCOTT      | 3000       | 1          |
| 20         | FORD       | 3000       | 1          |
| 20         | JONES      | 2975       | 3          |
| 20         | ADAMS      | 1100       | 4          |
| 20         | SMITH      | 800        | 5          |
| 30         | BLAKE      | 2850       | 1          |
| 30         | ALLEN      | 1600       | 2          |
| 30         | TURNER     | 1500       | 3          |
| 30         | MARTIN     | 1250       | 4          |
| 30         | WARD       | 1250       | 4          |
| 30         | JAMES      | 950        | 6          |
+------------+------------+------------+------------+

ROW_NUMBER

Syntax

row_number() over([partition_clause] [orderby_clause])

Description

Returns the sequence number of the current row in a partition. The sequence number counts from 1.
Description

For more information, see windowing_definition. frame_clause is not supported.
Return value

A value of the BIGINT type is returned.

Examples

Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value, and obtain the sequence numbers of the employees in their respective groups. Sample statement:

select deptno, ename, sal, row_number() over (partition by deptno order by sal desc) as nums from emp;

Sample result:

+------------+------------+------------+------------+
| deptno     | ename      | sal        | nums       |
+------------+------------+------------+------------+
| 10         | JACCKA     | 5000       | 1          |
| 10         | KING       | 5000       | 2          |
| 10         | CLARK      | 2450       | 3          |
| 10         | WELAN      | 2450       | 4          |
| 10         | TEBAGE     | 1300       | 5          |
| 10         | MILLER     | 1300       | 6          |
| 20         | SCOTT      | 3000       | 1          |
| 20         | FORD       | 3000       | 2          |
| 20         | JONES      | 2975       | 3          |
| 20         | ADAMS      | 1100       | 4          |
| 20         | SMITH      | 800        | 5          |
| 30         | BLAKE      | 2850       | 1          |
| 30         | ALLEN      | 1600       | 2          |
| 30         | TURNER     | 1500       | 3          |
| 30         | MARTIN     | 1250       | 4          |
| 30         | WARD       | 1250       | 5          |
| 30         | JAMES      | 950        | 6          |
+------------+------------+------------+------------+

STDDEV

Syntax

double stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
decimal stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])

Description

Returns the population standard deviation of all input values. This function is also called STDDEV_POP.
Description
- expr: required. The expression that is used to calculate the population standard deviation. The input values can be of the DOUBLE or DECIMAL type.
  - If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
  - If the value in a row is null, this row is not used for calculation.
  - If the distinct keyword is specified, the population standard deviation of distinct values is calculated.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the same data type as expr is returned. If the values of all expressions that are specified by expr are null, null is returned.

Examples

Example 1: Use the deptno column to define a window and calculate the population standard deviation of the sal column. The ORDER BY clause is not specified. This function returns the cumulative population standard deviation of the current window. The current window includes the rows that have the same deptno value. Sample statement:

select deptno, sal, stddev(sal) over (partition by deptno) from emp;

Sample result:

+------------+------------+------------+
| deptno     | sal        | _c2        |
+------------+------------+------------+
| 10         | 1300       | 1546.1421524412158 |   -- This row is the first row of this window. The return value is the cumulative population standard deviation of the values from the first row to the sixth row. 
| 10         | 2450       | 1546.1421524412158 |   -- The return value is the cumulative population standard deviation of the values from the first row to the sixth row. 
| 10         | 5000       | 1546.1421524412158 |
| 10         | 1300       | 1546.1421524412158 |
| 10         | 5000       | 1546.1421524412158 |
| 10         | 2450       | 1546.1421524412158 |
| 20         | 3000       | 1004.7387720198718 |
| 20         | 3000       | 1004.7387720198718 |
| 20         | 800        | 1004.7387720198718 |
| 20         | 1100       | 1004.7387720198718 |
| 20         | 2975       | 1004.7387720198718 |
| 30         | 1500       | 610.1001739241042 |
| 30         | 950        | 610.1001739241042 |
| 30         | 1600       | 610.1001739241042 |
| 30         | 1250       | 610.1001739241042 |
| 30         | 1250       | 610.1001739241042 |
| 30         | 2850       | 610.1001739241042 |
+------------+------------+------------+

Example 2: Use the deptno column to define a window and calculate the population standard deviation of the sal column. The ORDER BY clause is specified. This function returns the cumulative population standard deviation of the values from the first row to the row that has the same sal value as the current row in the current window. The population standard deviations for the rows that have the same sal value are the same. The current window includes the rows that have the same deptno value. Sample statement:

-- Execute the following statement: 
select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;

Sample result:

+------------+------------+------------+
| deptno     | sal        | _c2        |
+------------+------------+------------+
| 10         | 1300       | 0.0        |           -- This row is the first row of this window. The population standard deviation for the first row is the cumulative population standard deviation of the values in the first and second rows because the two rows have the same sal value. 
| 10         | 1300       | 0.0        |           -- The return value is the cumulative population standard deviation of the values in the first and second rows. 
| 10         | 2450       | 575.0      |           -- The population standard deviation for the third row is the cumulative population standard deviation of the values from the first row to the fourth row because the third and fourth rows have the same sal value. 
| 10         | 2450       | 575.0      |           -- The return value is the cumulative population standard deviation of the values from the first row to the fourth row. 
| 10         | 5000       | 1546.1421524412158 |
| 10         | 5000       | 1546.1421524412158 |
| 20         | 800        | 0.0        |
| 20         | 1100       | 150.0      |
| 20         | 2975       | 962.4188277460079 |
| 20         | 3000       | 1004.7387720198718 |
| 20         | 3000       | 1004.7387720198718 |
| 30         | 950        | 0.0        |
| 30         | 1250       | 141.4213562373095 |
| 30         | 1250       | 141.4213562373095 |
| 30         | 1500       | 194.8557158514987 |
| 30         | 1600       | 226.71568097509268 |
| 30         | 2850       | 610.1001739241042 |
+------------+------------+------------+

STDDEV_SAMP

Syntax

double stddev_samp([distinct] <expr>) over([partition_clause] [orderby_clause] [frame_clause])
decimal stddev_samp([distinct] <expr>) over([partition_clause] [orderby_clause] [frame_clause])

Description

This function returns the sample standard deviation of all input values.
Description
- expr: required. This parameter specifies the expression that is used to calculate the sample standard deviation. The input values can be of the DOUBLE or DECIMAL type.
  - If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
  - If the value for a row is NULL, this row is not used for the calculation.
  - If the distinct keyword is specified, the sample standard deviation of distinct values is calculated.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value

A value of the same data type as expr is returned. If the values of all expressions that are specified by expr are NULL, NULL is returned. If the window has only one row of data whose expr value is not NULL, 0 is returned.

Examples

Example 1: Use the deptno column to define a window and calculate the sample standard deviation of the sal column. The ORDER BY clause is not specified. This function returns the cumulative sample standard deviation of the current window. The current window includes the rows that have the same deptno value. Sample statement:

select deptno, sal, stddev_samp(sal) over (partition by deptno) from emp;

Sample result:

+------------+------------+------------+
| deptno     | sal        | _c2        |
+------------+------------+------------+
| 10         | 1300       | 1693.7138680032904 |   -- This row is the first row of this window. The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. 
| 10         | 2450       | 1693.7138680032904 |   -- The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. 
| 10         | 5000       | 1693.7138680032904 |   -- The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. 
| 10         | 1300       | 1693.7138680032904 |     
| 10         | 5000       | 1693.7138680032904 |
| 10         | 2450       | 1693.7138680032904 |
| 20         | 3000       | 1123.3320969330487 |
| 20         | 3000       | 1123.3320969330487 |
| 20         | 800        | 1123.3320969330487 |
| 20         | 1100       | 1123.3320969330487 |
| 20         | 2975       | 1123.3320969330487 |
| 30         | 1500       | 668.331255192114 |
| 30         | 950        | 668.331255192114 |
| 30         | 1600       | 668.331255192114 |
| 30         | 1250       | 668.331255192114 |
| 30         | 1250       | 668.331255192114 |
| 30         | 2850       | 668.331255192114 |
+------------+------------+------------+

Example 2: Use the deptno column to define a window and calculate the sample standard deviation of the sal column. The ORDER BY clause is specified. This function returns the cumulative sample standard deviation of the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statement:

select deptno, sal, stddev_samp(sal) over (partition by deptno order by sal) from emp;

Sample result:

+------------+------------+------------+
| deptno     | sal        | _c2        |
+------------+------------+------------+
| 10         | 1300       | 0.0        |          -- This row is the first row of this window. 
| 10         | 1300       | 0.0        |          -- The return value is the cumulative sample standard deviation of the values in the first and second rows. 
| 10         | 2450       | 663.9528095680697 |   -- The return value is the cumulative sample standard deviation of the values from the first row to the third row. 
| 10         | 2450       | 663.9528095680696 |
| 10         | 5000       | 1511.2081259707413 |
| 10         | 5000       | 1693.7138680032904 |
| 20         | 800        | 0.0        |
| 20         | 1100       | 212.13203435596427 |
| 20         | 2975       | 1178.7175234126282 |
| 20         | 3000       | 1182.7536725793752 |
| 20         | 3000       | 1123.3320969330487 |
| 30         | 950        | 0.0        |
| 30         | 1250       | 212.13203435596427 |
| 30         | 1250       | 173.20508075688772 |
| 30         | 1500       | 225.0      |
| 30         | 1600       | 253.4758371127315 |
| 30         | 2850       | 668.331255192114 |
+------------+------------+------------+

SUM

Syntax

sum([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])

Description

Returns the sum of expr in a window.
Description
- expr: required. This parameter specifies the column whose sum you want to calculate. The column can be of the DOUBLE, DECIMAL, or BIGINT type.
  - If an input value is of the STRING type, it is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
  - If the input value in a row is null, the row is not used for calculation.
  - If the distinct keyword is specified, the sum of distinct values is calculated.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
- If input values are of the BIGINT type, a value of the BIGINT type is returned.
- If input values are of the DECIMAL type, a value of the DECIMAL type is returned.
- If input values are of the DOUBLE or STRING type, a value of the DOUBLE type is returned.
- If input values are null, null is returned.

Examples

Example 1: Use the deptno column to define a window and calculate the sum of the sal column. The ORDER BY clause is not specified. This function returns the cumulative sum of the current window. The current window includes the rows that have the same deptno value. Sample statement:

select deptno, sal, sum(sal) over (partition by deptno) from emp;

Sample result:

+------------+------------+------------+
| deptno     | sal        | _c2        |
+------------+------------+------------+
| 10         | 1300       | 17500      |   -- This row is the first row of this window. The return value is the cumulative sum of the values from the first row to the sixth row. 
| 10         | 2450       | 17500      |   -- The return value is the cumulative sum of the values from the first row to the sixth row. 
| 10         | 5000       | 17500      |   -- The return value is the cumulative sum of the values from the first row to the sixth row. 
| 10         | 1300       | 17500      |
| 10         | 5000       | 17500      |
| 10         | 2450       | 17500      |
| 20         | 3000       | 10875      |
| 20         | 3000       | 10875      |
| 20         | 800        | 10875      |
| 20         | 1100       | 10875      |
| 20         | 2975       | 10875      |
| 30         | 1500       | 9400       |
| 30         | 950        | 9400       |
| 30         | 1600       | 9400       |
| 30         | 1250       | 9400       |
| 30         | 1250       | 9400       |
| 30         | 2850       | 9400       |
+------------+------------+------------+

Example 2: Use the deptno column to define a window and calculate the sum of the sal column. The ORDER BY clause is specified. This function returns the cumulative sum of the values from the first row to the row that has the same sal value as the current row in the current window. The sum values for the rows that have the same sal value are the same. The current window includes the rows that have the same deptno value. Sample statement:

-- Execute the following statement: 
select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;

Sample result:

+------------+------------+------------+
| deptno     | sal        | _c2        |
+------------+------------+------------+
| 10         | 1300       | 2600       |   -- This row is the first row of this window. The sum for the first row is the cumulative sum of the values in the first and second rows because the two rows have the same sal value. 
| 10         | 1300       | 2600       |   -- The return value is the cumulative sum of the values in the first and second rows. 
| 10         | 2450       | 7500       |   -- The sum for the third row is the cumulative sum of the values from the first row to the fourth row because the third and fourth rows have the same sal value. 
| 10         | 2450       | 7500       |   -- The return value is the cumulative sum of the values from the first row to the fourth row. 
| 10         | 5000       | 17500      |
| 10         | 5000       | 17500      |
| 20         | 800        | 800        |
| 20         | 1100       | 1900       |
| 20         | 2975       | 4875       |
| 20         | 3000       | 10875      |
| 20         | 3000       | 10875      |
| 30         | 950        | 950        |
| 30         | 1250       | 3450       |
| 30         | 1250       | 3450       |
| 30         | 1500       | 4950       |
| 30         | 1600       | 6550       |
| 30         | 2850       | 9400       |
+------------+------------+------------+

Function	Description
AVG	Calculates the average value of data in a window.
CLUSTER_SAMPLE	Samples random rows of data. If true is returned, the specified row of data is sampled.
COUNT	Calculates the number of rows in a window.
CUME_DIST	Calculates the cumulative distribution of data in a partition.
DENSE_RANK	Calculates the rank of a row in a group of rows. The ranks are consecutive.
FIRST_VALUE	Obtains the calculated result of the first row of data in the window to which the current row belongs.
LAG	Obtains the calculated result of the Nth row of data that precedes the current row at a given offset in a window.
LAST_VALUE	Obtains the calculated result of the last row of data in the window to which the current row belongs.
LEAD	Obtains the calculated result of the Nth row of data that follows the current row at a given offset in a window.
MAX	Calculates the maximum value in a window.
MEDIAN	Calculates the median in a window.
MIN	Calculates the minimum value in a window.
NTILE	Splits rows of data in a partition into N groups of equal size and returns the number of the group to which the current row belongs. The group number ranges from 1 to N.
NTH_VALUE	Obtains the calculated result of the Nth row of data in a window to which the current row belongs.
PERCENT_RANK	Calculates the percentile rank of a row in an ordered group of rows. The function returns a percentage value.
RANK	Calculates the rank of a row in an ordered group of rows. The ranks may not be consecutive integers.
ROW_NUMBER	Calculates the sequence number of a row. The row number starts from 1.
STDDEV	Returns the population standard deviation of all the input values. This function is also called STDDEV_POP.
STDDEV_SAMP	Returns the sample standard deviation of all the input values.
SUM	Calculates the sum of data in a window.