You can use window functions in MaxCompute SQL to flexibly process and analyze data of columns in a window. This topic describes the syntax and parameters of window functions that are supported by MaxCompute SQL. This topic also provides examples on how to use window functions. It guides you through data development by using window functions.
The following table lists the window functions that are supported by MaxCompute SQL.
Function | Description |
---|---|
ROW_NUMBER | Calculates the sequence number of a row. The row number starts from 1. |
RANK | Calculates the rank of a row in an ordered group of rows. The ranks may not be consecutive integers. |
DENSE_RANK | Calculates the rank of a row in an ordered group of rows. The ranks are consecutive integers. |
PERCENT_RANK | Calculates the percentile rank of a row in an ordered group of rows. |
CUME_DIST | Calculates the cumulative distribution of data in a partition. |
NTILE | Splits rows of data in a partition into N groups of equal size and returns the number of the group to which the current row belongs. The group number ranges from 1 to N. |
LAG | Obtains the calculated result of the Nth row of data that precedes the current row at a given offset in a window. |
LEAD | Obtains the calculated result of the Nth row of data that follows the current row at a given offset in a window. |
FIRST_VALUE | Obtains the calculated result of the first row of data in the window to which the current row belongs. |
LAST_VALUE | Obtains the calculated result of the last row of data in the window to which the current row belongs. |
NTH_VALUE | Obtains the calculated result of the Nth row of data in a window to which the current row belongs. |
CLUSTER_SAMPLE | Samples random rows of data. If true is returned, the specified row of data is sampled. |
COUNT | Calculates the number of rows in a window. |
MIN | Calculates the minimum value in a window. |
MAX | Calculates the maximum value in a window. |
AVG | Calculates the average value of data in a window. |
SUM | Calculates the sum of data in a window. |
MEDIAN | Calculates the median in a window. |
STDDEV | Returns the population standard deviation of all input values. This function is also called STDDEV_POP. |
STDDEV_SAMP | Returns the sample standard deviation of all input values. |
Limits
Before you use window functions, take note of the following limits:
- Window functions are supported only in
SELECT
statements. - A window function cannot contain nested window functions or aggregate functions.
- You cannot use window functions together with aggregate functions of the same level.
Syntax
<function_name>([distinct][<expression> [, ...]]) over (<window_definition>)
<function_name>([distinct][<expression> [, ...]]) over <window_name>
- function_name: a built-in window function, built-in aggregate function, or user-defined aggregate function (UDAF). For more information about built-in aggregate functions, see Aggregate functions. For more information about UDAFs, see Overview.
- expression: the format of a window function. The format is subject to the function syntax.
- windowing_definition: the definition of a window. For more information about the syntax of windowing_definition, see windowing_definition.
- window_name: the name of a window. You can use the
window
keyword to configure a window and use windowing_definition to specify the name of the window. Syntax of named_window_def:
Position of named_window_def in an SQL statement:window <window_name> as (<window_definition>)
select ... from ... [where ...] [group by ...] [having ...] named_window_def [order by ...] [limit ...]
windowing_definition
--partition_clause:
[partition by <expression> [, ...]]
--orderby_clause:
[order by <expression> [asc|desc][nulls {first|last}] [, ...]]
[<frame_clause>]
If you use a window function in a SELECT statement, data is partitioned and sorted
based on partition by
and order by
in windowing_definition when the window function is executed. If the SELECT statement
does not include partition by
, only one partition exists and the partition contains all data. If the SELECT statement
does not include order by
, data in a partition is arranged in a random order, and a data stream is generated.
After the data stream is generated, a group of rows is extracted from the data stream
based on frame_clause
in windowing_definition to create a window for the current row. The window function
calculates the data included in the window to which the current row belongs.
- partition by <expression> [, ...]: optional. This parameter specifies the partition information. If the values of partition key columns are the same for a group of rows, these rows are included in the same window. For more information about the format of partition by, see Table operations.
- order by <expression> [asc|desc][nulls {first|last}] [, ...]: optional. This parameter specifies how to sort rows of data in a window.
Note If the values of the column that is specified in
order by
are the same, the sorting result may not be accurate. To reduce the random ordering of data, make sure that the values of the column that is specified inorder by
are unique. - frame_clause: optional. This parameter is used to determine the data boundaries of a window. For more information about frame_clause, see frame_clause.
frame_clause
-- Syntax 1
{ROWS|RANGE|GROUPS} <frame_start> [<frame_exclusion>]
-- Syntax 2
{ROWS|RANGE|GROUPS} between <frame_start> and <frame_end> [<frame_exclusion>]
frame_clause is a closed interval that is used to determine the data boundaries of a window. The
data boundaries are determined based on the rows that are specified by frame_start and frame_end.
- ROWS|RANGE|GROUPS: required. ROWS, RANGE, and GROUPS indicate the types of frame_clause. The implementation rules of frame_start and frame_end vary based on the type of frame_clause.
- ROWS: The data boundaries of a window are determined based on the number of rows.
- RANGE: The data boundaries of a window are determined based on the comparison results
of the values of the column that is specified in
order by
. In most cases,order by
is specified in windowing_definition. Iforder by
is not specified in windowing_definition, the values of the column that is specified inorder by
are the same for all rows in a partition. NULL values are considered equivalent. - GROUPS: In a partition, rows that have the same value of the column specified in
order by
form a group. Iforder by
is not specified, all rows in the partition form a group. NULL values are considered equivalent.
- frame_start and frame_end: the start and end rows of a window. frame_start is required. frame_end is optional. If frame_end is not specified, the default value CURRENT ROW is used.
The row specified by frame_start must precede or be the same as the row specified by frame_end. Compared with the row specified by frame_end, the row specified by frame_start is closer to the first row in a window after all data in the window is sorted based on the column that is specified in
order by
of windowing_definition. The following table describes the valid values and logic of frame_start and frame_end when the type of frame_clause is ROWS, RANGE, or GROUPS.frame_clause type frame_start or frame_end value Description ROWS, RANGE, and GROUPS UNBOUNDED PRECEDING Indicates the first row of a partition. Rows are counted from 1. UNBOUNDED FOLLOWING Indicates the last row of a partition. ROWS CURRENT ROW Indicates the current row. Each row of data corresponds to a result calculated by a window function. The current row indicates the row whose data is calculated by using a window function. offset PRECEDING Indicates the Nth row that precedes the current row at a given offset
. For example, if0 PRECEDING
indicates the current row,1 PRECEDING
indicates the previous row.offset
must be a non-negative integer.offset FOLLOWING Indicates the Nth row that follows the current row at a given offset
. For example, if0 FOLLOWING
indicates the current row,1 FOLLOWING
indicates the next row.offset
must be a non-negative integer.RANGE CURRENT ROW - If frame_start is set to CURRENT ROW, it indicates the first row that has the same value of the
column specified in
order by
as the current row. - If frame_end is set to CURRENT ROW, it indicates the last row that has the same value of the column
specified in
order by
as the current row.
offset PRECEDING The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by order by
. For example, if data in a window is sorted by X, Xi indicates the X value that corresponds to the ith row, and Xc indicates the X value that corresponds to the current row.order by
is set to asc:- frame_start indicates the first row that meets the following requirement:
Xc - Xi ≤ offset
. - frame_end indicates the last row that meets the following requirement:
Xc - Xi ≥ offset
.
- frame_start indicates the first row that meets the following requirement:
order by
is set to desc:- frame_start indicates the first row that meets the following requirement:
Xi - Xc ≤ offset
. - frame_end indicates the last row that meets the following requirement:
Xi - Xc ≥ offset
.
- frame_start indicates the first row that meets the following requirement:
The column that is specified in
order by
can be of the following data types: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATETIME, DATE, and TIMESTAMP.Syntax foroffset
of a date type:N
: indicates N days or N seconds. It must be a non-negative integer. For an offset of the DATETIME or TIMESTAMP type, it indicates N seconds. For an offset of the DATE type, it indicates N days.interval 'N' {YEAR\MONTH\DAY\HOUR\MINUTE\SECOND}
: indicates N years, months, days, hours, minutes, or seconds. For example,INTERVAL '3' YEAR
indicates 3 years.INTERVAL 'N-M' YEAR TO MONTH
: indicates N years and M months. For example,INTERVAL '1-3' YEAR TO MONTH
indicates 1 year and 3 months.INTERVAL 'D[ H[:M[:S[:N]]]]' DAY TO SECOND
: indicates D days, H hours, M minutes, S seconds, and N nanoseconds. For example,INTERVAL '1 2:3:4:5' DAY TO SECOND
indicates 1 day, 2 hours, 3 minutes, 4 seconds, and 5 nanoseconds.
offset FOLLOWING The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by order by
. For example, if data in a window is sorted by X, Xi indicates the X value that corresponds to the ith row, and Xc indicates the X value that corresponds to the current row.order by
is set to asc:- frame_start indicates the first row that meets the following requirement:
Xi - Xc ≥ offset
. - frame_end indicates the last row that meets the following requirement:
Xi - Xc ≤ offset
.
- frame_start indicates the first row that meets the following requirement:
order by
is set to desc:- frame_start indicates the first row that meets the following requirement:
Xc - Xi ≥ offset
. - frame_end indicates the last row that meets the following requirement:
Xc - Xi ≤ offset
.
- frame_start indicates the first row that meets the following requirement:
GROUPS CURRENT ROW - If frame_start is set to CURRENT ROW, it indicates the first row of the group to which the current row belongs.
- If frame_end is set to CURRENT ROW, it indicates the last row of the group to which the current row belongs.
offset PRECEDING - If frame_start is set to offset PRECEDING, it indicates the first row of the Nth group that precedes
the group of the current row at a given
offset
. - If frame_end is set to offset PRECEDING, it indicates the last row of the Nth group that precedes
the group of the current row at a given
offset
.
Note You cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING.offset FOLLOWING - If frame_start is set to offset FOLLOWING, it indicates the first row of the Nth group that follows
the group of the current row at a given
offset
. - If frame_end is set to offset FOLLOWING, it indicates the last row of the Nth group that follows
the group of the current row at a given
offset
.
Note You cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING. - If frame_start is set to CURRENT ROW, it indicates the first row that has the same value of the
column specified in
- frame_exclusion: optional. This parameter is used to remove specific rows from a window. Valid values:
- EXCLUDE NO OTHERS: indicates that no rows are excluded from the window.
- EXCLUDE CURRENT ROW: indicates that the current row is excluded from the window.
- EXCLUDE GROUP: indicates that an entire group of rows in a partition is excluded from
the window. In the group, all rows have the same value of the column that is specified
in
order by
as the current row. - EXCLUDE TIES: indicates that an entire group of rows, except for the current row, are excluded from the window.
- If
odps.sql.hive.compatible
is set to true, the following default frame_clause is used. This rule applies to most SQL systems.RANGE between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS
- If
odps.sql.hive.compatible
is set to false,order by
is specified, and one of the following window functions is used, the default frame_clause in ROWS mode is used: AVG, COUNT, MAX, MIN, STDDEV, STEDEV_POP, STDDEV_SAMP, and SUM.ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS
pid, oid, and rid
. The tbl table contains the following data:+------------+------------+------------+
| pid | oid | rid |
+------------+------------+------------+
| 1 | NULL | 1 |
| 1 | NULL | 2 |
| 1 | 1 | 3 |
| 1 | 1 | 4 |
| 1 | 2 | 5 |
| 1 | 4 | 6 |
| 1 | 7 | 7 |
| 1 | 11 | 8 |
| 2 | NULL | 9 |
| 2 | NULL | 10 |
+------------+------------+------------+
You can replace ellipses (...
) in the following SQL statements with windowing_definition to display the data in the windows in which each row of data is included.
- Windows in ROWS mode
- windowing_definition 1
The following result is returned:partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | [1, 2, 3] | | 1 | 1 | 4 | [1, 2, 3, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
- windowing_definition 2
The following result is returned:partition by pid order by oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING) as window from tbl;
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | NULL | 2 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 1 | 3 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 1 | 4 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 2 | 5 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
- windowing_definition 3
The following result is returned:partition by pid order by oid ROWS between 1 FOLLOWING and 3 FOLLOWING -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between 1 FOLLOWING and 3 FOLLOWING) as window from tbl;
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [2, 3, 4] | | 1 | NULL | 2 | [3, 4, 5] | | 1 | 1 | 3 | [4, 5, 6] | | 1 | 1 | 4 | [5, 6, 7] | | 1 | 2 | 5 | [6, 7, 8] | | 1 | 4 | 6 | [7, 8] | | 1 | 7 | 7 | [8] | | 1 | 11 | 8 | NULL | | 2 | NULL | 9 | [10] | | 2 | NULL | 10 | NULL | +------------+------------+------------+--------+
- windowing_definition 4
The following result is returned:partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW) as window from tbl;
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | NULL | | 1 | NULL | 2 | [1] | | 1 | 1 | 3 | [1, 2] | | 1 | 1 | 4 | [1, 2, 3] | | 1 | 2 | 5 | [1, 2, 3, 4] | | 1 | 4 | 6 | [1, 2, 3, 4, 5] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7] | | 2 | NULL | 9 | NULL | | 2 | NULL | 10 | [9] | +------------+------------+------------+--------+
- windowing_definition 5
The following result is returned:partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP) as window from tbl;
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | NULL | | 1 | NULL | 2 | NULL | | 1 | 1 | 3 | [1, 2] | | 1 | 1 | 4 | [1, 2] | | 1 | 2 | 5 | [1, 2, 3, 4] | | 1 | 4 | 6 | [1, 2, 3, 4, 5] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7] | | 2 | NULL | 9 | NULL | | 2 | NULL | 10 | NULL | +------------+------------+------------+--------+
- windowing_definition 6
The following result is returned:partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES) as window from tbl;
The differences between EXCLUDE CURRENT ROW and EXCLUDE GROUP can be obtained based on the comparison between the+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1] | | 1 | NULL | 2 | [2] | | 1 | 1 | 3 | [1, 2, 3] | | 1 | 1 | 4 | [1, 2, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9] | | 2 | NULL | 10 | [10] | +------------+------------+------------+--------+
window
column values of rows with therid
column values of 2, 4, and 10 in Syntax 5 and Syntax 6. If frame_exclusion is set to EXCLUDE GROUP, the rows that have the samepid
column value in a partition are extracted when the rows have the sameoid
column value as the current row.
- windowing_definition 1
- Windows in RANGE mode
- windowing_definition 1
The following result is returned:partition by pid order by oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;
If frame_end is set to CURRENT ROW, the last row that has the same value of the+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | [1, 2, 3, 4] | | 1 | 1 | 4 | [1, 2, 3, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
oid
column inorder by
as the current row is obtained. Therefore, thewindow
column value of the row whoserid
column value is 1 is [1, 2]. - windowing_definition 2
The following result is returned:partition by pid order by oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING) as window from tbl;
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | NULL | 2 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 1 | 3 | [3, 4, 5, 6, 7, 8] | | 1 | 1 | 4 | [3, 4, 5, 6, 7, 8] | | 1 | 2 | 5 | [5, 6, 7, 8] | | 1 | 4 | 6 | [6, 7, 8] | | 1 | 7 | 7 | [7, 8] | | 1 | 11 | 8 | [8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
- windowing_definition 3
The following result is returned:partition by pid order by oid RANGE between 3 PRECEDING and 1 PRECEDING -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid RANGE between 3 PRECEDING and 1 PRECEDING) as window from tbl;
For the row whose value of+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | NULL | | 1 | 1 | 4 | NULL | | 1 | 2 | 5 | [3, 4] | | 1 | 4 | 6 | [3, 4, 5] | | 1 | 7 | 7 | [6] | | 1 | 11 | 8 | NULL | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
oid
inorder by
is NULL, if frame_start is set tooffset PRECEDING
or offset FOLLOWING, the row is the first row whose value of oid in order by is NULL. If frame_end is set to offset PRECEDING or offset FOLLOWING, the row is the last row whose value of oid inorder by
is NULL.
- windowing_definition 1
- Windows in GROUPS mode
windowing_definition
The following result is returned:partition by pid order by oid GROUPS between 2 PRECEDING and CURRENT ROW -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid GROUPS between 2 PRECEDING and CURRENT ROW) as window from tbl;
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | [1, 2, 3, 4] | | 1 | 1 | 4 | [1, 2, 3, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [3, 4, 5, 6] | | 1 | 7 | 7 | [5, 6, 7] | | 1 | 11 | 8 | [6, 7, 8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
Sample data
create table if not exists emp
(empno bigint,
ename string,
job string,
mgr bigint,
hiredate datetime,
sal bigint,
comm bigint,
deptno bigint);
tunnel upload emp.txt emp;
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10
ROW_NUMBER
- Syntax
row_number() over([partition_clause] [orderby_clause])
- Description
This function returns the sequence number of the current row in a partition. The sequence number counts from 1.
- Parameters
For more information, see windowing_definition. frame_clause is not supported.
- Return value
A value of the BIGINT type is returned.
- Examples
Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the sequence numbers of the employees in their respective groups. Sample statement:
The following result is returned:select deptno, ename, sal, row_number() over (partition by deptno order by sal desc) as nums from emp;
+------------+------------+------------+------------+ | deptno | ename | sal | nums | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 2 | | 10 | CLARK | 2450 | 3 | | 10 | WELAN | 2450 | 4 | | 10 | TEBAGE | 1300 | 5 | | 10 | MILLER | 1300 | 6 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 2 | | 20 | JONES | 2975 | 3 | | 20 | ADAMS | 1100 | 4 | | 20 | SMITH | 800 | 5 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 2 | | 30 | TURNER | 1500 | 3 | | 30 | MARTIN | 1250 | 4 | | 30 | WARD | 1250 | 5 | | 30 | JAMES | 950 | 6 | +------------+------------+------------+------------+
RANK
- Syntax
bigint rank() over ([partition_clause] [orderby_clause])
- Description
This function returns the rank of the current row in a partition based on the order specified by orderby_clause. The rank counts from 1.
- Parameters
partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
- Return value
A value of the BIGINT type is returned. The return value may be duplicate and inconsecutive. The return value is the sequence number of the first row in the group to which the current row belongs. The sequence number of the first row is calculated by using the ROW_NUMBER() function.
If orderby_clause is not specified, the values in the returned results are all 1.
- Examples
Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the sequence numbers of the employees in their respective groups. Sample statement:
The following result is returned:select deptno, ename, sal, rank() over (partition by deptno order by sal desc) as nums from emp;
+------------+------------+------------+------------+ | deptno | ename | sal | nums | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 1 | | 10 | CLARK | 2450 | 3 | | 10 | WELAN | 2450 | 3 | | 10 | TEBAGE | 1300 | 5 | | 10 | MILLER | 1300 | 5 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 1 | | 20 | JONES | 2975 | 3 | | 20 | ADAMS | 1100 | 4 | | 20 | SMITH | 800 | 5 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 2 | | 30 | TURNER | 1500 | 3 | | 30 | MARTIN | 1250 | 4 | | 30 | WARD | 1250 | 4 | | 30 | JAMES | 950 | 6 | +------------+------------+------------+------------+
DENSE_RANK
- Syntax
bigint dense_rank() over ([partition_clause] [orderby_clause])
- Description
This function calculates the rank of a row in an ordered group of rows that are sorted based on orderby_clause. The rank counts from 1. In a partition, rows with the same value of the column that is specified in
order by
have the same rank. The rank increases by 1 each time the value of the column that is specified inorder by
changes. - Parameters
partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
- Return value
A value of the BIGINT type is returned. If orderby_clause is not specified, the values in the returned results are all 1.
- Examples
Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the sequence numbers of the employees in their respective groups. Sample statement:
The following result is returned:select deptno, ename, sal, dense_rank() over (partition by deptno order by sal desc) as nums from emp;
+------------+------------+------------+------------+ | deptno | ename | sal | nums | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 1 | | 10 | CLARK | 2450 | 2 | | 10 | WELAN | 2450 | 2 | | 10 | TEBAGE | 1300 | 3 | | 10 | MILLER | 1300 | 3 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 1 | | 20 | JONES | 2975 | 2 | | 20 | ADAMS | 1100 | 3 | | 20 | SMITH | 800 | 4 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 2 | | 30 | TURNER | 1500 | 3 | | 30 | MARTIN | 1250 | 4 | | 30 | WARD | 1250 | 4 | | 30 | JAMES | 950 | 5 | +------------+------------+------------+------------+
PERCENT_RANK
- Syntax
double percent_rank() over([partition_clause] [orderby_clause])
- Description
This function calculates the percentile rank of the current row in a partition based on orderby_clause.
- Parameters
partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
- Return value
A value of the DOUBLE type is returned. The valid value range is [0.0, 1.0]. The return value is calculated by using the following formula:
(rank - 1)/(partition_row_count - 1)
.rank
indicates the return value of the RANK function that corresponds to the current row.partition_row_count
indicates the number of rows in the partition to which the current row belongs. If the partition contains only one row of data, 0.0 is returned. - Examples
Calculate the percentile rank of each employee in a group based on the sal column. Sample statement:
The following result is returned:select deptno, ename, sal, percent_rank(sal) over (partition by deptno order by sal desc) as sal_new from emp;
+------------+------------+------------+------------+ | deptno | ename | sal | sal_new | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 0.0 | | 10 | KING | 5000 | 0.0 | | 10 | CLARK | 2450 | 0.4 | | 10 | WELAN | 2450 | 0.4 | | 10 | TEBAGE | 1300 | 0.8 | | 10 | MILLER | 1300 | 0.8 | | 20 | SCOTT | 3000 | 0.0 | | 20 | FORD | 3000 | 0.0 | | 20 | JONES | 2975 | 0.5 | | 20 | ADAMS | 1100 | 0.75 | | 20 | SMITH | 800 | 1.0 | | 30 | BLAKE | 2850 | 0.0 | | 30 | ALLEN | 1600 | 0.2 | | 30 | TURNER | 1500 | 0.4 | | 30 | MARTIN | 1250 | 0.6 | | 30 | WARD | 1250 | 0.6 | | 30 | JAMES | 950 | 1.0 | +------------+------------+------------+------------+
CUME_DIST
- Syntax
double cume_dist() over([partition_clause] [orderby_clause])
- Description
This function calculates the cumulative distribution. The cumulative distribution indicates the ratio of rows whose values are greater than or equal to the values of the current row to all rows in a partition. The ratio is determined based on orderby_clause.
- Parameters
partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
- Return value
A value of the DOUBLE type is returned. The return value is calculated by using the following formula:
row_number_of_last_peer/partition_row_count
.row_number_of_last_peer
indicates the value returned by ROW_NUMBER that corresponds to the last row of the group to which the current row belongs.partition_row_count
indicates the number of rows in a partition to which the current row belongs. - Examples
Group all employees based on the deptno column and calculate the cumulative distribution of employees in each group by salary. Sample statement:
The following result is returned:select deptno, ename, sal, concat(round(cume_dist() over (partition by deptno order by sal desc)*100,2),'%') as cume_dist from emp;
+------------+------------+------------+------------+ | deptno | ename | sal | cume_dist | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 33.33% | | 10 | KING | 5000 | 33.33% | | 10 | CLARK | 2450 | 66.67% | | 10 | WELAN | 2450 | 66.67% | | 10 | TEBAGE | 1300 | 100.0% | | 10 | MILLER | 1300 | 100.0% | | 20 | SCOTT | 3000 | 40.0% | | 20 | FORD | 3000 | 40.0% | | 20 | JONES | 2975 | 60.0% | | 20 | ADAMS | 1100 | 80.0% | | 20 | SMITH | 800 | 100.0% | | 30 | BLAKE | 2850 | 16.67% | | 30 | ALLEN | 1600 | 33.33% | | 30 | TURNER | 1500 | 50.0% | | 30 | MARTIN | 1250 | 83.33% | | 30 | WARD | 1250 | 83.33% | | 30 | JAMES | 950 | 100.0% | +------------+------------+------------+------------+
NTILE
- Syntax
bigint ntile(bigint <N>) over ([partition_clause] [orderby_clause])
- Description
This function splits rows of data in a partition into N groups of equal size and returns the number of the group to which the specified row belongs. If data in the partition cannot be split into N groups of equal size, one more row is preferentially allocated to the first M groups.
- Parameters
- N: required. This parameter specifies the number of splits. The input value is of the BIGINT type.
- partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
- Return value
A value of the BIGINT type is returned.
- Examples
Divide all employees into three groups based on the sal column in descending order and obtain the number of the group to which each employee belongs. Sample statement:
The following result is returned:select deptno, ename, sal, ntile(3) over (partition by deptno order by sal desc) as nt3 from emp;
+------------+------------+------------+------------+ | deptno | ename | sal | nt3 | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 1 | | 10 | CLARK | 2450 | 2 | | 10 | WELAN | 2450 | 2 | | 10 | TEBAGE | 1300 | 3 | | 10 | MILLER | 1300 | 3 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 1 | | 20 | JONES | 2975 | 2 | | 20 | ADAMS | 1100 | 2 | | 20 | SMITH | 800 | 3 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 1 | | 30 | TURNER | 1500 | 2 | | 30 | MARTIN | 1250 | 2 | | 30 | WARD | 1250 | 3 | | 30 | JAMES | 950 | 3 | +------------+------------+------------+------------+
LAG
- Syntax
lag(<expr>[, bigint <offset>[, <default>]]) over([partition_clause] orderby_clause)
- Description
This function returns the value of expr that precedes the current row at a given offset. The expression expr can be a column, a column operation, or a function operation.
- Parameters
- expr: required. The expression that is used to calculate the returned result.
- offset: optional. The value is a constant of the BIGINT type and must be greater than or equal to 0. The value 0 indicates the current row, and the value 1 indicates the previous row. Default value: 1. If the input value is of the STRING or DOUBLE type, it is implicitly converted into a value of the BIGINT type before calculation.
- default: optional. The default value when the value of offset is out of the valid range. The value of this parameter must be a constant. The default value of this parameter is NULL. The value of this parameter must be of the same data type as the value of expr. If the value of expr is not a constant, the parameter value is determined based on the current row.
- partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
- Return value
A value of the same data type as expr is returned.
- Examples
Group all employees based on the deptno column and calculate the value of sal for each employee at a given offset. Sample statement:
The following result is returned:select deptno, ename, sal, lag(sal, 1) over (partition by deptno order by sal) as sal_new from emp;
+------------+------------+------------+------------+ | deptno | ename | sal | sal_new | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | NULL | | 10 | MILLER | 1300 | 1300 | | 10 | CLARK | 2450 | 1300 | | 10 | WELAN | 2450 | 2450 | | 10 | KING | 5000 | 2450 | | 10 | JACCKA | 5000 | 5000 | | 20 | SMITH | 800 | NULL | | 20 | ADAMS | 1100 | 800 | | 20 | JONES | 2975 | 1100 | | 20 | SCOTT | 3000 | 2975 | | 20 | FORD | 3000 | 3000 | | 30 | JAMES | 950 | NULL | | 30 | MARTIN | 1250 | 950 | | 30 | WARD | 1250 | 1250 | | 30 | TURNER | 1500 | 1250 | | 30 | ALLEN | 1600 | 1500 | | 30 | BLAKE | 2850 | 1600 | +------------+------------+------------+------------+
LEAD
- Syntax
lead(<expr>[, bigint <offset>[, <default>]]) over([partition_clause] orderby_clause)
- Description
This function returns the value of expr that corresponds to the Nth row following the current row at a given offset. The expression expr can be a column, a column operation, or a function operation.
- Parameters
- expr: required. The expression that is used to calculate the returned result.
- offset: optional. The value is a constant of the BIGINT type and must be greater than or equal to 0. The value 0 indicates the current row, and the value 1 indicates the next row. Default value: 1. If the input value is of the STRING or DOUBLE type, it is implicitly converted into a value of the BIGINT type before calculation.
- default: optional. The default value when the value of offset is out of the valid range. The value of this parameter must be a constant. The default value of this parameter is NULL. The value of this parameter must be of the same data type as the value of expr. If the value of expr is not a constant, the parameter value is determined based on the current row.
- partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
- Return value
A value of the same data type as expr is returned.
- Examples
Group all employees based on the deptno column and calculate the value of sal for each employee at a given offset. Sample statement:
The following result is returned:select deptno, ename, sal, lead(sal, 1) over (partition by deptno order by sal) as sal_new from emp;
+------------+------------+------------+------------+ | deptno | ename | sal | sal_new | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | 1300 | | 10 | MILLER | 1300 | 2450 | | 10 | CLARK | 2450 | 2450 | | 10 | WELAN | 2450 | 5000 | | 10 | KING | 5000 | 5000 | | 10 | JACCKA | 5000 | NULL | | 20 | SMITH | 800 | 1100 | | 20 | ADAMS | 1100 | 2975 | | 20 | JONES | 2975 | 3000 | | 20 | SCOTT | 3000 | 3000 | | 20 | FORD | 3000 | NULL | | 30 | JAMES | 950 | 1250 | | 30 | MARTIN | 1250 | 1250 | | 30 | WARD | 1250 | 1500 | | 30 | TURNER | 1500 | 1600 | | 30 | ALLEN | 1600 | 2850 | | 30 | BLAKE | 2850 | NULL | +------------+------------+------------+------------+
FIRST_VALUE
- Syntax
first_value(<expr>[, <ignore_nulls>]) over ([partition_clause] [orderby_clause] [frame_clause])
- Description
This function returns the value of expr that corresponds to the first row of a window.
- Parameters
- expr: required. The expression that is used to calculate the returned result.
- ignore_nulls: optional. A value of the BOOLEAN type. This parameter specifies whether to ignore NULL values. Default value: False. If this parameter is set to True, a non-NULL value of expr that corresponds to the first row of a window is returned.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
- Return value
A value of the same data type as expr is returned.
- Examples
Group all employees by department and return the first row of data in each group. Sample statement:
- order by is not specified.
The following result is returned:select deptno, ename, sal, first_value(sal) over (partition by deptno) as first_value from emp;
+------------+------------+------------+-------------+ | deptno | ename | sal | first_value | +------------+------------+------------+-------------+ | 10 | TEBAGE | 1300 | 1300 | -- This row is the first row of the current window. | 10 | CLARK | 2450 | 1300 | | 10 | KING | 5000 | 1300 | | 10 | MILLER | 1300 | 1300 | | 10 | JACCKA | 5000 | 1300 | | 10 | WELAN | 2450 | 1300 | | 20 | FORD | 3000 | 3000 | -- This row is the first row of the current window. | 20 | SCOTT | 3000 | 3000 | | 20 | SMITH | 800 | 3000 | | 20 | ADAMS | 1100 | 3000 | | 20 | JONES | 2975 | 3000 | | 30 | TURNER | 1500 | 1500 | -- This row is the first row of the current window. | 30 | JAMES | 950 | 1500 | | 30 | ALLEN | 1600 | 1500 | | 30 | WARD | 1250 | 1500 | | 30 | MARTIN | 1250 | 1500 | | 30 | BLAKE | 2850 | 1500 | +------------+------------+------------+-------------+
- order by is specified.
The following result is returned:select deptno, ename, sal, first_value(sal) over (partition by deptno order by sal desc) as first_value from emp;
+------------+------------+------------+-------------+ | deptno | ename | sal | first_value | +------------+------------+------------+-------------+ | 10 | JACCKA | 5000 | 5000 | -- This row is the first row of the current window. | 10 | KING | 5000 | 5000 | | 10 | CLARK | 2450 | 5000 | | 10 | WELAN | 2450 | 5000 | | 10 | TEBAGE | 1300 | 5000 | | 10 | MILLER | 1300 | 5000 | | 20 | SCOTT | 3000 | 3000 | -- This row is the first row of the current window. | 20 | FORD | 3000 | 3000 | | 20 | JONES | 2975 | 3000 | | 20 | ADAMS | 1100 | 3000 | | 20 | SMITH | 800 | 3000 | | 30 | BLAKE | 2850 | 2850 | -- This row is the first row of the current window. | 30 | ALLEN | 1600 | 2850 | | 30 | TURNER | 1500 | 2850 | | 30 | MARTIN | 1250 | 2850 | | 30 | WARD | 1250 | 2850 | | 30 | JAMES | 950 | 2850 | +------------+------------+------------+-------------+
- order by is not specified.
LAST_VALUE
- Syntax
last_value(<expr>[, <ignore_nulls>]) over([partition_clause] [orderby_clause] [frame_clause])
- Description
This function returns the value of expr that corresponds to the last row of a window.
- Parameters
- expr: required. The expression that is used to calculate the returned result.
- ignore_nulls: optional. A value of the BOOLEAN type. This parameter specifies whether to ignore NULL values. Default value: False. If this parameter is set to True, a non-NULL value of expr that corresponds to the last row of a window is returned.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
- Return value
A value of the same data type as expr is returned.
- Examples
Group all employees by department and return the last row of data in each group. Sample statement:
- If order by is not specified, the rows from the first row to the last row belong to the current
window. The value of the last row in the current window is returned.
The following result is returned:select deptno, ename, sal, last_value(sal) over (partition by deptno) as last_value from emp;
+------------+------------+------------+-------------+ | deptno | ename | sal | last_value | +------------+------------+------------+-------------+ | 10 | TEBAGE | 1300 | 2450 | | 10 | CLARK | 2450 | 2450 | | 10 | KING | 5000 | 2450 | | 10 | MILLER | 1300 | 2450 | | 10 | JACCKA | 5000 | 2450 | | 10 | WELAN | 2450 | 2450 | -- This row is the last row of the current window. | 20 | FORD | 3000 | 2975 | | 20 | SCOTT | 3000 | 2975 | | 20 | SMITH | 800 | 2975 | | 20 | ADAMS | 1100 | 2975 | | 20 | JONES | 2975 | 2975 | -- This row is the last row of the current window. | 30 | TURNER | 1500 | 2850 | | 30 | JAMES | 950 | 2850 | | 30 | ALLEN | 1600 | 2850 | | 30 | WARD | 1250 | 2850 | | 30 | MARTIN | 1250 | 2850 | | 30 | BLAKE | 2850 | 2850 | -- This row is the last row of the current window. +------------+------------+------------+-------------+
- If order by is specified, the rows from the first row to the current row belong to the current
window. The value of the current row in the current window is returned.
The following result is returned:select deptno, ename, sal, last_value(sal) over (partition by deptno order by sal desc) as last_value from emp;
+------------+------------+------------+-------------+ | deptno | ename | sal | last_value | +------------+------------+------------+-------------+ | 10 | JACCKA | 5000 | 5000 | -- This row is the current row of the current window. | 10 | KING | 5000 | 5000 | -- This row is the current row of the current window. | 10 | CLARK | 2450 | 2450 | -- This row is the current row of the current window. | 10 | WELAN | 2450 | 2450 | -- This row is the current row of the current window. | 10 | TEBAGE | 1300 | 1300 | -- This row is the current row of the current window. | 10 | MILLER | 1300 | 1300 | -- This row is the current row of the current window. | 20 | SCOTT | 3000 | 3000 | -- This row is the current row of the current window. | 20 | FORD | 3000 | 3000 | -- This row is the current row of the current window. | 20 | JONES | 2975 | 2975 | -- This row is the current row of the current window. | 20 | ADAMS | 1100 | 1100 | -- This row is the current row of the current window. | 20 | SMITH | 800 | 800 | -- This row is the current row of the current window. | 30 | BLAKE | 2850 | 2850 | -- This row is the current row of the current window. | 30 | ALLEN | 1600 | 1600 | -- This row is the current row of the current window. | 30 | TURNER | 1500 | 1500 | -- This row is the current row of the current window. | 30 | MARTIN | 1250 | 1250 | -- This row is the current row of the current window. | 30 | WARD | 1250 | 1250 | -- This row is the current row of the current window. | 30 | JAMES | 950 | 950 | -- This row is the current row of the current window. +------------+------------+------------+-------------+
- If order by is not specified, the rows from the first row to the last row belong to the current
window. The value of the last row in the current window is returned.
NTH_VALUE
- Syntax
nth_value(<expr>, <number> [, <ignore_nulls>]) over ([partition_clause] [orderby_clause] [frame_clause])
- Description
This function returns the value of expr that corresponds to the Nth row in a window.
- Parameters
- expr: required. The expression that is used to calculate the returned result.
- number: required. The value is of the BIGINT type and must be an integer greater than or equal to 1. If the input value is 1, this function is equivalent to FIRST_VALUE.
- ignore_nulls: optional. A value of the BOOLEAN type. This parameter specifies whether to ignore NULL values. Default value: False. If this parameter is set to True, a non-NULL value of expr that corresponds to the Nth row of a window is returned.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
- Return value
A value of the same data type as expr is returned.
- Examples
Group all employees by department and return the sixth row of data in each group. Sample statement:
- If order by is not specified, the rows from the first row to the last row belong to the current
window. The value of the sixth row in the current window is returned.
The following result is returned:select deptno, ename, sal, nth_value(sal,6) over (partition by deptno) as nth_value from emp;
+------------+------------+------------+------------+ | deptno | ename | sal | nth_value | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | 2450 | | 10 | CLARK | 2450 | 2450 | | 10 | KING | 5000 | 2450 | | 10 | MILLER | 1300 | 2450 | | 10 | JACCKA | 5000 | 2450 | | 10 | WELAN | 2450 | 2450 | -- This row is the sixth row of the current window. | 20 | FORD | 3000 | NULL | | 20 | SCOTT | 3000 | NULL | | 20 | SMITH | 800 | NULL | | 20 | ADAMS | 1100 | NULL | | 20 | JONES | 2975 | NULL | -- This current window has less than six rows, and NULL is returned. | 30 | TURNER | 1500 | 2850 | | 30 | JAMES | 950 | 2850 | | 30 | ALLEN | 1600 | 2850 | | 30 | WARD | 1250 | 2850 | | 30 | MARTIN | 1250 | 2850 | | 30 | BLAKE | 2850 | 2850 | -- This row is the sixth row of the current window. +------------+------------+------------+------------+
- If order by is specified, the rows from the first row to the current row belong to the current
window. The value of the sixth row in the current window is returned.
The following result is returned:select deptno, ename, sal, nth_value(sal,6) over (partition by deptno order by sal) as nth_value from emp;
+------------+------------+------------+------------+ | deptno | ename | sal | nth_value | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | NULL | | 10 | MILLER | 1300 | NULL | -- This window has only two rows, and NULL is returned. | 10 | CLARK | 2450 | NULL | | 10 | WELAN | 2450 | NULL | | 10 | KING | 5000 | 5000 | | 10 | JACCKA | 5000 | 5000 | | 20 | SMITH | 800 | NULL | | 20 | ADAMS | 1100 | NULL | | 20 | JONES | 2975 | NULL | | 20 | SCOTT | 3000 | NULL | | 20 | FORD | 3000 | NULL | | 30 | JAMES | 950 | NULL | | 30 | MARTIN | 1250 | NULL | | 30 | WARD | 1250 | NULL | | 30 | TURNER | 1500 | NULL | | 30 | ALLEN | 1600 | NULL | | 30 | BLAKE | 2850 | 2850 | +------------+------------+------------+------------+
- If order by is not specified, the rows from the first row to the last row belong to the current
window. The value of the sixth row in the current window is returned.
CLUSTER_SAMPLE
- Syntax
boolean cluster_sample(bigint <N>) OVER ([partition_clause]) boolean cluster_sample(bigint <N>, bigint <M>) OVER ([partition_clause])
- Description
cluster_sample(bigint <N>)
: indicates that random N rows are sampled.cluster_sample(bigint <N>, bigint <M>)
: indicates that rows are sampled based on a specified ratio (M/N). The number of rows that are sampled is calculated by using the following formula:partition_row_count × M/N
.partition_row_count
indicates the number of rows in a partition.
- Parameters
- N: required. A constant of the BIGINT type. If N is set to NULL, NULL is returned.
- M: required. A constant of the BIGINT type. If M is set to NULL, NULL is returned.
- partition_clause: optional. For more information, see windowing_definition.
- Return value
A value of the BOOLEAN type is returned.
- Examples
If you want to sample about 20% of data entries in each group, execute the following statement:
The following result is returned:select deptno, sal from ( select deptno, sal, cluster_sample(5, 1) over (partition by deptno) as flag from emp ) sub where flag = true;
+------------+------------+ | deptno | sal | +------------+------------+ | 10 | 1300 | | 20 | 3000 | | 30 | 950 | +------------+------------+
COUNT
- Syntax
bigint count(*) over ([partition_clause] [orderby_clause] [frame_clause]) bigint count([distinct] <expr>[,...]) over ([partition_clause] [orderby_clause] [frame_clause])
- Description
count(*)
: the total number of rows to return.count([distinct] <expr>[,...])
: When you calculate the number of rows, the rows whose expr is NULL are ignored. If multiple expr parameters exist, any rows whose expr is NULL are ignored. In addition, if the distinct keyword is specified, the number of rows after deduplication is calculated. Any rows whose expr is NULL are ignored.
- Parameters
- expr: required. This parameter specifies the column whose values you want to count. All data types are supported. If the value for a row is NULL, this row is not used for the calculation. If the DISTINCT keyword is specified, the count value of distinct values is obtained.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
- Return value
A value of the BIGINT type is returned.
- Examples
- Example 1: Use the sal column to define a window. The order by clause is not specified.
This function returns the cumulative count from the first row to the last row in the
current window. The current window includes the rows that have the same sal value.
Sample statement:
The following result is returned:select sal, count(sal) over (partition by sal) as count from emp;
+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 2 | -- This row is the first row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value. | 1250 | 2 | -- The return value is the cumulative count from the first row to the second row in the current window. | 1300 | 2 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 2 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 2 | | 3000 | 2 | | 5000 | 2 | | 5000 | 2 | +------------+------------+
- Example 2: In non-Hive-compatible mode, use the sal column to define a window. The
order by clause is specified. This function returns the cumulative count from the
first row to the current row in the current window. The current window includes the
rows that have the same sal value. Sample statements:
The following result is returned:-- Disable the Hive-compatible mode. set odps.sql.hive.compatible=false; -- Execute the following statement: select sal, count(sal) over (partition by sal order by sal) as count from emp;
+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 1 | -- This row is the first row of this window. The cumulative count for the first row is 1. | 1250 | 2 | -- The cumulative count for the second row is 2. | 1300 | 1 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 1 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 1 | | 3000 | 2 | | 5000 | 1 | | 5000 | 2 | +------------+------------+
- Example 3: In Hive-compatible mode, use the sal column to define a window. The order
by clause is specified. This function returns the cumulative count from the first
row to the last row in the current window. The current window includes the rows that
have the same sal value. Sample statements:
The following result is returned:-- Enable the Hive-compatible mode. set odps.sql.hive.compatible=true; -- Execute the following statement: select sal, count(sal) over (partition by sal order by sal) as count from emp;
+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 2 | -- This row is the first row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value. | 1250 | 2 | -- The return value is the cumulative count from the first row to the second row in the current window. | 1300 | 2 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 2 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 2 | | 3000 | 2 | | 5000 | 2 | | 5000 | 2 | +------------+------------+
- Example 1: Use the sal column to define a window. The order by clause is not specified.
This function returns the cumulative count from the first row to the last row in the
current window. The current window includes the rows that have the same sal value.
Sample statement:
MIN
- Syntax
min(<expr>) over([partition_clause] [orderby_clause] [frame_clause])
- Description
This function returns the minimum value of expr in a window.
- Parameters
- expr: required. The expression that is used to calculate the minimum value. The input value can be of any data type other than BOOLEAN. If the value for a row is NULL, this row is not used for calculation.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
- Return value
A value of the same data type as expr is returned.
- Examples
- Example 1: Use the deptno column to define a window and obtain the minimum value of
the sal column. The order by clause is not specified. This function returns the minimum
value of the current window. The current window includes the rows that have the same
deptno value. Sample statement:
The following result is returned:select deptno, sal, min(sal) over (partition by deptno) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- This row is the first row of this window. The return value is the minimum value among the values from the first row to the sixth row. | 10 | 2450 | 1300 | -- The return value is the minimum value among the values from the first row to the sixth row. | 10 | 5000 | 1300 | -- The return value is the minimum value among the values from the first row to the sixth row. | 10 | 1300 | 1300 | | 10 | 5000 | 1300 | | 10 | 2450 | 1300 | | 20 | 3000 | 800 | | 20 | 3000 | 800 | | 20 | 800 | 800 | | 20 | 1100 | 800 | | 20 | 2975 | 800 | | 30 | 1500 | 950 | | 30 | 950 | 950 | | 30 | 1600 | 950 | | 30 | 1250 | 950 | | 30 | 1250 | 950 | | 30 | 2850 | 950 | +------------+------------+------------+
- Example 2: Use the deptno column to define a window and obtain the minimum value of
the sal column. The order by clause is specified. This function returns the minimum
value among the values from the first row to the current row in the current window.
The current window includes the rows that have the same deptno value. Sample statement:
The following result is returned:select deptno, sal, min(sal) over (partition by deptno order by sal) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- This row is the first row of this window. | 10 | 1300 | 1300 | -- The return value is the minimum value among the values in the first and second rows. | 10 | 2450 | 1300 | -- The return value is the minimum value among the values from the first row to the third row. | 10 | 2450 | 1300 | | 10 | 5000 | 1300 | | 10 | 5000 | 1300 | | 20 | 800 | 800 | | 20 | 1100 | 800 | | 20 | 2975 | 800 | | 20 | 3000 | 800 | | 20 | 3000 | 800 | | 30 | 950 | 950 | | 30 | 1250 | 950 | | 30 | 1250 | 950 | | 30 | 1500 | 950 | | 30 | 1600 | 950 | | 30 | 2850 | 950 | +------------+------------+------------+
- Example 1: Use the deptno column to define a window and obtain the minimum value of
the sal column. The order by clause is not specified. This function returns the minimum
value of the current window. The current window includes the rows that have the same
deptno value. Sample statement:
MAX
- Syntax
max(<expr>) over([partition_clause] [orderby_clause] [frame_clause])
- Description
This function returns the maximum value of expr in a window.
- Parameters
- expr: required. The expression that is used to calculate the maximum value. The input value can be of any data type other than BOOLEAN. If the value for a row is NULL, this row is not used for calculation.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
- Return value
A value of the same type as expr is returned.
- Examples
- Example 1: Use the deptno column to define a window and obtain the maximum value of
the sal column. The order by clause is not specified. This function returns the maximum
value of the current window. The current window includes the rows that have the same
deptno value. Sample statement:
The following result is returned:select deptno, sal, max(sal) over (partition by deptno) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 5000 | -- This row is the first row of this window. The return value is the maximum value among the values from the first row to the sixth row. | 10 | 2450 | 5000 | -- The return value is the maximum value among the values from the first row to the sixth row. | 10 | 5000 | 5000 | -- The return value is the maximum value among the values from the first row to the sixth row. | 10 | 1300 | 5000 | | 10 | 5000 | 5000 | | 10 | 2450 | 5000 | | 20 | 3000 | 3000 | | 20 | 3000 | 3000 | | 20 | 800 | 3000 | | 20 | 1100 | 3000 | | 20 | 2975 | 3000 | | 30 | 1500 | 2850 | | 30 | 950 | 2850 | | 30 | 1600 | 2850 | | 30 | 1250 | 2850 | | 30 | 1250 | 2850 | | 30 | 2850 | 2850 | +------------+------------+------------+
- Example 2: Use the deptno column to define a window and obtain the maximum value of
the sal column. The order by clause is specified. This function returns the maximum
value among the values from the first row to the current row in the current window.
The current window includes the rows that have the same deptno value. Sample statement:
The following result is returned:select deptno, sal, max(sal) over (partition by deptno order by sal) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- This row is the first row of this window. | 10 | 1300 | 1300 | -- The return value is the maximum value among the values in the first and second rows. | 10 | 2450 | 2450 | -- The return value is the maximum value among the values from the first row to the third row. | 10 | 2450 | 2450 | -- The return value is the maximum value among the values from the first row to the fourth row. | 10 | 5000 | 5000 | | 10 | 5000 | 5000 | | 20 | 800 | 800 | | 20 | 1100 | 1100 | | 20 | 2975 | 2975 | | 20 | 3000 | 3000 | | 20 | 3000 | 3000 | | 30 | 950 | 950 | | 30 | 1250 | 1250 | | 30 | 1250 | 1250 | | 30 | 1500 | 1500 | | 30 | 1600 | 1600 | | 30 | 2850 | 2850 | +------------+------------+------------+
- Example 1: Use the deptno column to define a window and obtain the maximum value of
the sal column. The order by clause is not specified. This function returns the maximum
value of the current window. The current window includes the rows that have the same
deptno value. Sample statement:
SUM
- Syntax
sum([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
- Description
This function returns the sum of expr in a window.
- Parameters
- expr: required. This parameter specifies the column whose sum you want to calculate. The
column is of the DOUBLE, DECIMAL, or BIGINT type.
- If an input value is of the STRING type, it is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
- If the input value is NULL, this row is not used for calculation.
- If the distinct keyword is specified, the sum of distinct values is calculated.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
- expr: required. This parameter specifies the column whose sum you want to calculate. The
column is of the DOUBLE, DECIMAL, or BIGINT type.
- Return value
- If input values are of the BIGINT type, a value of the BIGINT type is returned.
- If input values are of the DECIMAL type, a value of the DECIMAL type is returned.
- If input values are of the DOUBLE or STRING type, a value of the DOUBLE type is returned.
- If input values are NULL, NULL is returned.
- Examples
- Example 1: Use the deptno column to define a window and calculate the sum of the sal
column. The order by clause is not specified. This function returns the cumulative
sum of the current window. The current window includes the rows that have the same
deptno value. Sample statement:
The following result is returned:select deptno, sal, sum(sal) over (partition by deptno) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 17500 | -- This row is the first row of this window. The return value is the cumulative sum of the values from the first row to the sixth row. | 10 | 2450 | 17500 | -- The return value is the cumulative sum of the values from the first row to the sixth row. | 10 | 5000 | 17500 | -- The return value is the cumulative sum of the values from the first row to the sixth row. | 10 | 1300 | 17500 | | 10 | 5000 | 17500 | | 10 | 2450 | 17500 | | 20 | 3000 | 10875 | | 20 | 3000 | 10875 | | 20 | 800 | 10875 | | 20 | 1100 | 10875 | | 20 | 2975 | 10875 | | 30 | 1500 | 9400 | | 30 | 950 | 9400 | | 30 | 1600 | 9400 | | 30 | 1250 | 9400 | | 30 | 1250 | 9400 | | 30 | 2850 | 9400 | +------------+------------+------------+
- Example 2: In non-Hive-compatible mode, use the deptno column to define a window and
calculate the sum of the sal column. The order by clause is specified. This function
returns the cumulative sum of the values from the first row to the current row in
the current window. The current window includes the rows that have the same deptno
value. Sample statements:
The following result is returned:-- Disable the Hive-compatible mode. set odps.sql.hive.compatible=false; -- Execute the following statement: select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- This row is the first row of this window. | 10 | 1300 | 2600 | -- The return value is the cumulative sum of the values in the first and second rows. | 10 | 2450 | 5050 | -- The return value is the cumulative sum of the values from the first row to the third row. | 10 | 2450 | 7500 | | 10 | 5000 | 12500 | | 10 | 5000 | 17500 | | 20 | 800 | 800 | | 20 | 1100 | 1900 | | 20 | 2975 | 4875 | | 20 | 3000 | 7875 | | 20 | 3000 | 10875 | | 30 | 950 | 950 | | 30 | 1250 | 2200 | | 30 | 1250 | 3450 | | 30 | 1500 | 4950 | | 30 | 1600 | 6550 | | 30 | 2850 | 9400 | +------------+------------+------------+
- Example 3: In Hive-compatible mode, use the deptno column to define a window and calculate
the sum of the sal column. The order by clause is specified. This function returns
the cumulative sum of the values from the first row to the row that has the same sal
value as the current row in the current window. The sum values for the rows that have
the same sal value are the same. The current window includes the rows that have the
same deptno value. Sample statements:
The following result is returned:-- Enable the Hive-compatible mode. set odps.sql.hive.compatible=true; -- Execute the following statement: select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 2600 | -- This row is the first row of this window. The sum for the first row is the cumulative sum of the values in the first and second rows because the two rows have the same sal value. | 10 | 1300 | 2600 | -- The return value is the cumulative sum of the values in the first and second rows. | 10 | 2450 | 7500 | -- The sum for the third row is the cumulative sum of the values from the first row to the fourth row because the third and fourth rows have the same sal value. | 10 | 2450 | 7500 | -- The return value is the cumulative sum of the values from the first row to the fourth row. | 10 | 5000 | 17500 | | 10 | 5000 | 17500 | | 20 | 800 | 800 | | 20 | 1100 | 1900 | | 20 | 2975 | 4875 | | 20 | 3000 | 10875 | | 20 | 3000 | 10875 | | 30 | 950 | 950 | | 30 | 1250 | 3450 | | 30 | 1250 | 3450 | | 30 | 1500 | 4950 | | 30 | 1600 | 6550 | | 30 | 2850 | 9400 | +------------+------------+------------+
- Example 1: Use the deptno column to define a window and calculate the sum of the sal
column. The order by clause is not specified. This function returns the cumulative
sum of the current window. The current window includes the rows that have the same
deptno value. Sample statement:
AVG
- Syntax
double avg([distinct] double <expr>) over ([partition_clause] [orderby_clause] [frame_clause]) decimal avg([distinct] decimal <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
- Description
This function returns the average value of expr in a window.
- Parameters
- expr: required. The expression that is used to calculate the returned result. A value
of the DOUBLE or DECIMAL type.
- If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
- If the input value is NULL, this row is not used for calculation.
- If the distinct keyword is specified, the average value of distinct values is calculated.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
- expr: required. The expression that is used to calculate the returned result. A value
of the DOUBLE or DECIMAL type.
- Return value
If the input value of expr is of the DECIMAL type, a value of the DECIMAL type is returned. If the input value is of another data type, a value of the DOUBLE type is returned. If the input value of expr is NULL, NULL is returned.
- Examples
- Example 1: Use the deptno column to define a window and calculate the average value
of the sal column. The order by clause is not specified. This function returns the
cumulative average value of the values from the first row to the last row in the current
window. The current window includes the rows that have the same deptno column value.
Sample statements:
The following result is returned:select deptno, sal, avg(sal) over (partition by deptno) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 2916.6666666666665 | -- This row is the first row of this window. The return value is the cumulative average value of the values from the first row to the sixth row. | 10 | 2450 | 2916.6666666666665 | -- The return value is the cumulative average value of the values from the first row to the sixth row. | 10 | 5000 | 2916.6666666666665 | -- The return value is the cumulative average value of the values from the first row to the sixth row. | 10 | 1300 | 2916.6666666666665 | | 10 | 5000 | 2916.6666666666665 | | 10 | 2450 | 2916.6666666666665 | | 20 | 3000 | 2175.0 | | 20 | 3000 | 2175.0 | | 20 | 800 | 2175.0 | | 20 | 1100 | 2175.0 | | 20 | 2975 | 2175.0 | | 30 | 1500 | 1566.6666666666667 | | 30 | 950 | 1566.6666666666667 | | 30 | 1600 | 1566.6666666666667 | | 30 | 1250 | 1566.6666666666667 | | 30 | 1250 | 1566.6666666666667 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+
- Example 2: In non-Hive-compatible mode, use the deptno column to define a window and
calculate the average value of the sal column. The order by clause is specified. This
function returns the cumulative average value of the values from the first row to
the current row in the current window. The current window includes the rows that have
the same deptno value. Sample statements:
The following result is returned:-- Disable the Hive-compatible mode. set odps.sql.hive.compatible=false; -- Execute the following statement: select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300.0 | -- This row is the first row of this window. | 10 | 1300 | 1300.0 | -- The return value is the cumulative average value of the values in the first and second rows. | 10 | 2450 | 1683.3333333333333 | -- The return value is the cumulative average value of the values from the first row to the third row. | 10 | 2450 | 1875.0 | -- The return value is the cumulative average value of the values from the first row to the fourth row. | 10 | 5000 | 2500.0 | -- The return value is the cumulative average value of the values from the first row to the fifth row. | 10 | 5000 | 2916.6666666666665 | -- The return value is the cumulative average value of the values from the first row to the sixth row. | 20 | 800 | 800.0 | | 20 | 1100 | 950.0 | | 20 | 2975 | 1625.0 | | 20 | 3000 | 1968.75 | | 20 | 3000 | 2175.0 | | 30 | 950 | 950.0 | | 30 | 1250 | 1100.0 | | 30 | 1250 | 1150.0 | | 30 | 1500 | 1237.5 | | 30 | 1600 | 1310.0 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+
- Example 3: In Hive-compatible mode, use the deptno column to define a window and calculate
the average value of the sal column. The order by clause is specified. This function
returns the cumulative average value of the values from the first row to the row that
has the same sal value as the current row in the current window. The average values
for the rows that have the same sal value are the same. The current window includes
the rows that have the same deptno value. Sample statements:
The following result is returned:-- Enable the Hive-compatible mode. set odps.sql.hive.compatible=true; -- Execute the following statement: select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300.0 | -- This row is the first row of this window. The average value for the first row is the cumulative average value of the values in the first and second rows because the two rows have the same sal value. | 10 | 1300 | 1300.0 | -- The return value is the cumulative average value of the values in the first and second rows. | 10 | 2450 | 1875.0 | -- The average value for the third row is the cumulative average value of the values from the first row to the fourth row because the third and fourth rows have the same sal value. | 10 | 2450 | 1875.0 | -- The return value is the cumulative average value of the values from the first row to the fourth row. | 10 | 5000 | 2916.6666666666665 | | 10 | 5000 | 2916.6666666666665 | | 20 | 800 | 800.0 | | 20 | 1100 | 950.0 | | 20 | 2975 | 1625.0 | | 20 | 3000 | 2175.0 | | 20 | 3000 | 2175.0 | | 30 | 950 | 950.0 | | 30 | 1250 | 1150.0 | | 30 | 1250 | 1150.0 | | 30 | 1500 | 1237.5 | | 30 | 1600 | 1310.0 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+
- Example 1: Use the deptno column to define a window and calculate the average value
of the sal column. The order by clause is not specified. This function returns the
cumulative average value of the values from the first row to the last row in the current
window. The current window includes the rows that have the same deptno column value.
Sample statements:
MEDIAN
- Syntax
median(<expr>) over ([partition_clause] [orderby_clause] [frame_clause])
- Description
This function returns the median of expr in a window.
- Parameters
- expr: required. The expression that is used to calculate the median. A value of the DOUBLE
or DECIMAL type. The value of this parameter must be 1 to 255 digits in length.
- If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
- If the input value is NULL, NULL is returned.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
- expr: required. The expression that is used to calculate the median. A value of the DOUBLE
or DECIMAL type. The value of this parameter must be 1 to 255 digits in length.
- Return value
A value of the DOUBLE or DECIMAL type is returned. If the values of all expressions that are specified by expr are NULL, NULL is returned.
- Examples
Use the deptno column to define a window and calculate the median value of the sal column. This function returns the median value of the current window. The current window includes the rows that have the same deptno value. Sample statement:
The following result is returned:select deptno, sal, median(sal) over (partition by deptno) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 2450.0 | -- This row is the first row of this window. The return value is the median value of the values from the first row to the sixth row. | 10 | 2450 | 2450.0 | | 10 | 5000 | 2450.0 | | 10 | 1300 | 2450.0 | | 10 | 5000 | 2450.0 | | 10 | 2450 | 2450.0 | | 20 | 3000 | 2975.0 | | 20 | 3000 | 2975.0 | | 20 | 800 | 2975.0 | | 20 | 1100 | 2975.0 | | 20 | 2975 | 2975.0 | | 30 | 1500 | 1375.0 | | 30 | 950 | 1375.0 | | 30 | 1600 | 1375.0 | | 30 | 1250 | 1375.0 | | 30 | 1250 | 1375.0 | | 30 | 2850 | 1375.0 | +------------+------------+------------+
STDDEV
- Syntax
double stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause]) decimal stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
- Description
This function returns the population standard deviation of all input values. This function is also called STDDEV_POP.
- Parameters
- expr: required. The expression that is used to calculate the population standard deviation.
A value of the DOUBLE or DECIMAL type.
- If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
- If the input value is NULL, this row is not used for calculation.
- If the distinct keyword is specified, the population standard deviation of distinct values is calculated.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
- expr: required. The expression that is used to calculate the population standard deviation.
A value of the DOUBLE or DECIMAL type.
- Return value
A value of the same data type as expr is returned. If the values of all expressions that are specified by expr are NULL, NULL is returned.
- Examples
- Example 1: Use the deptno column to define a window and calculate the population standard
deviation of the sal column. The order by clause is not specified. This function returns
the cumulative population standard deviation of the current window. The current window
includes the rows that have the same deptno value. Sample statement:
The following result is returned:select deptno, sal, stddev(sal) over (partition by deptno) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1546.1421524412158 | -- This row is the first row of this window. The return value is the cumulative population standard deviation of the values from the first row to the sixth row. | 10 | 2450 | 1546.1421524412158 | -- The return value is the cumulative population standard deviation of the values from the first row to the sixth row. | 10 | 5000 | 1546.1421524412158 | | 10 | 1300 | 1546.1421524412158 | | 10 | 5000 | 1546.1421524412158 | | 10 | 2450 | 1546.1421524412158 | | 20 | 3000 | 1004.7387720198718 | | 20 | 3000 | 1004.7387720198718 | | 20 | 800 | 1004.7387720198718 | | 20 | 1100 | 1004.7387720198718 | | 20 | 2975 | 1004.7387720198718 | | 30 | 1500 | 610.1001739241042 | | 30 | 950 | 610.1001739241042 | | 30 | 1600 | 610.1001739241042 | | 30 | 1250 | 610.1001739241042 | | 30 | 1250 | 610.1001739241042 | | 30 | 2850 | 610.1001739241042 | +------------+------------+------------+
- Example 2: In non-Hive-compatible mode, use the deptno column to define a window and
calculate the population standard deviation of the sal column. The order by clause
is specified. This function returns the cumulative population standard deviation of
the values from the first row to the current row in the current window. The current
window includes the rows that have the same deptno value. Sample statements:
The following result is returned:-- Disable the Hive-compatible mode. set odps.sql.hive.compatible=false; -- Execute the following statement: select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 0.0 | -- This row is the first row of this window. | 10 | 1300 | 0.0 | -- The return value is the cumulative population standard deviation of the values in the first and second rows. | 10 | 2450 | 542.1151989096865 | -- The return value is the cumulative population standard deviation of the values from the first row to the third row. | 10 | 2450 | 575.0 | -- The return value is the cumulative population standard deviation of the values from the first row to the fourth row. | 10 | 5000 | 1351.6656391282572 | | 10 | 5000 | 1546.1421524412158 | | 20 | 800 | 0.0 | | 20 | 1100 | 150.0 | | 20 | 2975 | 962.4188277460079 | | 20 | 3000 | 1024.2947268730811 | | 20 | 3000 | 1004.7387720198718 | | 30 | 950 | 0.0 | | 30 | 1250 | 150.0 | | 30 | 1250 | 141.4213562373095 | | 30 | 1500 | 194.8557158514987 | | 30 | 1600 | 226.71568097509268 | | 30 | 2850 | 610.1001739241042 | +------------+------------+------------+
- Example 3: In Hive-compatible mode, use the deptno column to define a window and calculate
the population standard deviation of the sal column. The order by clause is specified.
This function returns the cumulative population standard deviation of the values from
the first row to the row that has the same sal value as the current row in the current
window. The population standard deviations for the rows that have the same sal value
are the same. The current window includes the rows that have the same deptno value.
Sample statements:
The following result is returned:-- Enable the Hive-compatible mode. set odps.sql.hive.compatible=true; -- Execute the following statement: select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 0.0 | -- This row is the first row of this window. The population standard deviation for the first row is the cumulative population standard deviation of the values in the first and second rows because the two rows have the same sal value. | 10 | 1300 | 0.0 | -- The return value is the cumulative population standard deviation of the values in the first and second rows. | 10 | 2450 | 575.0 | -- The population standard deviation for the third row is the cumulative population standard deviation of the values from the first row to the fourth row because the third and fourth rows have the same sal value. | 10 | 2450 | 575.0 | -- The return value is the cumulative population standard deviation of the values from the first row to the fourth row. | 10 | 5000 | 1546.1421524412158 | | 10 | 5000 | 1546.1421524412158 | | 20 | 800 | 0.0 | | 20 | 1100 | 150.0 | | 20 | 2975 | 962.4188277460079 | | 20 | 3000 | 1004.7387720198718 | | 20 | 3000 | 1004.7387720198718 | | 30 | 950 | 0.0 | | 30 | 1250 | 141.4213562373095 | | 30 | 1250 | 141.4213562373095 | | 30 | 1500 | 194.8557158514987 | | 30 | 1600 | 226.71568097509268 | | 30 | 2850 | 610.1001739241042 | +------------+------------+------------+
- Example 1: Use the deptno column to define a window and calculate the population standard
deviation of the sal column. The order by clause is not specified. This function returns
the cumulative population standard deviation of the current window. The current window
includes the rows that have the same deptno value. Sample statement:
STDDEV_SAMP
- Syntax
double stddev_samp([distinct] <expr>) over([partition_clause] [orderby_clause] [frame_clause]) decimal stddev_samp([distinct] <expr>) over([partition_clause] [orderby_clause] [frame_clause])
- Description
This function returns the sample standard deviation of all input values.
- Parameters
- expr: required. This parameter specifies the expression that is used to calculate the
sample standard deviation. A value of the DOUBLE or DECIMAL type.
- If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
- If the input value is NULL, this row is not used for calculation.
- If the distinct keyword is specified, the sample standard deviation of distinct values is calculated.
- partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
- expr: required. This parameter specifies the expression that is used to calculate the
sample standard deviation. A value of the DOUBLE or DECIMAL type.
- Return value
A value of the same data type as expr is returned. If the values of all expressions specified by expr are NULL, NULL is returned. If the window has only one row of data whose expr value is not NULL, 0 is returned.
- Examples
- Example 1: Use the deptno column to define a window and calculate the sample standard
deviation of the sal column. The order by clause is not specified. This function returns
the cumulative sample standard deviation of the current window. The current window
includes the rows that have the same deptno value. Sample statement:
The following result is returned:select deptno, sal, stddev_samp(sal) over (partition by deptno) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1693.7138680032904 | -- This row is the first row of this window. The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. | 10 | 2450 | 1693.7138680032904 | -- The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. | 10 | 5000 | 1693.7138680032904 | -- The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. | 10 | 1300 | 1693.7138680032904 | | 10 | 5000 | 1693.7138680032904 | | 10 | 2450 | 1693.7138680032904 | | 20 | 3000 | 1123.3320969330487 | | 20 | 3000 | 1123.3320969330487 | | 20 | 800 | 1123.3320969330487 | | 20 | 1100 | 1123.3320969330487 | | 20 | 2975 | 1123.3320969330487 | | 30 | 1500 | 668.331255192114 | | 30 | 950 | 668.331255192114 | | 30 | 1600 | 668.331255192114 | | 30 | 1250 | 668.331255192114 | | 30 | 1250 | 668.331255192114 | | 30 | 2850 | 668.331255192114 | +------------+------------+------------+
- Example 2: Use the deptno column to define a window and calculate the sample standard
deviation of the sal column. The order by clause is specified. This function returns
the cumulative sample standard deviation of the values from the first row to the current
row in the current window. The current window includes the rows that have the same
deptno value. Sample statement:
The following result is returned:select deptno, sal, stddev_samp(sal) over (partition by deptno order by sal) from emp;
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 0.0 | -- This row is the first row of this window. | 10 | 1300 | 0.0 | -- The return value is the cumulative sample standard deviation of the values in the first and second rows. | 10 | 2450 | 663.9528095680697 | -- The return value is the cumulative sample standard deviation of the values from the first row to the third row. | 10 | 2450 | 663.9528095680696 | | 10 | 5000 | 1511.2081259707413 | | 10 | 5000 | 1693.7138680032904 | | 20 | 800 | 0.0 | | 20 | 1100 | 212.13203435596427 | | 20 | 2975 | 1178.7175234126282 | | 20 | 3000 | 1182.7536725793752 | | 20 | 3000 | 1123.3320969330487 | | 30 | 950 | 0.0 | | 30 | 1250 | 212.13203435596427 | | 30 | 1250 | 173.20508075688772 | | 30 | 1500 | 225.0 | | 30 | 1600 | 253.4758371127315 | | 30 | 2850 | 668.331255192114 | +------------+------------+------------+
- Example 1: Use the deptno column to define a window and calculate the sample standard
deviation of the sal column. The order by clause is not specified. This function returns
the cumulative sample standard deviation of the current window. The current window
includes the rows that have the same deptno value. Sample statement: