Window functions perform aggregation or other calculations on a dynamically defined subset of data. They are often used for tasks such as processing time series data, ranking, and calculating moving averages. This topic describes the command syntax, parameters, and examples of the window functions that MaxCompute SQL supports.
Applicability
Window functions can appear only in
SELECTstatements.A window function cannot be nested with other window functions or aggregate functions.
Window functions cannot be used with aggregate functions at the same level.
Index
MaxCompute SQL supports the following window functions.
Function | Features |
Calculates the average value of data in a window. | |
Performs random sampling. Returns true if the row is sampled. | |
Counts the number of records in a window. | |
Calculates the cumulative distribution. | |
Calculates the rank. The ranks are consecutive. | |
Returns the value of the first row in the window frame of the current row. | |
Returns the value from the Nth row before the current row in a partition. | |
Returns the value of the last row in the window frame of the current row. | |
Returns the value from the Nth row after the current row in a partition. | |
Calculates the maximum value in a window. | |
Calculates the median value in a window. | |
Calculates the minimum value in a window. | |
Divides ordered data into N equal-sized groups and returns the group number (from 1 to N) for each row. | |
Returns the value of the Nth row in the window frame of the current row. | |
Calculates the rank as a percentage. | |
PERCENTILE_CONT | Calculates the exact percentile. |
PERCENTILE_DISC | Calculates a given percentile value by sorting the specified column in ascending order. |
Calculates the rank. The ranks may not be consecutive. | |
Calculates the row number, starting from 1. | |
Calculates the population standard deviation. It is an alias for STDDEV_POP. | |
Calculates the sample standard deviation. | |
Calculates the sum of data in a window. |
Window function syntax
The syntax for a window function is as follows.
<function_name>([distinct][<expression> [, ...]]) over (<window_definition>)
<function_name>([distinct][<expression> [, ...]]) over <window_name>function_name: A built-in window function, an aggregate function, or a user-defined aggregate function (UDAF).
expression: The format of the function, which must conform to the function's syntax.
windowing_definition: The window definition. For more information about the syntax, see windowing_definition.
window_name: The name of the window. You can use the
windowkeyword to define a custom window and assign a name to the windowing_definition. The syntax for a named window definition (named_window_def) is as follows:window <window_name> as (<window_definition>)The following describes the positions of custom statements in SQL:
select ... from ... [where ...] [group by ...] [having ...] named_window_def [order by ...] [limit ...]
windowing_definition
The syntax for windowing_definition is as follows.
--partition_clause:
[partition by <expression> [, ...]]
--orderby_clause:
[order by <expression> [asc|desc][nulls {first|last}] [, ...]]
[<frame_clause>]When you add a window function to a SELECT statement, the data is partitioned and sorted based on the partition by and order by clauses in the window definition. If you do not specify a partition by clause, all data is treated as a single partition. If you do not specify an order by clause, the order of data within a partition is not guaranteed. For each row, referred to as the current row, a segment of data is extracted from the partition based on the frame_clause to form the window for that row. The window function then computes a result for the current row based on the data in its window.
partition by <expression> [, ...]: Optional. Specifies the partition. Rows with the same partition key column values are in the same partition. For more information about the format, see Table operations.
order by <expression> [asc|desc][nulls {first|last}] [, ...]: Optional. Specifies how data is sorted within a partition.
NoteIf rows have the same
order byvalues, the sort order is not guaranteed. To ensure a consistent order, make sure that theorder byvalues are as unique as possible.frame_clause: Optional. Defines the boundaries of the window. For more information about frame_clause, see frame_clause.
filter_clause
The syntax for filter_clause is as follows.
FILTER (WHERE filter_condition)filter_condition is a Boolean expression, which is used in the same way as the WHERE clause in a select ... from ... where statement.
If you provide a FILTER clause, only rows for which the filter_condition evaluates to true are included in the window frame. For aggregate window functions (such as COUNT, SUM, AVG, MAX, and MIN), a value is still returned for every row. However, rows where the FILTER expression does not evaluate to true (such as NULL or false) are not included in the window frame for each row's calculation. NULL is treated as false.
Example
Prepare data
-- Create a table. CREATE TABLE IF NOT EXISTS mf_window_fun(key BIGINT,value BIGINT) STORED AS ALIORC; -- Insert data. insert into mf_window_fun values (1,100),(2,200),(1,150),(2,250),(3,300),(4,400),(5,500),(6,600),(7,700); -- Query data from the mf_window_fun table. select * from mf_window_fun; -- The following result is returned: +------------+------------+ | key | value | +------------+------------+ | 1 | 100 | | 2 | 200 | | 1 | 150 | | 2 | 250 | | 3 | 300 | | 4 | 400 | | 5 | 500 | | 6 | 600 | | 7 | 700 | +------------+------------+Query the cumulative sum of rows whose value is greater than 100 within the window.
select key,sum(value) filter(where value > 100) over (partition by key order by key) from mf_window_fun;The following result is returned:
+------------+------------+ | key | _c1 | +------------+------------+ | 1 | NULL | -- Skipped | 1 | 150 | | 2 | 200 | | 2 | 450 | | 3 | 300 | | 4 | 400 | | 5 | 500 | | 6 | 600 | | 7 | 700 | +------------+------------+
The FILTER clause does not remove rows that fail the filter_condition from the query result. It only excludes them from the window function's calculation. To remove these rows from the final output, you must use a
select ... from ... whereclause. The window function's value for an excluded row is not 0 or NULL. Instead, it inherits the value from the preceding row.You can use the FILTER clause only with aggregate window functions, such as COUNT, SUM, AVG, MAX, MIN, and WM_CONCAT. You cannot use the FILTER clause with non-aggregate functions such as RANK, ROW_NUMBER, or NTILE. Otherwise, a syntax error occurs.
To use the FILTER syntax in a window function, you must enable the following session flag:
set odps.sql.window.function.newimpl=true;.
frame_clause
The syntax for frame_clause is as follows.
-- Format 1
{ROWS|RANGE|GROUPS} <frame_start> [<frame_exclusion>]
-- Format 2
{ROWS|RANGE|GROUPS} between <frame_start> and <frame_end> [<frame_exclusion>]frame_clause is a closed interval that defines the boundaries of the window. It includes the rows at the frame_start and frame_end positions.
ROWS|RANGE|GROUPS: Required. The type of frame_clause. The implementation rules for frame_start and frame_end vary by type.
ROWS: Defines window boundaries based on the number of rows.
RANGE: Defines window boundaries by comparing the values of the
order bycolumn. Typically, anorder byclause is specified in the window definition. If noorder byclause is specified, all rows in a partition have the sameorder bycolumn value. NULL values are considered equal.GROUPS: All rows in a partition with the same
order bycolumn value form a GROUP. If noorder byclause is specified, all rows in the partition form a single GROUP. NULL values are considered equal.
frame_start and frame_end: Specify the start and end boundaries of the window. frame_start is required. frame_end is optional. If omitted, the default value is CURRENT ROW.
The position specified by frame_start must precede the position specified by frame_end, or match the position of frame_end. In other words, frame_start is closer to the head of the partition than frame_end. The head of the partition is the position of the first row after the data is sorted by the
order bystatement in the window definition. The following table describes the valid values and logic for frame_start and frame_end when the frame_clause type is ROWS, RANGE, or GROUPS.frame_clause type
frame_start/frame_end value
Description
ROWS, RANGE, GROUPS
UNBOUNDED PRECEDING
The first row of the partition. Counting starts from 1.
UNBOUNDED FOLLOWING
The last row of the partition.
ROWS
CURRENT ROW
The position of the current row. Each row of data corresponds to a window function result. The current row is the row for which the window function result is being calculated.
offset PRECEDING
The position that is
offsetrows before the current row, towards the head of the partition. For example,0 PRECEDINGrefers to the current row, and1 PRECEDINGrefers to the preceding row.offsetmust be a non-negative integer.offset FOLLOWING
The position that is
offsetrows after the current row, towards the tail of the partition. For example,0 FOLLOWINGrefers to the current row, and1 FOLLOWINGrefers to the next row.offsetmust be a non-negative integer.RANGE
CURRENT ROW
As frame_start, it refers to the position of the first row that has the same
order bycolumn value as the current row.As frame_end, it refers to the position of the last row that has the same
order bycolumn value as the current row.
offset PRECEDING
The positions of frame_start and frame_end depend on the
order bysequence. Assume the window is sorted by X. Xi represents the X value of the i-th row, and Xc represents the X value of the current row. The positions are described as follows:When
order byis ascending:frame_start: The position of the first row that satisfies
Xc - Xi <= offset.frame_end: The position of the last row that satisfies
Xc - Xi >= offset.
When
order byis descending:frame_start: The position of the first row that satisfies
Xi - Xc <= offset.frame_end: The position of the last row that satisfies
Xi - Xc >= offset.
The supported data types for the
order bycolumn are: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATETIME, DATE, and TIMESTAMP.The syntax for the
offsetof date types is as follows:N: Represents N days or N seconds. It must be a non-negative integer. For DATETIME and TIMESTAMP, it represents N seconds. For DATE, it represents N days.interval 'N' {YEAR\MONTH\DAY\HOUR\MINUTE\SECOND}: Represents N years, months, days, hours, minutes, or seconds. For example,INTERVAL '3' YEARrepresents 3 years.INTERVAL 'N-M' YEAR TO MONTH: Represents N years and M months. For example,INTERVAL '1-3' YEAR TO MONTHrepresents 1 year and 3 months.INTERVAL 'D[ H[:M[:S[:N]]]]' DAY TO SECOND: Represents D days, H hours, M minutes, S seconds, and N nanoseconds. For example,INTERVAL '1 2:3:4:5' DAY TO SECONDrepresents 1 day, 2 hours, 3 minutes, 4 seconds, and 5 nanoseconds.
offset FOLLOWING
The positions of frame_start and frame_end depend on the
order bysequence. Assume the window is sorted by X. Xi represents the X value of the i-th row, and Xc represents the X value of the current row. The positions are described as follows:When
order byis ascending:frame_start: The position of the first row that satisfies
Xi - Xc >= offset.frame_end: The position of the last row that satisfies
Xi - Xc <= offset.
When
order byis descending:frame_start: The position of the first row that satisfies
Xc - Xi >= offset.frame_end: The position of the last row that satisfies
Xc - Xi <= offset.
GROUPS
CURRENT ROW
As frame_start, it refers to the first row of the GROUP to which the current row belongs.
As frame_end, it refers to the last row of the GROUP to which the current row belongs.
offset PRECEDING
As frame_start, it refers to the position of the first row in the GROUP that is
offsetGROUPs before the current row's GROUP, towards the head of the partition.As frame_end, it refers to the position of the last row in the GROUP that is
offsetGROUPs before the current row's GROUP, towards the head of the partition.
NoteYou cannot set frame_start to UNBOUNDED FOLLOWING or frame_end to UNBOUNDED PRECEDING.
offset FOLLOWING
As frame_start, it refers to the position of the first row in the GROUP that is
offsetGROUPs after the current row's GROUP, towards the tail of the partition.As frame_end, it refers to the position of the last row in the GROUP that is
offsetGROUPs after the current row's GROUP, towards the tail of the partition.
NoteYou cannot set frame_start to UNBOUNDED FOLLOWING or frame_end to UNBOUNDED PRECEDING.
frame_exclusion: Optional. Used to exclude a portion of data from the window. Valid values are:
EXCLUDE NO OTHERS: Does not exclude any data.
EXCLUDE CURRENT ROW: Excludes the current row.
EXCLUDE GROUP: Excludes the entire GROUP, which means all data in the partition that has the same
order byvalue as the current row.EXCLUDE TIES: Excludes all rows that share the same order by value as the current row, except for the current row itself.
Default frame_clause
If you do not specify a frame_clause, MaxCompute uses a default frame_clause to determine the boundaries of the data included in the window. The default frame_clause is:
When Hive compatible mode is enabled (
set odps.sql.hive.compatible=true;), the default frame_clause is as follows, which is the same as in most other SQL systems.RANGE between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERSWhen Hive compatible mode is disabled (
set odps.sql.hive.compatible=false;), if anorder byclause is specified and the window function is AVG, COUNT, MAX, MIN, STDDEV, STDDEV_POP, STDDEV_SAMP, or SUM, the default frame_clause is of the ROWS type.ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS
Window boundary examples
Assume a table `tbl` has the structure pid: bigint, oid: bigint, rid: bigint and contains the following data:
+------------+------------+------------+
| pid | oid | rid |
+------------+------------+------------+
| 1 | NULL | 1 |
| 1 | NULL | 2 |
| 1 | 1 | 3 |
| 1 | 1 | 4 |
| 1 | 2 | 5 |
| 1 | 4 | 6 |
| 1 | 7 | 7 |
| 1 | 11 | 8 |
| 2 | NULL | 9 |
| 2 | NULL | 10 |
+------------+------------+------------+ROW type window
Window definition 1
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW --SQL statement is as follows. select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | [1, 2, 3] | | 1 | 1 | 4 | [1, 2, 3, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+Window definition 2
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING --SQL statement is as follows. select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING) as window from tbl;The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | NULL | 2 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 1 | 3 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 1 | 4 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 2 | 5 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+Window definition 3
partition by pid order by oid ROWS between 1 FOLLOWING and 3 FOLLOWING --SQL statement is as follows. select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between 1 FOLLOWING and 3 FOLLOWING) as window from tbl;The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [2, 3, 4] | | 1 | NULL | 2 | [3, 4, 5] | | 1 | 1 | 3 | [4, 5, 6] | | 1 | 1 | 4 | [5, 6, 7] | | 1 | 2 | 5 | [6, 7, 8] | | 1 | 4 | 6 | [7, 8] | | 1 | 7 | 7 | [8] | | 1 | 11 | 8 | NULL | | 2 | NULL | 9 | [10] | | 2 | NULL | 10 | NULL | +------------+------------+------------+--------+Window definition 4
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW --SQL statement is as follows. select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW) as window from tbl;The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | NULL | | 1 | NULL | 2 | [1] | | 1 | 1 | 3 | [1, 2] | | 1 | 1 | 4 | [1, 2, 3] | | 1 | 2 | 5 | [1, 2, 3, 4] | | 1 | 4 | 6 | [1, 2, 3, 4, 5] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7] | | 2 | NULL | 9 | NULL | | 2 | NULL | 10 | [9] | +------------+------------+------------+--------+Window definition 5
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP --SQL statement is as follows. select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP) as window from tbl;The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | NULL | | 1 | NULL | 2 | NULL | | 1 | 1 | 3 | [1, 2] | | 1 | 1 | 4 | [1, 2] | | 1 | 2 | 5 | [1, 2, 3, 4] | | 1 | 4 | 6 | [1, 2, 3, 4, 5] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7] | | 2 | NULL | 9 | NULL | | 2 | NULL | 10 | NULL | +------------+------------+------------+--------+Window definition 6
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES --SQL statement is as follows. select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES) as window from tbl;The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1] | | 1 | NULL | 2 | [2] | | 1 | 1 | 3 | [1, 2, 3] | | 1 | 1 | 4 | [1, 2, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9] | | 2 | NULL | 10 | [10] | +------------+------------+------------+--------+By comparing the
windowresults for rows whereridis 2, 4, and 10 in this example and the previous one, you can observe the difference between EXCLUDE CURRENT ROW and EXCLUDE GROUP. For EXCLUDE GROUP, in the same partition (wherepidis equal), all data with the sameoidas the current row is excluded.
RANGE type window
Window definition 1
partition by pid order by oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW --SQL statement is as follows. select pid, oid, rid, collect_list(rid) over(partition by pid order by oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | [1, 2, 3, 4] | | 1 | 1 | 4 | [1, 2, 3, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+When CURRENT ROW is used as frame_end, it includes all rows up to the last row that has the same
order byvalueoidas the current row. Therefore, thewindowresult for the record whereridis 1 is [1, 2].Window definition 2
partition by pid order by oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING --SQL statement is as follows. select pid, oid, rid, collect_list(rid) over(partition by pid order by oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING) as window from tbl;The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | NULL | 2 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 1 | 3 | [3, 4, 5, 6, 7, 8] | | 1 | 1 | 4 | [3, 4, 5, 6, 7, 8] | | 1 | 2 | 5 | [5, 6, 7, 8] | | 1 | 4 | 6 | [6, 7, 8] | | 1 | 7 | 7 | [7, 8] | | 1 | 11 | 8 | [8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+Window definition 3
partition by pid order by oid RANGE between 3 PRECEDING and 1 PRECEDING --SQL statement is as follows. select pid, oid, rid, collect_list(rid) over(partition by pid order by oid RANGE between 3 PRECEDING and 1 PRECEDING) as window from tbl;The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | NULL | | 1 | 1 | 4 | NULL | | 1 | 2 | 5 | [3, 4] | | 1 | 4 | 6 | [3, 4, 5] | | 1 | 7 | 7 | [6] | | 1 | 11 | 8 | NULL | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+For rows where the
order byvalueoidis NULL, if you useoffset {PRECEDING|FOLLOWING}andoffsetis not UNBOUNDED, the boundary is determined as follows: When used as frame_start, it points to the first row with a NULLorder byvalue in the partition. When used as frame_end, it points to the last row with a NULLorder byvalue.
GROUPS type window
The window definition is as follows:
partition by pid order by oid GROUPS between 2 PRECEDING and CURRENT ROW --SQL statement is as follows. select pid, oid, rid, collect_list(rid) over(partition by pid order by oid GROUPS between 2 PRECEDING and CURRENT ROW) as window from tbl;The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | [1, 2, 3, 4] | | 1 | 1 | 4 | [1, 2, 3, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [3, 4, 5, 6] | | 1 | 7 | 7 | [5, 6, 7] | | 1 | 11 | 8 | [6, 7, 8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
Sample data
To help you understand how to use each function, this topic provides source data and function examples based on that data. The following example commands create a table named emp and add data to it:
create table if not exists emp
(empno bigint,
ename string,
job string,
mgr bigint,
hiredate datetime,
sal bigint,
comm bigint,
deptno bigint);
tunnel upload emp.txt emp;The emp.txt file contains the following data:
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10AVG
Syntax
double avg([distinct] double <expr>) over ([partition_clause] [orderby_clause] [frame_clause]) decimal avg([distinct] decimal <expr>) over ([partition_clause] [orderby_clause] [frame_clause])Description
Returns the average value of expr in a window.
Parameters
expr: Required. The expression for which to calculate the result. It must be of the DOUBLE or DECIMAL type.
If the input value is of the STRING or BIGINT type, it is implicitly converted to the DOUBLE type for calculation. An error is returned for other data types.
If the input value is NULL, the row is not included in the calculation.
If you specify the distinct keyword, the function calculates the average of unique values.
partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.
Return value
If expr is of the DECIMAL type, a DECIMAL value is returned. Otherwise, a DOUBLE value is returned. If all values of expr are NULL, NULL is returned.
Examples
Example 1: Partition by department (deptno) and calculate the average salary (sal) without sorting. The function calculates the average salary for the entire partition (all rows with the same deptno). The command is as follows:
select deptno, sal, avg(sal) over (partition by deptno) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 2916.6666666666665 | -- This is the first row of the window. The value is the cumulative average from the first to the sixth row. | 10 | 2450 | 2916.6666666666665 | -- The value is the cumulative average from the first to the sixth row. | 10 | 5000 | 2916.6666666666665 | -- The value is the cumulative average from the first to the sixth row. | 10 | 1300 | 2916.6666666666665 | | 10 | 5000 | 2916.6666666666665 | | 10 | 2450 | 2916.6666666666665 | | 20 | 3000 | 2175.0 | | 20 | 3000 | 2175.0 | | 20 | 800 | 2175.0 | | 20 | 1100 | 2175.0 | | 20 | 2975 | 2175.0 | | 30 | 1500 | 1566.6666666666667 | | 30 | 950 | 1566.6666666666667 | | 30 | 1600 | 1566.6666666666667 | | 30 | 1250 | 1566.6666666666667 | | 30 | 1250 | 1566.6666666666667 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+Example 2: In non-Hive compatible mode, partition by department (deptno), sort by salary (sal), and calculate the average salary. The function calculates a running average from the first row of the partition to the current row. The commands are as follows:
-- Disable Hive compatible mode. set odps.sql.hive.compatible=false; -- Execute the following SQL command. select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300.0 | -- First row of the window. | 10 | 1300 | 1300.0 | -- Cumulative average from the first to the second row. | 10 | 2450 | 1683.3333333333333 | -- Cumulative average from the first to the third row. | 10 | 2450 | 1875.0 | -- Cumulative average from the first to the fourth row. | 10 | 5000 | 2500.0 | -- Cumulative average from the first to the fifth row. | 10 | 5000 | 2916.6666666666665 | -- Cumulative average from the first to the sixth row. | 20 | 800 | 800.0 | | 20 | 1100 | 950.0 | | 20 | 2975 | 1625.0 | | 20 | 3000 | 1968.75 | | 20 | 3000 | 2175.0 | | 30 | 950 | 950.0 | | 30 | 1250 | 1100.0 | | 30 | 1250 | 1150.0 | | 30 | 1500 | 1237.5 | | 30 | 1600 | 1310.0 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+Example 3: In Hive compatible mode, partition by department (deptno), sort by salary (sal), and calculate the average salary. The function calculates a running average from the first row of the partition to the last peer of the current row (rows with the same sal have the same average). The commands are as follows:
-- Enable Hive compatible mode. set odps.sql.hive.compatible=true; -- Execute the following SQL command. select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300.0 | -- First row of the window. Since the sal of the first and second rows are the same, the average for the first row is the cumulative average of the first two rows. | 10 | 1300 | 1300.0 | -- Cumulative average from the first to the second row. | 10 | 2450 | 1875.0 | -- Since the sal of the third and fourth rows are the same, the average for the third row is the cumulative average of the first four rows. | 10 | 2450 | 1875.0 | -- Cumulative average from the first to the fourth row. | 10 | 5000 | 2916.6666666666665 | | 10 | 5000 | 2916.6666666666665 | | 20 | 800 | 800.0 | | 20 | 1100 | 950.0 | | 20 | 2975 | 1625.0 | | 20 | 3000 | 2175.0 | | 20 | 3000 | 2175.0 | | 30 | 950 | 950.0 | | 30 | 1250 | 1150.0 | | 30 | 1250 | 1150.0 | | 30 | 1500 | 1237.5 | | 30 | 1600 | 1310.0 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+
CLUSTER_SAMPLE
Syntax
boolean cluster_sample(bigint <N>) OVER ([partition_clause]) boolean cluster_sample(bigint <N>, bigint <M>) OVER ([partition_clause])Description
cluster_sample(bigint <N>): Randomly samples N rows from the partition.cluster_sample(bigint <N>, bigint <M>): Randomly samples a fraction of rows (M/N) from the partition. The number of rows sampled is approximatelypartition_row_count × M / N, wherepartition_row_countis the number of rows in the partition.
Parameters
N: Required. A BIGINT constant. If N is NULL, the return value is NULL.
M: Required. A BIGINT constant. If M is NULL, the return value is NULL.
partition_clause: Optional. For more information, see windowing_definition.
Return value
Returns a BOOLEAN value.
Example
To sample approximately 20% of the rows from each group, use the following command:
select deptno, sal from ( select deptno, sal, cluster_sample(5, 1) over (partition by deptno) as flag from emp ) sub where flag = true;The following result is returned:
+------------+------------+ | deptno | sal | +------------+------------+ | 10 | 1300 | | 20 | 3000 | | 30 | 950 | +------------+------------+
COUNT
Syntax
-- Calculate the number of records.
BIGINT COUNT([DISTINCT|ALL] <colname>)
-- Calculate the number of records in a window.
BIGINT COUNT(*) OVER ([partition_clause] [orderby_clause] [frame_clause])
BIGINT COUNT([DISTINCT] <expr>[,...]) OVER ([partition_clause] [orderby_clause] [frame_clause])Parameters
DISTINCT|ALL: Optional. Specifies whether to remove duplicate records before counting. The default value is ALL, which counts all records. If you specify DISTINCT, the function counts only unique records.
colname: Required. The column whose values you want to count. This parameter can be of any data type. You can specify
*for colname, as inCOUNT(*). In this case, the function returns the total number of rows. Rows where the colname value is NULL are not included in the calculation.expr: Required. An expression. This parameter can be of any data type. Rows where the expression evaluates to NULL are not included in the calculation. If you specify the DISTINCT keyword, the function returns the count of unique values.
COUNT([DISTINCT] <expr>[,...]): Counts the number of rows in the specified window where the values of all specified expressions are not NULL. If you specify theDISTINCTkeyword, the function counts the rows after removing duplicate rows.partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.
Return value
The function returns a BIGINT value. Rows where colname is NULL are not included in the calculation.
Examples
Prepare test data
If you already have data, you can skip this step.
Download the test data test_data.txt.
Create a test table.
CREATE TABLE IF NOT EXISTS emp( empno BIGINT, ename STRING, job STRING, mgr BIGINT, hiredate DATETIME, sal BIGINT, comm BIGINT, deptno BIGINT );Load the data.
Replace
FILE_PATHwith the actual path and name of your data file.TUNNEL UPLOAD FILE_PATH emp;
Example 1: Specify a column for the window and return the cumulative count without sorting
This example partitions the window by the salary (sal) column. The data is not sorted. The function returns the cumulative count of all rows in the current window. The window contains all rows that have the same sal value.
Sample command
SELECT sal, COUNT(sal) OVER (PARTITION BY sal) AS count FROM emp;Result
+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 2 | -- The partition contains two rows where sal is 1250. The function returns 2 for both rows. | 1250 | 2 | -- The count is also 2 for the second row in the partition. | 1300 | 2 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 2 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 2 | | 3000 | 2 | | 5000 | 2 | | 5000 | 2 | +------------+------------+
Example 2: In non-Hive compatible mode, specify a column for the window and return the cumulative count after sorting
In non-Hive compatible mode, this example partitions the window by the salary (sal) column and sorts the data. The function returns the cumulative count from the first row to the current row in the current window. The window contains all rows that have the same sal value.
Sample command
-- Disable Hive compatible mode. SET odps.sql.hive.compatible=false; SELECT sal, COUNT(sal) OVER (PARTITION BY sal ORDER BY sal) AS count FROM emp;Return value
+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 1 | -- The window partition starts. The running count for the first row is 1. | 1250 | 2 | -- The running count for the second row is 2. | 1300 | 1 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 1 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 1 | | 3000 | 2 | | 5000 | 1 | | 5000 | 2 | +------------+------------+
Example 3: In Hive compatible mode, specify a column for the window and return the cumulative count after sorting
In Hive compatible mode, this example partitions the window by the salary (sal) column and sorts the data. The function returns the cumulative count from the first row to the last row in the current window. The window contains all rows that have the same sal value.
Sample command
-- Enable Hive compatible mode. SET odps.sql.hive.compatible=true; SELECT sal, COUNT(sal) OVER (PARTITION BY sal ORDER BY sal) AS count FROM emp;Result
+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 2 | -- The partition contains two rows where sal is 1250. The function returns 2 for both rows. | 1250 | 2 | -- The count is also 2 for the second row in the partition. | 1300 | 2 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 2 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 2 | | 3000 | 2 | | 5000 | 2 | | 5000 | 2 | +------------+------------+
Example 4: Return the total number of rows
Calculate the total number of employees in all departments.
Sample command
SELECT COUNT(*) FROM emp;Result
+------------+ | _c0 | +------------+ | 17 | +------------+
Example 5: Group data and calculate the total for each group
This example uses the COUNT function with GROUP BY to group all employees by department (deptno) and calculate the number of employees in each department.
Sample command
SELECT deptno, COUNT(*) FROM emp GROUP BY deptno;Result
+------------+------------+ | deptno | _c1 | +------------+------------+ | 20 | 5 | | 30 | 6 | | 10 | 6 | +------------+------------+
Example 6: Count unique values
This example uses DISTINCT to remove duplicate values and count the number of departments.
Sample command
SELECT COUNT(DISTINCT deptno) FROM emp;Return value
+------------+ | _c0 | +------------+ | 3 | +------------+
CUME_DIST
Syntax
double cume_dist() over([partition_clause] [orderby_clause])Description
Calculates the cumulative distribution of a value within a group of values. The result is the number of rows with values less than or equal to the current row's value, divided by the total number of rows in the partition. The comparison is determined by the orderby_clause.
Parameters
partition_clause and orderby_clause: For more information, see windowing_definition.
Return value
Returns a DOUBLE value. The specific return value is equal to
row_number_of_last_peer / partition_row_count, whererow_number_of_last_peeris the ROW_NUMBER window function's return value for the last row of the current row's GROUP, andpartition_row_countis the number of rows in the partition to which the row belongs.Example
Partition by department (deptno) and calculate the cumulative distribution of salary (sal) within each department. The command is as follows:
select deptno, ename, sal, concat(round(cume_dist() over (partition by deptno order by sal desc)*100,2),'%') as cume_dist from emp;The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | cume_dist | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 33.33% | | 10 | KING | 5000 | 33.33% | | 10 | CLARK | 2450 | 66.67% | | 10 | WELAN | 2450 | 66.67% | | 10 | TEBAGE | 1300 | 100.0% | | 10 | MILLER | 1300 | 100.0% | | 20 | SCOTT | 3000 | 40.0% | | 20 | FORD | 3000 | 40.0% | | 20 | JONES | 2975 | 60.0% | | 20 | ADAMS | 1100 | 80.0% | | 20 | SMITH | 800 | 100.0% | | 30 | BLAKE | 2850 | 16.67% | | 30 | ALLEN | 1600 | 33.33% | | 30 | TURNER | 1500 | 50.0% | | 30 | MARTIN | 1250 | 83.33% | | 30 | WARD | 1250 | 83.33% | | 30 | JAMES | 950 | 100.0% | +------------+------------+------------+------------+
DENSE_RANK
Syntax
bigint dense_rank() over ([partition_clause] [orderby_clause])Description
Calculates the rank of the current row within its partition based on the sort order specified in the orderby_clause. The ranking starts at 1. Rows with the same
order byvalues in a partition are assigned the same rank. The rank increments by 1 each time theorder byvalue changes.Parameters
partition_clause and orderby_clause: For more information, see windowing_definition.
Return value
Returns a BIGINT value. If an orderby_clause is not specified, all rows receive a rank of 1.
Example
Partition by department (deptno) and rank employees within each department based on their salary (sal) in descending order. The command is as follows:
select deptno, ename, sal, dense_rank() over (partition by deptno order by sal desc) as nums from emp;The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | nums | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 1 | | 10 | CLARK | 2450 | 2 | | 10 | WELAN | 2450 | 2 | | 10 | TEBAGE | 1300 | 3 | | 10 | MILLER | 1300 | 3 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 1 | | 20 | JONES | 2975 | 2 | | 20 | ADAMS | 1100 | 3 | | 20 | SMITH | 800 | 4 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 2 | | 30 | TURNER | 1500 | 3 | | 30 | MARTIN | 1250 | 4 | | 30 | WARD | 1250 | 4 | | 30 | JAMES | 950 | 5 | +------------+------------+------------+------------+
FIRST_VALUE
Syntax
first_value(<expr>[, <ignore_nulls>]) over ([partition_clause] [orderby_clause] [frame_clause])Description
Returns the value of the expression expr from the first row of the window frame.
Parameters
expr: Required. The expression for which to calculate the result.
ignore_nulls: Optional. A BOOLEAN value that specifies whether to ignore NULL values. The default value is False. If this parameter is set to True, the function returns the first non-NULL value of expr in the window frame.
partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.
Return value
The return value has the same data type as expr.
Example
The following command groups all employees by department and returns the first row of data from each group:
Without specifying order by:
select deptno, ename, sal, first_value(sal) over (partition by deptno) as first_value from emp;The following result is returned:
+------------+------------+------------+-------------+ | deptno | ename | sal | first_value | +------------+------------+------------+-------------+ | 10 | TEBAGE | 1300 | 1300 | -- First row of the current window. | 10 | CLARK | 2450 | 1300 | | 10 | KING | 5000 | 1300 | | 10 | MILLER | 1300 | 1300 | | 10 | JACCKA | 5000 | 1300 | | 10 | WELAN | 2450 | 1300 | | 20 | FORD | 3000 | 3000 | -- First row of the current window. | 20 | SCOTT | 3000 | 3000 | | 20 | SMITH | 800 | 3000 | | 20 | ADAMS | 1100 | 3000 | | 20 | JONES | 2975 | 3000 | | 30 | TURNER | 1500 | 1500 | -- First row of the current window. | 30 | JAMES | 950 | 1500 | | 30 | ALLEN | 1600 | 1500 | | 30 | WARD | 1250 | 1500 | | 30 | MARTIN | 1250 | 1500 | | 30 | BLAKE | 2850 | 1500 | +------------+------------+------------+-------------+Specifying order by:
select deptno, ename, sal, first_value(sal) over (partition by deptno order by sal desc) as first_value from emp;The following result is returned:
+------------+------------+------------+-------------+ | deptno | ename | sal | first_value | +------------+------------+------------+-------------+ | 10 | JACCKA | 5000 | 5000 | -- First row of the current window. | 10 | KING | 5000 | 5000 | | 10 | CLARK | 2450 | 5000 | | 10 | WELAN | 2450 | 5000 | | 10 | TEBAGE | 1300 | 5000 | | 10 | MILLER | 1300 | 5000 | | 20 | SCOTT | 3000 | 3000 | -- First row of the current window. | 20 | FORD | 3000 | 3000 | | 20 | JONES | 2975 | 3000 | | 20 | ADAMS | 1100 | 3000 | | 20 | SMITH | 800 | 3000 | | 30 | BLAKE | 2850 | 2850 | -- First row of the current window. | 30 | ALLEN | 1600 | 2850 | | 30 | TURNER | 1500 | 2850 | | 30 | MARTIN | 1250 | 2850 | | 30 | WARD | 1250 | 2850 | | 30 | JAMES | 950 | 2850 | +------------+------------+------------+-------------+
LAG
Syntax
lag(<expr>[,bigint <offset>[, <default>]]) over([partition_clause] orderby_clause)Description
Returns the value of the expression expr from the row that is offset rows before the current row (towards the head of the partition). The expression expr can be a column, a column operation, or a function operation.
Parameters
expr: Required. The expression for which to calculate the result.
offset: Optional. The offset, which is a BIGINT constant greater than or equal to 1. A value of 1 indicates the preceding row. The default value is 1. If the input value is of the STRING or DOUBLE type, it is implicitly converted to the BIGINT type for calculation.
default: Optional. Specifies a default value to return when the offset is out of bounds. If this parameter is not specified, the default is NULL. The value must be a constant with the same data type as expr. If expr is not a constant, this value is evaluated based on the current row.
partition_clause and orderby_clause: For more information, see windowing_definition.
Return value
The return value has the same data type as expr.
Example
Partition by department (deptno) and retrieve the salary (sal) from the preceding row for each employee. The command is as follows:
select deptno, ename, sal, lag(sal, 1) over (partition by deptno order by sal) as sal_new from emp;The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | sal_new | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | NULL | | 10 | MILLER | 1300 | 1300 | | 10 | CLARK | 2450 | 1300 | | 10 | WELAN | 2450 | 2450 | | 10 | KING | 5000 | 2450 | | 10 | JACCKA | 5000 | 5000 | | 20 | SMITH | 800 | NULL | | 20 | ADAMS | 1100 | 800 | | 20 | JONES | 2975 | 1100 | | 20 | SCOTT | 3000 | 2975 | | 20 | FORD | 3000 | 3000 | | 30 | JAMES | 950 | NULL | | 30 | MARTIN | 1250 | 950 | | 30 | WARD | 1250 | 1250 | | 30 | TURNER | 1500 | 1250 | | 30 | ALLEN | 1600 | 1500 | | 30 | BLAKE | 2850 | 1600 | +------------+------------+------------+------------+
LAST_VALUE
Syntax
last_value(<expr>[, <ignore_nulls>]) over([partition_clause] [orderby_clause] [frame_clause])Description
Returns the value of the expression expr from the last row of the window frame.
Parameters
expr: Required. The expression for which to calculate the result.
ignore_nulls: Optional. A BOOLEAN value that specifies whether to ignore NULL values. The default value is False. If this parameter is set to True, the function returns the last non-NULL value of expr in the window frame.
partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.
Return value
The return value has the same data type as expr.
Example
The following command groups all employees by department and returns the last row of data from each group:
Without an order by clause, the window frame includes all rows in the partition. The function returns the value from the last row in the partition.
select deptno, ename, sal, last_value(sal) over (partition by deptno) as last_value from emp;The following result is returned:
+------------+------------+------------+-------------+ | deptno | ename | sal | last_value | +------------+------------+------------+-------------+ | 10 | TEBAGE | 1300 | 2450 | | 10 | CLARK | 2450 | 2450 | | 10 | KING | 5000 | 2450 | | 10 | MILLER | 1300 | 2450 | | 10 | JACCKA | 5000 | 2450 | | 10 | WELAN | 2450 | 2450 | -- Last row of the current window. | 20 | FORD | 3000 | 2975 | | 20 | SCOTT | 3000 | 2975 | | 20 | SMITH | 800 | 2975 | | 20 | ADAMS | 1100 | 2975 | | 20 | JONES | 2975 | 2975 | -- Last row of the current window. | 30 | TURNER | 1500 | 2850 | | 30 | JAMES | 950 | 2850 | | 30 | ALLEN | 1600 | 2850 | | 30 | WARD | 1250 | 2850 | | 30 | MARTIN | 1250 | 2850 | | 30 | BLAKE | 2850 | 2850 | -- Last row of the current window. +------------+------------+------------+-------------+With an order by clause, the default window frame is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. The function returns the value from the current row.
select deptno, ename, sal, last_value(sal) over (partition by deptno order by sal desc) as last_value from emp;The following result is returned:
+------------+------------+------------+-------------+ | deptno | ename | sal | last_value | +------------+------------+------------+-------------+ | 10 | JACCKA | 5000 | 5000 | -- Current row of the current window. | 10 | KING | 5000 | 5000 | -- Current row of the current window. | 10 | CLARK | 2450 | 2450 | -- Current row of the current window. | 10 | WELAN | 2450 | 2450 | -- Current row of the current window. | 10 | TEBAGE | 1300 | 1300 | -- Current row of the current window. | 10 | MILLER | 1300 | 1300 | -- Current row of the current window. | 20 | SCOTT | 3000 | 3000 | -- Current row of the current window. | 20 | FORD | 3000 | 3000 | -- Current row of the current window. | 20 | JONES | 2975 | 2975 | -- Current row of the current window. | 20 | ADAMS | 1100 | 1100 | -- Current row of the current window. | 20 | SMITH | 800 | 800 | -- Current row of the current window. | 30 | BLAKE | 2850 | 2850 | -- Current row of the current window. | 30 | ALLEN | 1600 | 1600 | -- Current row of the current window. | 30 | TURNER | 1500 | 1500 | -- Current row of the current window. | 30 | MARTIN | 1250 | 1250 | -- Current row of the current window. | 30 | WARD | 1250 | 1250 | -- Current row of the current window. | 30 | JAMES | 950 | 950 | -- Current row of the current window. +------------+------------+------------+-------------+
LEAD
Syntax
lead(<expr>[, bigint <offset>[, <default>]]) over([partition_clause] orderby_clause)Description
Returns the value of the expression expr from the row that is offset rows after the current row (towards the tail of the partition). The expression expr can be a column, a column operation, or a function operation.
Parameters
expr: Required. The expression for which to calculate the result.
offset: Optional. The offset, which is a BIGINT constant greater than or equal to 0. A value of 0 indicates the current row, and a value of 1 indicates the next row. The default value is 1. If the input value is of the STRING or DOUBLE type, it is implicitly converted to the BIGINT type for calculation.
default: Optional. The value to return if the offset is out of bounds. This value must be a constant with the same data type as expr. The default is NULL. If expr is not a constant, the value is evaluated based on the current row.
partition_clause and orderby_clause: For more information, see windowing_definition.
Return value
The return value has the same data type as expr.
Example
Partition by department (deptno) and retrieve the salary (sal) from the next row for each employee. The command is as follows:
select deptno, ename, sal, lead(sal, 1) over (partition by deptno order by sal) as sal_new from emp;The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | sal_new | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | 1300 | | 10 | MILLER | 1300 | 2450 | | 10 | CLARK | 2450 | 2450 | | 10 | WELAN | 2450 | 5000 | | 10 | KING | 5000 | 5000 | | 10 | JACCKA | 5000 | NULL | | 20 | SMITH | 800 | 1100 | | 20 | ADAMS | 1100 | 2975 | | 20 | JONES | 2975 | 3000 | | 20 | SCOTT | 3000 | 3000 | | 20 | FORD | 3000 | NULL | | 30 | JAMES | 950 | 1250 | | 30 | MARTIN | 1250 | 1250 | | 30 | WARD | 1250 | 1500 | | 30 | TURNER | 1500 | 1600 | | 30 | ALLEN | 1600 | 2850 | | 30 | BLAKE | 2850 | NULL | +------------+------------+------------+------------+
MAX
Syntax
max(<expr>) over([partition_clause] [orderby_clause] [frame_clause])Description
Returns the maximum value of expr in a window.
Parameters
expr: Required. The expression used to calculate the maximum value. It can be of any data type except BOOLEAN. If the value is NULL, the row is not included in the calculation.
partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.
Return value
The return value has the same data type as expr.
Examples
Example 1: Partition by department (deptno), calculate the maximum salary (sal), and do not sort. The function returns the maximum value of the current partition (rows with the same deptno). The command is as follows:
select deptno, sal, max(sal) over (partition by deptno) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 5000 | -- First row of the window. The value is the maximum from the first to the sixth row. | 10 | 2450 | 5000 | -- The value is the maximum from the first to the sixth row. | 10 | 5000 | 5000 | -- The value is the maximum from the first to the sixth row. | 10 | 1300 | 5000 | | 10 | 5000 | 5000 | | 10 | 2450 | 5000 | | 20 | 3000 | 3000 | | 20 | 3000 | 3000 | | 20 | 800 | 3000 | | 20 | 1100 | 3000 | | 20 | 2975 | 3000 | | 30 | 1500 | 2850 | | 30 | 950 | 2850 | | 30 | 1600 | 2850 | | 30 | 1250 | 2850 | | 30 | 1250 | 2850 | | 30 | 2850 | 2850 | +------------+------------+------------+Example 2: Partition by department (deptno), calculate the maximum salary (sal), and sort the results. The function returns the maximum value from the first row to the current row of the current partition (rows with the same deptno). The command is as follows:
select deptno, sal, max(sal) over (partition by deptno order by sal) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- First row of the window. | 10 | 1300 | 1300 | -- Maximum value from the first to the second row. | 10 | 2450 | 2450 | -- Maximum value from the first to the third row. | 10 | 2450 | 2450 | -- Maximum value from the first to the fourth row. | 10 | 5000 | 5000 | | 10 | 5000 | 5000 | | 20 | 800 | 800 | | 20 | 1100 | 1100 | | 20 | 2975 | 2975 | | 20 | 3000 | 3000 | | 20 | 3000 | 3000 | | 30 | 950 | 950 | | 30 | 1250 | 1250 | | 30 | 1250 | 1250 | | 30 | 1500 | 1500 | | 30 | 1600 | 1600 | | 30 | 2850 | 2850 | +------------+------------+------------+
MEDIAN
Syntax
median(<expr>) over ([partition_clause] [orderby_clause] [frame_clause])Description
Calculates the median of expr in a window.
Parameters
expr: Required. The expression for which to calculate the median. It must be of the DOUBLE or DECIMAL type.
If the input value is of the STRING or BIGINT type, it is implicitly converted to the DOUBLE type for calculation. An error is returned for other data types.
If the input is NULL, the return value is NULL.
partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.
Return value
Returns a DOUBLE or DECIMAL value. If all expr values are NULL, NULL is returned.
Example
Partition by department (deptno) and calculate the median salary (sal). The function returns the median for the entire partition (all rows with the same deptno). The command is as follows:
select deptno, sal, median(sal) over (partition by deptno) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 2450.0 | -- First row of the window. The value is the median from the first to the sixth row. | 10 | 2450 | 2450.0 | | 10 | 5000 | 2450.0 | | 10 | 1300 | 2450.0 | | 10 | 5000 | 2450.0 | | 10 | 2450 | 2450.0 | | 20 | 3000 | 2975.0 | | 20 | 3000 | 2975.0 | | 20 | 800 | 2975.0 | | 20 | 1100 | 2975.0 | | 20 | 2975 | 2975.0 | | 30 | 1500 | 1375.0 | | 30 | 950 | 1375.0 | | 30 | 1600 | 1375.0 | | 30 | 1250 | 1375.0 | | 30 | 1250 | 1375.0 | | 30 | 2850 | 1375.0 | +------------+------------+------------+
MIN
Syntax
min(<expr>) over([partition_clause] [orderby_clause] [frame_clause])Description
Returns the minimum value of expr in a window.
Parameters
expr: Required. The expression used to calculate the minimum value. It can be of any data type except BOOLEAN. If the value is NULL, the row is not included in the calculation.
partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.
Return value
The return value has the same data type as expr.
Examples
Example 1: Partition by department (deptno), calculate the minimum salary (sal), and do not sort. The function returns the minimum value of the current partition (rows with the same deptno). The command is as follows:
select deptno, sal, min(sal) over (partition by deptno) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- First row of the window. The value is the minimum from the first to the sixth row. | 10 | 2450 | 1300 | -- The value is the minimum from the first to the sixth row. | 10 | 5000 | 1300 | -- The value is the minimum from the first to the sixth row. | 10 | 1300 | 1300 | | 10 | 5000 | 1300 | | 10 | 2450 | 1300 | | 20 | 3000 | 800 | | 20 | 3000 | 800 | | 20 | 800 | 800 | | 20 | 1100 | 800 | | 20 | 2975 | 800 | | 30 | 1500 | 950 | | 30 | 950 | 950 | | 30 | 1600 | 950 | | 30 | 1250 | 950 | | 30 | 1250 | 950 | | 30 | 2850 | 950 | +------------+------------+------------+Example 2: Partition by department (deptno), calculate the minimum salary (sal), and sort the results. The function returns the minimum value from the first row to the current row of the current partition (rows with the same deptno). The command is as follows:
select deptno, sal, min(sal) over (partition by deptno order by sal) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- First row of the window. | 10 | 1300 | 1300 | -- Minimum value from the first to the second row. | 10 | 2450 | 1300 | -- Minimum value from the first to the third row. | 10 | 2450 | 1300 | | 10 | 5000 | 1300 | | 10 | 5000 | 1300 | | 20 | 800 | 800 | | 20 | 1100 | 800 | | 20 | 2975 | 800 | | 20 | 3000 | 800 | | 20 | 3000 | 800 | | 30 | 950 | 950 | | 30 | 1250 | 950 | | 30 | 1250 | 950 | | 30 | 1500 | 950 | | 30 | 1600 | 950 | | 30 | 2850 | 950 | +------------+------------+------------+
NTILE
Syntax
bigint ntile(bigint <N>) over ([partition_clause] [orderby_clause])Description
Divides the ordered rows in a partition into N groups, as equal in size as possible, and returns the group number for each row. If the number of rows is not evenly divisible by N, the first groups (those with smaller group numbers) will have one extra row.
Parameters
N: Required. The number of groups. A BIGINT value.
partition_clause and orderby_clause: For more information, see windowing_definition.
Return value
Returns a BIGINT value.
Example
Divide all employees into 3 groups within each department based on salary (sal) in descending order, and return the group number for each employee. The command is as follows:
select deptno, ename, sal, ntile(3) over (partition by deptno order by sal desc) as nt3 from emp;The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | nt3 | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 1 | | 10 | CLARK | 2450 | 2 | | 10 | WELAN | 2450 | 2 | | 10 | TEBAGE | 1300 | 3 | | 10 | MILLER | 1300 | 3 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 1 | | 20 | JONES | 2975 | 2 | | 20 | ADAMS | 1100 | 2 | | 20 | SMITH | 800 | 3 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 1 | | 30 | TURNER | 1500 | 2 | | 30 | MARTIN | 1250 | 2 | | 30 | WARD | 1250 | 3 | | 30 | JAMES | 950 | 3 | +------------+------------+------------+------------+
NTH_VALUE
Syntax
nth_value(<expr>, <number> [, <ignore_nulls>]) over ([partition_clause] [orderby_clause] [frame_clause])Description
Returns the value of the expression expr from the Nth row of the window frame.
Parameters
expr: Required. The expression for which to calculate the result.
number: Required. A BIGINT value. An integer greater than or equal to 1. If the value is 1, this function is equivalent to FIRST_VALUE.
ignore_nulls: Optional. A BOOLEAN value that specifies whether to ignore NULL values. The default value is False. If this parameter is set to True, the function returns the Nth non-NULL value of expr in the window frame.
partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.
Return value
The return value has the same data type as expr.
Example
The following command groups all employees by department and returns the 6th row of data from each group:
Without an order by clause, the window frame includes all rows in the partition. The function returns the value of the 6th row in the partition.
select deptno, ename, sal, nth_value(sal,6) over (partition by deptno) as nth_value from emp;The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | nth_value | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | 2450 | | 10 | CLARK | 2450 | 2450 | | 10 | KING | 5000 | 2450 | | 10 | MILLER | 1300 | 2450 | | 10 | JACCKA | 5000 | 2450 | | 10 | WELAN | 2450 | 2450 | -- 6th row of the current window. | 20 | FORD | 3000 | NULL | | 20 | SCOTT | 3000 | NULL | | 20 | SMITH | 800 | NULL | | 20 | ADAMS | 1100 | NULL | | 20 | JONES | 2975 | NULL | -- The current window does not have a 6th row, so NULL is returned. | 30 | TURNER | 1500 | 2850 | | 30 | JAMES | 950 | 2850 | | 30 | ALLEN | 1600 | 2850 | | 30 | WARD | 1250 | 2850 | | 30 | MARTIN | 1250 | 2850 | | 30 | BLAKE | 2850 | 2850 | -- 6th row of the current window. +------------+------------+------------+------------+With an order by clause, the default window frame is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. The function returns the value of the 6th row in the window frame.
select deptno, ename, sal, nth_value(sal,6) over (partition by deptno order by sal) as nth_value from emp;The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | nth_value | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | NULL | | 10 | MILLER | 1300 | NULL | -- The current window has only 2 rows, so the 6th row exceeds the window length. | 10 | CLARK | 2450 | NULL | | 10 | WELAN | 2450 | NULL | | 10 | KING | 5000 | 5000 | | 10 | JACCKA | 5000 | 5000 | | 20 | SMITH | 800 | NULL | | 20 | ADAMS | 1100 | NULL | | 20 | JONES | 2975 | NULL | | 20 | SCOTT | 3000 | NULL | | 20 | FORD | 3000 | NULL | | 30 | JAMES | 950 | NULL | | 30 | MARTIN | 1250 | NULL | | 30 | WARD | 1250 | NULL | | 30 | TURNER | 1500 | NULL | | 30 | ALLEN | 1600 | NULL | | 30 | BLAKE | 2850 | 2850 | +------------+------------+------------+------------+
PERCENT_RANK
Syntax
double percent_rank() over([partition_clause] [orderby_clause])Description
Calculates the percentile rank of the current row within its partition, based on the sort order specified by orderby_clause.
Parameters
partition_clause and orderby_clause: For more information, see windowing_definition.
Return value
Returns a DOUBLE value in the range of [0.0, 1.0]. The specific return value is equal to
“(rank - 1) / (partition_row_count - 1)”, whererankis the result of the RANK window function for that row, andpartition_row_countis the number of rows in the partition to which the row belongs. If the partition contains only one row, the output is 0.0.Example
Calculate the percentile rank of each employee's salary within their department. The command is as follows:
select deptno, ename, sal, percent_rank() over (partition by deptno order by sal desc) as sal_new from emp;The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | sal_new | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 0.0 | | 10 | KING | 5000 | 0.0 | | 10 | CLARK | 2450 | 0.4 | | 10 | WELAN | 2450 | 0.4 | | 10 | TEBAGE | 1300 | 0.8 | | 10 | MILLER | 1300 | 0.8 | | 20 | SCOTT | 3000 | 0.0 | | 20 | FORD | 3000 | 0.0 | | 20 | JONES | 2975 | 0.5 | | 20 | ADAMS | 1100 | 0.75 | | 20 | SMITH | 800 | 1.0 | | 30 | BLAKE | 2850 | 0.0 | | 30 | ALLEN | 1600 | 0.2 | | 30 | TURNER | 1500 | 0.4 | | 30 | MARTIN | 1250 | 0.6 | | 30 | WARD | 1250 | 0.6 | | 30 | JAMES | 950 | 1.0 | +------------+------------+------------+------------+
PERCENTILE_CONT
Syntax
-- Calculate the exact percentile PERCENTILE_CONT(<col_name>, DOUBLE <percentile>[, BOOLEAN <isIgnoreNull>]) -- Calculate the exact percentile in a window PERCENTILE_CONT(<col_name>, DOUBLE <percentile>[, BOOLEAN <isIgnoreNull>]) OVER ([partition_clause] [orderby_clause])Description
Calculates the exact percentile. It uses a linear interpolation algorithm, sorts the specified column in ascending order, and returns the exact value at the specified percentile.
Parameters
col_name: Required. A column of the DOUBLE or DECIMAL type.
percentile: Required. The percentile to calculate. A DOUBLE constant in the range of [0, 1].
isIgnoreNull: Optional. Specifies whether to ignore NULL values. A BOOLEAN constant. The default value is TRUE. If set to FALSE, NULL values are treated as the minimum value during sorting.
partition_clause and orderby_clause: For more information, see Window functions.
Return value
Returns the calculated percentile value as a DOUBLE.
Examples
Example 1: Ignore NULL values and calculate the exact percentile in a window.
SELECT PERCENTILE_CONT(x, 0) OVER() AS min, PERCENTILE_CONT(x, 0.01) OVER() AS percentile1, PERCENTILE_CONT(x, 0.5) OVER() AS median, PERCENTILE_CONT(x, 0.9) OVER() AS percentile90, PERCENTILE_CONT(x, 1) OVER() AS max FROM VALUES(0D),(3D),(NULL),(1D),(2D) AS tbl(x) LIMIT 1; -- Return result +------------+-------------+------------+--------------+------------+ | min | percentile1 | median | percentile90 | max | +------------+-------------+------------+--------------+------------+ | 0.0 | 0.03 | 1.5 | 2.7 | 3.0 | +------------+-------------+------------+--------------+------------+Example 2: Do not ignore NULL values. NULL values are treated as the minimum value during sorting. Calculate the exact percentile in a window.
SELECT PERCENTILE_CONT(x, 0, false) OVER() AS min, PERCENTILE_CONT(x, 0.01, false) OVER() AS percentile1, PERCENTILE_CONT(x, 0.5, false) OVER() AS median, PERCENTILE_CONT(x, 0.9, false) OVER() AS percentile90, PERCENTILE_CONT(x, 1, false) OVER() AS max FROM VALUES(0D),(3D),(NULL),(1D),(2D) AS tbl(x) LIMIT 1; -- Return result +------------+-------------+------------+--------------+------------+ | min | percentile1 | median | percentile90 | max | +------------+-------------+------------+--------------+------------+ | NULL | 0.0 | 1.0 | 2.6 | 3.0 | +------------+-------------+------------+--------------+------------+
PERCENTILE_DISC
Syntax
-- Calculate a given percentile value PERCENTILE_DISC(<col_name>, DOUBLE <percentile>[, BOOLEAN <isIgnoreNull>]) -- Calculate the percentile value in a window PERCENTILE_DISC(<col_name>, DOUBLE <percentile>[, BOOLEAN <isIgnoreNull>]) OVER ([partition_clause] [orderby_clause])Description
Calculates a given percentile value. It first sorts the specified column in ascending order and then returns the first value whose cumulative distribution is greater than or equal to the specified percentile.
Parameters
col_name: Required. A column with any sortable data type.
percentile: Required. The percentile to calculate. A DOUBLE constant in the range of [0, 1].
isIgnoreNull: Optional. Specifies whether to ignore NULL values. A BOOLEAN constant. The default value is TRUE. If set to FALSE, NULL values are treated as the minimum value during sorting.
partition_clause and orderby_clause: For more information, see Window functions.
Return value
Returns the calculated percentile value. The data type is the same as the input col_name column.
Examples
Example 1: Ignore NULL values and calculate the percentile value in a window.
SELECT x, PERCENTILE_DISC(x, 0) OVER() AS min, PERCENTILE_DISC(x, 0.5) OVER() AS median, PERCENTILE_DISC(x, 1) OVER() AS max FROM VALUES('c'),(NULL),('b'),('a') AS tbl(x); -- Return result +------------+------------+------------+------------+ | x | min | median | max | +------------+------------+------------+------------+ | c | a | b | c | | NULL | a | b | c | | b | a | b | c | | a | a | b | c | +------------+------------+------------+------------+Example 2: Do not ignore NULL values. NULL values are treated as the minimum value during sorting. Calculate the percentile value in a window.
SELECT x, PERCENTILE_DISC(x, 0, false) OVER() AS min, PERCENTILE_DISC(x, 0.5, false) OVER() AS median, PERCENTILE_DISC(x, 1, false) OVER() AS max FROM VALUES('c'),(NULL),('b'),('a') AS tbl(x); -- Return result +------------+------------+------------+------------+ | x | min | median | max | +------------+------------+------------+------------+ | c | NULL | a | c | | NULL | NULL | a | c | | b | NULL | a | c | | a | NULL | a | c | +------------+------------+------------+------------+
RANK
Syntax
bigint rank() over ([partition_clause] [orderby_clause])Description
Calculates the rank of the current row within its partition, based on the sort order specified by orderby_clause. Counting starts from 1.
Parameters
partition_clause and orderby_clause: For more information, see windowing_definition.
Return value
Returns a BIGINT value. The return values may be duplicated and non-consecutive. The specific return value is the
ROW_NUMBER()value of the first row in the GROUP to which the data row belongs. If an orderby_clause is not specified, all rows receive a rank of 1.Example
Partition by department (deptno) and rank employees within each department based on their salary (sal) in descending order. The command is as follows:
select deptno, ename, sal, rank() over (partition by deptno order by sal desc) as nums from emp;The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | nums | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 1 | | 10 | CLARK | 2450 | 3 | | 10 | WELAN | 2450 | 3 | | 10 | TEBAGE | 1300 | 5 | | 10 | MILLER | 1300 | 5 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 1 | | 20 | JONES | 2975 | 3 | | 20 | ADAMS | 1100 | 4 | | 20 | SMITH | 800 | 5 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 2 | | 30 | TURNER | 1500 | 3 | | 30 | MARTIN | 1250 | 4 | | 30 | WARD | 1250 | 4 | | 30 | JAMES | 950 | 6 | +------------+------------+------------+------------+
ROW_NUMBER
Syntax
row_number() over([partition_clause] [orderby_clause])Description
Calculates the row number of the current row within its partition, starting from 1.
Parameters
For more information, see windowing_definition. The frame_clause is not allowed.
Return value
Returns a BIGINT value.
Example
Partition by department (deptno) and assign a unique, sequential number to each employee within their department, based on salary (sal) in descending order. The command is as follows:
select deptno, ename, sal, row_number() over (partition by deptno order by sal desc) as nums from emp;The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | nums | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 2 | | 10 | CLARK | 2450 | 3 | | 10 | WELAN | 2450 | 4 | | 10 | TEBAGE | 1300 | 5 | | 10 | MILLER | 1300 | 6 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 2 | | 20 | JONES | 2975 | 3 | | 20 | ADAMS | 1100 | 4 | | 20 | SMITH | 800 | 5 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 2 | | 30 | TURNER | 1500 | 3 | | 30 | MARTIN | 1250 | 4 | | 30 | WARD | 1250 | 5 | | 30 | JAMES | 950 | 6 | +------------+------------+------------+------------+
STDDEV
Syntax
double stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause]) decimal stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])Description
Calculates the population standard deviation. This is an alias for the STDDEV_POP function.
Parameters
expr: Required. The expression for which to calculate the population standard deviation. It must be of the DOUBLE or DECIMAL type.
If the input value is of the STRING or BIGINT type, it is implicitly converted to the DOUBLE type for calculation. An error is returned for other data types.
If the input value is NULL, the row is not included in the calculation.
If you specify the distinct keyword, the function calculates the population standard deviation of unique values.
partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.
Return value
The return value has the same data type as expr. If all values of expr are NULL, NULL is returned.
Examples
Example 1: Partition by department (deptno), calculate the population standard deviation of salary (sal), and do not sort. The function returns the cumulative population standard deviation of the current partition (rows with the same deptno). The command is as follows:
select deptno, sal, stddev(sal) over (partition by deptno) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1546.1421524412158 | -- First row of the window. The value is the cumulative population standard deviation from the first to the sixth row. | 10 | 2450 | 1546.1421524412158 | -- The value is the cumulative population standard deviation from the first to the sixth row. | 10 | 5000 | 1546.1421524412158 | | 10 | 1300 | 1546.1421524412158 | | 10 | 5000 | 1546.1421524412158 | | 10 | 2450 | 1546.1421524412158 | | 20 | 3000 | 1004.7387720198718 | | 20 | 3000 | 1004.7387720198718 | | 20 | 800 | 1004.7387720198718 | | 20 | 1100 | 1004.7387720198718 | | 20 | 2975 | 1004.7387720198718 | | 30 | 1500 | 610.1001739241042 | | 30 | 950 | 610.1001739241042 | | 30 | 1600 | 610.1001739241042 | | 30 | 1250 | 610.1001739241042 | | 30 | 1250 | 610.1001739241042 | | 30 | 2850 | 610.1001739241042 | +------------+------------+------------+Example 2: In non-Hive compatible mode, partition by department (deptno), calculate the population standard deviation of salary (sal), and sort the results. The function returns the cumulative population standard deviation from the first row to the current row of the current partition (rows with the same deptno). The commands are as follows:
-- Disable Hive compatible mode. set odps.sql.hive.compatible=false; -- Execute the following SQL command. select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 0.0 | -- First row of the window. | 10 | 1300 | 0.0 | -- Cumulative population standard deviation from the first to the second row. | 10 | 2450 | 542.1151989096865 | -- Cumulative population standard deviation from the first to the third row. | 10 | 2450 | 575.0 | -- Cumulative population standard deviation from the first to the fourth row. | 10 | 5000 | 1351.6656391282572 | | 10 | 5000 | 1546.1421524412158 | | 20 | 800 | 0.0 | | 20 | 1100 | 150.0 | | 20 | 2975 | 962.4188277460079 | | 20 | 3000 | 1024.2947268730811 | | 20 | 3000 | 1004.7387720198718 | | 30 | 950 | 0.0 | | 30 | 1250 | 150.0 | | 30 | 1250 | 141.4213562373095 | | 30 | 1500 | 194.8557158514987 | | 30 | 1600 | 226.71568097509268 | | 30 | 2850 | 610.1001739241042 | +------------+------------+------------+Example 3: In Hive compatible mode, partition by department (deptno), calculate the population standard deviation of salary (sal), and sort the results. The function returns the cumulative population standard deviation from the first row to the row with the same value as the current row (rows with the same sal have the same population standard deviation). The commands are as follows:
-- Enable Hive compatible mode. set odps.sql.hive.compatible=true; -- Execute the following SQL command. select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 0.0 | -- First row of the window. Since the sal of the first and second rows are the same, the population standard deviation for the first row is the cumulative population standard deviation of the first two rows. | 10 | 1300 | 0.0 | -- Cumulative population standard deviation from the first to the second row. | 10 | 2450 | 575.0 | -- Since the sal of the third and fourth rows are the same, the population standard deviation for the third row is the cumulative population standard deviation of the first four rows. | 10 | 2450 | 575.0 | -- Cumulative population standard deviation from the first to the fourth row. | 10 | 5000 | 1546.1421524412158 | | 10 | 5000 | 1546.1421524412158 | | 20 | 800 | 0.0 | | 20 | 1100 | 150.0 | | 20 | 2975 | 962.4188277460079 | | 20 | 3000 | 1004.7387720198718 | | 20 | 3000 | 1004.7387720198718 | | 30 | 950 | 0.0 | | 30 | 1250 | 141.4213562373095 | | 30 | 1250 | 141.4213562373095 | | 30 | 1500 | 194.8557158514987 | | 30 | 1600 | 226.71568097509268 | | 30 | 2850 | 610.1001739241042 | +------------+------------+------------+
STDDEV_SAMP
Syntax
double stddev_samp([distinct] <expr>) over([partition_clause] [orderby_clause] [frame_clause]) decimal stddev_samp([distinct] <expr>) over([partition_clause] [orderby_clause] [frame_clause])Description
Calculates the sample standard deviation.
Parameters
expr: Required. The expression for which to calculate the sample standard deviation. It must be of the DOUBLE or DECIMAL type.
If the input value is of the STRING or BIGINT type, it is implicitly converted to the DOUBLE type for calculation. An error is returned for other data types.
If the input value is NULL, the row is not included in the calculation.
If you specify the distinct keyword, the function calculates the sample standard deviation of unique values.
partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.
Return value
The return value has the same data type as expr. If all values of expr are NULL, NULL is returned. If the window contains only one non-NULL value for expr, the result is 0.
Examples
Example 1: Partition by department (deptno), calculate the sample standard deviation of salary (sal), and do not sort. The function returns the cumulative sample standard deviation of the current partition (rows with the same deptno). The command is as follows:
select deptno, sal, stddev_samp(sal) over (partition by deptno) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1693.7138680032904 | -- First row of the window. The value is the cumulative sample standard deviation from the first to the sixth row. | 10 | 2450 | 1693.7138680032904 | -- The value is the cumulative sample standard deviation from the first to the sixth row. | 10 | 5000 | 1693.7138680032904 | -- The value is the cumulative sample standard deviation from the first to the sixth row. | 10 | 1300 | 1693.7138680032904 | | 10 | 5000 | 1693.7138680032904 | | 10 | 2450 | 1693.7138680032904 | | 20 | 3000 | 1123.3320969330487 | | 20 | 3000 | 1123.3320969330487 | | 20 | 800 | 1123.3320969330487 | | 20 | 1100 | 1123.3320969330487 | | 20 | 2975 | 1123.3320969330487 | | 30 | 1500 | 668.331255192114 | | 30 | 950 | 668.331255192114 | | 30 | 1600 | 668.331255192114 | | 30 | 1250 | 668.331255192114 | | 30 | 1250 | 668.331255192114 | | 30 | 2850 | 668.331255192114 | +------------+------------+------------+Example 2: Partition by department (deptno), calculate the sample standard deviation of salary (sal), and sort the results. The function returns the cumulative sample standard deviation from the first row to the current row of the current partition (rows with the same deptno). The command is as follows:
select deptno, sal, stddev_samp(sal) over (partition by deptno order by sal) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 0.0 | -- First row of the window. | 10 | 1300 | 0.0 | -- Cumulative sample standard deviation from the first to the second row. | 10 | 2450 | 663.9528095680697 | -- Cumulative sample standard deviation from the first to the third row. | 10 | 2450 | 663.9528095680696 | | 10 | 5000 | 1511.2081259707413 | | 10 | 5000 | 1693.7138680032904 | | 20 | 800 | 0.0 | | 20 | 1100 | 212.13203435596427 | | 20 | 2975 | 1178.7175234126282 | | 20 | 3000 | 1182.7536725793752 | | 20 | 3000 | 1123.3320969330487 | | 30 | 950 | 0.0 | | 30 | 1250 | 212.13203435596427 | | 30 | 1250 | 173.20508075688772 | | 30 | 1500 | 225.0 | | 30 | 1600 | 253.4758371127315 | | 30 | 2850 | 668.331255192114 | +------------+------------+------------+
SUM
Syntax
sum([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])Description
Returns the sum of expr in a window.
Parameters
expr: Required. The column for which to calculate the sum. It must be of the DOUBLE, DECIMAL, or BIGINT type.
If the input value is of the STRING type, it is implicitly converted to the DOUBLE type for calculation. An error is returned for other data types.
If the input value is NULL, the row is not included in the calculation.
If you specify the distinct keyword, the function calculates the sum of unique values.
partition_clause, orderby_clause, and frame_clause: For more information, see windowing_definition.
Return value
If the input value is of the BIGINT type, a BIGINT value is returned.
If the input value is of the DECIMAL type, a DECIMAL value is returned.
If the input value is of the DOUBLE or STRING type, a DOUBLE value is returned.
If all input values are NULL, NULL is returned.
Examples
Example 1: Partition by department (deptno), calculate the sum of salary (sal), and do not sort. The function returns the cumulative sum of the current partition (rows with the same deptno). The command is as follows:
select deptno, sal, sum(sal) over (partition by deptno) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 17500 | -- First row of the window. The value is the cumulative sum from the first to the sixth row. | 10 | 2450 | 17500 | -- The value is the cumulative sum from the first to the sixth row. | 10 | 5000 | 17500 | -- The value is the cumulative sum from the first to the sixth row. | 10 | 1300 | 17500 | | 10 | 5000 | 17500 | | 10 | 2450 | 17500 | | 20 | 3000 | 10875 | | 20 | 3000 | 10875 | | 20 | 800 | 10875 | | 20 | 1100 | 10875 | | 20 | 2975 | 10875 | | 30 | 1500 | 9400 | | 30 | 950 | 9400 | | 30 | 1600 | 9400 | | 30 | 1250 | 9400 | | 30 | 1250 | 9400 | | 30 | 2850 | 9400 | +------------+------------+------------+Example 2: In non-Hive compatible mode, partition by department (deptno), calculate the sum of salary (sal), and sort the results. The function returns the cumulative sum from the first row to the current row of the current partition (rows with the same deptno). The commands are as follows:
-- Disable Hive compatible mode. set odps.sql.hive.compatible=false; -- Execute the following SQL command. select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- First row of the window. | 10 | 1300 | 2600 | -- Cumulative sum from the first to the second row. | 10 | 2450 | 5050 | -- Cumulative sum from the first to the third row. | 10 | 2450 | 7500 | | 10 | 5000 | 12500 | | 10 | 5000 | 17500 | | 20 | 800 | 800 | | 20 | 1100 | 1900 | | 20 | 2975 | 4875 | | 20 | 3000 | 7875 | | 20 | 3000 | 10875 | | 30 | 950 | 950 | | 30 | 1250 | 2200 | | 30 | 1250 | 3450 | | 30 | 1500 | 4950 | | 30 | 1600 | 6550 | | 30 | 2850 | 9400 | +------------+------------+------------+Example 3: In Hive compatible mode, partition by department (deptno), calculate the sum of salary (sal), and sort the results. The function returns the cumulative sum from the first row to the row with the same value as the current row (rows with the same sal have the same sum). The commands are as follows:
-- Enable Hive compatible mode. set odps.sql.hive.compatible=true; -- Execute the following SQL command. select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 2600 | -- First row of the window. Since the sal of the first and second rows are the same, the sum for the first row is the cumulative sum of the first two rows. | 10 | 1300 | 2600 | -- Cumulative sum from the first to the second row. | 10 | 2450 | 7500 | -- Since the sal of the third and fourth rows are the same, the sum for the third row is the cumulative sum of the firstfour rows. | 10 | 5000 | 17500 | | 10 | 5000 | 17500 | | 20 | 800 | 800 | | 20 | 1100 | 1900 | | 20 | 2975 | 4875 | | 20 | 3000 | 10875 | | 20 | 3000 | 10875 | | 30 | 950 | 950 | | 30 | 1250 | 3450 | | 30 | 1250 | 3450 | | 30 | 1500 | 4950 | | 30 | 1600 | 6550 | | 30 | 2850 | 9400 | +------------+------------+------------+
References
If the built-in functions do not meet your needs, MaxCompute also supports user-defined functions. For more information about user-defined functions, see MaxCompute UDF overview.
For more information about common issues with MaxCompute SQL, see:
For more information about common error codes and issues with MaxCompute built-in functions, see: