Learn the syntax of window functions and how to use them in SLS analytic queries.
Introduction
Unlike aggregate functions that collapse rows into a single result, window functions compute a result for each row based on a related set of rows. Window functions have three core elements: partition, order, and frame (Window Function Concepts and Syntax).
function over (
[partition by partition_expression]
[order by order_expression]
[frame]
)
-
Partition: The
partition byclause divides rows into partitions. If this clause is omitted, the entire result set is treated as a single partition. -
Order: The
order byclause sorts the rows within each partition.NoteUsing the
order byclause on columns with duplicate values results in a non-deterministic row order. To ensure a consistent sort order, specify multiple columns. For example,order by request_time, request_method. -
Frame: Restricts rows within a partition. Cannot be used with ranking functions. Syntax:
{ rows | range} { frame_start | frame_between }. Example:range between unbounded preceding and unbounded following. Window Function Frame Specification.
Functions
|
Category |
Function |
Syntax |
Description |
SQL |
SPL |
|
Aggregate functions |
N/A |
All aggregate functions can be used as window functions. |
√ |
× |
|
|
Ranking functions |
|
Calculates the cumulative distribution of a value within a partition, which is the fraction of rows with values less than or equal to the current row's value. The return value is in the range (0, 1]. |
√ |
× |
|
|
|
Calculates the rank of a value in a partition. Ties receive the same rank. Ranks are consecutive. For example, if two rows are ranked 1, the next rank is 2. |
√ |
× |
||
|
ntile(n) |
Divides the ordered rows in a partition into a specified number of groups, n. |
√ |
× |
||
|
|
Calculates the percentage rank of each row within a partition. |
√ |
× |
||
|
|
Calculates the rank of a value in a partition. Ties receive the same rank. Ranks are not consecutive. For example, if two rows are ranked 1, the next rank is 3. |
√ |
× |
||
|
|
Assigns a unique, sequential integer to each row in a partition, starting from 1. For example, three rows with the same value are ranked 1, 2, and 3. |
√ |
× |
||
|
Offset functions |
first_value(x) |
Returns the value of x from the first row of a partition. |
√ |
× |
|
|
last_value(x) |
Returns the value of x from the last row of a partition. |
√ |
× |
||
|
lag(x, offset, default_value) |
Returns the value of the row that is offset rows above the current row in the window partition. If the row does not exist, default_value is returned. |
√ |
× |
||
|
lead(x, offset, default_value) |
Returns the value of the row that is offset rows after the current row in the window partition. If the row does not exist, it returns default_value. |
√ |
× |
||
|
nth_value(x, offset) |
Returns the value of the row at the offset position in the window partition. |
√ |
× |
Aggregate functions
All aggregate functions can serve as window functions. The following example uses sum().
Syntax
sum() over (
[partition by partition_expression]
[order by order_expression]
[frame]
)
Parameters
|
Parameter |
Description |
|
partition by partition_expression |
Divides rows into partitions based on the specified partition expression. |
|
order by order_expression |
Sorts rows within each partition based on the specified order expression. |
|
frame |
Window frame, such as |
Return value type
double
Examples
Calculates each employee's salary as a proportion of their department total.
-
Query statement
* | SELECT department, staff_name, salary, round ( salary * 1.0 / sum(salary) over(partition by department), 3) AS salary_percentage -
Query results: The results show data for four employees in the
devdepartment and three in theMarketingdepartment. Thesalary_percentagecolumn shows each employee's salary as a fraction of their department total.
Cume_dist function
Calculates the cumulative distribution of a value within a partition: the ratio of rows with values less than or equal to the current row's value to total rows. Returns a value in the range (0, 1].
Syntax
cume_dist() over (
[partition by partition_expression]
[order by order_expression]
)
Parameters
|
Parameter |
Description |
|
partition by partition_expression |
Partitions the rows based on the specified partition expression. |
|
order by order_expression |
Sorts the rows within each partition based on the specified order expression. |
Return value type
double
Examples
Calculates the cumulative distribution of object sizes in OSS bucket bucket00788.
-
Query statement
bucket=bucket00788 | select object, object_size, cume_dist() over ( partition by object order by object_size ) as cume_dist from oss-log-store
Dense_rank function
Returns the rank of a value within a partition. Ties receive the same rank, and ranks are consecutive without gaps. For example, if two rows share rank 1, the next rank is 2.
Syntax
dense_rank() over (
[partition by partition_expression]
[order by order_expression]
)
Parameters
|
Parameter |
Description |
|
partition by partition_expression |
Divides rows into partitions based on the partition expression. |
|
order by order_expression |
Sorts rows within each partition based on the order expression. |
Return value type
bigint
Examples
Calculates salary rank by department.
-
Query and analysis statement
* | select department, staff_name, salary, dense_rank() over( partition by department order by salary desc ) as salary_rank order by department, salary_rank -
Query and analysis results: In Marketing, Blan Stark and Smith share rank 1 (salary 9000); Achilles is rank 2 (8000). In dev, Rob is rank 1 (9000), Blan rank 2 (8500), Sansa rank 3 (8000).
Ntile function
Divides ordered rows in a partition into a specified number of groups.
Syntax
ntile(n) over (
[partition by partition_expression]
[order by order_expression]
)
Parameters
|
Parameter |
Description |
|
n |
The number of groups to divide the rows into. |
|
partition by partition_expression |
Divides the rows into partitions using the partition expression. |
|
order by order_expression |
Sorts the rows within each partition by the order expression. |
Return value type
bigint
Examples
Divides data for a specified object into three groups.
-
Query statement
object=245-da918c.model | select object, object_size, ntile(3) over ( partition by object order by object_size ) as ntile from oss-log-store -
Query and analysis results: The query returns nine rows. The values for
object_size, in ascending order, are 3396, 3701, 3750, 3757, 3914, 3918, 7440, 7490, and 7521. The corresponding values in thentilecolumn are 1, 1, 1, 2, 2, 2, 3, 3, and 3. As a result,ntile(3)divides the nine rows into three equal groups of three.
percent_rank function
Calculates the percent rank of each row in a partition. Formula: (rank - 1) / (total_rows - 1), where rank is the current row's rank and total_rows is the partition size.
Syntax
percent_rank() over (
[partition by partition_expression]
[order by order_expression]
)
Parameters
|
Parameter |
Description |
|
partition by partition_expression |
Partitions the rows based on the given expression. |
|
order by order_expression |
Sorts the rows within each partition by the given expression. |
Return value type
double
Examples
Calculates the percent rank of an OSS object by size.
-
Query statement
object=245-da918c3e2dd9dc9cb4d9283b%2F555e2441b6a4c7f094099a6dba8e7a5f.model| select object, object_size, percent_rank() over ( partition by object order by object_size ) as ntile FROM oss-log-store -
Query and analysis results: Six rows sorted by
object_size. Thentilecolumn shows percent rank values distributed evenly from 0.0 to 1.0.
Rank function
Returns the rank of each row within a partition. Ties receive the same rank, creating gaps in the sequence. For example, if two rows share rank 1, the next rank is 3.
Syntax
rank() over (
[partition by partition_expression]
[order by order_expression]
)
Parameters
|
Parameter |
Description |
|
partition by partition_expression |
Divides rows into partitions based on the specified partition expression. |
|
order by order_expression |
Sorts rows within each partition based on the specified order expression. |
Return value type
bigint
Examples
Ranks employees by salary within each department.
-
Query and analysis statement
* | select department, staff_name, salary, rank() over( partition by department order by salary desc ) as salary_rank order by department, salary_rank -
Query and analysis results: In Marketing, two employees share rank 1 (salary 9000). The next rank is 3, demonstrating how ties create gaps.
row_number function
Assigns a unique sequential integer to each row within a partition, starting from 1.
Syntax
row_number() over (
[partition by partition_expression]
[order by order_expression]
)
Parameters
|
Parameter |
Description |
|
partition by partition_expression |
Divides the rows into partitions based on the specified partition expression. |
|
order by order_expression |
Sorts the rows within each partition based on the specified order expression. |
Return value type
bigint.
Examples
Ranks employees by salary within each department.
-
Query
* | select department, staff_name, salary, row_number() over( partition by department order by salary desc ) as salary_rank order by department, salary_rank -
Six rows: In Marketing, Blan Stark (9000) is row 1, Smith (9000) row 2, Achilles (8000) row 3. In dev, Rob (9000) is row 1, Blan (8500) row 2, Sansa (8000) row 3.
First_value function
Returns the value of the first row in each partition.
Syntax
first_value(x) over (
[partition by partition_expression]
[order by order_expression]
[frame]
)
Parameters
|
Parameter |
Description |
|
x |
The column name. It can be of any data type. |
|
partition by partition_expression |
Divides rows into window partitions based on the partition expression. |
|
order by order_expression |
Sorts the rows within each window partition based on the order expression. |
|
frame |
Specifies the window frame, a subset of the current window partition. For example, |
Return value type
Returns the same data type as x.
Examples
Returns the minimum size of each object in an OSS bucket.
-
Query and analysis statement
bucket :bucket90 | select object, object_size, first_value(object_size) over ( partition by object order by object_size range between unbounded preceding and unbounded following ) as first_value from oss-log-store -
Query and analysis results: Seven rows partitioned into three groups by object. Within each group, first_value equals the minimum object_size (the first row in ascending order).
Last_value function
Returns the value from the last row in a partition.
Syntax
last_value(x) over (
[partition by partition_expression]
[order by order_expression]
[frame]
)
Parameters
|
Parameter |
Description |
|
x |
The column name. It can be any data type. |
|
partition by partition_expression |
Divides the rows into partitions based on the specified partition expression. |
|
order by order_expression |
Sorts the rows within each partition based on the specified order expression. |
|
frame |
Specifies the window frame, which is a subset of the current partition. For example, |
Return value type
Same type as x.
Examples
Finds the maximum object size in the specified OSS bucket.
-
Query and analysis statement
bucket :bucket90 | select object, object_size, last_value(object_size) over ( partition by object order by object_size range between unbounded preceding and unbounded following ) as last_value from oss-log-store -
The query returns three columns: object, object_size, and last_value. For
245-da918c.model, seven rows showobject_sizefrom 2383 to 6936, withlast_valueof 6936 for all rows. Fordashboard%2F2020%2F05%2F20%2F16%2F47.csv, two rows haveobject_size2435 and 2603, withlast_valueof 2603.
Lag function
Returns the value from a row at a specified offset preceding the current row in a partition.
Syntax
lag(x, offset, default_value) over (
[partition by partition_expression]
[order by order_expression]
[frame]
)
Parameters
|
Parameter |
Description |
|
x |
A column or expression of any data type. |
|
offset |
The offset, which specifies the number of rows preceding the current row. If you set offset to 0, the function returns the value of the current row. |
|
default_value |
Returns default_value if the specified offset row does not exist. |
|
partition by partition_expression |
Divides the rows into window partitions based on the specified partition expression. |
|
order by order_expression |
Sorts the rows within each window partition based on the specified order expression. |
|
frame |
Specifies the window frame, a subset of the current window partition. For example, |
Return value type
Same type as x.
Examples
Calculates daily UVs and day-over-day growth rate.
-
Query statement
* | select day, UV, UV * 1.0 /(lag(UV, 1, 0) over()) as diff_percentage from ( select approx_distinct(client_ip) as UV, date_trunc('day', __time__) as day from log group by day order by day asc ) -
Query and analysis results: Eight records from August 2-9, 2021. The first row's diff_percentage is Infinity because
lagreturns the default value 0, causing division by zero. Subsequent rows show the ratio of current-day to previous-day UVs, such as 2.098 and 0.976.
Lead function
Returns the value from a row at a specified offset after the current row in a partition.
Syntax
lead(x, offset, default_value) over (
[partition by partition_expression]
[order by order_expression]
[frame]
)
Parameters
|
Parameter |
Description |
|
x |
A column or expression of any data type. |
|
offset |
The number of rows after the current row. If offset is 0, the function returns the value of the current row. |
|
default_value |
If the specified offset row does not exist, it returns default_value. |
|
partition by partition_expression |
Divides the rows into window partitions based on the specified partition expression. |
|
order by order_expression |
Sorts the rows within each window partition based on the specified order expression. |
|
frame |
Specifies the window frame, for example, |
Return value type
Returns a value of the same data type as x.
Examples
Calculates the hourly UV ratio on August 26, 2021.
-
Query statement
* | select time, UV, UV * 1.0 /(lead(UV, 1, 0) over()) as diff_percentage from ( select approx_distinct(client_ip) as uv, date_trunc('hour', __time__) as time from log group by time order by time asc ) -
Query and analysis results: The query returns eight rows, showing the hourly unique visitors (
UV) and the ratio of the current hour's UVs to the next hour's UVs (diff_percentage) from 00:00 to 07:00 on August 26, 2021.
nth_value function
Returns the value from the offset-th row in a partition.
Syntax
nth_value(x, offset) over (
[partition by partition_expression]
[order by order_expression]
[frame]
)
Parameters
|
Parameter |
Description |
|
x |
A column or expression. It can be of any data type. |
|
offset |
The row offset. It must be a positive integer. |
|
partition by partition_expression |
Divides the rows into partitions based on the partition expression. |
|
order by order_expression |
Sorts the rows within each partition based on the order expression. |
|
frame |
Specifies the window frame, which is a subset of the current partition. For example, |
Return value type
Same as the data type of x.
Examples
Returns the employee with the second-highest salary per department.
-
Query statement
* | select department, staff_name, salary, nth_value(staff_name, 2) over( partition by department order by salary desc range between unbounded preceding and unbounded following ) as second_highest_salary from log -
Query results: The second_highest_salary column shows Blan (8500) for dev and San (7000) for Marketing across all rows in each partition.