An aggregate function calculates a single value from a set of values. This topic describes the basic syntax of aggregate functions and provides usage examples.
Simple Log Service supports the following aggregate functions.
Parameter | Syntax | Description | SQL support | SPL support |
arbitrary(x) | Returns an arbitrary non-NULL value from x. | √ | × | |
avg(x) | Calculates the arithmetic mean of x. | √ | × | |
bitwise_and_agg(x) | Returns the bitwise AND of all values in x. | √ | × | |
bitwise_or_agg(x) | Returns the bitwise OR of all values in x. | √ | × | |
bool_and(boolean expression) | Returns true if all logs meet the condition. The | √ | × | |
bool_or(boolean expression) | Returns true if at least one log meets the condition. | √ | × | |
checksum(x) | Calculates the checksum of x. | √ | × | |
count(*) | Returns the total number of logs. | √ | × | |
count(1) | Returns the total number of logs, which is equivalent to | √ | × | |
count(x) | Returns the number of logs where the value of x is not NULL. | √ | × | |
count_if(boolean expression) | Returns the number of logs that meet the specified condition. | √ | × | |
every(boolean expression) | Returns true if all logs meet the condition. The | √ | × | |
geometric_mean(x) | Calculates the geometric mean of x. | √ | × | |
kurtosis(x) | Calculates the kurtosis of x. | √ | × | |
map_union(x) | Returns the union of maps in a column. If a key exists in multiple maps, an arbitrary value for that key is chosen. | √ | × | |
max(x) | Returns the maximum value in x. | √ | × | |
max(x, n) | Returns the n largest values in x as an array. | √ | × | |
max_by(x, y) | Returns the value of x associated with the maximum value of y. | √ | × | |
max_by(x, y, n) | Returns an array of the x values corresponding to the n largest values of y. | √ | × | |
min(x) | Returns the minimum value in x. | √ | × | |
min(x, n) | Returns the n smallest values in x as an array. | √ | × | |
min_by(x, y) | Returns the value of x associated with the minimum value of y. | √ | × | |
min_by(x, y, n) | Returns an array of the x values corresponding to the n smallest values of y. | √ | × | |
skewness(x) | Calculates the skewness of x. | √ | × | |
sum(x) | Calculates the sum of x. | √ | × |
Arbitrary function
The arbitrary function returns an arbitrary non-null value from the expression x.
Syntax
arbitrary(x)Parameters
Parameter | Description |
x | An expression of any data type. |
Return type
Matches the data type of the input expression.
Example
Returns an arbitrary non-null value from the request_method field.
Query statement (Debug)
* | SELECT arbitrary(request_method) AS request_methodThe request_method column returns a value such as
GET.
Avg function
The avg function returns the average of the values in x.
Syntax
avg(x)Parameters
Parameter | Description |
x | The column or expression to average. The values must be of a double, bigint, decimal, or real type. |
Return value type
A value of the double type.
Examples
Returns projects with an average latency greater than 1,000 microseconds.
Query statement (Debug)
method: PostLogstoreLogs | SELECT avg(latency) AS avg_latency, Project GROUP BY Project HAVING avg_latency > 1000The query returns a record with an avg_latency of
3223.4162679425835and a Project name that starts withdatalab-14.
bitwise_and_agg function
The bitwise_and_agg function returns the bitwise AND of all values in x.
Syntax
bitwise_and_agg(x)Parameters
Parameter | Description |
x | A bigint value. |
Return value type
A bigint value.
Examples
Performs a bitwise AND operation on all values in the status field.
Query statement (Debug)
* | SELECT bitwise_and_agg(status)The query returns a single column named
_col0with a value of0.
Bitwise_or_agg function
The bitwise_or_agg function returns the result of the bitwise OR operation on all values in x.
Syntax
bitwise_or_agg(x)Parameters
Parameter | Description |
x | A bigint value. |
Return type
A bigint value.
Example
Perform a bitwise OR operation on all values of the request_time field.
Query and analysis statement (Debug)
* | SELECT bitwise_or_agg(request_length)The query and analysis results are returned as a single column
_col0, and the value is16383.
bool_and function
The bool_and function returns true if a boolean expression is true for all logs. This function is equivalent to the every function.
Syntax
bool_and(boolean expression)Parameters
Parameter | Description |
boolean expression | The boolean expression to evaluate. |
Return value type
boolean
Example
This example checks if the request_time value for all logs is less than 100.
Query and analysis statement (Debug)
* | SELECT bool_and(request_time < 100)The query and analysis result is
true, which indicates that therequest_timevalue of all logs is less than 100.
bool_or function
The bool_or function returns true if a boolean expression is true for at least one log.
Syntax
bool_or(boolean expression)Parameters
Parameter | Description |
boolean expression | A boolean expression. |
Return value type
boolean
Examples
To check whether the request_time for any log is less than 20:
Query and analysis statement (Debug)
* | SELECT bool_or(request_time < 20)The query and analysis results return
truefor the_col0column, which indicates that the log contains a record where request_time is less than 20.
Checksum function
The checksum function calculates the checksum of x.
Syntax
checksum(x)Parameters
Parameter | Description |
x | An expression of any data type. |
Return value type
A Base64-encoded string.
Example
Query statement (Debug)
* | SELECT checksum(request_method) AS request_methodThe checksum of the request_method field is
NDXFdgnd8GE=.
Count function
The count function counts logs.
Syntax
Counts the total number of logs.
count(*)Returns the total number of logs. This is equivalent to
count(*).count(1)Counts logs where x is not NULL.
count(x)
Parameters
Parameter | Description |
x | An expression of any data type. |
Return value type
An integer.
Examples
Example 1: Count website page views.
Query statement (Debug)
* | SELECT count(*) AS PVQuery and analysis results: The result table contains a PV column with the value
2009.
Example 2: Count logs where the
request_methodfield is not null.Query statement (Debug)
* | SELECT count(request_method) AS countQuery and analysis results: The result table contains a count column with the value
1954.
count_if function
The count_if function counts logs that meet a specified condition.
Syntax
count_if(boolean expression)Parameters
Parameter | Description |
boolean expression | The condition to evaluate. |
Return type
The count of logs that meet the condition.
Example
Count the number of log entries where the value of the request_uri field ends with file-0.
Query and analysis statement (debug)
* | SELECT count_if(request_uri like '%file-0') AS countQuery and analysis results
geometric_mean function
The geometric_mean function calculates the geometric mean of x.
Syntax
geometric_mean(x)Parameters
Parameter | Description |
x | A value of type double, bigint, or real. |
Return value type
Returns a double value.
Example
Calculate the geometric mean of request time.
Query and analysis statement (Debug)
* | SELECT geometric_mean(request_time) AS timeThe query and analysis results show a time column with a value of
39.443123208882308.
Every function
The every function returns true if a boolean expression evaluates to true for all logs. This function is equivalent to the bool_and function.
Syntax
every(boolean expression)Parameters
Parameter | Description |
boolean expression | An expression that returns a boolean value for each log. |
Return value type
Returns a boolean value.
Example
This example checks if the request_time value in every log is less than 100.
Query and analysis statement (Debug)
* | SELECT every(request_time < 100)Query and analysis results
Kurtosis function
Calculates the kurtosis of x.
Syntax
kurtosis(x)Parameters
Parameter | Description |
x | The value must be a double or bigint. |
Return type
double
Example
Calculates the kurtosis of the request_time field.
Query statement (Debug)
*| SELECT kurtosis(request_time)Query and analysis results
kurtosis -3.0006998250171186
map_union function
The map_union function returns the union of all maps in a column. If the same key exists in multiple input maps, the value for that key in the resulting map is non-deterministically chosen from one of the inputs.
Syntax
map_union(x)Parameters
Parameter | Description |
x | A column of the map type. |
Return value type
Returns a single value of the map type.
Example
This example aggregates map values from the etl_context field into a single map.
Sample field
etl_context: { project:"datalab-148****6461-cn-chengdu" logstore:"internal-etl-log" consumer_group:"etl-83****4d1965" consumer:"etl-b2d40ed****c8d6-291294" shard_id:"0" }Query statement
* | SELECT map_union( try_cast(json_parse(etl_context) AS map(varchar, varchar)) )Query and analysis results: After the query statement is executed, the result column is named
_col0, and the value is a map of key-value pairs parsed from etl_context, containing fields such asconsumer_group,shard_id,project,logstore, andconsumer, and their corresponding values.
Max function
The max function returns the maximum value of x.
Syntax
Returns the maximum value of x.
max(x)Returns an array of the n largest values of x.
max(x, n)
Parameters
Parameter | Description |
x | Any expression. |
n | A positive integer. |
Return value type
The same data type as the input expression.
Examples
Example 1: Find the maximum request time.
Query statement (Debug)
* | SELECT max(request_time) AS max_request_timeQuery and analysis results: The value of
max_request_timeis80.0.
Example 2: Find the top 10 request times.
Query statement (Debug)
* | SELECT max(request_time, 10) AS "top 10"Query and analysis results: The query returns
[80.0,80.0,80.0,80.0,80.0,79.0,79.0,79.0,78.0,78.0]for the top 10 column. The array contains the 10 largestrequest_timevalues in descending order.
Max_by function
The max_by function supports the following two syntaxes.
Syntax
Returns the value of x for the maximum value of y.
max_by(x, y)Returns an array of x values for the n largest values of y.
max_by(x, y, n)
Parameters
Parameter | Description |
x | An expression of any data type. |
y | An expression of any data type. |
n | A positive integer. |
Return value type
Same as the data type of x.
Examples
Example 1: Find the time of the order with the highest pretax amount.
Query and analysis statement
* | SELECT max_by(UsageEndTime, PretaxAmount) AS timeQuery and analysis results: The time column returns
1625731025.
Example 2: Find the request methods for the three requests with the longest request times.
Query and analysis statement (debug)
* | SELECT max_by(request_method, request_time, 3) AS methodQuery and analysis results: The value of method is
["POST","POST","POST"], which indicates that the methods for the three most recent requests sorted byrequest_timeare all POST.
min function
The min function returns the minimum value in x.
Syntax
Returns the minimum value in x.
min(x)Returns the n smallest values in x as an array.
min(x,n)
Parameters
Parameter | Description |
x | An expression of any data type. |
n | A positive integer. |
Return value type
Same as the data type of the input expression.
Examples
Example 1: Find the minimum request time.
Query and analysis statement (Debug)
* | SELECT min(request_time) AS min_request_timeThe query and analysis results return a column named min_request_time with the value
10.0.
Example 2: Find the 10 smallest request times.
Query and analysis statement (Debug)
* | SELECT min(request_time, 10)The query and analysis results return a column named
_col0with the value[10.0,10.0,10.0,10.0,10.0,11.0,12.0,12.0,13.0,13.0].
min_by function
The min_by function has the following two syntaxes.
Syntax
Returns the value of x for the minimum value of y.
min_by(x, y)Returns an array of x values for the n smallest values of y.
min_by(x, y, n)
Parameters
Parameter | Description |
x | An expression of any data type. |
y | An expression of any data type. |
n | A positive integer. |
Return value type
The same data type as the x parameter.
Examples
Example 1: Find the request method for the request with the minimum request time.
Query and analysis statement (Debug)
* | SELECT min_by(request_method, request_time) AS methodQuery and analysis results: The method column returns
GET.
Example 2: Find the request methods for the three requests with the smallest request times.
Query and analysis statement (Debug)
* | SELECT min_by(request_method, request_time, 3) AS methodQuery and analysis results.
Skewness function
The skewness function calculates the skewness of the expression x.
Syntax
skewness(x)Parameters
Parameter | Description |
x | An expression that evaluates to a double or bigint type. |
Return value type
double
Example
Calculate the skewness of the values in the request_time field.
Query and analysis statement (Debug)
*| SELECT skewness(request_time) AS skewnessQuery and analysis results: The skewness value is
0.00042944441602043965.
Sum function
The sum function sums the values in the expression x.
Syntax
sum(x)Parameters
Parameter | Description |
x | An expression that evaluates to a double, bigint, decimal, or real type. |
Return value type
Same as the data type of the input expression.
Examples
Calculate the daily traffic of a website.
Query and analysis statement (debug)
* | SELECT date_trunc('day', __time__) AS time, sum(body_bytes_sent) AS body_bytes_sent GROUP BY time ORDER BY timeQuery and analysis results: After the query and analysis statement is executed, a statistical results table aggregated by day is returned, containing the sum of
body_bytes_sentfor a total of five days from 2023-09-19 to 2023-09-23.