Aggregate functions are used to calculate a result for a group of data.
Except for count(), count_if(), max_by(), min_by(), and approx_distinct(), all other aggregate functions ignore null values and return null for no input rows or when all values are null. For example, the sum() function returns null rather than 0, and the avg() function does not include null values in the count. The coalesce() function converts null into 0.
arbitrary(x) → [same as input]
Returns an arbitrary non-null value of x.
array_agg(x) → array<[same as input]>
Returns an array created from the input x elements.
avg(x) → double
Returns the average (arithmetic mean) of all input values.
bool_and(boolean) → boolean
Returns TRUE if every input value is TRUE. Otherwise, this function returns FALSE.
bool_or(boolean) → boolean
Returns TRUE if one input value is TRUE. Otherwise, this function returns FALSE.
checksum(x) → varbinary
Returns an order-insensitive checksum of the specified values.
count (*) → bigint
Returns the number of input rows.
count(x) → bigint
Returns the number of non-null input values.
count_if(x) → bigint
Returns the number of TRUE input values. This function is equivalent to count(CASE WHEN x THEN 1 END).
every(boolean) → boolean
The alias for bool_and().
geometric_mean(x) → double
Returns the geometric mean of all input values.
max_by(x, y) → [same as x]
Returns the first value of x associated with the maximum value of y over all associations.
max_by(x, y, n) → array<[same as x]>
Returns the first n values of x associated with the first n largest values of all input values of y in descending order.
min_by(x, y) → [same as x]
Returns the first value of x associated with the minimum value of y over all associations.
min_by(x, y, n) → array<[same as x]>
Returns the first n values of x associated with the first n smallest values of all input values of y in ascending order.
max(x) → [same as input]
Returns the maximum value of all input values.
max(x, n) → array<[same as x]>
Returns the first n largest values of all input values of x.
min(x) → [same as input]
Returns the minimum value of all input values.
min(x, n) → array<[same as x]>
Returns the first n smallest values of all input values of x.
sum(x) → [same as input]
Returns the sum of all input values.
Bitwise aggregate functions
bitwise_and_agg(x) → bigint
Returns the bitwise AND of all input values in two's complement representation.
bitwise_or_agg(x) → bigint
Returns the bitwise OR of all input values in two's complement representation.
Map aggregate functions
histogram(x) → map<K,bigint>
Returns a map that contains the number of times each input value occurs.
map_agg(key, value) → map<K,V>
Returns a map created from the input key-value pairs.
map_union(x<K, V>)
→ map<K,V>
Returns the union of all input maps. If a key is found in multiple input maps, the value of the key in the resulting map comes from an arbitrary input map.
multimap_agg(key, value) → map<K,array<V>>
Returns a multimap generated from the input key-value pairs. Each key can be related to multiple values.
Approximate aggregate functions
approx_distinct(x) → bigint
Returns the approximate number of distinct input values. This function provides an approximate calculation of count(DISTINCT x). If all input values are null, 0 is returned.
This function provides a standard error of 2.3%, which is the standard deviation of the approximately normal error distribution over all possible sets. It does not ensure an upper threshold on the error for a specific input set.
approx_distinct(x, e) → bigint
Returns the approximate number of distinct input values. This function provides an approximate calculation of count(DISTINCT x). If all input values are null, 0 is returned.
This function produces a standard error of no more than e, which is the standard deviation of the approximately normal error distribution over all possible sets. It does not ensure an upper threshold on the error for a specific input set. The implementation of this function requires that e be in the range of [0.01150, 0.26000].
approx_percentile(x, percentage) → [same as x]
Returns the approximate percentile for all input values of x at the specified percentage. The value of percentage must range from 0 to 1 and must be constant for all input rows. Example: approx_percentile(x, 0.5).
approx_percentile(x, percentages) → array<[same as x]>
Returns the approximate percentile for all input values of x at each of the specified percentages. Each element of the percentages array must range from 0 to 1, and the array must be constant for all input rows.
approx_percentile(x, w, percentage) → [same as x]
Returns the approximate weighted percentile for all input values of x by using the per-item weight w at the percentage. The weight must contain at least one integer value. It is effectively a replication count for the value x in the percentile set. The value of percentage must range from 0 to 1 and must be constant for all input rows.
approx_percentile(x, w, percentage, accuracy) → [same as x]
Returns the approximate weighted percentile for all input values of x by using the per-item weight w at the percentage, with a maximum rank error of accuracy. The weight must contain at least one integer value. It is effectively a replication count for the value x in the percentile set. The value of percentage must range from 0 to 1 and must be constant for all input rows. The value of accuracy must be a constant between 0 and 1 for all input rows.
approx_percentile(x, w, percentages) → array<[same as x]>
Returns the approximate weighted percentile for all input values of x by using the per-item weight w at each of the given percentages specified in the array. The weight must contain at least one integer value. It is effectively a replication count for the value x in the percentile set. Each element of the percentages array must range from 0 to 1, and the array must be constant for all input rows.
numeric_histogram(buckets, value, weight) → map<double, double>
Computes the approximate histogram based on the number of buckets for all values with a per-item weight of weight. The following algorithm applies:
Yael Ben-Haim and Elad Tom-Tov, "A streaming parallel decision tree algorithm", J.
Machine Learning Research 11 (2010), pp. 849--872.
The value of buckets must be of the BIGINT type. The values of value and weight must be numeric.
numeric_histogram(buckets, value) → map<double, double>
Computes the approximate histogram based on the number of buckets for all values. This function is equivalent to the variant of numeric_histogram() that takes a weight, with a per-item weight of 1.
Statistical aggregate functions
corr(y, x) → double
Returns the correlation coefficient of input values.
covar_pop(y, x) → double
Returns the population covariance of input values.
covar_samp(y, x) → double
Returns the sample covariance of input values.
regr_intercept(y, x) → double
Returns the linear regression intercept of input values. y is the dependent value. x is the independent value.
regr_slope(y, x) → double
Returns the linear regression slope of input values. y is the dependent value. x is the independent value.
stddev(x) → double
The alias for stddev_samp().
stddev_pop(x) → double
Returns the population standard deviation of all input values.
stddev_samp(x) → double
Returns the sample standard deviation of all input values.
variance(x) → double
The alias for var_samp().
var_pop(x) → double
Returns the population variance of all input values.
var_samp(x) → double
Returns the sample variance of all input values.
Characteristic aggregate functions
group_concat
A string aggregate function.
Syntax: GROUP_CONCAT(expr [,expr …])
By default, fields are separated by commas (,).