Aggregate functions have the following features:

  • Input a dataset.
  • Output a single computation result.

A majority of aggregate functions ignore null values during computation and return null when no input is made or all values are null, but there are a few exceptions:

  • count
  • count_if
  • max_by
  • min_by
  • approx_distinct

Basic aggregate functions

Function Syntax Description
arbitrary arbitrary(x) → [same as input] Returns an arbitrary non-null value of x.
array_agg array_agg(x) → array<[same as input]> Returns an array created from the input elements.
avg avg(x) → double Returns the average (arithmetic mean) of all input values.
avg avg(time interval type) → time interval type Returns the average interval length of all input time series.
bool_and bool_and(boolean) → boolean Returns TRUE if every input value is TRUE. Otherwise, FALSE is returned.
bool_or bool_or(boolean) → boolean Returns TRUE if any of the input values is True. Otherwise, FALSE is returned.
checksum checksum(x) → varbinary Returns an order-insensitive checksum of x.
count count(*) → bigint Returns the number of rows.
count count(x) → bigint Returns the number of non-null elements.
count_if count_if(x) → bigint Returns the number of TRUE elements of x. This function is equivalent to count(CASE WHEN x THEN 1 END).
every every(boolean) → boolean This is an alias for bool_and.
geometric_mean geometric_mean(x) → double Returns the geometric mean of x.
max_by max_by(x, y) → [same as x] Returns the value of x associated with the maximum value of y over all input values.
max_by max_by(x, y, n) → array<[same as x]> Returns an array of x associated with the n largest of all input values of y.
min_by min_by(x, y) → [same as x] Returns the value of x associated with the minimum value of y over all input values.
min_by min_by(x, y, n) → array<[same as x]> Returns an array of x associated with the n smallest of all input values of y.
max max(x) → [same as input] Returns the maximum value among all input values.
max max(x, n) → array<[same as x]> Returns the n largest values of all input values of x.
min min(x) → [same as input] Returns the minimum value among all input values.
min min(x, n) → array<[same as x]> Returns the n smallest values of all input values of x.
sum sum(x) → [same as input] Returns the sum of all input values of x.

Bitwise aggregate functions

For bitwise aggregate functions, see bitwise_and_agg and bitwise_or_agg functions described in Bitwise operators.

Map aggregate functions

Function Syntax Description
histogram histogram(x) → map<K,bigint> Creates a statistics histogram.
map_agg map_agg(key, value) → map<K,V> Creates a variable of the MAP type.
map_union map_union(x<K, V>) → map<K,V> Returns the union of all the input maps. If a key is found in multiple input maps, the key value in the resulting map comes from an arbitrary input map.
multimap_agg multimap_agg(key, value) → map<K,array> Creates a variable of the MAP type with multiple mappings.

Approximate aggregate functions

Function Syntax Description
approx_distinct approx_distinct(x, [e]) → bigint Returns the approximate number of rows that contain distinct input values. This function provides an approximation of count(DISTINCT x). 0 is returned if all input values are null. This function produces a standard error of no more than e, which is the standard deviation of the (approximately normal) error distribution over all possible sets. It is optional and set to 2.3% by default. The current implementation of this function requires that e be in the range [0.01150, 0.26000]. It does not guarantee an upper limit on the error for any specific input set.
approx_percentile approx_percentile(x, percentage) → [same as x] Returns the approximate percentile for all input values of x at the given percentage.
approx_percentile approx_percentile(x, percentages) → array<[same as x]> Similar to the preceding function, percentages is an array and returns constant values for all input rows.
approx_percentile approx_percentile(x, w, percentage) → [same as x] Similar to the preceding function, w is the weighted value of x.
approx_percentile approx_percentile(x, w, percentage, accuracy) → [same as x] Similar to the preceding function, accuracy is the upper limit of the estimation accuracy, and the value must be in the range [0, 1].
approx_percentile approx_percentile(x, w, percentages) → array<[same as x]> Similar to the preceding function, percentages is an array and returns constant values for all input rows.
numeric_histogram numeric_histogram(buckets, value, [weight]) → map<double, double> Computes an approximate histogram with up to a given number of buckets. buckets must be a BIGINT. value and weight must be numeric. Weight is optional and set to 1 by default.

Statistical aggregate functions

Function Syntax Description
corr corr(y, x) → double Returns the correlation coefficient.
covar_pop covar_pop(y, x) → double Returns the population covariance of input values.
covar_samp covar_samp(y, x) → double Returns the sample covariance of input values.
kurtosis kurtosis(x) → double Returns the excess kurtosis of all input values. Use the following expression for unbiased estimation:
kurtosis(x) =
              n(n+1)/((n-1)(n-2)(n-3))sum[(x_i-mean)^4]/sttdev(x)^4-3(n-1)^2/((n-2)(n-3))
regr_intercept regr_intercept(y, x) → double Returns the y-intercept of the linear regression line. y is the dependent variable. x is the independent variable.
regr_slope regr_slope(y, x) → double Returns the linear regression slope of input values. y is the dependent variable. X is the independent variable.
skewness skewness(x) → double Returns the skewness of all input values.
sttdev_pop sttdev_pop(x) → double Returns the population standard deviation of all input values.
sttdev_samp sttdev_samp(x) → double Returns the sample standard deviation of all input values.
sttdev sttdev(x) → double This is an alias for sttdev_samp.
var_pop var_pop(x) → double Returns the population variance of all input values.
var_samp var_samp(x) → double Returns the sample variance of all input values.
variance variance(x) → double This is an alias for var_samp.