General-purpose aggregate functions supported by Hologres - Hologres

Hologres is compatible with PostgreSQL and supports the standard PostgreSQL syntax for data development. This topic describes the general-purpose aggregate functions supported by Hologres.

Hologres supports a subset of the functions available in PostgreSQL. For more information about how to use these functions, see Aggregate Functions. The following table lists the general-purpose aggregate functions that Hologres supports.

Function	Description
APPROX_COUNT_DISTINCT	Calculates the number of rows that have distinct values in a column. The result of this function is an approximate value.
APPROX_PERCENTILE	Calculates the approximate percentile of a numeric column. Note Hologres supports APPROX_PERCENTILE from V3.1 and later.
ARRAY_AGG	Aggregates the values of an expression into an array.
AVG	Calculates the average of the non-NULL values returned by a BIGINT, FLOAT8, FLOAT4, or INT expression.
BIT_AND	Performs bitwise AND operations on the non-NULL values returned by a BIGINT or an INT expression.
BIT_OR	Performs bitwise OR operations on the non-NULL values returned by a BIGINT or an INT expression.
BOOL_AND	Checks whether all values returned by a BOOLEAN expression are TRUE.
BOOL_OR	Checks whether any value returned by a BOOLEAN expression is TRUE.
CORR	Calculates the correlation coefficient between two variables.
COUNT	Gets the number of rows in a specified table or the number of input rows from a BIGINT or NUMERIC expression.
COVAR_POP	Calculates the population covariance.
COVAR_SAMP	Calculates the sample covariance.
EVERY	Checks whether all values of a BOOLEAN expression are TRUE.
MAX	Gets the maximum value from a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
MAX_BY	Compares the values in a column (column y) to find the maximum value, and returns the value of another specified column (column x) from the same row.
MIN	Gets the minimum value from a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
MIN_BY	Compares the values in a column (column y) to find the minimum value, and returns the value of another specified column (column x) from the same row.
REGR_AVGX	Calculates the average of an independent variable (X).
REGR_AVGY	Calculates the average of a dependent variable (Y).
REGR_COUNT	Calculates the number of rows where neither input parameter is NULL.
REGR_R2	Calculates the square of the correlation coefficient.
REGR_INTERCEPT	Calculates the y-intercept of the least-squares-fit linear equation determined by the `(Y, X)` pairs.
REGR_SLOPE	Calculates the slope of the least-squares-fit linear equation.
REGR_SXX	Calculates the sum of squares of the independent variable (X), as `sum(X^2) - sum(X)^2/N`.
REGR_SXY	Calculates the sum of products of the independent variable (X) and dependent variable (Y), as `sum(XY) - sum(X) sum(Y)/N`.
SUM	Gets the sum of all values from a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
STDDEV	Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
STDDEV_POP	Calculates the population standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
STDDEV_SAMP	Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
STRING_AGG	Concatenates the non-NULL values returned by a TEXT expression into a string using a specified separator.
UNIQ	Calculates the number of rows that have distinct values in a column.
VARIANCE	Calculates the sample variance of the values returned by an INT or NUMERIC expression.
VAR_POP	Calculates the population variance of the values returned by a FLOAT8, INT, or NUMERIC expression.
VAR_SAMP	Calculates the sample variance of the values returned by an INT, FLOAT8, or NUMERIC expression.

Sample data

-- Create a table.
CREATE TABLE example_table(
    c1 INT,           
    c2 BOOLEAN,      
    c3 VARCHAR(10),  
    c4 DECIMAL(3, 1),
    c5 TEXT,
    c6 FLOAT8,
    c7 FLOAT8
);

-- Insert data.
INSERT INTO example_table (c1, c2, c3, c4, c5, c6, C7) VALUES
(1, true, 'a', 1.1, 3, 6, 7),
(2, false, 'b', 2.2, 4, 6, 2.6);

APPROX_COUNT_DISTINCT

Description: Calculates the number of rows that have distinct values in a column. This function returns a single, approximate value.
```
APPROX_COUNT_DISTINCT ( <column> )
```
Parameters
column: Required. Specifies the column for which you want to calculate the approximate number of distinct rows.
Usage notes
The APPROX_COUNT_DISTINCT function uses HyperLogLog cardinality estimation to perform an inexact COUNT DISTINCT calculation. An inexact COUNT DISTINCT calculation improves query performance, especially when the column has many discrete values. The average error rate is typically between 0.1% and 1%. This function is suitable for performance-sensitive scenarios where some degree of error is acceptable. You can adjust the error rate using the following parameter.
```
SET hg_experimental_approx_count_distinct_precision = 20;
```
- The value ranges from 12 to 20. The default value is 17.
- The precision parameter specifies the number of bits for bucketing in the HyperLogLog algorithm. A larger value indicates more buckets and higher theoretical precision.
- A higher precision value also increases the computation time and memory overhead. However, the overhead is much lower than that of an exact COUNT DISTINCT ( column ) statement. Therefore, we recommend that you use APPROX_COUNT_DISTINCT instead of COUNT DISTINCT ( column ).
- When the precision parameter is set to a value greater than 17, Hologres uses the HyperLogLog++ algorithm. This algorithm corrects the return value to further reduce and stabilize the error.
  For example, if you set hg_experimental_approx_count_distinct_precision to 20, the error rate can be reduced to a range of 0.01% to 0.2% in most cases.
Note
You can also perform an exact COUNT DISTINCT calculation using the COUNT DISTINCT ( column ) function. This function consumes more resources.

Example

The following example uses the ORDERS table from the TPC-H public dataset provided by MaxCompute. You can run the following SQL statements directly without any additional preparation.

-- Create a foreign table.
IMPORT FOREIGN SCHEMA public_data
LIMIT TO (ORDERS_10g)
FROM SERVER odps_server INTO public options (if_table_exist 'update');

-- Create an internal table.
BEGIN;
CREATE TABLE ORDERS (
    O_ORDERKEY bigint NOT NULL PRIMARY KEY,
    O_CUSTKEY int NOT NULL,
    O_ORDERSTATUS text NOT NULL,
    O_TOTALPRICE DECIMAL(12, 2) NOT NULL,
    O_ORDERDATE timestamptz NOT NULL,
    O_ORDERPRIORITY text NOT NULL,
    O_CLERK text NOT NULL,
    O_SHIPPRIORITY int NOT NULL,
    O_COMMENT text NOT NULL
);

CALL set_table_property ('ORDERS', 'segment_key', 'O_ORDERDATE');
CALL set_table_property ('ORDERS', 'distribution_key', 'O_ORDERKEY');
CALL set_table_property ('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
CALL set_table_property ('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');

COMMENT ON COLUMN ORDERS.O_ORDERKEY IS 'Order ID';
COMMENT ON COLUMN ORDERS.O_CUSTKEY IS 'Customer ID';
COMMENT ON COLUMN ORDERS.O_ORDERSTATUS IS 'Order status';
COMMENT ON COLUMN ORDERS.O_TOTALPRICE IS 'Total price';
COMMENT ON COLUMN ORDERS.O_ORDERDATE IS 'Order date';
COMMENT ON COLUMN ORDERS.O_ORDERPRIORITY IS 'Order priority';
COMMENT ON COLUMN ORDERS.O_CLERK IS 'Cashier';
COMMENT ON COLUMN ORDERS.O_SHIPPRIORITY IS 'Shipping priority';
COMMENT ON COLUMN ORDERS.O_COMMENT IS 'Comment';

COMMIT;

-- Import data to the internal table.
INSERT INTO ORDERS SELECT * FROM ORDERS_10g;

Calculate the approximate number of distinct rows in the O_CUSTKEY column.

SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;

The following result is returned.

approx_count_distinct
-----------------------
               1000422

Set the global precision to 20 and calculate the approximate number of distinct rows in the O_CUSTKEY column.

ALTER DATABASE dbname SET hg_experimental_approx_count_distinct_precision = 20;
SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;

The following result is returned.

approx_count_distinct
-----------------------
               1000422

Set the session-level precision to 20 and calculate the approximate number of distinct rows in the O_CUSTKEY column.

--Set the session-level precision to 20.
SET hg_experimental_approx_count_distinct_precision = 20;
SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;

The following result is returned.

 approx_count_distinct
-----------------------
                998854

APPROX_PERCENTILE

Note

Hologres supports APPROX_PERCENTILE from V3.1 and later.

Description: Calculates the approximate percentile of a numeric column. This function returns a single, approximate value.
```
APPROX_PERCENTILE (fraction) WITHIN GROUP (ORDER BY sort_expression)
```
Parameters
- fraction: A constant between 0 and 1. For example, 0.5 represents the median.
- sort_expression: Specifies the column for which to calculate the percentile. This column must be sortable.
Usage notes
The APPROX_PERCENTILE function uses a reservoir sampling algorithm. It samples a maximum of 8,192 elements and then calculates the percentile from these samples.

Example

The following example calculates the approximate count of distinct values in the O_CUSTKEY column.

SELECT
  APPROX_PERCENTILE(0.25) WITHIN GROUP (ORDER BY salary) AS q1,
  APPROX_PERCENTILE(0.5)  WITHIN GROUP (ORDER BY salary) AS median,
  APPROX_PERCENTILE(0.75) WITHIN GROUP (ORDER BY salary) AS q3
FROM employees;

ARRAY_AGG

Description: Concatenates the values of an expression into an array.
```
ARRAY_AGG(expression)
```
Parameters
expression: Required. An expression of any data type except JSON, JSONB, TIMETZ, INTERVAL, INET, OID, UUID, and array types.
Return value
Returns a value of the ARRAY type.

Example

SELECT ARRAY_AGG(c1) FROM example_table;

The following result is returned.

 array_agg
-----------
 {2,1}

AVG

Description: Calculates the average of the non-NULL values returned by a BIGINT, FLOAT8, FLOAT4, or INT expression.
```
AVG(expression)
```
Parameters
expression: Required. A BIGINT, FLOAT8, FLOAT4, or INT expression.
Return value
The return value is the same data type as the parameter.

Example

SELECT AVG(c7) FROM example_table;

The following result is returned.

   avg
----------
 1.500000

BIT_AND

Description: Performs bitwise AND operations on the non-NULL values returned by a BIGINT or INT expression.
```
BIT_AND(expression)
```
Parameters
expression: Required. A BIGINT or INT expression.
Return value
The return value is the same data type as the parameter.

Example

SELECT BIT_AND(c1) FROM example_table;

The result is as follows.

 bit_and
---------
       0

BIT_OR

Description: Performs bitwise OR operations on the non-NULL values returned by a BIGINT or INT expression.
```
BIT_OR(expression)
```
Parameters
expression: Required. A BIGINT or INT expression.
Return value
The return value is the same data type as the parameter.

Example

SELECT BIT_OR(c1) FROM example_table;

The following result is returned.

 bit_or
--------
      3

BOOL_AND

Description: Checks whether all values returned by a BOOLEAN expression are TRUE.
```
BOOL_AND(bool)
```
Parameters
bool: Required. A BOOLEAN expression.
Return value
Returns a value of the BOOLEAN type. This function returns TRUE (t) if all values returned by the expression are true, and FALSE (f) otherwise.

Example

SELECT BOOL_AND(c2) FROM example_table;

The following result is returned.

 bool_and
----------
 f

BOOL_OR

Description: Checks whether any value returned by a BOOLEAN expression is TRUE.
```
BOOL_OR(bool)
```
Parameters
bool: Required. A BOOLEAN expression.
Return value
Returns a value of the BOOLEAN type. This function returns TRUE (t) if any value returned by the expression is true, and FALSE (f) otherwise.

Example

SELECT BOOL_OR(c2) FROM example_table;

The following result is returned.

 bool_or
---------
 t

CORR

Description: Calculates the correlation coefficient between two variables.
```
CORR(Y, X)
```
Note
The parameters must be of the DOUBLE PRECISION, FLOAT, or FLOAT8 data type.
Parameters
Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
The return value is the same data type as the parameters.

Example

SELECT CORR(c6, c7) FROM example_table;

The following result is returned.

 corr
------

COUNT

Description: Retrieves the number of rows in a specified table or the number of input rows from a BIGINT or NUMERIC expression.
- Retrieves the number of rows in a specified table.
```
COUNT(*)
```
- Retrieves the number of input rows from a BIGINT or NUMERIC expression.
```
COUNT(expression)
```
  expression: Required. A non-NULL expression of the BIGINT or NUMERIC data type.
Return value
Returns a value of the BIGINT type.

Example

SELECT COUNT(*) FROM example_table;

The following result is returned.

 count
-------
     2

COVAR_POP

Description: Calculates the population covariance.
```
COVAR_POP(Y, X)
```
Parameters
Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
The return value is the same data type as the parameters.

Example

SELECT COVAR_POP(c6, c7) FROM example_table;

The following result is returned.

      covar_pop
---------------------
 3.5527136788005e-15

COVAR_SAMP

Description: Calculates the sample covariance.
```
COVAR_SAMP(Y, X)
```
Parameters
Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
The return value is the same data type as the parameters.

Example

SELECT COVAR_SAMP(c6, c7) FROM example_table;

The following result is returned.

     covar_samp
--------------------
 7.105427357601e-15

EVERY

Description: Checks whether all values of a BOOLEAN expression are TRUE.
```
EVERY(bool)
```
Parameters
bool: Required. A BOOLEAN expression.
Return value
Returns a value of the BOOLEAN type. This function returns TRUE (t) if all values of the expression are true, and FALSE (f) otherwise.

Example

SELECT EVERY(c2) FROM example_table;

The following result is returned.

 every
-------
 f

MAX

Description: Retrieves the maximum value from a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
```
MAX(expression)
```
Parameters
expression: Required. A BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
Return value
The return value is the same data type as the parameter.

Example

SELECT MAX(c1) FROM example_table;

The following result is returned.

 max
-----
   2

MAX_BY

Description: Compares the values in a column (column y) to find the maximum value and returns the value of another specified column (column x) from the corresponding row.
```
MAX_BY(x, y);
```
Note
Numeric types are compared based on their numeric value. Non-numeric types are compared based on their dictionary sort order.
Parameters
- y: Required. Specifies the column from which to retrieve the maximum value. Rows where column y is NULL are not included in the calculation.
- x: Required. Specifies the column from which to retrieve a value that corresponds to the maximum value in column y.
Usage notes
The MAX_BY function is supported only in Hologres V1.3.36 and later.
Note
If your instance version is earlier than V1.3.36, you can join the Hologres DingTalk group to request an instance upgrade. For more information, see How to obtain more online support?
Return value
- The function returns the value from column x that corresponds to the maximum value in column y.
- If all values in column y are NULL, the function returns NULL.
- If the MAX_BY function finds multiple rows with the maximum value in column y, it returns the maximum value from the corresponding values in column x.

Example

--Sample data
DROP TABLE IF EXISTS test;
CREATE TABLE IF NOT EXISTS test (
    id INT,
    name TEXT,
    cost INT
);
INSERT INTO test
    VALUES (1, 'a', 100), (1, 'aa', 200), (1, 'aaa', 300), (2, 'b', 150), (2, 'bb', 300), (3, 'c', 150), (3, 'cc', 50);

Query for the value in the name column that corresponds to the maximum value in the cost column. In this example, multiple maximum values exist.

SELECT max_by(name, cost) FROM test;

The following result is returned.

 max_by
--------
 aaa

MIN

Description: Retrieves the minimum value from a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
```
MIN(expression)
```
Parameters
expression: Required. A BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC type.
Return value
The return value is the same data type as the parameter.

Example

SELECT MIN(c1) FROM example_table;

The following result is returned.

 min
-----
   1

MIN_BY

Description: Compares the values in a column (column y) to find the minimum value and returns the value of another specified column (column x) from the corresponding row.
```
MIN_BY(x, y);
```
Note
Numeric types are compared based on their numeric value. Non-numeric types are compared based on their dictionary sort order.
Parameters
- y: Required. Specifies the column from which to retrieve the minimum value. Rows where column y is NULL are not included in the calculation.
- x: Required. Specifies the column from which to retrieve a value that corresponds to the minimum value in column y.
Usage notes
The MAX_BY and MIN_BY functions are supported only in Hologres V1.3.36 and later.
Note
If your instance version is earlier than V1.3.36, you can join the Hologres DingTalk group to request an instance upgrade. For more information, see How to obtain more online support?
Return value
- The function returns the value from column x that corresponds to the minimum value in column y.
- If all values in column y are NULL, the function returns NULL.
- If the MIN_BY function finds multiple rows with the minimum value in column y, it returns the minimum value from the corresponding values in column x.

Example

--Sample data
DROP TABLE IF EXISTS test;
CREATE TABLE IF NOT EXISTS test (
    id INT,
    name TEXT,
    cost INT
);
INSERT INTO test
    VALUES (1, 'a', 100), (1, 'aa', 200), (1, 'aaa', 300), (2, 'b', 150), (2, 'bb', 300), (3, 'c', 150), (3, 'cc', 50);

Query for the value in the name column that corresponds to the minimum value in the cost column.

SELECT MIN_BY(name, cost) FROM test;

The following result is returned.

min_by
--------
 cc
(1 row)

REGR_AVGX

Description: Calculates the average of an independent variable (X).
```
REGR_AVGX(Y, X)
```
Parameters
Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
The return value is the same data type as the parameters.

Example

SELECT REGR_AVGX(c6, c7) FROM example_table;

The following result is returned.

 regr_avgx
-----------
       4.8

REGR_AVGY

Description: Calculates the average of a dependent variable (Y).
```
REGR_AVGY(Y, X)
```
Parameters
Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
The return value is the same data type as the parameters.

Example

SELECT REGR_AVGY(c6, c7) FROM example_table;

The following result is returned.

 regr_avgy
-----------
         6

REGR_COUNT

Description: Calculates the number of rows where neither input parameter is NULL.
```
REGR_COUNT(Y, X)
```
Parameters
Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
Returns a value of the BIGINT type.

Example

SELECT REGR_COUNT(c6, c7) FROM example_table;

The following result is returned.

 regr_count
------------
          2

REGR_R2

Description: Calculates the square of the correlation coefficient.
```
REGR_R2(Y, X)
```
Parameters
Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
The return value is the same data type as the parameters.

Example

SELECT REGR_R2(c6, c7) FROM example_table;

The following result is returned.

 regr_r2
---------
       1

REGR_INTERCEPT

Description: Calculates the y-intercept of the least-squares-fit linear equation determined by the (Y, X) pairs.
```
REGR_INTERCEPT(Y, X)
```
Parameters
Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
The return value is the same data type as the parameters.

Example

SELECT REGR_INTERCEPT(c6, c7) FROM example_table;

The following result is returned.

 regr_intercept
----------------
              6

REGR_SLOPE

Description: Calculates the slope of the least-squares-fit linear equation.
```
REGR_SLOPE(Y, X)
```
Parameters
Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
The return value is the same data type as the parameters.

Example

SELECT REGR_SLOPE(c6, c7) FROM example_table;

The following result is returned.

      regr_slope
----------------------
 7.34031751818285e-16

REGR_SXX

Description: Calculates the sum of squares of the independent variable (X), as sum(X^2) - sum(X)^2/N.
```
REGR_SXX(Y, X)
```
Parameters
Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
The return value is the same data type as the parameters.

Example

SELECT REGR_SXX(c6, c7) FROM example_table;

The following result is returned.

 regr_sxx
----------
     9.68

REGR_SXY

Description: Calculates the sum of products of the independent variable (X) and dependent variable (Y), as sum(X × Y) - sum(X) × sum(Y)/N.
```
REGR_SXY(Y, X)
```
Parameters
Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
The return value is the same data type as the parameters.

Example

SELECT REGR_SXY(c6, c7) FROM example_table;

The following result is returned.

      regr_sxy
--------------------
 7.105427357601e-15

SUM

Description: Calculates the sum of all values from a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
```
SUM(expression)
```
Parameters
expression: Required. A BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
Return value
The return value is the same data type as the parameter.

Example

SELECT SUM(c1) FROM example_table;

The following result is returned.

 sum
-----
   3

STDDEV

Description: Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
```
STDDEV(expression)
```
Parameters
expression: Required. An INT, NUMERIC, or FLOAT8 expression.
Return value
Returns a value of the NUMERIC or FLOAT8 type.

Example

SELECT STDDEV(c1) FROM example_table;

The following result is returned.

    stddev
--------------
 0.7071067811

STDDEV_POP

Description: Calculates the population standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
```
STDDEV_POP(expression)	
```
Parameters
expression: Required. An INT, NUMERIC, or FLOAT8 expression.
Return value
Returns a value of the NUMERIC or FLOAT8 type.

Example

SELECT STDDEV_POP(c1) FROM example_table;

The following result is returned.

  stddev_pop
--------------
 0.5000000000

STDDEV_SAMP

Description: Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
```
STDDEV_SAMP(expression)	
```
Parameters
expression: Required. An INT, NUMERIC, or FLOAT8 expression.
Return value
Returns a value of the NUMERIC or FLOAT8 type.

Example

SELECT STDDEV_SAMP(c1) FROM example_table;

The following result is returned.

 stddev_samp
--------------
 0.7071067812

STRING_AGG

Description: Concatenates the non-NULL values returned by a TEXT expression into a string using a specified separator.
```
STRING_AGG(<expression> TEXT, <delimiter> TEXT)
```
Parameters
- expression: Required. The source string.
- delimiter: Required. The separator.
Return value
Returns a value of the TEXT type.

Example

SELECT STRING_AGG(c5, '-') FROM example_table;

The following result is returned.

 string_agg
------------
 3-4

UNIQ

Description: Calculates the number of rows that have distinct values in a column.
```
UNIQ ( < column > );
```
Parameters
column: Required. Specifies the column for which you want to calculate the number of rows that have distinct values. This column can be of the following data types: SMALLINT, INTEGER, BIGINT, REAL, DOUBLE PRECISION, TEXT, VARCHAR, TIMESTAMP, TIMESTAMPTZ, DATE, TIMETZ, and UUID.
Usage notes
- The UNIQ function is supported only in Hologres V1.3 and later.
  Note
  If your instance version is earlier than V1.3, see Common errors when preparing for an upgrade or join the Hologres DingTalk group to provide feedback. For more information, see How to obtain more online support?.
- The UNIQ function performs better than the COUNT DISTINCT function only when the SQL statement contains a GROUP BY clause and the values of the GROUP BY field are evenly distributed without data skew.
- Typically, UNIQ performs better than COUNT DISTINCT when the GROUP BY key has high cardinality. UNIQ also consumes less memory than COUNT DISTINCT. If you encounter an out-of-memory error when you use COUNT DISTINCT, you can use UNIQ as an alternative.
- Since Hologres V2.1, the performance of COUNT DISTINCT has been significantly optimized for various scenarios, including those with single or multiple COUNT DISTINCT functions, data skew, or SQL statements without a GROUP BY clause. You no longer need to manually rewrite queries to use UNIQ for better performance. For more information, see Count Distinct optimization.

Example

The following example uses the ORDERS table from the TPC-H public dataset provided by MaxCompute. You can run the following SQL statements directly without any additional preparation.

-- Create a foreign table.
IMPORT FOREIGN SCHEMA public_data
LIMIT TO (ORDERS_10g)
FROM SERVER odps_server INTO public options (if_table_exist 'update');

-- Create an internal table.
BEGIN;
CREATE TABLE ORDERS (
    O_ORDERKEY bigint NOT NULL PRIMARY KEY,
    O_CUSTKEY int NOT NULL,
    O_ORDERSTATUS text NOT NULL,
    O_TOTALPRICE DECIMAL(12, 2) NOT NULL,
    O_ORDERDATE timestamptz NOT NULL,
    O_ORDERPRIORITY text NOT NULL,
    O_CLERK text NOT NULL,
    O_SHIPPRIORITY int NOT NULL,
    O_COMMENT text NOT NULL
);

CALL set_table_property ('ORDERS', 'segment_key', 'O_ORDERDATE');
CALL set_table_property ('ORDERS', 'distribution_key', 'O_ORDERKEY');
CALL set_table_property ('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
CALL set_table_property ('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');

COMMENT ON COLUMN ORDERS.O_ORDERKEY IS 'Order ID';
COMMENT ON COLUMN ORDERS.O_CUSTKEY IS 'Customer ID';
COMMENT ON COLUMN ORDERS.O_ORDERSTATUS IS 'Order status';
COMMENT ON COLUMN ORDERS.O_TOTALPRICE IS 'Total price';
COMMENT ON COLUMN ORDERS.O_ORDERDATE IS 'Order date';
COMMENT ON COLUMN ORDERS.O_ORDERPRIORITY IS 'Order priority';
COMMENT ON COLUMN ORDERS.O_CLERK IS 'Cashier';
COMMENT ON COLUMN ORDERS.O_SHIPPRIORITY IS 'Shipping priority';
COMMENT ON COLUMN ORDERS.O_COMMENT IS 'Comment';

COMMIT;

-- Import data to the internal table.
INSERT INTO ORDERS SELECT * FROM ORDERS_10g;

Calculate the number of distinct rows in the O_CUSTKEY column of the ORDERS table.
```
SELECT UNIQ ( O_CUSTKEY ) FROM ORDERS;
```
The following result is returned.
```
  uniq
--------
 999982
```

Calculate the number of distinct O_CUSTKEY values for each O_ORDERSTATUS in the ORDERS table.

SELECT O_ORDERSTATUS, UNIQ ( O_CUSTKEY ) FROM ORDERS GROUP BY O_ORDERSTATUS;

The following result is returned.

 o_orderstatus |  uniq
---------------+--------
 P             | 313478
 F             | 996258
 O             | 996400

VARIANCE

Description: Calculates the sample variance of the values returned by an INT or NUMERIC expression.
```
VARIANCE(expression)
```
Parameters
expression: Required. An INT or NUMERIC expression.
Return value
Returns a value of the NUMERIC type.

Example

SELECT VARIANCE(c1) FROM example_table;

The following result is returned.

   variance
--------------
 0.5000000000

VAR_POP

Description: Calculates the population variance of the values returned by a FLOAT8, INT, or NUMERIC expression.
```
VAR_POP(expression)
```
Parameters
expression: Required. A FLOAT8, INT, or NUMERIC expression.
Return value
Returns a value of the NUMERIC or FLOAT8 type.

Example

SELECT VAR_POP(c1) FROM example_table;

The following result is returned.

   var_pop
--------------
 0.2500000000

VAR_SAMP

Description: Calculates the sample variance of the values returned by an INT, FLOAT8, or NUMERIC expression.
```
VAR_SAMP(expression)
```
Parameters
expression: Required. An INT, FLOAT8, or NUMERIC expression.
Return value
Returns a value of the NUMERIC or FLOAT8 type.

Example

SELECT VAR_SAMP(c1) FROM example_table;

The following result is returned.

   var_samp
--------------
 0.5000000000