APPROX_COUNT_DISTINCT is an aggregate function. This topic describes the syntax of the APPROX_COUNT_DISTINCT function in Hologres.

Syntax

The APPROX_COUNT_DISTINCT function is invoked to calculate the number of rows that have distinct values in a column of a table. The result of this function is an approximate value.
APPROX_COUNT_DISTINCT ( <column> )
The following table describes the parameter in the syntax.
Parameter Description
column The column for which you want to calculate the approximate number of rows that have distinct values.

The APPROX_COUNT_DISTINCT function uses the HyperLogLog approximating analysis to perform an inaccurate COUNT DISTINCT operation. An inaccurate COUNT DISTINCT operation allows you to improve query performance especially when a large number of discrete values exist in the column. This ensures that the average margin of error is in the range of 0.1% to 1%. The APPROX_COUNT_DISTINCT function applies to scenarios in which high query performance is required and an approximate result is acceptable.

If you can afford the costs of more resources, you can also perform a precise COUNT DISTINCT operation by using the COUNT DISTINCT ( column ) function.

Adjust the margin of error

To adjust the margin of error, you can execute the following statement to change the value of the hg_experimental_approx_count_distinct_precision parameter:
set hg_experimental_approx_count_distinct_precision = 20;
  • Valid values: 12 to 20. Default value: 17.
  • This parameter specifies the number of bits that is used as the index of buckets in the HyperLogLog algorithm. A greater value indicates more buckets and higher accuracy.
  • A greater value indicates that more computing time and memory are consumed. However, the overhead is still smaller than the overhead of the COUNT DISTINCT ( column ) function. Therefore, we recommend that you use the APPROX_COUNT_DISTINCT function rather than the COUNT DISTINCT ( column ) function.
  • If you set this parameter to a value greater than 17, Hologres uses the HyperLogLog++ algorithm to correct the error of the return result in order to reduce the margin of error. For example, if you set the hg_experimental_approx_count_distinct_precision parameter to 20, the margin of error can be reduced to 0.01% to 0.2%.

Examples

Execute the following statement to calculate the approximate number of rows that have distinct values in the O_CUSTKEY column:
SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;

-- Set the hg_experimental_approx_count_distinct_precision parameter to 20 for all sessions and calculate the approximate number of rows that have distinct values in the O_CUSTKEY column.
ALTER DATABASE dbname SET hg_experimental_approx_count_distinct_precision = 20;
SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;

-- Set the hg_experimental_approx_count_distinct_precision parameter to 20 for the current session.
SET hg_experimental_approx_count_distinct_precision = 20;
SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;