APPROX_COUNT_DISTINCT is an aggregate function. This topic describes the syntax of the APPROX_COUNT_DISTINCT function in Hologres.
Syntax
APPROX_COUNT_DISTINCT
function is invoked to calculate the number of rows that have distinct values in
a column of a table. The result of this function is an approximate value. APPROX_COUNT_DISTINCT ( <column> )
The following table describes the parameter in the syntax. Parameter | Description |
---|---|
column | The column for which you want to calculate the approximate number of rows that have distinct values. |
The APPROX_COUNT_DISTINCT
function uses the HyperLogLog approximating analysis to perform an inaccurate COUNT
DISTINCT operation. An inaccurate COUNT DISTINCT operation allows you to improve query
performance especially when a large number of discrete values exist in the column.
This ensures that the average margin of error is in the range of 0.1% to 1%
. The APPROX_COUNT_DISTINCT function applies to scenarios in which high query performance
is required and an approximate result is acceptable.
If you can afford the costs of more resources, you can also perform a precise COUNT
DISTINCT operation by using the COUNT DISTINCT ( column )
function.
Adjust the margin of error
set hg_experimental_approx_count_distinct_precision = 20;
- Valid values: 12 to 20. Default value: 17.
- This parameter specifies the number of bits that is used as the index of buckets in the HyperLogLog algorithm. A greater value indicates more buckets and higher accuracy.
- A greater value indicates that more computing time and memory are consumed. However,
the overhead is still smaller than the overhead of the
COUNT DISTINCT ( column )
function. Therefore, we recommend that you use the APPROX_COUNT_DISTINCT function rather than theCOUNT DISTINCT ( column )
function. - If you set this parameter to a value greater than 17, Hologres uses the HyperLogLog++ algorithm to correct the error of the return result in order to reduce the margin of error. For example, if you set the hg_experimental_approx_count_distinct_precision parameter to 20, the margin of error can be reduced to 0.01% to 0.2%.
Examples
SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;
-- Set the hg_experimental_approx_count_distinct_precision parameter to 20 for all sessions and calculate the approximate number of rows that have distinct values in the O_CUSTKEY column.
ALTER DATABASE dbname SET hg_experimental_approx_count_distinct_precision = 20;
SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;
-- Set the hg_experimental_approx_count_distinct_precision parameter to 20 for the current session.
SET hg_experimental_approx_count_distinct_precision = 20;
SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;