AnalyticDB for PostgreSQL builds on the features of Greenplum Database and supports the hll extension. Counting distinct values exactly requires memory proportional to the cardinality of the dataset — impractical at scale. The hll extension adds the HyperLogLog (HLL) probabilistic algorithm to AnalyticDB for PostgreSQL, letting you estimate cardinality with a small, bounded error while using a fraction of the memory. This makes it practical for high-cardinality workloads such as daily unique visitors (UVs) and page views (PVs) in internet advertising.
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for PostgreSQL instance
The hll extension installed on the Extensions page of your instance. For details, see Install, update, and uninstall extensions.
Key concepts and data types
The hll extension introduces two data types:
| Type | Description |
|---|---|
hll | Stores a HyperLogLog sketch — a compact probabilistic representation of a distinct-value set |
hll_hashval | Stores a single hashed value before it is added to an hll sketch |
All values must be hashed before you can add them to an hll sketch. Use the hash functions in the Hash functions section to convert raw values to hll_hashval.
Create a table with an hll column to store HyperLogLog sketches:
create table agg (id int primary key, userids hll);Cast an integer to hll_hashval to prepare it for aggregation:
select 1::hll_hashval;Operators
hll operators
The hll type supports the following operators:
| Operator | Description | Example |
|---|---|---|
= | Equality | select hll_add_agg(1::hll_hashval) = hll_add_agg(2::hll_hashval); |
!=, <> | Inequality | — |
|| | Merges two hll sketches into one | select hll_add_agg(1::hll_hashval) || hll_add_agg(2::hll_hashval); |
# | Returns the estimated cardinality (distinct count) of an hll sketch | select #hll_add_agg(1::hll_hashval); |
hll_hashval operators
The hll_hashval type supports =, !=, and <>:
select 1::hll_hashval = 2::hll_hashval;
select 1::hll_hashval <> 2::hll_hashval;Functions
Hash functions
Hash functions convert raw values to hll_hashval. Use the function that matches your column type:
| Function | Input type | Example |
|---|---|---|
hll_hash_boolean | BOOLEAN | select hll_hash_boolean(true); |
hll_hash_smallint | SMALLINT | — |
hll_hash_integer | INTEGER | select hll_hash_integer(1); |
hll_hash_bigint | BIGINT | — |
Aggregation functions
Two aggregation functions build and combine hll sketches. The naming follows a consistent pattern: each function name embeds what it does — hll_add_agg adds values in an aggregate, and hll_union unions two sketches.
hll_add_agg— an aggregate function that adds rawhll_hashvalvalues into an empty hll sketch. Use this to build an hll sketch from a table or result set.select hll_add_agg(1::hll_hashval);hll_union— merges two existing hll sketches into one. Use this to combine hll values already stored in a table.select hll_union(hll_add_agg(1::hll_hashval), hll_add_agg(2::hll_hashval));
Utility functions
hll_set_defaults— sets the precision parameters for hll sketches. Higher precision reduces estimation error at the cost of more memory.select hll_set_defaults(15, 5, -1, 1);hll_print— prints debugging information about an hll sketch.select hll_print(hll_add_agg(1::hll_hashval));
Example: estimate daily unique visitors
This example shows the full workflow — hash and aggregate raw user IDs into daily hll sketches, store them in a table, then query the estimated UV count per day.
Step 1: Create a table to store daily hll sketches.
create table access_date (acc_date date unique, userids hll);Step 2: Aggregate user IDs into hll sketches and insert by date.
insert into access_date select current_date, hll_add_agg(hll_hash_integer(user_id)) from generate_series(1,10000) t(user_id);
insert into access_date select current_date-1, hll_add_agg(hll_hash_integer(user_id)) from generate_series(5000,20000) t(user_id);
insert into access_date select current_date-2, hll_add_agg(hll_hash_integer(user_id)) from generate_series(9000,40000) t(user_id);Step 3: Query the estimated UV count per day using the # operator.
postgres=# select #userids from access_date where acc_date=current_date;
?column?
------------------
9725.85273370708
(1 row)
postgres=# select #userids from access_date where acc_date=current_date-1;
?column?
------------------
14968.6596883279
(1 row)
postgres=# select #userids from access_date where acc_date=current_date-2;
?column?
------------------
29361.5209149911
(1 row)The results are estimates. The # operator returns a floating-point cardinality value, not an exact distinct count. To adjust precision, use hll_set_defaults before inserting data.