COUNT DISTINCT requires a full scan of raw data every time you run it, and it cannot answer multi-segment questions efficiently. The hll extension adds the HyperLogLog (HLL) data type to PolarDB for PostgreSQL, letting you store probabilistic sketches instead of raw rows. You can then estimate distinct-value counts — such as page views (PVs) and unique visitors (UVs) — with bounded error, constant memory usage, and instant multi-day or cross-segment union queries.
In practice, 1,280 bytes of hll data can accurately estimate billions of distinct elements.
Supported versions
The hll extension is supported on PolarDB for PostgreSQL clusters running:
PostgreSQL 14 (revision version 14.5.1.0 or later)
PostgreSQL 11 (revision version 1.1.28 or later)
To check the revision version in use:
PostgreSQL 14:
SELECT version();PostgreSQL 11:
SHOW polar_version;
How it works
HyperLogLog uses a hierarchy of internal representations to minimize memory at low cardinalities and switch to a compact probabilistic structure as cardinality grows:
| Representation | Description |
|---|---|
| EMPTY | Empty set. |
| EXPLICIT | Used at low cardinalities. |
| SPARSE | Used at intermediate cardinalities. |
| FULL | Used at high cardinalities. |
The hll_print function shows which representation is active along with its parameters, which is useful for tuning and debugging.
Data types
| Type | Description |
|---|---|
hll | Stores a HyperLogLog sketch. Supports aggregation, union, and cardinality estimation. |
hll_hashval | Intermediate hash value. Hash functions produce this type before you aggregate values into an hll. |
Install and remove the extension
Install:
CREATE EXTENSION hll;Remove:
DROP EXTENSION hll;Hash functions
Hash functions convert raw column values to hll_hashval before you aggregate them into an hll. Choose the function that matches your source column type.
| Function | Input type | Example |
|---|---|---|
hll_hash_boolean | boolean | hll_hash_boolean(true) |
hll_hash_smallint | smallint | hll_hash_smallint(4) |
hll_hash_integer | integer | hll_hash_integer(1) |
hll_hash_bigint | bigint | hll_hash_bigint(123456789) |
Examples:
SELECT hll_hash_boolean(true);Result:hll_hash_boolean --------------------- 8849112093580131862 (1 row)SELECT hll_hash_integer(1);Result:hll_hash_integer ---------------------- -8604791237420463362 (1 row)
Operators
hll operators
| Operator | Description | Example | Result |
|---|---|---|---|
= | Equality | hll_add_agg(1::hll_hashval) = hll_add_agg(2::hll_hashval) | f |
!= / <> | Inequality | — | — |
|| | Union of two hll values | hll_add_agg(1::hll_hashval) || hll_add_agg(2::hll_hashval) | \x128b7f00000000000000010000000000000002 |
# | Cardinality estimate (distinct count) | #hll_add_agg(1::hll_hashval) | 1 |
hll_hashval operators
| Operator | Description | Example | Result |
|---|---|---|---|
= | Equality | 1::hll_hashval = 2::hll_hashval | f |
!= / <> | Inequality | 1::hll_hashval <> 2::hll_hashval | t |
Basic operations
Convert INT data to hll_hashval:
SELECT 1::hll_hashval;Result:
hll_hashval
-------------
1
(1 row)Functions
hll_add_agg
Aggregates hll_hashval values into a single hll value.
SELECT hll_add_agg(1::hll_hashval);Result:
hll_add_agg
--------------------------
\x128b7f0000000000000001
(1 row)hll_union
Merges two hll values into one.
SELECT hll_union(hll_add_agg(1::hll_hashval), hll_add_agg(2::hll_hashval));Result:
hll_union
------------------------------------------
\x128b7f00000000000000010000000000000002
(1 row)hll_set_defaults
Sets the accuracy parameters.
SELECT hll_set_defaults(15, 5, -1, 1);Result:
hll_set_defaults
------------------
(11,5,-1,1)
(1 row)hll_print
Displays the internal representation and parameters of an hll value. Use this to inspect which algorithm is active and to verify accuracy settings.
SELECT hll_print(hll_add_agg(1::hll_hashval));Result:
hll_print
-----------------------------------------------------------------------------
EXPLICIT, 1 elements, nregs=32768, nbits=5, expthresh=-1(2560), sparseon=1:+
0: 1
(1 row)End-to-end example: daily UV tracking
This example shows the full pattern: hash raw user IDs, aggregate into hll values per day, query daily distinct-user counts, and combine multiple days with a union.
Step 1: Create the aggregate table and populate it
CREATE TABLE access_date (acc_date DATE UNIQUE, userids hll);
-- Day 0: users with IDs 1-10000
INSERT INTO access_date
SELECT current_date, hll_add_agg(hll_hash_integer(user_id))
FROM generate_series(1, 10000) t(user_id);
-- Day -1: users with IDs 5000-20000
INSERT INTO access_date
SELECT current_date - 1, hll_add_agg(hll_hash_integer(user_id))
FROM generate_series(5000, 20000) t(user_id);
-- Day -2: users with IDs 9000-40000
INSERT INTO access_date
SELECT current_date - 2, hll_add_agg(hll_hash_integer(user_id))
FROM generate_series(9000, 40000) t(user_id);Step 2: Query daily distinct-user counts
Use the # operator to estimate UVs for each day:
SELECT #userids FROM access_date WHERE acc_date = current_date;Result:
?column?
------------------
9725.852733707077
(1 row)SELECT #userids FROM access_date WHERE acc_date = current_date - 1;Result:
?column?
------------------
14968.65968832792
(1 row)SELECT #userids FROM access_date WHERE acc_date = current_date - 2;Result:
?column?
------------------
29361.520914991113
(1 row)