Exact COUNT(DISTINCT ...) queries become slow as datasets grow, because the database must track every unique value. The hll extension adds the HyperLogLog (HLL) data type to PolarDB for PostgreSQL (Compatible with Oracle), enabling approximate cardinality estimation that scales to billions of rows without the cost of exact counting. Use it to estimate page views (PVs) and unique visitors (UVs) in analytics workloads.
Prerequisites
Before you begin, ensure that you have:
A PolarDB for PostgreSQL (Compatible with Oracle) cluster running version 2.0 (revision version 2.0.14.1.0 or later)
To check your minor version, run:
SHOW polar_version;How hll works
HyperLogLog is a probabilistic algorithm that estimates the number of distinct elements in a dataset. Instead of storing every value, it stores a compact sketch — 1,280 bytes of hll data can accurately estimate billions of distinct elements.
All hll operations follow the same pipeline:
Hash source values into
hll_hashvalusing a type-specific hash function (for example,hll_hash_integer).Aggregate hashed values into an hll sketch using
hll_add_agg.Merge sketches across time periods or partitions using
hll_unionor the||operator.Extract the cardinality estimate from a sketch using the
#operator.
Pre-aggregate sketches and store them in a table. When you query across multiple time periods, merge the stored sketches — this is significantly faster than scanning raw data.
Install the extension
CREATE EXTENSION hll;Data types
| Data type | Description |
|---|---|
hll | An hll sketch that stores an approximate set of hashed values |
hll_hashval | A hashed representation of a source value |
Functions
Hash functions
Hash functions convert source values into hll_hashval. Use the function that matches your column type.
| Function | Input type |
|---|---|
hll_hash_boolean | Boolean |
hll_hash_smallint | SMALLINT |
hll_hash_integer | INTEGER |
hll_hash_bigint | BIGINT |
Examples:
SELECT hll_hash_boolean(true);hll_hash_boolean
---------------------
8849112093580131862
(1 row)SELECT hll_hash_integer(1);hll_hash_integer
----------------------
-8604791237420463362
(1 row)hll_add_agg
Aggregates hll_hashval values into a single hll sketch.
SELECT hll_add_agg(1::hll_hashval);hll_add_agg
--------------------------
\x128b7f0000000000000001
(1 row)hll_union
Merges two hll sketches into one. Use this to combine sketches stored across multiple rows.
SELECT hll_union(hll_add_agg(1::hll_hashval), hll_add_agg(2::hll_hashval));hll_union
------------------------------------------
\x128b7f00000000000000010000000000000002
(1 row)hll_set_defaults
Adjusts the accuracy parameters for hll sketches. Increasing the precision parameter reduces the error rate at the cost of larger sketch size.
SELECT hll_set_defaults(15,5,-1,1);hll_set_defaults
------------------
(11,5,-1,1)
(1 row)hll_print
Displays debug information about an hll sketch, including its internal representation and element count.
SELECT hll_print(hll_add_agg(1::hll_hashval));hll_print
-----------------------------------------------------------------------------
EXPLICIT, 1 elements, nregs=32768, nbits=5, expthresh=-1(2560), sparseon=1:+
0: 1
(1 row)Operators
Operators for hll
| Operator | Description |
|---|---|
= | Equality |
!= / <> | Inequality |
|| | Merge two hll sketches |
# | Extract cardinality estimate from an hll sketch |
Examples:
-- Check if two sketches are equal
SELECT hll_add_agg(1::hll_hashval) = hll_add_agg(2::hll_hashval);?column?
----------
f
(1 row)-- Merge two sketches
SELECT hll_add_agg(1::hll_hashval) || hll_add_agg(2::hll_hashval);?column?
------------------------------------------
\x128b7f00000000000000010000000000000002
(1 row)-- Estimate cardinality
SELECT #hll_add_agg(1::hll_hashval);?column?
----------
1
(1 row)Operators for hll_hashval
| Operator | Description |
|---|---|
= | Equality |
!= / <> | Inequality |
Examples:
SELECT 1::hll_hashval = 2::hll_hashval;?column?
----------
f
(1 row)SELECT 1::hll_hashval <> 2::hll_hashval;?column?
----------
t
(1 row)Basic operations
Create a table with an hll column:
CREATE TABLE agg (id INT PRIMARY KEY, userids hll);Cast an integer to hll_hashval:
SELECT 1::hll_hashval;hll_hashval
-------------
1
(1 row)Estimate daily unique visitors
This example pre-aggregates UV data by date and queries estimates across multiple days. Pre-aggregating hll sketches is significantly faster than running COUNT(DISTINCT ...) over raw event data at query time.
Step 1: Create the table and insert data.
CREATE TABLE access_date (acc_date DATE UNIQUE, userids hll);
-- Day 1: user IDs 1-10,000
INSERT INTO access_date
SELECT current_date, hll_add_agg(hll_hash_integer(user_id))
FROM generate_series(1, 10000) t(user_id);
-- Day 2: user IDs 5,000-20,000
INSERT INTO access_date
SELECT current_date - 1, hll_add_agg(hll_hash_integer(user_id))
FROM generate_series(5000, 20000) t(user_id);
-- Day 3: user IDs 9,000-40,000
INSERT INTO access_date
SELECT current_date - 2, hll_add_agg(hll_hash_integer(user_id))
FROM generate_series(9000, 40000) t(user_id);Step 2: Query estimated UV counts per day.
SELECT #userids AS estimated_uv FROM access_date WHERE acc_date = current_date;estimated_uv
------------------
9725.852733707077
(1 row)SELECT #userids AS estimated_uv FROM access_date WHERE acc_date = current_date - 1;estimated_uv
------------------
14968.65968832792
(1 row)SELECT #userids AS estimated_uv FROM access_date WHERE acc_date = current_date - 2;estimated_uv
--------------------
29361.520914991113
(1 row)The estimated UV for each day reflects the approximate number of distinct user IDs in that range.
Uninstall the extension
DROP EXTENSION hll;