The hll extension adds the HyperLogLog data type to ApsaraDB RDS for PostgreSQL, letting you estimate the count of distinct values (cardinality) across large datasets with sub-percent error and minimal storage. A single hll value of 1,280 bytes can represent billions of distinct elements—making it practical for high-scale analytics such as counting page views (PV) and unique visitors (UV).
Prerequisites
Before you begin, make sure that:
-
Your RDS instance runs PostgreSQL 11 or later.
The hll extension is not supported on PostgreSQL 17.
-
If your major engine version meets the requirement but the extension is still unavailable, update the minor engine version. See Update the minor engine version.
When to use hll instead of COUNT DISTINCT
COUNT DISTINCT answers exactly one fixed question per query: "How many unique values appeared in this dataset?" If you need to answer that question across different time windows—today, this week, the past 30 days, a rolling 7-day window—you must re-scan raw data for each query.
hll works differently. It stores a compact sketch of a dataset. Once you have stored daily sketches, you can union them together at query time to answer any time-range question without touching raw data. This makes hll well-suited for analytics workloads where:
-
Raw event data is too large to aggregate repeatedly
-
You need UV or PV counts across arbitrary date ranges
-
Approximate results are acceptable (accuracy is configurable via
hll_set_defaults)
For the full algorithm, see HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. For the upstream extension source, see citusdata/postgresql-hll.
How hashing works
Before adding a value to an hll structure, you must hash it using one of the hll_hash_* functions. hll stores hash values, not raw values—this is how it achieves both compact size and type-agnostic storage.
Attempting to add a raw integer without hashing produces an error:
SELECT 1234 || hll_empty();
-- ERROR: operator does not exist: integer || hll
-- HINT: No operator matches the given name and argument type(s).
-- You might need to add explicit type casts.
Always wrap input values in the appropriate hll_hash_* call before passing them to hll operations.
Enable the hll extension
Connect to your database and run:
CREATE EXTENSION hll;
Quick start
The following example builds a minimal hll set step by step, then retrieves its cardinality.
-- Create a table with an hll column
CREATE TABLE helloworld (id integer, visitors hll);
-- Insert an empty hll
INSERT INTO helloworld (id, visitors) VALUES (1, hll_empty());
-- Add a hashed integer (for example, a numeric user ID)
UPDATE helloworld
SET visitors = hll_add(visitors, hll_hash_integer(12345))
WHERE id = 1;
-- Add a hashed text value (for example, a session token)
UPDATE helloworld
SET visitors = hll_add(visitors, hll_hash_text('session-abc'))
WHERE id = 1;
-- Estimate the number of distinct values
SELECT hll_cardinality(visitors) FROM helloworld WHERE id = 1;
-- hll_cardinality
-- -----------------
-- 2
-- (1 row)
Basic operations
Create a table with hll fields:
create table agg (id int primary key, userids hll);
Convert INT data to hll_hashval data:
select 1::hll_hashval;
Estimate UV across date ranges
This example shows a typical PV/UV analytics pattern. The key idea: store one hll sketch per day, then union sketches at query time to answer any date-range question.
Step 1: Create the table and populate daily sketches
CREATE TABLE access_date (acc_date date UNIQUE, userids hll);
-- Day 0: users 1–10,000
INSERT INTO access_date
SELECT current_date, hll_add_agg(hll_hash_integer(user_id))
FROM generate_series(1, 10000) t(user_id);
-- Day -1: users 5,000–20,000
INSERT INTO access_date
SELECT current_date - 1, hll_add_agg(hll_hash_integer(user_id))
FROM generate_series(5000, 20000) t(user_id);
-- Day -2: users 9,000–40,000
INSERT INTO access_date
SELECT current_date - 2, hll_add_agg(hll_hash_integer(user_id))
FROM generate_series(9000, 40000) t(user_id);
Step 2: Query UV for a single day
The # operator returns the cardinality estimate directly.
SELECT #userids FROM access_date WHERE acc_date = current_date;
-- ?column?
-- ------------------
-- 9725.85273370708
-- (1 row)
SELECT #userids FROM access_date WHERE acc_date = current_date - 1;
-- ?column?
-- ------------------
-- 14968.6596883279
-- (1 row)
SELECT #userids FROM access_date WHERE acc_date = current_date - 2;
-- ?column?
-- ------------------
-- 29361.5209149911
-- (1 row)
Step 3: Query UV across a date range
Union the daily sketches first, then estimate cardinality. This correctly accounts for users who appeared on multiple days.
-- Unique visitors over the past 3 days (deduplicated across days)
SELECT hll_cardinality(hll_union_agg(userids))
FROM access_date
WHERE acc_date >= current_date - 2;
Step 4: Compute a rolling 7-day UV window
Use a window function to compute the rolling unique visitor count for each day without re-scanning raw data.
SELECT
acc_date,
#hll_union_agg(userids) OVER seven_days AS rolling_uv
FROM access_date
WINDOW seven_days AS (ORDER BY acc_date ASC ROWS 6 PRECEDING);
Reference
Hash functions
Hash every input value before adding it to an hll structure. Choose the function that matches your column's data type.
| Function | Input type | Example |
|---|---|---|
hll_hash_boolean |
boolean | hll_hash_boolean(true) |
hll_hash_smallint |
smallint | hll_hash_smallint(4) |
hll_hash_integer |
integer | hll_hash_integer(21474836) |
hll_hash_bigint |
bigint | hll_hash_bigint(9007199254740992) |
hll_hash_text |
text | hll_hash_text('user@example.com') |
All functions return hll_hashval, the only type accepted by hll operators and aggregate functions.
Operators
hll operators
| Operator | Description | Example |
|---|---|---|
= |
Equality | hll_add_agg(1::hll_hashval) = hll_add_agg(2::hll_hashval) |
!= / <> |
Inequality | — |
|| |
Union (merge two hll sets) | hll_add_agg(1::hll_hashval) || hll_add_agg(2::hll_hashval) |
# |
Cardinality estimate | #hll_add_agg(1::hll_hashval) |
hll_hashval operators
| Operator | Description | Example |
|---|---|---|
= |
Equality | 1::hll_hashval = 2::hll_hashval |
!= / <> |
Inequality | 1::hll_hashval <> 2::hll_hashval |
Aggregate and utility functions
| Function | Description | Example |
|---|---|---|
hll_empty() |
Returns an empty hll | hll_empty() |
hll_add(hll, hll_hashval) |
Adds a hash value to an hll | hll_add(set, hll_hash_integer(42)) |
hll_add_agg(hll_hashval) |
Aggregate: builds an hll from a set of hash values | SELECT hll_add_agg(hll_hash_integer(user_id)) FROM events |
hll_cardinality(hll) |
Returns the estimated distinct count | SELECT hll_cardinality(visitors) |
hll_union(hll, hll) |
Merges two hll values | hll_union(hll_add_agg(1::hll_hashval), hll_add_agg(2::hll_hashval)) |
hll_union_agg(hll) |
Aggregate: merges multiple hll values | SELECT hll_union_agg(daily_sketch) FROM daily_uniques |
hll_set_defaults(log2m, regwidth, expthresh, sparseon) |
Configures accuracy and storage tradeoffs | SELECT hll_set_defaults(15, 5, -1, 1) |
hll_print(hll) |
Returns debug information for an hll value | SELECT hll_print(hll_add_agg(1::hll_hashval)) |