All Products
Search
Document Center

PolarDB:hll

Last Updated:Mar 28, 2026

COUNT DISTINCT requires a full scan of raw data every time you run it, and it cannot answer multi-segment questions efficiently. The hll extension adds the HyperLogLog (HLL) data type to PolarDB for PostgreSQL, letting you store probabilistic sketches instead of raw rows. You can then estimate distinct-value counts — such as page views (PVs) and unique visitors (UVs) — with bounded error, constant memory usage, and instant multi-day or cross-segment union queries.

In practice, 1,280 bytes of hll data can accurately estimate billions of distinct elements.

Supported versions

The hll extension is supported on PolarDB for PostgreSQL clusters running:

  • PostgreSQL 14 (revision version 14.5.1.0 or later)

  • PostgreSQL 11 (revision version 1.1.28 or later)

To check the revision version in use:

  • PostgreSQL 14:

    SELECT version();
  • PostgreSQL 11:

    SHOW polar_version;

How it works

HyperLogLog uses a hierarchy of internal representations to minimize memory at low cardinalities and switch to a compact probabilistic structure as cardinality grows:

RepresentationDescription
EMPTYEmpty set.
EXPLICITUsed at low cardinalities.
SPARSEUsed at intermediate cardinalities.
FULLUsed at high cardinalities.

The hll_print function shows which representation is active along with its parameters, which is useful for tuning and debugging.

Data types

TypeDescription
hllStores a HyperLogLog sketch. Supports aggregation, union, and cardinality estimation.
hll_hashvalIntermediate hash value. Hash functions produce this type before you aggregate values into an hll.

Install and remove the extension

Install:

CREATE EXTENSION hll;

Remove:

DROP EXTENSION hll;

Hash functions

Hash functions convert raw column values to hll_hashval before you aggregate them into an hll. Choose the function that matches your source column type.

FunctionInput typeExample
hll_hash_booleanbooleanhll_hash_boolean(true)
hll_hash_smallintsmallinthll_hash_smallint(4)
hll_hash_integerintegerhll_hash_integer(1)
hll_hash_bigintbiginthll_hash_bigint(123456789)

Examples:

  • SELECT hll_hash_boolean(true); Result:

    hll_hash_boolean
    ---------------------
     8849112093580131862
    (1 row)
  • SELECT hll_hash_integer(1); Result:

    hll_hash_integer
    ----------------------
     -8604791237420463362
    (1 row)

Operators

hll operators

OperatorDescriptionExampleResult
=Equalityhll_add_agg(1::hll_hashval) = hll_add_agg(2::hll_hashval)f
!= / <>Inequality
||Union of two hll valueshll_add_agg(1::hll_hashval) || hll_add_agg(2::hll_hashval)\x128b7f00000000000000010000000000000002
#Cardinality estimate (distinct count)#hll_add_agg(1::hll_hashval)1

hll_hashval operators

OperatorDescriptionExampleResult
=Equality1::hll_hashval = 2::hll_hashvalf
!= / <>Inequality1::hll_hashval <> 2::hll_hashvalt

Basic operations

Convert INT data to hll_hashval:

SELECT 1::hll_hashval;

Result:

hll_hashval
-------------
 1
(1 row)

Functions

hll_add_agg

Aggregates hll_hashval values into a single hll value.

SELECT hll_add_agg(1::hll_hashval);

Result:

hll_add_agg
--------------------------
 \x128b7f0000000000000001
(1 row)

hll_union

Merges two hll values into one.

SELECT hll_union(hll_add_agg(1::hll_hashval), hll_add_agg(2::hll_hashval));

Result:

hll_union
------------------------------------------
 \x128b7f00000000000000010000000000000002
(1 row)

hll_set_defaults

Sets the accuracy parameters.

SELECT hll_set_defaults(15, 5, -1, 1);

Result:

hll_set_defaults
------------------
 (11,5,-1,1)
(1 row)

hll_print

Displays the internal representation and parameters of an hll value. Use this to inspect which algorithm is active and to verify accuracy settings.

SELECT hll_print(hll_add_agg(1::hll_hashval));

Result:

hll_print
-----------------------------------------------------------------------------
 EXPLICIT, 1 elements, nregs=32768, nbits=5, expthresh=-1(2560), sparseon=1:+
 0:                    1
(1 row)

End-to-end example: daily UV tracking

This example shows the full pattern: hash raw user IDs, aggregate into hll values per day, query daily distinct-user counts, and combine multiple days with a union.

Step 1: Create the aggregate table and populate it

CREATE TABLE access_date (acc_date DATE UNIQUE, userids hll);

-- Day 0: users with IDs 1-10000
INSERT INTO access_date
    SELECT current_date, hll_add_agg(hll_hash_integer(user_id))
    FROM generate_series(1, 10000) t(user_id);

-- Day -1: users with IDs 5000-20000
INSERT INTO access_date
    SELECT current_date - 1, hll_add_agg(hll_hash_integer(user_id))
    FROM generate_series(5000, 20000) t(user_id);

-- Day -2: users with IDs 9000-40000
INSERT INTO access_date
    SELECT current_date - 2, hll_add_agg(hll_hash_integer(user_id))
    FROM generate_series(9000, 40000) t(user_id);

Step 2: Query daily distinct-user counts

Use the # operator to estimate UVs for each day:

SELECT #userids FROM access_date WHERE acc_date = current_date;

Result:

?column?
------------------
 9725.852733707077
(1 row)
SELECT #userids FROM access_date WHERE acc_date = current_date - 1;

Result:

?column?
------------------
 14968.65968832792
(1 row)
SELECT #userids FROM access_date WHERE acc_date = current_date - 2;

Result:

?column?
------------------
 29361.520914991113
(1 row)