All Products
Search
Document Center

PolarDB:hll

Last Updated:Mar 28, 2026

Exact COUNT(DISTINCT ...) queries become slow as datasets grow, because the database must track every unique value. The hll extension adds the HyperLogLog (HLL) data type to PolarDB for PostgreSQL (Compatible with Oracle), enabling approximate cardinality estimation that scales to billions of rows without the cost of exact counting. Use it to estimate page views (PVs) and unique visitors (UVs) in analytics workloads.

Prerequisites

Before you begin, ensure that you have:

  • A PolarDB for PostgreSQL (Compatible with Oracle) cluster running version 2.0 (revision version 2.0.14.1.0 or later)

To check your minor version, run:

SHOW polar_version;

How hll works

HyperLogLog is a probabilistic algorithm that estimates the number of distinct elements in a dataset. Instead of storing every value, it stores a compact sketch — 1,280 bytes of hll data can accurately estimate billions of distinct elements.

All hll operations follow the same pipeline:

  1. Hash source values into hll_hashval using a type-specific hash function (for example, hll_hash_integer).

  2. Aggregate hashed values into an hll sketch using hll_add_agg.

  3. Merge sketches across time periods or partitions using hll_union or the || operator.

  4. Extract the cardinality estimate from a sketch using the # operator.

Pre-aggregate sketches and store them in a table. When you query across multiple time periods, merge the stored sketches — this is significantly faster than scanning raw data.

Install the extension

CREATE EXTENSION hll;

Data types

Data typeDescription
hllAn hll sketch that stores an approximate set of hashed values
hll_hashvalA hashed representation of a source value

Functions

Hash functions

Hash functions convert source values into hll_hashval. Use the function that matches your column type.

FunctionInput type
hll_hash_booleanBoolean
hll_hash_smallintSMALLINT
hll_hash_integerINTEGER
hll_hash_bigintBIGINT

Examples:

SELECT hll_hash_boolean(true);
hll_hash_boolean
---------------------
 8849112093580131862
(1 row)
SELECT hll_hash_integer(1);
hll_hash_integer
----------------------
 -8604791237420463362
(1 row)

hll_add_agg

Aggregates hll_hashval values into a single hll sketch.

SELECT hll_add_agg(1::hll_hashval);
hll_add_agg
--------------------------
 \x128b7f0000000000000001
(1 row)

hll_union

Merges two hll sketches into one. Use this to combine sketches stored across multiple rows.

SELECT hll_union(hll_add_agg(1::hll_hashval), hll_add_agg(2::hll_hashval));
hll_union
------------------------------------------
 \x128b7f00000000000000010000000000000002
(1 row)

hll_set_defaults

Adjusts the accuracy parameters for hll sketches. Increasing the precision parameter reduces the error rate at the cost of larger sketch size.

SELECT hll_set_defaults(15,5,-1,1);
hll_set_defaults
------------------
 (11,5,-1,1)
(1 row)

hll_print

Displays debug information about an hll sketch, including its internal representation and element count.

SELECT hll_print(hll_add_agg(1::hll_hashval));
hll_print
-----------------------------------------------------------------------------
 EXPLICIT, 1 elements, nregs=32768, nbits=5, expthresh=-1(2560), sparseon=1:+
 0:                    1
(1 row)

Operators

Operators for hll

OperatorDescription
=Equality
!= / <>Inequality
||Merge two hll sketches
#Extract cardinality estimate from an hll sketch

Examples:

-- Check if two sketches are equal
SELECT hll_add_agg(1::hll_hashval) = hll_add_agg(2::hll_hashval);
?column?
----------
 f
(1 row)
-- Merge two sketches
SELECT hll_add_agg(1::hll_hashval) || hll_add_agg(2::hll_hashval);
?column?
------------------------------------------
 \x128b7f00000000000000010000000000000002
(1 row)
-- Estimate cardinality
SELECT #hll_add_agg(1::hll_hashval);
?column?
----------
        1
(1 row)

Operators for hll_hashval

OperatorDescription
=Equality
!= / <>Inequality

Examples:

SELECT 1::hll_hashval = 2::hll_hashval;
?column?
----------
 f
(1 row)
SELECT 1::hll_hashval <> 2::hll_hashval;
?column?
----------
 t
(1 row)

Basic operations

Create a table with an hll column:

CREATE TABLE agg (id INT PRIMARY KEY, userids hll);

Cast an integer to hll_hashval:

SELECT 1::hll_hashval;
hll_hashval
-------------
 1
(1 row)

Estimate daily unique visitors

This example pre-aggregates UV data by date and queries estimates across multiple days. Pre-aggregating hll sketches is significantly faster than running COUNT(DISTINCT ...) over raw event data at query time.

Step 1: Create the table and insert data.

CREATE TABLE access_date (acc_date DATE UNIQUE, userids hll);

-- Day 1: user IDs 1-10,000
INSERT INTO access_date
    SELECT current_date, hll_add_agg(hll_hash_integer(user_id))
    FROM generate_series(1, 10000) t(user_id);

-- Day 2: user IDs 5,000-20,000
INSERT INTO access_date
    SELECT current_date - 1, hll_add_agg(hll_hash_integer(user_id))
    FROM generate_series(5000, 20000) t(user_id);

-- Day 3: user IDs 9,000-40,000
INSERT INTO access_date
    SELECT current_date - 2, hll_add_agg(hll_hash_integer(user_id))
    FROM generate_series(9000, 40000) t(user_id);

Step 2: Query estimated UV counts per day.

SELECT #userids AS estimated_uv FROM access_date WHERE acc_date = current_date;
estimated_uv
------------------
 9725.852733707077
(1 row)
SELECT #userids AS estimated_uv FROM access_date WHERE acc_date = current_date - 1;
estimated_uv
------------------
 14968.65968832792
(1 row)
SELECT #userids AS estimated_uv FROM access_date WHERE acc_date = current_date - 2;
estimated_uv
--------------------
 29361.520914991113
(1 row)

The estimated UV for each day reflects the approximate number of distinct user IDs in that range.

Uninstall the extension

DROP EXTENSION hll;