All Products
Search
Document Center

AnalyticDB:hll

Last Updated:Mar 28, 2026

AnalyticDB for PostgreSQL builds on the features of Greenplum Database and supports the hll extension. Counting distinct values exactly requires memory proportional to the cardinality of the dataset — impractical at scale. The hll extension adds the HyperLogLog (HLL) probabilistic algorithm to AnalyticDB for PostgreSQL, letting you estimate cardinality with a small, bounded error while using a fraction of the memory. This makes it practical for high-cardinality workloads such as daily unique visitors (UVs) and page views (PVs) in internet advertising.

Prerequisites

Before you begin, ensure that you have:

Key concepts and data types

The hll extension introduces two data types:

TypeDescription
hllStores a HyperLogLog sketch — a compact probabilistic representation of a distinct-value set
hll_hashvalStores a single hashed value before it is added to an hll sketch

All values must be hashed before you can add them to an hll sketch. Use the hash functions in the Hash functions section to convert raw values to hll_hashval.

Create a table with an hll column to store HyperLogLog sketches:

create table agg (id int primary key, userids hll);

Cast an integer to hll_hashval to prepare it for aggregation:

select 1::hll_hashval;

Operators

hll operators

The hll type supports the following operators:

OperatorDescriptionExample
=Equalityselect hll_add_agg(1::hll_hashval) = hll_add_agg(2::hll_hashval);
!=, <>Inequality
||Merges two hll sketches into oneselect hll_add_agg(1::hll_hashval) || hll_add_agg(2::hll_hashval);
#Returns the estimated cardinality (distinct count) of an hll sketchselect #hll_add_agg(1::hll_hashval);

hll_hashval operators

The hll_hashval type supports =, !=, and <>:

select 1::hll_hashval = 2::hll_hashval;
select 1::hll_hashval <> 2::hll_hashval;

Functions

Hash functions

Hash functions convert raw values to hll_hashval. Use the function that matches your column type:

FunctionInput typeExample
hll_hash_booleanBOOLEANselect hll_hash_boolean(true);
hll_hash_smallintSMALLINT
hll_hash_integerINTEGERselect hll_hash_integer(1);
hll_hash_bigintBIGINT

Aggregation functions

Two aggregation functions build and combine hll sketches. The naming follows a consistent pattern: each function name embeds what it does — hll_add_agg adds values in an aggregate, and hll_union unions two sketches.

  • hll_add_agg — an aggregate function that adds raw hll_hashval values into an empty hll sketch. Use this to build an hll sketch from a table or result set.

    select hll_add_agg(1::hll_hashval);
  • hll_union — merges two existing hll sketches into one. Use this to combine hll values already stored in a table.

    select hll_union(hll_add_agg(1::hll_hashval), hll_add_agg(2::hll_hashval));

Utility functions

  • hll_set_defaults — sets the precision parameters for hll sketches. Higher precision reduces estimation error at the cost of more memory.

    select hll_set_defaults(15, 5, -1, 1);
  • hll_print — prints debugging information about an hll sketch.

    select hll_print(hll_add_agg(1::hll_hashval));

Example: estimate daily unique visitors

This example shows the full workflow — hash and aggregate raw user IDs into daily hll sketches, store them in a table, then query the estimated UV count per day.

Step 1: Create a table to store daily hll sketches.

create table access_date (acc_date date unique, userids hll);

Step 2: Aggregate user IDs into hll sketches and insert by date.

insert into access_date select current_date, hll_add_agg(hll_hash_integer(user_id)) from generate_series(1,10000) t(user_id);
insert into access_date select current_date-1, hll_add_agg(hll_hash_integer(user_id)) from generate_series(5000,20000) t(user_id);
insert into access_date select current_date-2, hll_add_agg(hll_hash_integer(user_id)) from generate_series(9000,40000) t(user_id);

Step 3: Query the estimated UV count per day using the # operator.

postgres=# select #userids from access_date where acc_date=current_date;
     ?column?
------------------
 9725.85273370708
(1 row)

postgres=# select #userids from access_date where acc_date=current_date-1;
     ?column?
------------------
 14968.6596883279
(1 row)

postgres=# select #userids from access_date where acc_date=current_date-2;
     ?column?
------------------
 29361.5209149911
(1 row)

The results are estimates. The # operator returns a floating-point cardinality value, not an exact distinct count. To adjust precision, use hll_set_defaults before inserting data.

See also