All Products
Search
Document Center

Hologres:Calculate UV with dynamic table and roaring bitmaps

Last Updated:Mar 26, 2026

RoaringBitmap pre-aggregates unique visitor (UV) counts into compressed bitmaps during each incremental refresh, reducing query-time work to a cardinality calculation over bitmaps. Combined with Dynamic Table's incremental refresh—which processes only newly arrived rows, requires no manual scheduling, and handles any time-range query—this approach delivers sub-second UV calculations on hundreds of millions of records under high-QPS, low-latency workloads.

Choose your implementation based on the UID field type:

UID typeMethod
intMethod 1: RoaringBitmap UV for INT fields
textMethod 2: RoaringBitmap UV for TEXT fields
RoaringBitmap operates on integer inputs. For int UID fields, no conversion is needed. For text UID fields (Method 2), a mapping table converts text UIDs to int4 values before aggregation.

When to use this approach

Use RoaringBitmap with Dynamic Table when:

  • Data volume is in the hundreds of millions of records or more

  • Queries require high QPS with low latency

  • UV calculations need to span arbitrary time ranges

Prerequisites

Before you begin, ensure that you have:

  • A Hologres instance with the required permissions to create tables and Dynamic Tables

  • The RoaringBitmap extension installed

Install the extension before creating any tables or Dynamic Tables:

CREATE EXTENSION IF NOT EXISTS roaringbitmap;

Method 1: RoaringBitmap UV for INT fields

Use this method when the UID field is of type int. It supports intersection, union, and difference operations on user groups based on any tag or property.

How it works

  1. Store granular event data in a detail table partitioned by day.

  2. A Dynamic Table reads the detail table incrementally, groups rows by business dimensions, and aggregates UIDs into a RoaringBitmap per group using RB_BUILD_AGG. Each row in the Dynamic Table holds a bitmap representing all distinct UIDs for that dimension combination and date.

  3. At query time, RB_OR_AGG merges bitmaps across the requested date range—deduplicating UIDs across days—and RB_CARDINALITY returns the UV count.

Create the detail table

DROP TABLE IF EXISTS ods_app_detail;
BEGIN;
CREATE TABLE IF NOT EXISTS ods_app_detail (
  uid     int,
  country text,
  prov    text,
  city    text,
  ymd     text NOT NULL
)
LOGICAL PARTITION BY LIST (ymd);
CALL set_table_property('ods_app_detail', 'orientation',        'column');
CALL set_table_property('ods_app_detail', 'bitmap_columns',     'country,prov,city,ymd');
CALL set_table_property('ods_app_detail', 'distribution_key',   'uid');
CALL set_table_property('ods_app_detail', 'clustering_key',     'ymd');
CALL set_table_property('ods_app_detail', 'event_time_column',  'ymd');
COMMIT;

The table is partitioned by ymd (format YYYYMMDD) for efficient partition pruning at query time. The bitmap_columns property accelerates filtering on country, prov, city, and ymd.

Create the Dynamic Table

CREATE DYNAMIC TABLE dt_dws_app_rb
LOGICAL PARTITION BY LIST(ymd)
WITH (
  freshness                          = '5 minutes',
  auto_refresh_mode                  = 'incremental',
  auto_refresh_partition_active_time = '1 days',
  partition_key_time_format          = 'YYYYMMDD'
)
AS
SELECT
  RB_BUILD_AGG(uid) AS rb_uid,  -- aggregates all UIDs in the group into a RoaringBitmap
  country,
  prov,
  city,
  ymd,
  COUNT(1) AS pv
FROM ods_app_detail
GROUP BY country, prov, city, ymd;

The Dynamic Table refreshes only the active partition (within the last day) every 5 minutes, processing only newly arrived rows. After each refresh, each row in dt_dws_app_rb contains a rb_uid bitmap for a specific (country, prov, city, ymd) combination—representing all distinct UIDs that appeared on that date with those dimension values. For example, the row for (US, California, San Francisco, 20251223) holds a single bitmap with every de-duplicated UID that visited on that date.

Refresh historical partitions

New partitions refresh automatically. For historical partitions outside the active window, run a full refresh manually. For example, to backfill December 1, 2025:

REFRESH DYNAMIC TABLE public.dt_dws_app_rb PARTITION(20251201) WITH (refresh_mode = 'full');

Query UV over any time range

Both queries use RB_OR_AGG to union bitmaps across rows—deduplicating UIDs across dimension groups and dates—then RB_CARDINALITY to count distinct UIDs.

-- UV and PV for a single day
SELECT
  RB_CARDINALITY(RB_OR_AGG(rb_uid)) AS uv,  -- union bitmaps across rows, then count distinct UIDs
  country,
  prov,
  city,
  SUM(pv) AS pv
FROM dt_dws_app_rb
WHERE ymd = '20251223'
GROUP BY country, prov, city;

-- UV and PV for a month
SELECT
  RB_CARDINALITY(RB_OR_AGG(rb_uid)) AS uv,
  country,
  prov,
  city,
  SUM(pv) AS pv
FROM dt_dws_app_rb
WHERE ymd >= '20251201' AND ymd <= '20251230'
GROUP BY country, prov, city;

Method 2: RoaringBitmap UV for TEXT fields

Use this method when the UID field is of type text. Because RoaringBitmap requires integer inputs, a user mapping table stores the text-to-int4 mapping. The hg_id_encoding_int4 function handles this conversion automatically inside the Dynamic Table—looking up or inserting each text UID in the mapping table and returning its integer equivalent—so you do not need to manage the dictionary manually or run separate join jobs.

Create the user mapping table

BEGIN;
CREATE TABLE uid_mapping (
  uid       text NOT NULL,
  uid_int32 serial,
  PRIMARY KEY (uid)
);
CALL set_table_property('uid_mapping', 'clustering_key',   'uid');
CALL set_table_property('uid_mapping', 'distribution_key', 'uid');
CALL set_table_property('uid_mapping', 'orientation',      'row');
COMMIT;

The mapping table uses row store for fast point lookups by uid. The uid_int32 serial column auto-assigns a unique integer to each new text UID.

Create the Dynamic Table

CREATE DYNAMIC TABLE dt_dws_app_rb
WITH (
  freshness                          = '5 minutes',
  auto_refresh_mode                  = 'incremental',
  auto_refresh_partition_active_time = '1 days',
  partition_key_time_format          = 'YYYYMMDD'
)
LOGICAL PARTITION BY LIST (ymd)
AS
SELECT
  country,
  prov,
  city,
  RB_BUILD_AGG(uid_int4) AS uid_rb,  -- aggregates mapped integer UIDs into a RoaringBitmap
  ymd
FROM (
  SELECT
    country,
    prov,
    city,
    hg_id_encoding_int4(uid, 'uid_mapping') AS uid_int4,  -- converts text UID to int4, writes new entries to uid_mapping
    ymd
  FROM ods_app_detail
) a
GROUP BY country, prov, city, ymd;

The subquery calls hg_id_encoding_int4(uid, 'uid_mapping') to look up or insert each text UID in uid_mapping, returning its integer equivalent. The outer query aggregates those integers into bitmaps per dimension group.

Query UV over any time range

SELECT
  country,
  prov,
  city,
  RB_CARDINALITY(RB_OR_AGG(uid_rb)) AS uv  -- union bitmaps across rows, then count distinct UIDs
FROM dt_dws_app_rb
WHERE ymd = '20251223'
GROUP BY country, prov, city;

Extend the WHERE clause to any date range, the same as Method 1.

What's next