All Products
Search
Document Center

Hologres:BSI functions

Last Updated:Mar 26, 2026

Bit-sliced index (BSI) functions are extension functions in Hologres that compress key-value data into a bitmap-based structure for high-performance aggregation and filtering on large user segments.

How it works

A BSI compresses key-value pairs in cid::int, value::bigint format. Each decimal value is converted to binary, and the binary data is traversed from the low bit to the high bit. For each bit position, the roaring bitmap of all cid values whose bit is 1 is recorded as a slice, forming the BSI structure.

The following sample data illustrates the principle:

cidValue (decimal)Value (binary)
130011
260110
340100
4101010
570111
6NULL-

The maximum decimal value is 10, which requires 4 bits. The BSI generated from this data contains four slices:

  • slice 0: roaringbitmap {1,5}

  • slice 1: roaringbitmap {1,2,4,5}

  • slice 2: roaringbitmap {2,3,5}

  • slice 3: roaringbitmap {4}

Each BSI also stores the following internal metadata:

FieldDescriptionValue in the example
EBM (existence bitmap)Roaring bitmap of all non-NULL cid values{1,2,3,4,5}
maxMaximum value10
minMinimum value3
bitCountNumber of bits used for binary representation4

Example calculations for a selected crowd `{1,3,5}`:

  • Sum: rb_and_cardinality(crowd, slice0) × 2<sup>0</sup> + ... + rb_and_cardinality(crowd, slice3) × 2<sup>3</sup> = 14

  • Top 2 values:

    crowd & slice3 = NULL
    crowd & slice2 = {3,5}  -- top 2 values are {3,5}
    {3,5} & slice1 = {5}    -- top 1 value is {5}

BSI combines roaring bitmap set operations with value calculations, which makes it efficient for user profile analysis scenarios that use both attribute tags and behavior tags. For a practical example, see BSI (beta).

Limitations

  • Version requirement: Hologres V2.1 and later. To upgrade an earlier instance, see Upgrade instances.

  • Value range: Each value in a BSI must be an integer from 1 to 2<sup>31</sup> − 1.

  • Extensions: A superuser must create two extensions at the database level before using BSI. Create each extension once per database.

    Important

    Do not use DROP EXTENSION <extension_name> CASCADE;. The CASCADE option drops not only the extension but also all dependent objects and their data, including PostGIS data, roaring bitmap data, Proxima data, binary log data, BSI data, metadata, tables, views, and server data.

    -- Create extensions. Create roaring bitmaps before creating BSI.
    CREATE EXTENSION roaringbitmap;
    CREATE EXTENSION bsi;
    
    -- Drop the extensions.
    DROP EXTENSION bsi;
    DROP EXTENSION roaringbitmap;

Functions

BSI constructors

bsi_build

Builds a BSI from two parallel one-dimensional arrays.

Syntax

bsi_build(cids integer[], values bigint[]) → bsi

Arguments

ArgumentTypeDescription
cidsinteger[]Array of key values (cid)
valuesbigint[]Array of corresponding values, aligned with cids

Returns: bsi

Example

SELECT bsi_iterate(bsi_build('{1,2,3}', '{2,4,6}'));

Output:

{1,2}
{2,4}
{3,6}

bsi_add_value

Adds a single key-value pair to an existing BSI.

Syntax

bsi_add_value(b bsi, cid integer, value bigint) → bsi

Arguments

ArgumentTypeDescription
bbsiThe existing BSI
cidintegerThe key to add
valuebigintThe value associated with cid

Returns: bsi

Example

SELECT bsi_iterate(bsi_add_value(bsi_build('{1,2,3}', '{2,4,6}'), 4, 8));

Output:

{1,2}
{2,4}
{3,6}
{4,8}

BSI expansion functions

bsi_iterate

Expands a BSI into its key-value pairs, returning one row per pair.

Syntax

bsi_iterate(b bsi) → set of integer[]

Arguments

ArgumentTypeDescription
bbsiThe BSI to expand

Returns: set of integer[] — each row is {cid, value}

Example

SELECT bsi_iterate(bsi_build('{1,2,3}', '{2,4,6}'));

Output:

{1,2}
{2,4}
{3,6}

bsi_show

Returns the first N key-value pairs from a BSI as a formatted text string.

Syntax

bsi_show(b bsi, n integer) → text

Arguments

ArgumentTypeDescription
bbsi or byteaThe BSI to inspect
nintegerNumber of key-value pairs to display

Returns: text — the first N pairs followed by the count of remaining pairs

Example

SELECT bsi_show(bsi_build('{1,2,3}', '{2,4,6}'), 2);

Output:

1=2,2=4...left 1

BSI query functions

The comparison functions (bsi_eq, bsi_ge, bsi_gt, bsi_le, bsi_lt, bsi_neq, bsi_range) all support an optional bytea filter parameter. When provided, the function first intersects the BSI's EBM with the filter bitmap, then performs the comparison on the resulting subset.

bsi_ebm

Returns the existence bitmap (EBM) of a BSI — the roaring bitmap of all non-NULL cid values.

Syntax

bsi_ebm(b bsi) → roaringbitmap

Example

SELECT rb_to_array(bsi_ebm(bsi_build('{1,2,3}', '{2,4,6}')));

Output: {1,2,3}

bsi_filter

Returns a new BSI containing only the cid values that appear in both the BSI's EBM and the given bitmap.

Syntax

bsi_filter(b bsi, crowd bytea) → bsi

Arguments

ArgumentTypeDescription
bbsi or byteaThe source BSI
crowdbyteaA roaring bitmap serialized to bytea; used to filter the EBM

Returns: bsi

Example

SELECT bsi_iterate(bsi_filter(bsi_build('{1,2,3}', '{2,4,6}'), rb_build('{1,2}')));

Output:

{1,2}
{2,4}

bsi_eq

Returns the cid values whose BSI value equals the given threshold.

Syntax

bsi_eq(b bsi, threshold bigint [, crowd bytea]) → roaringbitmap

Example

SELECT rb_to_array(bsi_eq(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 4));

Output: {2,3}

bsi_neq

Returns the cid values whose BSI value is not equal to the given threshold.

Syntax

bsi_neq(b bsi, threshold bigint [, crowd bytea]) → roaringbitmap

Example

SELECT rb_to_array(bsi_neq(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 4));

Output: {1,4}

bsi_lt

Returns the cid values whose BSI value is less than the given threshold.

Syntax

bsi_lt(b bsi, threshold bigint [, crowd bytea]) → roaringbitmap

Example

SELECT rb_to_array(bsi_lt(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 4));

Output: {1}

bsi_le

Returns the cid values whose BSI value is less than or equal to the given threshold.

Syntax

bsi_le(b bsi, threshold bigint [, crowd bytea]) → roaringbitmap

Example

SELECT rb_to_array(bsi_le(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 4));

Output: {1,2,3}

bsi_gt

Returns the cid values whose BSI value is greater than the given threshold.

Syntax

bsi_gt(b bsi, threshold bigint [, crowd bytea]) → roaringbitmap

Example

SELECT rb_to_array(bsi_gt(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 4));

Output: {4}

bsi_ge

Returns the cid values whose BSI value is greater than or equal to the given threshold.

Syntax

bsi_ge(b bsi, threshold bigint [, crowd bytea]) → roaringbitmap

Example

SELECT rb_to_array(bsi_ge(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 4));

Output: {2,3,4}

bsi_range

Returns the cid values whose BSI value falls between the two specified threshold values.

Syntax

bsi_range(b bsi, lower bigint, upper bigint [, crowd bytea]) → roaringbitmap

Arguments

ArgumentTypeDescription
bbsiThe source BSI
lowerbigintLower bound of the range
upperbigintUpper bound of the range
crowdbytea(Optional) Serialized roaring bitmap to pre-filter the EBM

Example

SELECT rb_to_array(bsi_range(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 3, 5));

Output: {2,3}

bsi_compare

A unified comparison function that supports all comparison types through a single entry point.

Syntax

bsi_compare(op text, b bsi [, crowd bytea], val1 bigint [, val2 bigint]) → roaringbitmap

Arguments

ArgumentTypeDescription
optextComparison type: LT, LE, GT, GE, EQ, NEQ, or RANGE
bbsiThe source BSI
crowdbytea(Optional) Serialized roaring bitmap to pre-filter the EBM
val1bigintThe comparison value (lower bound for RANGE)
val2bigint(Required for RANGE only) The upper bound

Example

SELECT rb_to_array(bsi_compare('RANGE', bsi_build('{1,2,3,4}', '{2,4,4,8}'), 3, 5));

Output: {2,3}

BSI aggregate and analytic functions

bsi_sum

Returns the total sum of BSI values and the count of matching cid values (EBM cardinality) as a two-element array.

Syntax

bsi_sum(b bsi [, crowd bytea]) → bigint[]

Returns: bigint[]{sum, cardinality}

Example

SELECT bsi_sum(bsi_build('{1,2,3,4}', '{2,4,6,8}'));

Output: {20,4}

bsi_topk

Returns the cid values that have the top K largest BSI values.

Syntax

bsi_topk(b bsi [, crowd bytea], k integer) → roaringbitmap

Arguments

ArgumentTypeDescription
bbsi or byteaThe source BSI
crowdbytea(Optional) Serialized roaring bitmap to pre-filter the EBM
kintegerNumber of top values to return

Example

SELECT rb_to_array(bsi_topk(bsi_build('{1,2,3,4,5}', '{2,4,6,8,10}'), 3));

Output: {3,4,5}

bsi_stat

Returns the value distribution of a BSI across user-defined intervals.

Syntax

bsi_stat(boundaries bigint[], b bsi [, crowd bytea]) → text

Arguments

ArgumentTypeDescription
boundariesbigint[]Array of boundary values that define the distribution intervals
bbsi or byteaThe source BSI
crowdbytea(Optional) Serialized roaring bitmap to pre-filter the EBM

Returns: text — interval ranges with counts, in the format (lower,upper]=count

Example

SELECT bsi_stat('{1,3,5}', bsi_build('{1,2,3,4}', '{2,4,6,8}'));

Output: (0,1]=0;(1,3]=1;(3,5]=1;(5,8]=2

bsi_add

Adds the values of two BSIs for cid values they share in their EBMs, and returns a new BSI.

Syntax

bsi_add(b1 bsi, b2 bsi) → bsi

Arguments

ArgumentTypeDescription
b1bsiFirst BSI
b2bsiSecond BSI

Example

SELECT bsi_iterate(bsi_add(bsi_build('{1,2,3}', '{2,4,6}'), bsi_build('{1,2}', '{2,4}')));

Output:

{1,4}
{2,8}
{3,6}

bsi_add_agg

Aggregate function that sums BSI values across multiple rows.

Syntax

bsi_add_agg(b bsi) → bsi

Example

SELECT bsi_iterate(bsi_add_agg(bsi_build('{1,2,3}', '{2,4,6}')));

Output:

{1,2}
{2,4}
{3,6}

bsi_merge

Merges two BSIs whose EBMs have no overlap. Use this when combining disjoint data partitions.

Syntax

bsi_merge(b1 bsi, b2 bsi) → bsi

Example

SELECT bsi_iterate(bsi_merge(bsi_build('{1,2}', '{2,4}'), bsi_build('{3,4}', '{6,8}')));

Output:

{1,2}
{2,4}
{3,6}
{4,8}

bsi_merge_agg

Aggregate function that merges multiple BSI rows into one. The input BSIs must have non-overlapping EBMs.

Syntax

bsi_merge_agg(b bsi) → bsi

Example

SELECT bsi_iterate(bsi_merge_agg(bsi_build('{1,2,3}', '{2,4,6}')));

Output:

{1,2}
{2,4}
{3,6}

bsi_transpose

Returns the deduplicated set of distinct values in the BSI as a roaring bitmap.

Syntax

bsi_transpose(b bsi [, crowd bytea]) → roaringbitmap

Example

SELECT rb_to_array(bsi_transpose(bsi_build('{1,2,3,4,5}', '{2,4,4,8,8}')));

Output: {2,4,8}

bsi_transpose_with_count

Returns a new BSI where each distinct value is mapped to the number of cid entries that hold that value.

Syntax

bsi_transpose_with_count(b bsi [, crowd bytea]) → bsi

Example

SELECT bsi_iterate(bsi_transpose_with_count(bsi_build('{1,2,3,4,5}', '{2,4,4,8,8}')));

Output:

{2,1}
{4,2}
{8,2}

The output {value, count} means: value 2 appears once, value 4 appears twice, and value 8 appears twice.

What's next

  • BSI (beta) — end-to-end example of user profile analysis using BSI functions