Bit-sliced index (BSI) functions are extension functions in Hologres that compress key-value data into a bitmap-based structure for high-performance aggregation and filtering on large user segments.
How it works
A BSI compresses key-value pairs in cid::int, value::bigint format. Each decimal value is converted to binary, and the binary data is traversed from the low bit to the high bit. For each bit position, the roaring bitmap of all cid values whose bit is 1 is recorded as a slice, forming the BSI structure.
The following sample data illustrates the principle:
| cid | Value (decimal) | Value (binary) |
|---|---|---|
| 1 | 3 | 0011 |
| 2 | 6 | 0110 |
| 3 | 4 | 0100 |
| 4 | 10 | 1010 |
| 5 | 7 | 0111 |
| 6 | NULL | - |
The maximum decimal value is 10, which requires 4 bits. The BSI generated from this data contains four slices:
slice 0: roaringbitmap
{1,5}slice 1: roaringbitmap
{1,2,4,5}slice 2: roaringbitmap
{2,3,5}slice 3: roaringbitmap
{4}
Each BSI also stores the following internal metadata:
| Field | Description | Value in the example |
|---|---|---|
EBM (existence bitmap) | Roaring bitmap of all non-NULL cid values | {1,2,3,4,5} |
max | Maximum value | 10 |
min | Minimum value | 3 |
bitCount | Number of bits used for binary representation | 4 |
Example calculations for a selected crowd `{1,3,5}`:
Sum:
rb_and_cardinality(crowd, slice0) × 2<sup>0</sup>+ ... + rb_and_cardinality(crowd, slice3) × 2<sup>3</sup>= 14Top 2 values:
crowd & slice3 = NULL crowd & slice2 = {3,5} -- top 2 values are {3,5} {3,5} & slice1 = {5} -- top 1 value is {5}
BSI combines roaring bitmap set operations with value calculations, which makes it efficient for user profile analysis scenarios that use both attribute tags and behavior tags. For a practical example, see BSI (beta).
Limitations
Version requirement: Hologres V2.1 and later. To upgrade an earlier instance, see Upgrade instances.
Value range: Each value in a BSI must be an integer from 1 to 2<sup>31</sup> − 1.
Extensions: A superuser must create two extensions at the database level before using BSI. Create each extension once per database.
ImportantDo not use
DROP EXTENSION <extension_name> CASCADE;. TheCASCADEoption drops not only the extension but also all dependent objects and their data, including PostGIS data, roaring bitmap data, Proxima data, binary log data, BSI data, metadata, tables, views, and server data.-- Create extensions. Create roaring bitmaps before creating BSI. CREATE EXTENSION roaringbitmap; CREATE EXTENSION bsi; -- Drop the extensions. DROP EXTENSION bsi; DROP EXTENSION roaringbitmap;
Functions
BSI constructors
bsi_build
Builds a BSI from two parallel one-dimensional arrays.
Syntax
bsi_build(cids integer[], values bigint[]) → bsiArguments
| Argument | Type | Description |
|---|---|---|
cids | integer[] | Array of key values (cid) |
values | bigint[] | Array of corresponding values, aligned with cids |
Returns: bsi
Example
SELECT bsi_iterate(bsi_build('{1,2,3}', '{2,4,6}'));Output:
{1,2}
{2,4}
{3,6}bsi_add_value
Adds a single key-value pair to an existing BSI.
Syntax
bsi_add_value(b bsi, cid integer, value bigint) → bsiArguments
| Argument | Type | Description |
|---|---|---|
b | bsi | The existing BSI |
cid | integer | The key to add |
value | bigint | The value associated with cid |
Returns: bsi
Example
SELECT bsi_iterate(bsi_add_value(bsi_build('{1,2,3}', '{2,4,6}'), 4, 8));Output:
{1,2}
{2,4}
{3,6}
{4,8}BSI expansion functions
bsi_iterate
Expands a BSI into its key-value pairs, returning one row per pair.
Syntax
bsi_iterate(b bsi) → set of integer[]Arguments
| Argument | Type | Description |
|---|---|---|
b | bsi | The BSI to expand |
Returns: set of integer[] — each row is {cid, value}
Example
SELECT bsi_iterate(bsi_build('{1,2,3}', '{2,4,6}'));Output:
{1,2}
{2,4}
{3,6}bsi_show
Returns the first N key-value pairs from a BSI as a formatted text string.
Syntax
bsi_show(b bsi, n integer) → textArguments
| Argument | Type | Description |
|---|---|---|
b | bsi or bytea | The BSI to inspect |
n | integer | Number of key-value pairs to display |
Returns: text — the first N pairs followed by the count of remaining pairs
Example
SELECT bsi_show(bsi_build('{1,2,3}', '{2,4,6}'), 2);Output:
1=2,2=4...left 1BSI query functions
The comparison functions (bsi_eq, bsi_ge, bsi_gt, bsi_le, bsi_lt, bsi_neq, bsi_range) all support an optional bytea filter parameter. When provided, the function first intersects the BSI's EBM with the filter bitmap, then performs the comparison on the resulting subset.
bsi_ebm
Returns the existence bitmap (EBM) of a BSI — the roaring bitmap of all non-NULL cid values.
Syntax
bsi_ebm(b bsi) → roaringbitmapExample
SELECT rb_to_array(bsi_ebm(bsi_build('{1,2,3}', '{2,4,6}')));Output: {1,2,3}
bsi_filter
Returns a new BSI containing only the cid values that appear in both the BSI's EBM and the given bitmap.
Syntax
bsi_filter(b bsi, crowd bytea) → bsiArguments
| Argument | Type | Description |
|---|---|---|
b | bsi or bytea | The source BSI |
crowd | bytea | A roaring bitmap serialized to bytea; used to filter the EBM |
Returns: bsi
Example
SELECT bsi_iterate(bsi_filter(bsi_build('{1,2,3}', '{2,4,6}'), rb_build('{1,2}')));Output:
{1,2}
{2,4}bsi_eq
Returns the cid values whose BSI value equals the given threshold.
Syntax
bsi_eq(b bsi, threshold bigint [, crowd bytea]) → roaringbitmapExample
SELECT rb_to_array(bsi_eq(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 4));Output: {2,3}
bsi_neq
Returns the cid values whose BSI value is not equal to the given threshold.
Syntax
bsi_neq(b bsi, threshold bigint [, crowd bytea]) → roaringbitmapExample
SELECT rb_to_array(bsi_neq(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 4));Output: {1,4}
bsi_lt
Returns the cid values whose BSI value is less than the given threshold.
Syntax
bsi_lt(b bsi, threshold bigint [, crowd bytea]) → roaringbitmapExample
SELECT rb_to_array(bsi_lt(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 4));Output: {1}
bsi_le
Returns the cid values whose BSI value is less than or equal to the given threshold.
Syntax
bsi_le(b bsi, threshold bigint [, crowd bytea]) → roaringbitmapExample
SELECT rb_to_array(bsi_le(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 4));Output: {1,2,3}
bsi_gt
Returns the cid values whose BSI value is greater than the given threshold.
Syntax
bsi_gt(b bsi, threshold bigint [, crowd bytea]) → roaringbitmapExample
SELECT rb_to_array(bsi_gt(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 4));Output: {4}
bsi_ge
Returns the cid values whose BSI value is greater than or equal to the given threshold.
Syntax
bsi_ge(b bsi, threshold bigint [, crowd bytea]) → roaringbitmapExample
SELECT rb_to_array(bsi_ge(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 4));Output: {2,3,4}
bsi_range
Returns the cid values whose BSI value falls between the two specified threshold values.
Syntax
bsi_range(b bsi, lower bigint, upper bigint [, crowd bytea]) → roaringbitmapArguments
| Argument | Type | Description |
|---|---|---|
b | bsi | The source BSI |
lower | bigint | Lower bound of the range |
upper | bigint | Upper bound of the range |
crowd | bytea | (Optional) Serialized roaring bitmap to pre-filter the EBM |
Example
SELECT rb_to_array(bsi_range(bsi_build('{1,2,3,4}', '{2,4,4,8}'), 3, 5));Output: {2,3}
bsi_compare
A unified comparison function that supports all comparison types through a single entry point.
Syntax
bsi_compare(op text, b bsi [, crowd bytea], val1 bigint [, val2 bigint]) → roaringbitmapArguments
| Argument | Type | Description |
|---|---|---|
op | text | Comparison type: LT, LE, GT, GE, EQ, NEQ, or RANGE |
b | bsi | The source BSI |
crowd | bytea | (Optional) Serialized roaring bitmap to pre-filter the EBM |
val1 | bigint | The comparison value (lower bound for RANGE) |
val2 | bigint | (Required for RANGE only) The upper bound |
Example
SELECT rb_to_array(bsi_compare('RANGE', bsi_build('{1,2,3,4}', '{2,4,4,8}'), 3, 5));Output: {2,3}
BSI aggregate and analytic functions
bsi_sum
Returns the total sum of BSI values and the count of matching cid values (EBM cardinality) as a two-element array.
Syntax
bsi_sum(b bsi [, crowd bytea]) → bigint[]Returns: bigint[] — {sum, cardinality}
Example
SELECT bsi_sum(bsi_build('{1,2,3,4}', '{2,4,6,8}'));Output: {20,4}
bsi_topk
Returns the cid values that have the top K largest BSI values.
Syntax
bsi_topk(b bsi [, crowd bytea], k integer) → roaringbitmapArguments
| Argument | Type | Description |
|---|---|---|
b | bsi or bytea | The source BSI |
crowd | bytea | (Optional) Serialized roaring bitmap to pre-filter the EBM |
k | integer | Number of top values to return |
Example
SELECT rb_to_array(bsi_topk(bsi_build('{1,2,3,4,5}', '{2,4,6,8,10}'), 3));Output: {3,4,5}
bsi_stat
Returns the value distribution of a BSI across user-defined intervals.
Syntax
bsi_stat(boundaries bigint[], b bsi [, crowd bytea]) → textArguments
| Argument | Type | Description |
|---|---|---|
boundaries | bigint[] | Array of boundary values that define the distribution intervals |
b | bsi or bytea | The source BSI |
crowd | bytea | (Optional) Serialized roaring bitmap to pre-filter the EBM |
Returns: text — interval ranges with counts, in the format (lower,upper]=count
Example
SELECT bsi_stat('{1,3,5}', bsi_build('{1,2,3,4}', '{2,4,6,8}'));Output: (0,1]=0;(1,3]=1;(3,5]=1;(5,8]=2
bsi_add
Adds the values of two BSIs for cid values they share in their EBMs, and returns a new BSI.
Syntax
bsi_add(b1 bsi, b2 bsi) → bsiArguments
| Argument | Type | Description |
|---|---|---|
b1 | bsi | First BSI |
b2 | bsi | Second BSI |
Example
SELECT bsi_iterate(bsi_add(bsi_build('{1,2,3}', '{2,4,6}'), bsi_build('{1,2}', '{2,4}')));Output:
{1,4}
{2,8}
{3,6}bsi_add_agg
Aggregate function that sums BSI values across multiple rows.
Syntax
bsi_add_agg(b bsi) → bsiExample
SELECT bsi_iterate(bsi_add_agg(bsi_build('{1,2,3}', '{2,4,6}')));Output:
{1,2}
{2,4}
{3,6}bsi_merge
Merges two BSIs whose EBMs have no overlap. Use this when combining disjoint data partitions.
Syntax
bsi_merge(b1 bsi, b2 bsi) → bsiExample
SELECT bsi_iterate(bsi_merge(bsi_build('{1,2}', '{2,4}'), bsi_build('{3,4}', '{6,8}')));Output:
{1,2}
{2,4}
{3,6}
{4,8}bsi_merge_agg
Aggregate function that merges multiple BSI rows into one. The input BSIs must have non-overlapping EBMs.
Syntax
bsi_merge_agg(b bsi) → bsiExample
SELECT bsi_iterate(bsi_merge_agg(bsi_build('{1,2,3}', '{2,4,6}')));Output:
{1,2}
{2,4}
{3,6}bsi_transpose
Returns the deduplicated set of distinct values in the BSI as a roaring bitmap.
Syntax
bsi_transpose(b bsi [, crowd bytea]) → roaringbitmapExample
SELECT rb_to_array(bsi_transpose(bsi_build('{1,2,3,4,5}', '{2,4,4,8,8}')));Output: {2,4,8}
bsi_transpose_with_count
Returns a new BSI where each distinct value is mapped to the number of cid entries that hold that value.
Syntax
bsi_transpose_with_count(b bsi [, crowd bytea]) → bsiExample
SELECT bsi_iterate(bsi_transpose_with_count(bsi_build('{1,2,3,4,5}', '{2,4,4,8,8}')));Output:
{2,1}
{4,2}
{8,2}The output {value, count} means: value 2 appears once, value 4 appears twice, and value 8 appears twice.
What's next
BSI (beta) — end-to-end example of user profile analysis using BSI functions