All Products
Search
Document Center

AnalyticDB:User segmentation functions (Roaring Bitmap)

Last Updated:Mar 30, 2026

Roaring Bitmap is a compressed bitmap format optimized for high-performance set operations. Use it in AnalyticDB for MySQL for deduplication, tag-based user filtering, and time-series calculations over large datasets.

When to use Roaring Bitmap

Roaring Bitmap outperforms standard COUNT(DISTINCT ...) queries when each group in the bitmap contains fewer than 100 million entries. For larger UID spaces, increase the number of groups accordingly (for example, a UID space of 10 billion requires at least 100 groups).

Version requirements

Table type Minimum version
OSS external tables 3.1.6.4
Internal tables 3.2.1.0

To view or update your cluster version, go to the Configuration Information section on the Cluster Information page in the AnalyticDB for MySQL console.

Limitations

  • No direct SELECT on ROARINGBITMAP columns. Use UNNEST to view the elements:

    SELECT * FROM unnest(RB_BUILD(ARRAY[1,2,3]));
  • Clusters earlier than V3.2.1.0 support ROARINGBITMAP only in OSS external tables. To use bitmap operations on internal tables in those versions, store bitmaps as VARBINARY and convert at query time using RB_BUILD_VARBINARY:

    -- Define an internal table with VARBINARY
    CREATE TABLE test_rb_cstore (id INT, rb VARBINARY);
    
    -- Query using bitmap functions
    SELECT RB_CARDINALITY(RB_BUILD_VARBINARY(rb)) FROM test_rb_cstore;

Build a Roaring Bitmap

Three functions construct a Roaring Bitmap, each for a different input source:

Function Input Use when
RB_BUILD(array) Integer array Building a bitmap from a literal array
RB_BUILD_AGG(integer) Integer (aggregate) Aggregating row-level integers into a bitmap
RB_BUILD_VARBINARY(varbinary) VARBINARY Reading bitmaps stored as VARBINARY in internal tables

Functions

Scalar functions

Function Input type Output type Description Example
RB_BUILD INT ROARING BITMAP Builds a Roaring Bitmap from an integer array. RB_BUILD(ARRAY[1,2,3])
RB_BUILD_RANGE INT, INT ROARING BITMAP Builds a Roaring Bitmap from a range: start (inclusive) to end (exclusive). RB_BUILD_RANGE(0, 10000)
RB_BUILD_VARBINARY VARBINARY ROARING BITMAP Builds a Roaring Bitmap from VARBINARY data. RB_BUILD_VARBINARY(RB_TO_VARBINARY(RB_BUILD(ARRAY[1,2,3])))
RB_CARDINALITY ROARING BITMAP BIGINT Returns the number of elements in the bitmap. RB_CARDINALITY(RB_BUILD(ARRAY[1,2,3]))
RB_CONTAINS ROARING BITMAP, INT BOOLEAN Returns true if the bitmap contains the specified integer. RB_CONTAINS(RB_BUILD(ARRAY[1,2,3]), 3)
RB_CONTAINS ROARING BITMAP, ROARING BITMAP BOOLEAN Returns true if the first bitmap contains all elements of the second. RB_CONTAINS(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[3]))
RB_AND ROARING BITMAP, ROARING BITMAP ROARING BITMAP Returns the intersection (AND) of two bitmaps. RB_AND(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[2,3,4]))
RB_OR ROARING BITMAP, ROARING BITMAP ROARING BITMAP Returns the union (OR) of two bitmaps. RB_OR(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[2,3,4]))
RB_XOR ROARING BITMAP, ROARING BITMAP ROARING BITMAP Returns the exclusive OR (XOR) of two bitmaps. RB_XOR(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[2,3,4]))
RB_AND_NULL2EMPTY ROARING BITMAP, ROARING BITMAP ROARING BITMAP AND operation with null-safe handling: if one input is NULL, the other is used; if one input is {}, the result is {}. RB_AND_NULL2EMPTY(RB_BUILD(null), RB_BUILD(ARRAY[3,4,5]))
RB_OR_NULL2EMPTY ROARING BITMAP, ROARING BITMAP ROARING BITMAP OR operation with null-safe handling: NULL inputs are treated as {}. RB_OR_NULL2EMPTY(RB_BUILD(null), RB_BUILD(ARRAY[3,4,5]))
RB_ANDNOT_NULL2EMPTY ROARING BITMAP, ROARING BITMAP ROARING BITMAP ANDNOT operation with null-safe handling: NULL inputs are treated as {}. RB_ANDNOT_NULL2EMPTY(RB_BUILD(null), RB_BUILD(ARRAY[3,4,5]))
RB_AND_CARDINALITY ROARING BITMAP, ROARING BITMAP INTEGER Returns the cardinality of the AND result. RB_AND_CARDINALITY(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[3,4,5]))
RB_AND_NULL2EMPTY_CARDINALITY ROARING BITMAP, ROARING BITMAP INTEGER Returns the cardinality of the AND result; NULL inputs are treated as {}. RB_AND_NULL2EMPTY_CARDINALITY(RB_BUILD(null), RB_BUILD(ARRAY[3,4,5]))
RB_OR_CARDINALITY ROARING BITMAP, ROARING BITMAP INTEGER Returns the cardinality of the OR result. RB_OR_CARDINALITY(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[3,4,5]))
RB_OR_NULL2EMPTY_CARDINALITY ROARING BITMAP, ROARING BITMAP INTEGER Returns the cardinality of the OR result; NULL inputs are treated as {}. RB_OR_NULL2EMPTY_CARDINALITY(RB_BUILD(null), RB_BUILD(ARRAY[3,4,5]))
RB_XOR_CARDINALITY ROARING BITMAP, ROARING BITMAP INTEGER Returns the cardinality of the XOR result. RB_XOR_CARDINALITY(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[3,4,5]))
RB_ANDNOT_CARDINALITY ROARING BITMAP, ROARING BITMAP INTEGER Returns the cardinality of the ANDNOT result. RB_ANDNOT_CARDINALITY(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[3,4,5]))
RB_ANDNOT_NULL2EMPTY_CARDINALITY ROARING BITMAP, ROARING BITMAP INTEGER Returns the cardinality of the ANDNOT result; NULL inputs are treated as {}. RB_ANDNOT_NULL2EMPTY_CARDINALITY(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[3,4,5]))
RB_IS_EMPTY ROARING BITMAP BOOLEAN Returns true if the bitmap is empty. RB_IS_EMPTY(RB_BUILD(ARRAY[]))
RB_CLEAR ROARING BITMAP, BIGINT, BIGINT ROARING BITMAP Removes elements in the specified range: start (inclusive) to end (exclusive). RB_CLEAR(RB_BUILD('{1,2,3}'), 2, 3)
RB_FLIP ROARING BITMAP, INTEGER, INTEGER ROARING BITMAP Flips bits in the specified offset range. RB_FLIP(RB_BUILD(ARRAY[1,2,3,4,5]), 2, 5)
RB_MINIMUM ROARING BITMAP INTEGER Returns the smallest element. Returns an error if the bitmap is empty. RB_MINIMUM(RB_BUILD(ARRAY[1,2,3]))
RB_MAXIMUM ROARING BITMAP INTEGER Returns the largest element. Returns an error if the bitmap is empty. RB_MAXIMUM(RB_BUILD(ARRAY[1,2,3]))
RB_RANK ROARING BITMAP, INTEGER INTEGER Returns the count of elements less than or equal to the specified offset. RB_RANK(RB_BUILD(ARRAY[1,2,3]), 2)
RB_TO_ARRAY ROARING BITMAP INTEGER Returns the elements as an integer array. RB_TO_ARRAY(RB_BUILD(ARRAY[1,2,3]))
RB_TO_VARBINARY ROARING BITMAP VARBINARY Returns the bitmap serialized as VARBINARY. RB_TO_VARBINARY(RB_BUILD(ARRAY[1,2,3]))
RB_RANGE_CARDINALITY ROARING BITMAP, INTEGER, INTEGER INTEGER Returns the cardinality for elements at positions start (inclusive) to end (exclusive). Position is 1-based. Requires version 3.1.10.0 or later. RB_RANGE_CARDINALITY(RB_BUILD(ARRAY[1,2,3]), 2, 3)
RB_SELECT ROARING BITMAP, INTEGER, INTEGER ROARING BITMAP Returns elements at positions start (inclusive) to end (exclusive). Position is 1-based. Requires version 3.1.10.0 or later. RB_SELECT(RB_BUILD(ARRAY[1,3,4,5,7,9]), 2, 3)

Aggregate functions

Function Input type Output type Description Example
RB_BUILD_AGG INTEGER ROARING BITMAP Aggregates integer values from multiple rows into a bitmap. RB_CARDINALITY(RB_BUILD_AGG(1))
RB_OR_AGG ROARING BITMAP ROARING BITMAP Performs OR aggregation across multiple bitmaps. RB_CARDINALITY(RB_OR_AGG(RB_BUILD(ARRAY[1,2,3])))
RB_AND_AGG ROARING BITMAP ROARING BITMAP Performs AND aggregation across multiple bitmaps. RB_CARDINALITY(RB_AND_AGG(RB_BUILD(ARRAY[1,2,3])))
RB_XOR_AGG ROARING BITMAP ROARING BITMAP Performs XOR aggregation across multiple bitmaps. RB_CARDINALITY(RB_XOR_AGG(RB_BUILD(ARRAY[1,2,3])))
RB_OR_CARDINALITY_AGG ROARING BITMAP INTEGER Performs OR aggregation and returns the cardinality. RB_OR_CARDINALITY_AGG(RB_BUILD(ARRAY[1,2,3]))
RB_AND_CARDINALITY_AGG ROARING BITMAP INTEGER Performs AND aggregation and returns the cardinality. RB_AND_CARDINALITY_AGG(RB_BUILD(ARRAY[1,2,3]))
RB_XOR_CARDINALITY_AGG ROARING BITMAP INTEGER Performs XOR aggregation and returns the cardinality. RB_XOR_CARDINALITY_AGG(RB_BUILD(ARRAY[1,2,3]))

Basic usage

These examples show how to create a table, insert bitmap data, and run scalar and aggregate queries.

Internal table

  1. Create an internal table with a ROARINGBITMAP column.

    CREATE TABLE `test_rb` (
      `id` INT,
      `rb` ROARINGBITMAP
    );
  2. Insert bitmap data.

    INSERT INTO test_rb VALUES (1, '[1, 2, 3]');
    INSERT INTO test_rb VALUES (2, '[2, 3, 4, 5, 6]');
  3. Get the cardinality of each row.

    SELECT id, RB_CARDINALITY(rb) FROM test_rb;
    +------+--------------------+
    | id   | rb_cardinality(rb) |
    +------+--------------------+
    |    2 |                  5 |
    |    1 |                  3 |
    +------+--------------------+
  4. Get the cardinality of the union across all rows.

    SELECT RB_OR_CARDINALITY_AGG(rb) FROM test_rb;
    +---------------------------+
    | rb_or_cardinality_agg(rb) |
    +---------------------------+
    |                         6 |
    +---------------------------+

External table

  1. Create an OSS external table with a ROARINGBITMAP column.

    CREATE TABLE `test_rb` (
      `id` INT,
      `rb` ROARINGBITMAP
    ) engine = 'oss'
    TABLE_PROPERTIES = '{
      "endpoint": "oss-cn-zhangjiakou.aliyuncs.com",
      "accessid": "************",
      "accesskey": "************",
      "url": "oss://testBucketName/roaringbitmap/test_for_user/",
      "format": "parquet"
    }';

    For external table parameters, see OSS non-partitioned external table.

  2. Insert bitmap data.

    Important

    INSERT INTO is inefficient for large-scale writes. For large datasets, generate Parquet files using an ETL tool and upload them to the OSS path before creating the external table.

    INSERT INTO test_rb SELECT 1, rb_build(ARRAY[1,2,3]);
    INSERT INTO test_rb SELECT 2, rb_build(ARRAY[2,3,4,5]);
  3. Get the cardinality of each row.

    SELECT id, RB_CARDINALITY(rb) FROM test_rb;
    +------+--------------------+
    | id   | rb_cardinality(rb) |
    +------+--------------------+
    |    2 |                  4 |
    |    1 |                  3 |
    +------+--------------------+
  4. Get the cardinality of the union across all rows.

    SELECT RB_OR_CARDINALITY_AGG(rb) FROM test_rb;
    +---------------------------+
    | rb_or_cardinality_agg(rb) |
    +---------------------------+
    |                         5 |
    +---------------------------+

User profiling tutorial

This tutorial walks through a complete user profiling workflow: building a tag table from raw user data, converting it to bitmap format for efficient set operations, and running multi-dimensional analyses.

Workflow overview: original label table converted to Roaring Bitmap label table for computations

Step 1: Prepare the source tag table

  1. Create the source table users_base.

    CREATE TABLE users_base (
      uid  INT,
      tag1 STRING,  -- Valid values: x, y, z
      tag2 STRING,  -- Valid values: a, b
      tag3 INT      -- Valid values: 1 to 10
    );
  2. Generate 100 million rows of random test data using a cross join of two bitmap ranges (10,000 × 10,000 = 100,000,000 rows).

    SUBMIT JOB
    INSERT OVERWRITE users_base
    SELECT
      CAST(ROW_NUMBER() OVER (ORDER BY c1) AS INT) AS uid,
      SUBSTRING('xyz', FLOOR(RAND() * 3) + 1, 1) AS tag1,
      SUBSTRING('ab',  FLOOR(RAND() * 2) + 1, 1) AS tag2,
      CAST(FLOOR(RAND() * 10) + 1 AS INT)         AS tag3
    FROM (
      SELECT A.c1 FROM
        UNNEST(RB_BUILD_RANGE(0, 10000)) AS A(c1)
        JOIN (SELECT c1 FROM UNNEST(RB_BUILD_RANGE(0, 10000)) AS B(c1))
    );
  3. Verify the data.

    SELECT * FROM users_base LIMIT 10;
    +--------+------+------+------+
    | uid    | tag1 | tag2 | tag3 |
    +--------+------+------+------+
    |  74526 | y    | b    |    3 |
    |  75611 | z    | b    |   10 |
    |  80850 | x    | b    |    5 |
    |  81656 | z    | b    |    7 |
    | 163845 | x    | b    |    2 |
    | 167007 | y    | b    |    4 |
    | 170541 | y    | b    |    9 |
    | 213108 | x    | a    |   10 |
    |  66056 | y    | b    |    4 |
    |  67761 | z    | a    |    2 |
    +--------+------+------+------+

Step 2: Add grouping fields

Bitmap operations in a distributed engine run in parallel across groups. Add a user_group field to partition UIDs across groups, and an offset field to encode the UID position within each group.

This example uses 16 groups with the formula uid = 16 × offset + user_group:

  • user_group = uid % 16 — the group the UID belongs to

  • offset = uid / 16 — the UID's position within the group

Sizing your groups: Each group's bitmap should contain fewer than 100 million entries. For a UID space of 10 billion, use 100 groups of 100 million each. Adjust the number of groups based on your cluster's ACUs and total UID space.

Note

The grouping formula above is for illustration only. Design your own grouping function based on your data distribution.

  1. Create the users table with the grouping fields.

    CREATE TABLE users (
      uid        INT,
      tag1       STRING,
      tag2       STRING,
      tag3       INT,
      user_group INT,  -- Grouping field: uid % 16
      offset     INT   -- Offset field: uid / 16
    );
  2. Populate users from users_base.

    SUBMIT JOB INSERT OVERWRITE users
    SELECT
      uid, tag1, tag2, tag3,
      CAST(uid % 16 AS INT),
      CAST(FLOOR(uid / 16) AS INT)
    FROM users_base;
  3. Verify the data.

    SELECT * FROM users LIMIT 10;
    +---------+------+------+------+------------+--------+
    | uid     | tag1 | tag2 | tag3 | user_group | offset |
    +---------+------+------+------+------------+--------+
    |  377194 | z    | b    |   10 |         10 |  23574 |
    |  309440 | x    | a    |    1 |          0 |  19340 |
    |  601745 | z    | a    |    7 |          1 |  37609 |
    |  753751 | z    | b    |    3 |          7 |  47109 |
    |  988186 | y    | a    |   10 |         10 |  61761 |
    |  883822 | x    | a    |    9 |         14 |  55238 |
    |  325065 | x    | b    |    6 |          9 |  20316 |
    | 1042875 | z    | a    |   10 |         11 |  65179 |
    |  928606 | y    | b    |    5 |         14 |  58037 |
    |  990858 | z    | a    |    8 |         10 |  61928 |
    +---------+------+------+------+------------+--------+

Step 3: Build bitmap tag tables

For each tag dimension, create a bitmap tag table where each row stores one bitmap per (tag_value, user_group) pair. The bitmap encodes the offsets of all matching users in that group.

Internal table

  1. Create and populate the bitmap tag table for tag1.

    CREATE TABLE `tag_tbl_1` (
      `tag1`       STRING,
      `rb`         ROARINGBITMAP,
      `user_group` INT
    );
    
    INSERT OVERWRITE tag_tbl_1
    SELECT tag1, RB_BUILD_AGG(offset), user_group
    FROM users
    GROUP BY tag1, user_group;
  2. Verify the tag table.

    SELECT tag1, user_group, RB_CARDINALITY(rb) FROM tag_tbl_1;
    +------+------------+--------------------+
    | tag1 | user_group | rb_cardinality(rb) |
    +------+------------+--------------------+
    | y    |         13 |             563654 |
    | x    |         11 |             565013 |
    | z    |          2 |             564428 |
    | x    |          4 |             564377 |
    ...
    | z    |          5 |             564333 |
    | x    |          8 |             564808 |
    | x    |          0 |             564228 |
    | y    |          3 |             563325 |
    +------+------------+--------------------+
  3. Create and populate the bitmap tag table for tag2.

    CREATE TABLE `tag_tbl_2` (
      `tag2`       STRING,
      `rb`         ROARINGBITMAP,
      `user_group` INT
    );
    
    INSERT OVERWRITE tag_tbl_2
    SELECT tag2, RB_BUILD_AGG(offset), user_group
    FROM users
    GROUP BY tag2, user_group;
  4. Verify the tag table.

    SELECT tag2, user_group, RB_CARDINALITY(rb) FROM tag_tbl_2;
    +------+------------+--------------------+
    | tag2 | user_group | rb_cardinality(rb) |
    +------+------------+--------------------+
    | a    |          9 |            3123039 |
    | a    |          5 |            3123973 |
    | a    |         12 |            3122414 |
    | a    |          7 |            3127218 |
    | a    |         15 |            3125403 |
    ...
    | a    |         10 |            3122698 |
    | b    |          4 |            3126091 |
    | b    |          3 |            3124626 |
    | b    |          9 |            3126961 |
    | b    |         14 |            3125351 |
    +------+------------+--------------------+

External table

  1. Create and populate the bitmap tag table for tag1 in OSS.

    CREATE TABLE `tag_tbl_1` (
      `tag1`       STRING,
      `rb`         ROARINGBITMAP,
      `user_group` INT
    ) engine = 'oss'
    TABLE_PROPERTIES = '{
      "endpoint":  "oss-cn-zhangjiakou.aliyuncs.com",
      "accessid":  "************",
      "accesskey": "************",
      "url":       "oss://testBucketName/roaringbitmap/tag_tbl_1/",
      "format":    "parquet"
    }';
    
    INSERT OVERWRITE tag_tbl_1
    SELECT tag1, RB_BUILD_AGG(offset), user_group
    FROM users
    GROUP BY tag1, user_group;
  2. Verify the tag table.

    SELECT tag1, user_group, RB_CARDINALITY(rb) FROM tag_tbl_1;
    +------+------------+--------------------+
    | tag1 | user_group | rb_cardinality(rb) |
    +------+------------+--------------------+
    | z    |          7 |            2082608 |
    | x    |         10 |            2082953 |
    | y    |          7 |            2084730 |
    | x    |         14 |            2084856 |
    ...
    | z    |         15 |            2084535 |
    | z    |          5 |            2083204 |
    | x    |         11 |            2085239 |
    | z    |          1 |            2084879 |
    +------+------------+--------------------+
  3. Create and populate the bitmap tag table for tag2 in OSS.

    CREATE TABLE `tag_tbl_2` (
      `tag2`       STRING,
      `rb`         ROARINGBITMAP,
      `user_group` INT
    ) engine = 'oss'
    TABLE_PROPERTIES = '{
      "endpoint":  "oss-cn-zhangjiakou.aliyuncs.com",
      "accessid":  "************",
      "accesskey": "************",
      "url":       "oss://testBucketName/roaringbitmap/tag_tbl_2/",
      "format":    "parquet"
    }';
    
    INSERT OVERWRITE tag_tbl_2
    SELECT tag2, RB_BUILD_AGG(offset), user_group
    FROM users
    GROUP BY tag2, user_group;
  4. Verify the tag table.

    SELECT tag2, user_group, RB_CARDINALITY(rb) FROM tag_tbl_2;
    +------+------------+--------------------+
    | tag2 | user_group | rb_cardinality(rb) |
    +------+------------+--------------------+
    | b    |         11 |            3121361 |
    | a    |          6 |            3124750 |
    | a    |          1 |            3125433 |
    ...
    | b    |          2 |            3126523 |
    | b    |         12 |            3123452 |
    | a    |          4 |            3126111 |
    | a    |         13 |            3123316 |
    | a    |          2 |            3123477 |
    +------+------------+--------------------+

Step 4: Analyze with bitmap tag tables

The following scenarios show common analysis patterns using the bitmap tag tables built in Step 3.

Scenario 1: Filter and group

Count users where tag1 IN ('x', 'y'), grouped by tag2.

All queries follow the same pattern: join bitmap tag tables on user_group, apply set operations per group, then aggregate across groups.

  1. Inspect the per-group intermediate results.

    SELECT
      tag2,
      t1.user_group AS user_group,
      RB_CARDINALITY(RB_AND(t2.rb, rb1)) AS rb
    FROM tag_tbl_2 AS t2
    JOIN (
      SELECT user_group, rb AS rb1
      FROM tag_tbl_1
      WHERE tag1 IN ('x', 'y')
    ) AS t1 ON t1.user_group = t2.user_group;
    +------+------------+---------+
    | tag2 | user_group | rb      |
    +------+------------+---------+
    | b    |          3 | 1041828 |
    | a    |         15 | 1039859 |
    | a    |          9 | 1039140 |
    | b    |          1 | 1041524 |
    | a    |          4 | 1041599 |
    | b    |          1 | 1041381 |
    | b    |         10 | 1041026 |
    | b    |          6 | 1042289 |
    +------+------------+---------+
  2. Sum the per-group counts to get the final totals.

    SELECT tag2, SUM(cnt) FROM (
      SELECT
        tag2,
        t1.user_group AS user_group,
        RB_CARDINALITY(RB_AND(t2.rb, rb1)) AS cnt
      FROM tag_tbl_2 AS t2
      JOIN (
        SELECT user_group, rb AS rb1
        FROM tag_tbl_1
        WHERE tag1 IN ('x', 'y')
      ) AS t1 ON t1.user_group = t2.user_group
    ) GROUP BY tag2;
    +------+----------+
    | tag2 | sum(cnt) |
    +------+----------+
    | a    | 33327868 |
    | b    | 33335220 |
    +------+----------+

Scenario 2: Intersection between two bitmap tag tables

Find users where (tag1 = 'x' OR tag1 = 'y') AND tag2 = 'b'.

Both inputs come from bitmap tag tables. First OR-aggregate within each tag table, then AND the results across the two.

SELECT user_group, RB_CARDINALITY(rb) FROM (
  SELECT
    t1.user_group AS user_group,
    RB_AND(rb1, rb2) AS rb
  FROM (
    SELECT user_group, RB_OR_AGG(rb) AS rb1
    FROM tag_tbl_1
    WHERE tag1 = 'x' OR tag1 = 'y'
    GROUP BY user_group
  ) AS t1
  JOIN (
    SELECT user_group, RB_OR_AGG(rb) AS rb2
    FROM tag_tbl_2
    WHERE tag2 = 'b'
    GROUP BY user_group
  ) AS t2 ON t1.user_group = t2.user_group
  GROUP BY user_group
);
+------------+--------------------+
| user_group | rb_cardinality(rb) |
+------------+--------------------+
|         10 |            2083679 |
|          3 |            2082370 |
|          9 |            2082847 |
|          2 |            2086511 |
...
|          1 |            2082291 |
|          4 |            2083290 |
|         14 |            2083581 |
|         15 |            2084110 |
+------------+--------------------+

Scenario 3: Intersection between a bitmap tag table and the source table

Find users where (tag1 = 'x' OR tag1 = 'y') AND tag2 = 'b', where the second condition is filtered from the raw users table rather than a pre-built bitmap tag table.

Use RB_BUILD_AGG to build a bitmap on the fly from the users table, then AND it with the pre-built bitmap from tag_tbl_1.

SELECT user_group, RB_CARDINALITY(rb) FROM (
  SELECT
    t1.user_group AS user_group,
    RB_AND(rb1, rb2) AS rb
  FROM (
    SELECT user_group, RB_OR_AGG(rb) AS rb1
    FROM tag_tbl_1
    WHERE tag1 = 'x' OR tag1 = 'y'
    GROUP BY user_group
  ) AS t1
  JOIN (
    SELECT user_group, RB_BUILD_AGG(offset) AS rb2
    FROM users
    WHERE tag2 = 'b'
    GROUP BY user_group
  ) AS t2 ON t1.user_group = t2.user_group
  GROUP BY user_group
);
+------------+--------------------+
| user_group | rb_cardinality(rb) |
+------------+--------------------+
|          3 |            2082370 |
|          1 |            2082291 |
|          0 |            2082383 |
|          4 |            2083290 |
|         11 |            2081662 |
|         13 |            2085280 |
...
|         14 |            2083581 |
|         15 |            2084110 |
|          9 |            2082847 |
|          8 |            2084860 |
|          5 |            2083056 |
|          7 |            2083275 |
+------------+--------------------+

Scenario 4: Export bitmap results to OSS

Save the bitmap results from Scenario 2 to an OSS external table for downstream use.

  1. Create the output table.

    CREATE TABLE `tag_tbl_3` (
      `user_group` INT,
      `rb`         ROARINGBITMAP
    ) engine = 'oss'
    TABLE_PROPERTIES = '{
      "endpoint":  "oss-cn-zhangjiakou.aliyuncs.com",
      "accessid":  "************",
      "accesskey": "************",
      "url":       "oss://testBucketName/roaringbitmap/tag_tbl_3/",
      "format":    "parquet"
    }';
  2. Write the Scenario 2 results to tag_tbl_3.

    INSERT OVERWRITE tag_tbl_3
    SELECT
      t1.user_group AS user_group,
      RB_AND(rb1, rb2) AS rb
    FROM (
      SELECT user_group, RB_OR_AGG(rb) AS rb1
      FROM tag_tbl_1
      WHERE tag1 = 'x' OR tag1 = 'y'
      GROUP BY user_group
    ) AS t1
    JOIN (
      SELECT user_group, RB_OR_AGG(rb) AS rb2
      FROM tag_tbl_2
      WHERE tag2 = 'b'
      GROUP BY user_group
    ) AS t2 ON t1.user_group = t2.user_group;

    After the query completes, the result is stored in Parquet format at oss://testBucketName/roaringbitmap/tag_tbl_3/.

Scenario 5: Accelerate queries with an internal cache table (for external tables)

Import bitmap data from an OSS external table into an internal table to speed up repeated queries. Because internal tables earlier than V3.2.1.0 do not support the ROARINGBITMAP type natively, store the bitmaps as VARBINARY and convert at query time.

  1. Create the internal cache table with a VARBINARY column for the bitmap.

    CREATE TABLE `tag_tbl_1_cstore` (
      `tag1`       VARCHAR,
      `rb`         VARBINARY,
      `user_group` INT
    );
  2. Import from the OSS table, serializing each bitmap to VARBINARY.

    INSERT INTO tag_tbl_1_cstore
    SELECT tag1, RB_TO_VARBINARY(rb), user_group
    FROM tag_tbl_1;
  3. Query the cache table, deserializing VARBINARY back to bitmaps at query time.

    SELECT
      tag1,
      user_group,
      RB_CARDINALITY(RB_OR_AGG(RB_BUILD_VARBINARY(rb)))
    FROM tag_tbl_1_cstore
    GROUP BY tag1, user_group;
    +------+------------+---------------------------------------------------+
    | tag1 | user_group | rb_cardinality(rb_or_agg(rb_build_varbinary(rb))) |
    +------+------------+---------------------------------------------------+
    | y    |          3 |                                           2082919 |
    | x    |          9 |                                           2083085 |
    | x    |          3 |                                           2082140 |
    | y    |         11 |                                           2082268 |
    | z    |          4 |                                           2082451 |
    ...
    | z    |          2 |                                           2081560 |
    | y    |          6 |                                           2082194 |
    | z    |          7 |                                           2082608 |
    +------+------------+---------------------------------------------------+