Roaring Bitmap is a compressed bitmap format optimized for high-performance set operations. Use it in AnalyticDB for MySQL for deduplication, tag-based user filtering, and time-series calculations over large datasets.
When to use Roaring Bitmap
Roaring Bitmap outperforms standard COUNT(DISTINCT ...) queries when each group in the bitmap contains fewer than 100 million entries. For larger UID spaces, increase the number of groups accordingly (for example, a UID space of 10 billion requires at least 100 groups).
Version requirements
| Table type | Minimum version |
|---|---|
| OSS external tables | 3.1.6.4 |
| Internal tables | 3.2.1.0 |
To view or update your cluster version, go to the Configuration Information section on the Cluster Information page in the AnalyticDB for MySQL console.
Limitations
-
No direct SELECT on ROARINGBITMAP columns. Use UNNEST to view the elements:
SELECT * FROM unnest(RB_BUILD(ARRAY[1,2,3])); -
Clusters earlier than V3.2.1.0 support ROARINGBITMAP only in OSS external tables. To use bitmap operations on internal tables in those versions, store bitmaps as VARBINARY and convert at query time using
RB_BUILD_VARBINARY:-- Define an internal table with VARBINARY CREATE TABLE test_rb_cstore (id INT, rb VARBINARY); -- Query using bitmap functions SELECT RB_CARDINALITY(RB_BUILD_VARBINARY(rb)) FROM test_rb_cstore;
Build a Roaring Bitmap
Three functions construct a Roaring Bitmap, each for a different input source:
| Function | Input | Use when |
|---|---|---|
RB_BUILD(array) |
Integer array | Building a bitmap from a literal array |
RB_BUILD_AGG(integer) |
Integer (aggregate) | Aggregating row-level integers into a bitmap |
RB_BUILD_VARBINARY(varbinary) |
VARBINARY | Reading bitmaps stored as VARBINARY in internal tables |
Functions
Scalar functions
| Function | Input type | Output type | Description | Example |
|---|---|---|---|---|
RB_BUILD |
INT | ROARING BITMAP | Builds a Roaring Bitmap from an integer array. | RB_BUILD(ARRAY[1,2,3]) |
RB_BUILD_RANGE |
INT, INT | ROARING BITMAP | Builds a Roaring Bitmap from a range: start (inclusive) to end (exclusive). | RB_BUILD_RANGE(0, 10000) |
RB_BUILD_VARBINARY |
VARBINARY | ROARING BITMAP | Builds a Roaring Bitmap from VARBINARY data. | RB_BUILD_VARBINARY(RB_TO_VARBINARY(RB_BUILD(ARRAY[1,2,3]))) |
RB_CARDINALITY |
ROARING BITMAP | BIGINT | Returns the number of elements in the bitmap. | RB_CARDINALITY(RB_BUILD(ARRAY[1,2,3])) |
RB_CONTAINS |
ROARING BITMAP, INT | BOOLEAN | Returns true if the bitmap contains the specified integer. |
RB_CONTAINS(RB_BUILD(ARRAY[1,2,3]), 3) |
RB_CONTAINS |
ROARING BITMAP, ROARING BITMAP | BOOLEAN | Returns true if the first bitmap contains all elements of the second. |
RB_CONTAINS(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[3])) |
RB_AND |
ROARING BITMAP, ROARING BITMAP | ROARING BITMAP | Returns the intersection (AND) of two bitmaps. | RB_AND(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[2,3,4])) |
RB_OR |
ROARING BITMAP, ROARING BITMAP | ROARING BITMAP | Returns the union (OR) of two bitmaps. | RB_OR(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[2,3,4])) |
RB_XOR |
ROARING BITMAP, ROARING BITMAP | ROARING BITMAP | Returns the exclusive OR (XOR) of two bitmaps. | RB_XOR(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[2,3,4])) |
RB_AND_NULL2EMPTY |
ROARING BITMAP, ROARING BITMAP | ROARING BITMAP | AND operation with null-safe handling: if one input is NULL, the other is used; if one input is {}, the result is {}. |
RB_AND_NULL2EMPTY(RB_BUILD(null), RB_BUILD(ARRAY[3,4,5])) |
RB_OR_NULL2EMPTY |
ROARING BITMAP, ROARING BITMAP | ROARING BITMAP | OR operation with null-safe handling: NULL inputs are treated as {}. |
RB_OR_NULL2EMPTY(RB_BUILD(null), RB_BUILD(ARRAY[3,4,5])) |
RB_ANDNOT_NULL2EMPTY |
ROARING BITMAP, ROARING BITMAP | ROARING BITMAP | ANDNOT operation with null-safe handling: NULL inputs are treated as {}. |
RB_ANDNOT_NULL2EMPTY(RB_BUILD(null), RB_BUILD(ARRAY[3,4,5])) |
RB_AND_CARDINALITY |
ROARING BITMAP, ROARING BITMAP | INTEGER | Returns the cardinality of the AND result. | RB_AND_CARDINALITY(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[3,4,5])) |
RB_AND_NULL2EMPTY_CARDINALITY |
ROARING BITMAP, ROARING BITMAP | INTEGER | Returns the cardinality of the AND result; NULL inputs are treated as {}. |
RB_AND_NULL2EMPTY_CARDINALITY(RB_BUILD(null), RB_BUILD(ARRAY[3,4,5])) |
RB_OR_CARDINALITY |
ROARING BITMAP, ROARING BITMAP | INTEGER | Returns the cardinality of the OR result. | RB_OR_CARDINALITY(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[3,4,5])) |
RB_OR_NULL2EMPTY_CARDINALITY |
ROARING BITMAP, ROARING BITMAP | INTEGER | Returns the cardinality of the OR result; NULL inputs are treated as {}. |
RB_OR_NULL2EMPTY_CARDINALITY(RB_BUILD(null), RB_BUILD(ARRAY[3,4,5])) |
RB_XOR_CARDINALITY |
ROARING BITMAP, ROARING BITMAP | INTEGER | Returns the cardinality of the XOR result. | RB_XOR_CARDINALITY(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[3,4,5])) |
RB_ANDNOT_CARDINALITY |
ROARING BITMAP, ROARING BITMAP | INTEGER | Returns the cardinality of the ANDNOT result. | RB_ANDNOT_CARDINALITY(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[3,4,5])) |
RB_ANDNOT_NULL2EMPTY_CARDINALITY |
ROARING BITMAP, ROARING BITMAP | INTEGER | Returns the cardinality of the ANDNOT result; NULL inputs are treated as {}. |
RB_ANDNOT_NULL2EMPTY_CARDINALITY(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[3,4,5])) |
RB_IS_EMPTY |
ROARING BITMAP | BOOLEAN | Returns true if the bitmap is empty. |
RB_IS_EMPTY(RB_BUILD(ARRAY[])) |
RB_CLEAR |
ROARING BITMAP, BIGINT, BIGINT | ROARING BITMAP | Removes elements in the specified range: start (inclusive) to end (exclusive). | RB_CLEAR(RB_BUILD('{1,2,3}'), 2, 3) |
RB_FLIP |
ROARING BITMAP, INTEGER, INTEGER | ROARING BITMAP | Flips bits in the specified offset range. | RB_FLIP(RB_BUILD(ARRAY[1,2,3,4,5]), 2, 5) |
RB_MINIMUM |
ROARING BITMAP | INTEGER | Returns the smallest element. Returns an error if the bitmap is empty. | RB_MINIMUM(RB_BUILD(ARRAY[1,2,3])) |
RB_MAXIMUM |
ROARING BITMAP | INTEGER | Returns the largest element. Returns an error if the bitmap is empty. | RB_MAXIMUM(RB_BUILD(ARRAY[1,2,3])) |
RB_RANK |
ROARING BITMAP, INTEGER | INTEGER | Returns the count of elements less than or equal to the specified offset. | RB_RANK(RB_BUILD(ARRAY[1,2,3]), 2) |
RB_TO_ARRAY |
ROARING BITMAP | INTEGER | Returns the elements as an integer array. | RB_TO_ARRAY(RB_BUILD(ARRAY[1,2,3])) |
RB_TO_VARBINARY |
ROARING BITMAP | VARBINARY | Returns the bitmap serialized as VARBINARY. | RB_TO_VARBINARY(RB_BUILD(ARRAY[1,2,3])) |
RB_RANGE_CARDINALITY |
ROARING BITMAP, INTEGER, INTEGER | INTEGER | Returns the cardinality for elements at positions start (inclusive) to end (exclusive). Position is 1-based. Requires version 3.1.10.0 or later. | RB_RANGE_CARDINALITY(RB_BUILD(ARRAY[1,2,3]), 2, 3) |
RB_SELECT |
ROARING BITMAP, INTEGER, INTEGER | ROARING BITMAP | Returns elements at positions start (inclusive) to end (exclusive). Position is 1-based. Requires version 3.1.10.0 or later. | RB_SELECT(RB_BUILD(ARRAY[1,3,4,5,7,9]), 2, 3) |
Aggregate functions
| Function | Input type | Output type | Description | Example |
|---|---|---|---|---|
RB_BUILD_AGG |
INTEGER | ROARING BITMAP | Aggregates integer values from multiple rows into a bitmap. | RB_CARDINALITY(RB_BUILD_AGG(1)) |
RB_OR_AGG |
ROARING BITMAP | ROARING BITMAP | Performs OR aggregation across multiple bitmaps. | RB_CARDINALITY(RB_OR_AGG(RB_BUILD(ARRAY[1,2,3]))) |
RB_AND_AGG |
ROARING BITMAP | ROARING BITMAP | Performs AND aggregation across multiple bitmaps. | RB_CARDINALITY(RB_AND_AGG(RB_BUILD(ARRAY[1,2,3]))) |
RB_XOR_AGG |
ROARING BITMAP | ROARING BITMAP | Performs XOR aggregation across multiple bitmaps. | RB_CARDINALITY(RB_XOR_AGG(RB_BUILD(ARRAY[1,2,3]))) |
RB_OR_CARDINALITY_AGG |
ROARING BITMAP | INTEGER | Performs OR aggregation and returns the cardinality. | RB_OR_CARDINALITY_AGG(RB_BUILD(ARRAY[1,2,3])) |
RB_AND_CARDINALITY_AGG |
ROARING BITMAP | INTEGER | Performs AND aggregation and returns the cardinality. | RB_AND_CARDINALITY_AGG(RB_BUILD(ARRAY[1,2,3])) |
RB_XOR_CARDINALITY_AGG |
ROARING BITMAP | INTEGER | Performs XOR aggregation and returns the cardinality. | RB_XOR_CARDINALITY_AGG(RB_BUILD(ARRAY[1,2,3])) |
Basic usage
These examples show how to create a table, insert bitmap data, and run scalar and aggregate queries.
Internal table
-
Create an internal table with a ROARINGBITMAP column.
CREATE TABLE `test_rb` ( `id` INT, `rb` ROARINGBITMAP ); -
Insert bitmap data.
INSERT INTO test_rb VALUES (1, '[1, 2, 3]'); INSERT INTO test_rb VALUES (2, '[2, 3, 4, 5, 6]'); -
Get the cardinality of each row.
SELECT id, RB_CARDINALITY(rb) FROM test_rb;+------+--------------------+ | id | rb_cardinality(rb) | +------+--------------------+ | 2 | 5 | | 1 | 3 | +------+--------------------+ -
Get the cardinality of the union across all rows.
SELECT RB_OR_CARDINALITY_AGG(rb) FROM test_rb;+---------------------------+ | rb_or_cardinality_agg(rb) | +---------------------------+ | 6 | +---------------------------+
External table
-
Create an OSS external table with a ROARINGBITMAP column.
CREATE TABLE `test_rb` ( `id` INT, `rb` ROARINGBITMAP ) engine = 'oss' TABLE_PROPERTIES = '{ "endpoint": "oss-cn-zhangjiakou.aliyuncs.com", "accessid": "************", "accesskey": "************", "url": "oss://testBucketName/roaringbitmap/test_for_user/", "format": "parquet" }';For external table parameters, see OSS non-partitioned external table.
-
Insert bitmap data.
ImportantINSERT INTO is inefficient for large-scale writes. For large datasets, generate Parquet files using an ETL tool and upload them to the OSS path before creating the external table.
INSERT INTO test_rb SELECT 1, rb_build(ARRAY[1,2,3]); INSERT INTO test_rb SELECT 2, rb_build(ARRAY[2,3,4,5]); -
Get the cardinality of each row.
SELECT id, RB_CARDINALITY(rb) FROM test_rb;+------+--------------------+ | id | rb_cardinality(rb) | +------+--------------------+ | 2 | 4 | | 1 | 3 | +------+--------------------+ -
Get the cardinality of the union across all rows.
SELECT RB_OR_CARDINALITY_AGG(rb) FROM test_rb;+---------------------------+ | rb_or_cardinality_agg(rb) | +---------------------------+ | 5 | +---------------------------+
User profiling tutorial
This tutorial walks through a complete user profiling workflow: building a tag table from raw user data, converting it to bitmap format for efficient set operations, and running multi-dimensional analyses.
Step 1: Prepare the source tag table
-
Create the source table
users_base.CREATE TABLE users_base ( uid INT, tag1 STRING, -- Valid values: x, y, z tag2 STRING, -- Valid values: a, b tag3 INT -- Valid values: 1 to 10 ); -
Generate 100 million rows of random test data using a cross join of two bitmap ranges (10,000 × 10,000 = 100,000,000 rows).
SUBMIT JOB INSERT OVERWRITE users_base SELECT CAST(ROW_NUMBER() OVER (ORDER BY c1) AS INT) AS uid, SUBSTRING('xyz', FLOOR(RAND() * 3) + 1, 1) AS tag1, SUBSTRING('ab', FLOOR(RAND() * 2) + 1, 1) AS tag2, CAST(FLOOR(RAND() * 10) + 1 AS INT) AS tag3 FROM ( SELECT A.c1 FROM UNNEST(RB_BUILD_RANGE(0, 10000)) AS A(c1) JOIN (SELECT c1 FROM UNNEST(RB_BUILD_RANGE(0, 10000)) AS B(c1)) ); -
Verify the data.
SELECT * FROM users_base LIMIT 10;+--------+------+------+------+ | uid | tag1 | tag2 | tag3 | +--------+------+------+------+ | 74526 | y | b | 3 | | 75611 | z | b | 10 | | 80850 | x | b | 5 | | 81656 | z | b | 7 | | 163845 | x | b | 2 | | 167007 | y | b | 4 | | 170541 | y | b | 9 | | 213108 | x | a | 10 | | 66056 | y | b | 4 | | 67761 | z | a | 2 | +--------+------+------+------+
Step 2: Add grouping fields
Bitmap operations in a distributed engine run in parallel across groups. Add a user_group field to partition UIDs across groups, and an offset field to encode the UID position within each group.
This example uses 16 groups with the formula uid = 16 × offset + user_group:
-
user_group = uid % 16— the group the UID belongs to -
offset = uid / 16— the UID's position within the group
Sizing your groups: Each group's bitmap should contain fewer than 100 million entries. For a UID space of 10 billion, use 100 groups of 100 million each. Adjust the number of groups based on your cluster's ACUs and total UID space.
The grouping formula above is for illustration only. Design your own grouping function based on your data distribution.
-
Create the
userstable with the grouping fields.CREATE TABLE users ( uid INT, tag1 STRING, tag2 STRING, tag3 INT, user_group INT, -- Grouping field: uid % 16 offset INT -- Offset field: uid / 16 ); -
Populate
usersfromusers_base.SUBMIT JOB INSERT OVERWRITE users SELECT uid, tag1, tag2, tag3, CAST(uid % 16 AS INT), CAST(FLOOR(uid / 16) AS INT) FROM users_base; -
Verify the data.
SELECT * FROM users LIMIT 10;+---------+------+------+------+------------+--------+ | uid | tag1 | tag2 | tag3 | user_group | offset | +---------+------+------+------+------------+--------+ | 377194 | z | b | 10 | 10 | 23574 | | 309440 | x | a | 1 | 0 | 19340 | | 601745 | z | a | 7 | 1 | 37609 | | 753751 | z | b | 3 | 7 | 47109 | | 988186 | y | a | 10 | 10 | 61761 | | 883822 | x | a | 9 | 14 | 55238 | | 325065 | x | b | 6 | 9 | 20316 | | 1042875 | z | a | 10 | 11 | 65179 | | 928606 | y | b | 5 | 14 | 58037 | | 990858 | z | a | 8 | 10 | 61928 | +---------+------+------+------+------------+--------+
Step 3: Build bitmap tag tables
For each tag dimension, create a bitmap tag table where each row stores one bitmap per (tag_value, user_group) pair. The bitmap encodes the offsets of all matching users in that group.
Internal table
-
Create and populate the bitmap tag table for
tag1.CREATE TABLE `tag_tbl_1` ( `tag1` STRING, `rb` ROARINGBITMAP, `user_group` INT ); INSERT OVERWRITE tag_tbl_1 SELECT tag1, RB_BUILD_AGG(offset), user_group FROM users GROUP BY tag1, user_group; -
Verify the tag table.
SELECT tag1, user_group, RB_CARDINALITY(rb) FROM tag_tbl_1;+------+------------+--------------------+ | tag1 | user_group | rb_cardinality(rb) | +------+------------+--------------------+ | y | 13 | 563654 | | x | 11 | 565013 | | z | 2 | 564428 | | x | 4 | 564377 | ... | z | 5 | 564333 | | x | 8 | 564808 | | x | 0 | 564228 | | y | 3 | 563325 | +------+------------+--------------------+ -
Create and populate the bitmap tag table for
tag2.CREATE TABLE `tag_tbl_2` ( `tag2` STRING, `rb` ROARINGBITMAP, `user_group` INT ); INSERT OVERWRITE tag_tbl_2 SELECT tag2, RB_BUILD_AGG(offset), user_group FROM users GROUP BY tag2, user_group; -
Verify the tag table.
SELECT tag2, user_group, RB_CARDINALITY(rb) FROM tag_tbl_2;+------+------------+--------------------+ | tag2 | user_group | rb_cardinality(rb) | +------+------------+--------------------+ | a | 9 | 3123039 | | a | 5 | 3123973 | | a | 12 | 3122414 | | a | 7 | 3127218 | | a | 15 | 3125403 | ... | a | 10 | 3122698 | | b | 4 | 3126091 | | b | 3 | 3124626 | | b | 9 | 3126961 | | b | 14 | 3125351 | +------+------------+--------------------+
External table
-
Create and populate the bitmap tag table for
tag1in OSS.CREATE TABLE `tag_tbl_1` ( `tag1` STRING, `rb` ROARINGBITMAP, `user_group` INT ) engine = 'oss' TABLE_PROPERTIES = '{ "endpoint": "oss-cn-zhangjiakou.aliyuncs.com", "accessid": "************", "accesskey": "************", "url": "oss://testBucketName/roaringbitmap/tag_tbl_1/", "format": "parquet" }'; INSERT OVERWRITE tag_tbl_1 SELECT tag1, RB_BUILD_AGG(offset), user_group FROM users GROUP BY tag1, user_group; -
Verify the tag table.
SELECT tag1, user_group, RB_CARDINALITY(rb) FROM tag_tbl_1;+------+------------+--------------------+ | tag1 | user_group | rb_cardinality(rb) | +------+------------+--------------------+ | z | 7 | 2082608 | | x | 10 | 2082953 | | y | 7 | 2084730 | | x | 14 | 2084856 | ... | z | 15 | 2084535 | | z | 5 | 2083204 | | x | 11 | 2085239 | | z | 1 | 2084879 | +------+------------+--------------------+ -
Create and populate the bitmap tag table for
tag2in OSS.CREATE TABLE `tag_tbl_2` ( `tag2` STRING, `rb` ROARINGBITMAP, `user_group` INT ) engine = 'oss' TABLE_PROPERTIES = '{ "endpoint": "oss-cn-zhangjiakou.aliyuncs.com", "accessid": "************", "accesskey": "************", "url": "oss://testBucketName/roaringbitmap/tag_tbl_2/", "format": "parquet" }'; INSERT OVERWRITE tag_tbl_2 SELECT tag2, RB_BUILD_AGG(offset), user_group FROM users GROUP BY tag2, user_group; -
Verify the tag table.
SELECT tag2, user_group, RB_CARDINALITY(rb) FROM tag_tbl_2;+------+------------+--------------------+ | tag2 | user_group | rb_cardinality(rb) | +------+------------+--------------------+ | b | 11 | 3121361 | | a | 6 | 3124750 | | a | 1 | 3125433 | ... | b | 2 | 3126523 | | b | 12 | 3123452 | | a | 4 | 3126111 | | a | 13 | 3123316 | | a | 2 | 3123477 | +------+------------+--------------------+
Step 4: Analyze with bitmap tag tables
The following scenarios show common analysis patterns using the bitmap tag tables built in Step 3.
Scenario 1: Filter and group
Count users where tag1 IN ('x', 'y'), grouped by tag2.
All queries follow the same pattern: join bitmap tag tables on user_group, apply set operations per group, then aggregate across groups.
-
Inspect the per-group intermediate results.
SELECT tag2, t1.user_group AS user_group, RB_CARDINALITY(RB_AND(t2.rb, rb1)) AS rb FROM tag_tbl_2 AS t2 JOIN ( SELECT user_group, rb AS rb1 FROM tag_tbl_1 WHERE tag1 IN ('x', 'y') ) AS t1 ON t1.user_group = t2.user_group;+------+------------+---------+ | tag2 | user_group | rb | +------+------------+---------+ | b | 3 | 1041828 | | a | 15 | 1039859 | | a | 9 | 1039140 | | b | 1 | 1041524 | | a | 4 | 1041599 | | b | 1 | 1041381 | | b | 10 | 1041026 | | b | 6 | 1042289 | +------+------------+---------+ -
Sum the per-group counts to get the final totals.
SELECT tag2, SUM(cnt) FROM ( SELECT tag2, t1.user_group AS user_group, RB_CARDINALITY(RB_AND(t2.rb, rb1)) AS cnt FROM tag_tbl_2 AS t2 JOIN ( SELECT user_group, rb AS rb1 FROM tag_tbl_1 WHERE tag1 IN ('x', 'y') ) AS t1 ON t1.user_group = t2.user_group ) GROUP BY tag2;+------+----------+ | tag2 | sum(cnt) | +------+----------+ | a | 33327868 | | b | 33335220 | +------+----------+
Scenario 2: Intersection between two bitmap tag tables
Find users where (tag1 = 'x' OR tag1 = 'y') AND tag2 = 'b'.
Both inputs come from bitmap tag tables. First OR-aggregate within each tag table, then AND the results across the two.
SELECT user_group, RB_CARDINALITY(rb) FROM (
SELECT
t1.user_group AS user_group,
RB_AND(rb1, rb2) AS rb
FROM (
SELECT user_group, RB_OR_AGG(rb) AS rb1
FROM tag_tbl_1
WHERE tag1 = 'x' OR tag1 = 'y'
GROUP BY user_group
) AS t1
JOIN (
SELECT user_group, RB_OR_AGG(rb) AS rb2
FROM tag_tbl_2
WHERE tag2 = 'b'
GROUP BY user_group
) AS t2 ON t1.user_group = t2.user_group
GROUP BY user_group
);+------------+--------------------+
| user_group | rb_cardinality(rb) |
+------------+--------------------+
| 10 | 2083679 |
| 3 | 2082370 |
| 9 | 2082847 |
| 2 | 2086511 |
...
| 1 | 2082291 |
| 4 | 2083290 |
| 14 | 2083581 |
| 15 | 2084110 |
+------------+--------------------+
Scenario 3: Intersection between a bitmap tag table and the source table
Find users where (tag1 = 'x' OR tag1 = 'y') AND tag2 = 'b', where the second condition is filtered from the raw users table rather than a pre-built bitmap tag table.
Use RB_BUILD_AGG to build a bitmap on the fly from the users table, then AND it with the pre-built bitmap from tag_tbl_1.
SELECT user_group, RB_CARDINALITY(rb) FROM (
SELECT
t1.user_group AS user_group,
RB_AND(rb1, rb2) AS rb
FROM (
SELECT user_group, RB_OR_AGG(rb) AS rb1
FROM tag_tbl_1
WHERE tag1 = 'x' OR tag1 = 'y'
GROUP BY user_group
) AS t1
JOIN (
SELECT user_group, RB_BUILD_AGG(offset) AS rb2
FROM users
WHERE tag2 = 'b'
GROUP BY user_group
) AS t2 ON t1.user_group = t2.user_group
GROUP BY user_group
);+------------+--------------------+
| user_group | rb_cardinality(rb) |
+------------+--------------------+
| 3 | 2082370 |
| 1 | 2082291 |
| 0 | 2082383 |
| 4 | 2083290 |
| 11 | 2081662 |
| 13 | 2085280 |
...
| 14 | 2083581 |
| 15 | 2084110 |
| 9 | 2082847 |
| 8 | 2084860 |
| 5 | 2083056 |
| 7 | 2083275 |
+------------+--------------------+
Scenario 4: Export bitmap results to OSS
Save the bitmap results from Scenario 2 to an OSS external table for downstream use.
-
Create the output table.
CREATE TABLE `tag_tbl_3` ( `user_group` INT, `rb` ROARINGBITMAP ) engine = 'oss' TABLE_PROPERTIES = '{ "endpoint": "oss-cn-zhangjiakou.aliyuncs.com", "accessid": "************", "accesskey": "************", "url": "oss://testBucketName/roaringbitmap/tag_tbl_3/", "format": "parquet" }'; -
Write the Scenario 2 results to
tag_tbl_3.INSERT OVERWRITE tag_tbl_3 SELECT t1.user_group AS user_group, RB_AND(rb1, rb2) AS rb FROM ( SELECT user_group, RB_OR_AGG(rb) AS rb1 FROM tag_tbl_1 WHERE tag1 = 'x' OR tag1 = 'y' GROUP BY user_group ) AS t1 JOIN ( SELECT user_group, RB_OR_AGG(rb) AS rb2 FROM tag_tbl_2 WHERE tag2 = 'b' GROUP BY user_group ) AS t2 ON t1.user_group = t2.user_group;After the query completes, the result is stored in Parquet format at
oss://testBucketName/roaringbitmap/tag_tbl_3/.
Scenario 5: Accelerate queries with an internal cache table (for external tables)
Import bitmap data from an OSS external table into an internal table to speed up repeated queries. Because internal tables earlier than V3.2.1.0 do not support the ROARINGBITMAP type natively, store the bitmaps as VARBINARY and convert at query time.
-
Create the internal cache table with a VARBINARY column for the bitmap.
CREATE TABLE `tag_tbl_1_cstore` ( `tag1` VARCHAR, `rb` VARBINARY, `user_group` INT ); -
Import from the OSS table, serializing each bitmap to VARBINARY.
INSERT INTO tag_tbl_1_cstore SELECT tag1, RB_TO_VARBINARY(rb), user_group FROM tag_tbl_1; -
Query the cache table, deserializing VARBINARY back to bitmaps at query time.
SELECT tag1, user_group, RB_CARDINALITY(RB_OR_AGG(RB_BUILD_VARBINARY(rb))) FROM tag_tbl_1_cstore GROUP BY tag1, user_group;+------+------------+---------------------------------------------------+ | tag1 | user_group | rb_cardinality(rb_or_agg(rb_build_varbinary(rb))) | +------+------------+---------------------------------------------------+ | y | 3 | 2082919 | | x | 9 | 2083085 | | x | 3 | 2082140 | | y | 11 | 2082268 | | z | 4 | 2082451 | ... | z | 2 | 2081560 | | y | 6 | 2082194 | | z | 7 | 2082608 | +------+------------+---------------------------------------------------+