TairRoaring is Alibaba Cloud Tair's implementation of the Roaring bitmap data structure. It delivers high query throughput at low memory cost by combining three optimizations:
Two-level indexes and dynamic containers — balance between performance and space complexity across a wide range of data distributions.
SIMD (single instruction, multiple data), vectorization, and popcount algorithms — improve computational efficiency for both time and space complexity.
Tair's storage engine — provides the computing performance and stability that production workloads require.
For a full command reference, see TairRoaring.
Test environment
All benchmarks in this document run against a 16 GB Tair (Enterprise Edition) DRAM-based instance. Network latency between the client and instance is under 0.1 ms.
For cluster architecture tests, four DRAM-based cluster instances are used, each shard sized at 4 GB:
| Instance | Shards |
|---|---|
| 8 GB | 2 |
| 16 GB | 4 |
| 32 GB | 8 |
| 64 GB | 16 |
The client is deployed in the same zone as the instances and connects through proxy endpoints. All instances are configured with a maximum bandwidth of 2,048 Mbit/s to prevent network saturation during testing.
The QPS values throughout this document are reference values measured under specific test conditions. Actual performance depends on your instance specification, data distribution, key count, network environment, and command mix.
Test tool
Download: redis-benchmark.tar.gz
The tool is written in Go and follows the same design as the redis-benchmark utility — easy to use and modify. It generates two independent random values (__RAND__ and __RAND2__) per request and outputs results as histograms.
Parameters:
| Parameter | Default | Description |
|---|---|---|
-h | 127.0.0.1 | Instance endpoint |
-port | 6379 | Instance port |
-a | — | Password in <user>:<password> format |
-d | 30 | Test duration (seconds) |
-c | — | Total number of requests; overrides -d when set |
-p | 4 | Concurrent connections |
-r | 100000000 | Range for __RAND__ |
-r2 | 100000000 | Range for __RAND2__ |
-command | TR.GETBIT foo-__RAND__ | Command to run; append __RAND__ or __RAND2__ to a command or parameter suffix to generate random values |
-batching | — | Pipeline batch size; wraps commands in MULTI/EXEC |
Example:
# Run TR.SETBIT with 20 concurrent connections for 30 seconds.
# Keys: foo-0 to foo-99999, bit offsets: 0 to 9999999.
./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
-a user:password \
-d 30 \
-r 100000 \
-r2 10000000 \
-command "TR.SETBIT foo-__RAND__ bar-__RAND2__" \
-p 20 \
-c 1Standard architecture benchmarks
All tests in this section use a single 16 GB standard-architecture instance.
Single-key tests
Read commands
Each test runs 20 concurrent connections against a single pre-populated key.
Sample command:
./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
-a user:password \
-d 30 \
-r 100000 \
-r2 10000000 \
-command "TR.GETBIT foo-__RAND__ bar-__RAND2__" \
-p 20 \
-c 1Results:
| Command | Records per request | Parameter range | QPS | Avg latency (ms) |
|---|---|---|---|---|
| TR.GETBIT | 1 | 1–10000000 | 255,000 | 0.21 |
| TR.GETBITS | 100 | 1–10000000 | 54,000 | 0.97 |
| TR.RANK | 1 | 1–10000000 | 161,000 | 0.24 |
| TR.RANGE | 100 | 0–100 | 129,000 | 0.46 |
| TR.SCAN | 100 | 1–10000000 | 130,000 | 0.38 |
Write commands
Sample command:
./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
-a user:password \
-d 30 \
-r 100000 \
-r2 10000000 \
-command "TR.SETBIT foo-__RAND__ bar-__RAND2__" \
-p 20 \
-c 1Results:
| Command | Records per request | Parameter range | QPS | Avg latency (ms) |
|---|---|---|---|---|
| TR.SETBIT | 1 | 1–10000000 | 145,000 | 0.37 |
| TR.SETBITS | 100 | 1–10000000 | 22,000 | 0.71 |
| TR.SETBITS (ordered) | 100 (max offset 2^32) | 1–6000 | 28,000 | 0.66 |
| TR.APPENDBITARRAY | 1000 (half bits = 1) | 1–10000000 | 10,000 | 0.38 |
| TR.SETRANGE | — | 1000 | 130,000 | 0.30 |
| TR.FLIPRANGE | — | 1–10000000 | 100,000 | 0.46 |
Pipeline tests
Pipeline mode groups multiple commands between MULTI and EXEC, increasing throughput at the cost of higher per-request latency.
TR.SETBIT pipeline
./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
-a user:password \
-d 30 \
-r 10000000 \
-command "TR.SETBIT foo-__RAND__ 1" \
-batching 10 \
-p 20 \
-c 1| Concurrent connections | Batch size | Bits/s | Avg latency (ms) |
|---|---|---|---|
| 20 | 10 | 460,000 | 0.42 |
| 10 | 50 | 665,000 | 0.72 |
| 6 | 100 | 660,000 | 0.85 |
| 3 | 200 | 680,000 | 0.79 |
| 3 | 500 | 681,500 | 1.96 |
| 2 | 100 | 658,000 | 2.60 |

TR.GETBIT pipeline
./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
-a user:password \
-d 30 \
-r 100000 \
-r2 10000000 \
-command "TR.GETBIT foo-__RAND__ __RAND2__" \
-batching 10 \
-p 20 \
-c 1If the key foo-__RAND__ does not exist, it is treated as an empty key. Populate keys before running this test.| Concurrent connections | Batch size | Bits/s | Avg latency (ms) |
|---|---|---|---|
| 20 | 10 | 572,700 | 0.34 |
| 10 | 50 | 725,900 | 0.65 |
| 7 | 100 | 772,000 | 0.85 |
| 7 | 200 | 788,800 | 1.67 |
| 5 | 500 | 746,000 | 3.10 |
| 2 | 100 | 770,000 | 2.10 |

Memory usage
Run TR.STAT foo JSON to inspect the container distribution for a key. TairRoaring stores data in three container types — array containers, bitset containers, and RLE (run-length encoding) containers — and selects the most space-efficient type based on data distribution.
Array containers are automatically converted to bitset containers when they exceed 4,096 elements.
Sparse data (bit density < 6.25%)
Sparse keys primarily use array containers. Memory scales linearly: key capacity = cardinality × 2 bytes.
| Cardinality | RLE containers | Array containers | Bitset containers | Heap memory (bytes) |
|---|---|---|---|---|
| 37,700,484 | — | 65,536 | — | 75,400,968 |
| 75,011,384 | — | 65,536 | — | 150,022,768 |
| 100,403,264 | — | 65,536 | — | 200,806,528 |
| 163,090,592 | — | 65,536 | — | 326,181,184 |
Random data
As cardinality increases, array containers convert to bitset containers. Heap memory stabilizes at 536,870,912 bytes when all containers are bitset containers.
| Cardinality | RLE containers | Array containers | Bitset containers | Heap memory (bytes) |
|---|---|---|---|---|
| 253,104,088 | — | 65,534 | 2 | 506,208,102 |
| 261,169,659 | — | 63,273 | 2,263 | 522,227,634 |
| 267,974,804 | — | 35,932 | 29,604 | 533,159,296 |
| 273,694,253 | — | 6,607 | 58,929 | 536,491,922 |
| 343,504,134 | — | 0 | 65,536 | 536,870,912 |
| 535,589,835 | — | 0 | 65,536 | 536,870,912 |
Consecutive bits
Memory usage for consecutive bit distributions is closely tied to the specific bit pattern and has high variance. These results are not representative for general planning.
Multi-key tests
Write and read commands
TR.SETBITS writes 100 random bits per call. Each test runs 20 concurrent connections across 100,000 distinct keys.
Sample command:
./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
-a user:password \
-d 30 \
-r 10000000 \
-r2 100000 \
-command "TR.SETBITS foo-__RAND2__ __RAND__ __RAND__ ... (100 times)" \
-p 20| Command | Records per request | QPS | Avg latency (ms) |
|---|---|---|---|
| TR.SETBIT | 1 | 91,003 | 0.43 |
| TR.SETBITS | 100 | 15,127 | 0.96 |
| TR.GETBIT | 1 | 154,941 | 0.32 |
| TR.GETBITS | 100 | 40,166 | 1.08 |
| TR.SCAN | 100 | 104,637 | 0.47 |
| TR.RANK | — | 151,161 | 0.32 |

Bitwise operations
TR.BITOP
TR.BITOP performs a bitwise operation on multiple keys and stores the result in a new key. Supported operations: AND, OR, NOT, DIFF, and XOR.
Sample command:
./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
-a user:password \
-d 30 \
-r 100000 \
-r2 10000000 \
-command "TR.BITOP dest-__RAND__ AND foo-__RAND__ foo-__RAND__" \
-p 3 \
-c 1| Operation | Concurrent connections | QPS | Avg latency (ms) |
|---|---|---|---|
| AND | 3 | 940 | 3.18 |
| OR | 2 | 595 | 3.45 |
| XOR | 2 | 551 | 3.61 |
| DIFF | 3 | 3,577 | 0.83 |
| NOT | 1 | 1,281 | 0.77 |

TR.BITOPCARD
TR.BITOPCARD performs a bitwise operation on multiple keys and returns the count of bits set to 1, without writing a result key. Supported operations: AND, OR, NOT, DIFF, and XOR.
Sample command:
./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
-a user:password \
-d 30 \
-r 100000 \
-r2 10000000 \
-command "TR.BITOPCARD AND foo-__RAND__ foo-__RAND__" \
-p 2 \
-c 1| Operation | Concurrent connections | QPS | Avg latency (ms) |
|---|---|---|---|
| AND | 2 | 971 | 2.05 |
| OR | 2 | 609 | 3.27 |
| XOR | 2 | 572 | 3.48 |
| DIFF | 2 | 4,290 | 0.46 |
| NOT | 2 | 3,577 | 0.55 |

Cluster architecture benchmarks
These tests measure horizontal scaling across four DRAM-based cluster instances. Each test runs single-key commands with 20 concurrent connections. The commands tested are TR.GETBIT, TR.GETBITS, TR.SETBIT, and TR.SETBITS.
Sample command:
./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
-a user:password \
-d 30 \
-r 100000 \
-r2 10000000 \
-command "TR.SETBIT foo-__RAND__ bar-__RAND2__" \
-p 20 \
-c 1Results:
| Command | 2 shards (8 GB) | 4 shards (16 GB) | 8 shards (32 GB) | 16 shards (64 GB) |
|---|---|---|---|---|
| TR.GETBIT | 590,742 | 567,738 | 569,610 | 555,178 |
| TR.GETBITS | 53,900 | 91,991 | 172,969 | 229,214 |
| TR.SETBIT | 316,753 | 530,367 | 577,406 | 558,301 |
| TR.SETBITS | 31,917 | 57,843 | 116,614 | 160,891 |

TR.GETBIT QPS is roughly flat across shard counts because single-bit reads already saturate per-shard capacity at 2 shards. TR.GETBITS and TR.SETBITS scale nearly linearly with shard count because each multi-record request distributes work across shards through the proxy.