All Products
Search
Document Center

Tair (Redis® OSS-Compatible):TairRoaring performance whitepaper

Last Updated:Mar 28, 2026

TairRoaring is Alibaba Cloud Tair's implementation of the Roaring bitmap data structure. It delivers high query throughput at low memory cost by combining three optimizations:

  • Two-level indexes and dynamic containers — balance between performance and space complexity across a wide range of data distributions.

  • SIMD (single instruction, multiple data), vectorization, and popcount algorithms — improve computational efficiency for both time and space complexity.

  • Tair's storage engine — provides the computing performance and stability that production workloads require.

For a full command reference, see TairRoaring.

Test environment

All benchmarks in this document run against a 16 GB Tair (Enterprise Edition) DRAM-based instance. Network latency between the client and instance is under 0.1 ms.

For cluster architecture tests, four DRAM-based cluster instances are used, each shard sized at 4 GB:

InstanceShards
8 GB2
16 GB4
32 GB8
64 GB16

The client is deployed in the same zone as the instances and connects through proxy endpoints. All instances are configured with a maximum bandwidth of 2,048 Mbit/s to prevent network saturation during testing.

The QPS values throughout this document are reference values measured under specific test conditions. Actual performance depends on your instance specification, data distribution, key count, network environment, and command mix.

Test tool

Download: redis-benchmark.tar.gz

The tool is written in Go and follows the same design as the redis-benchmark utility — easy to use and modify. It generates two independent random values (__RAND__ and __RAND2__) per request and outputs results as histograms.

Parameters:

ParameterDefaultDescription
-h127.0.0.1Instance endpoint
-port6379Instance port
-aPassword in <user>:<password> format
-d30Test duration (seconds)
-cTotal number of requests; overrides -d when set
-p4Concurrent connections
-r100000000Range for __RAND__
-r2100000000Range for __RAND2__
-commandTR.GETBIT foo-__RAND__Command to run; append __RAND__ or __RAND2__ to a command or parameter suffix to generate random values
-batchingPipeline batch size; wraps commands in MULTI/EXEC

Example:

# Run TR.SETBIT with 20 concurrent connections for 30 seconds.
# Keys: foo-0 to foo-99999, bit offsets: 0 to 9999999.
./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
  -a user:password \
  -d 30 \
  -r 100000 \
  -r2 10000000 \
  -command "TR.SETBIT foo-__RAND__ bar-__RAND2__" \
  -p 20 \
  -c 1

Standard architecture benchmarks

All tests in this section use a single 16 GB standard-architecture instance.

Single-key tests

Read commands

Each test runs 20 concurrent connections against a single pre-populated key.

Sample command:

./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
  -a user:password \
  -d 30 \
  -r 100000 \
  -r2 10000000 \
  -command "TR.GETBIT foo-__RAND__ bar-__RAND2__" \
  -p 20 \
  -c 1

Results:

CommandRecords per requestParameter rangeQPSAvg latency (ms)
TR.GETBIT11–10000000255,0000.21
TR.GETBITS1001–1000000054,0000.97
TR.RANK11–10000000161,0000.24
TR.RANGE1000–100129,0000.46
TR.SCAN1001–10000000130,0000.38

Write commands

Sample command:

./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
  -a user:password \
  -d 30 \
  -r 100000 \
  -r2 10000000 \
  -command "TR.SETBIT foo-__RAND__ bar-__RAND2__" \
  -p 20 \
  -c 1

Results:

CommandRecords per requestParameter rangeQPSAvg latency (ms)
TR.SETBIT11–10000000145,0000.37
TR.SETBITS1001–1000000022,0000.71
TR.SETBITS (ordered)100 (max offset 2^32)1–600028,0000.66
TR.APPENDBITARRAY1000 (half bits = 1)1–1000000010,0000.38
TR.SETRANGE1000130,0000.30
TR.FLIPRANGE1–10000000100,0000.46

Pipeline tests

Pipeline mode groups multiple commands between MULTI and EXEC, increasing throughput at the cost of higher per-request latency.

TR.SETBIT pipeline

./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
  -a user:password \
  -d 30 \
  -r 10000000 \
  -command "TR.SETBIT foo-__RAND__ 1" \
  -batching 10 \
  -p 20 \
  -c 1
Concurrent connectionsBatch sizeBits/sAvg latency (ms)
2010460,0000.42
1050665,0000.72
6100660,0000.85
3200680,0000.79
3500681,5001.96
2100658,0002.60
TairRoaring性能测试-3

TR.GETBIT pipeline

./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
  -a user:password \
  -d 30 \
  -r 100000 \
  -r2 10000000 \
  -command "TR.GETBIT foo-__RAND__ __RAND2__" \
  -batching 10 \
  -p 20 \
  -c 1
If the key foo-__RAND__ does not exist, it is treated as an empty key. Populate keys before running this test.
Concurrent connectionsBatch sizeBits/sAvg latency (ms)
2010572,7000.34
1050725,9000.65
7100772,0000.85
7200788,8001.67
5500746,0003.10
2100770,0002.10
TairRoaring性能发布-4

Memory usage

Run TR.STAT foo JSON to inspect the container distribution for a key. TairRoaring stores data in three container types — array containers, bitset containers, and RLE (run-length encoding) containers — and selects the most space-efficient type based on data distribution.

Array containers are automatically converted to bitset containers when they exceed 4,096 elements.

Sparse data (bit density < 6.25%)

Sparse keys primarily use array containers. Memory scales linearly: key capacity = cardinality × 2 bytes.

CardinalityRLE containersArray containersBitset containersHeap memory (bytes)
37,700,48465,53675,400,968
75,011,38465,536150,022,768
100,403,26465,536200,806,528
163,090,59265,536326,181,184

Random data

As cardinality increases, array containers convert to bitset containers. Heap memory stabilizes at 536,870,912 bytes when all containers are bitset containers.

CardinalityRLE containersArray containersBitset containersHeap memory (bytes)
253,104,08865,5342506,208,102
261,169,65963,2732,263522,227,634
267,974,80435,93229,604533,159,296
273,694,2536,60758,929536,491,922
343,504,134065,536536,870,912
535,589,835065,536536,870,912

Consecutive bits

Memory usage for consecutive bit distributions is closely tied to the specific bit pattern and has high variance. These results are not representative for general planning.

Multi-key tests

Write and read commands

TR.SETBITS writes 100 random bits per call. Each test runs 20 concurrent connections across 100,000 distinct keys.

Sample command:

./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
  -a user:password \
  -d 30 \
  -r 10000000 \
  -r2 100000 \
  -command "TR.SETBITS foo-__RAND2__ __RAND__ __RAND__ ... (100 times)" \
  -p 20
CommandRecords per requestQPSAvg latency (ms)
TR.SETBIT191,0030.43
TR.SETBITS10015,1270.96
TR.GETBIT1154,9410.32
TR.GETBITS10040,1661.08
TR.SCAN100104,6370.47
TR.RANK151,1610.32
TairRoaring性能测试-5

Bitwise operations

TR.BITOP

TR.BITOP performs a bitwise operation on multiple keys and stores the result in a new key. Supported operations: AND, OR, NOT, DIFF, and XOR.

Sample command:

./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
  -a user:password \
  -d 30 \
  -r 100000 \
  -r2 10000000 \
  -command "TR.BITOP dest-__RAND__ AND foo-__RAND__ foo-__RAND__" \
  -p 3 \
  -c 1
OperationConcurrent connectionsQPSAvg latency (ms)
AND39403.18
OR25953.45
XOR25513.61
DIFF33,5770.83
NOT11,2810.77
TairRoaring性能测试-7
TR.BITOPCARD

TR.BITOPCARD performs a bitwise operation on multiple keys and returns the count of bits set to 1, without writing a result key. Supported operations: AND, OR, NOT, DIFF, and XOR.

Sample command:

./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
  -a user:password \
  -d 30 \
  -r 100000 \
  -r2 10000000 \
  -command "TR.BITOPCARD AND foo-__RAND__ foo-__RAND__" \
  -p 2 \
  -c 1
OperationConcurrent connectionsQPSAvg latency (ms)
AND29712.05
OR26093.27
XOR25723.48
DIFF24,2900.46
NOT23,5770.55
TairRoaring性能测试-8

Cluster architecture benchmarks

These tests measure horizontal scaling across four DRAM-based cluster instances. Each test runs single-key commands with 20 concurrent connections. The commands tested are TR.GETBIT, TR.GETBITS, TR.SETBIT, and TR.SETBITS.

Sample command:

./redis -h r-**********0d7f.redis.zhangbei.rds.aliyuncs.com \
  -a user:password \
  -d 30 \
  -r 100000 \
  -r2 10000000 \
  -command "TR.SETBIT foo-__RAND__ bar-__RAND2__" \
  -p 20 \
  -c 1

Results:

Command2 shards (8 GB)4 shards (16 GB)8 shards (32 GB)16 shards (64 GB)
TR.GETBIT590,742567,738569,610555,178
TR.GETBITS53,90091,991172,969229,214
TR.SETBIT316,753530,367577,406558,301
TR.SETBITS31,91757,843116,614160,891
TairRoaring性能测试-9

TR.GETBIT QPS is roughly flat across shard counts because single-bit reads already saturate per-shard capacity at 2 shards. TR.GETBITS and TR.SETBITS scale nearly linearly with shard count because each multi-record request distributes work across shards through the proxy.