TairBloom is a space-efficient probabilistic data structure for membership testing at scale. A negative result guarantees the element is absent. A positive result means the element is probably present — a small fraction of positive results are false positives. This trade-off lets TairBloom perform membership checks across millions of records using a fraction of the memory that a hash set requires.
TairBloom implements Scalable Bloom Filters (SBFs), which stack multiple Bloom filter layers to grow dynamically while keeping the false positive rate stable.
Key concepts
Bloom filter
A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, used to test whether an element is a member of a set. It is a bit array of _m_ bits (initially all 0) paired with _k_ hash functions. Adding an element sets _k_ bits to 1. Querying an element checks those _k_ bits — if all are 1, the element is probably present; if any is 0, the element is definitely absent. Because bits are shared across elements, false positives are possible. False negatives are not. Elements cannot be removed once added.
Scalable Bloom Filter (SBF)
As more elements are added to a fixed Bloom filter, the false positive rate rises. An SBF avoids this by stacking layers: when the current layer (for example, BF0) fills to the point where it can no longer maintain the target false positive rate, a new layer (BF1) is created automatically. New elements are always written to the latest layer. Queries traverse from the latest layer back to BF0.
TairBloom is built on SBFs. Each new layer has double the capacity of the previous one but occupies four times the memory.
Use cases
TairBloom fits any system that needs to skip an expensive lookup when an item is definitely new. The probabilistic false positive is acceptable because the cost of occasionally re-processing a known item is lower than the cost of always querying the database.
TairBloom can be used for recommendation and crawler systems in industries such as live-streaming, music, and e-commerce.
Recommendation system
Problem: How do you avoid recommending the same article to the same user twice, without querying a database on every recommendation request?
Implementation: Store each recommended article ID per user in a TairBloom key. Before sending a recommendation, call
BF.EXISTS. If the result is 0, the article has never been recommended — send it and record it withBF.ADD. If the result is 1, skip to the next candidate.Benefit: A single in-memory check replaces a database round-trip for the vast majority of requests. Occasional false positives (skipping an article that was not actually recommended) are acceptable.
void recommendedSystem(userid) {
while (true) {
// Get a candidate article ID.
docid = getDocByRandom()
if (bf.exists(userid, docid)) {
// Probably already recommended — skip.
continue;
} else {
// Definitely not recommended — send it.
sendRecommendMsg(docid);
// Record the recommendation.
bf.add(userid, docid);
break;
}
}
}Crawler system
Problem: How do you prevent re-crawling URLs that have already been processed?
Implementation: Store all crawled URLs in a TairBloom key. Before downloading a URL, call
BF.EXISTS. If the result is 0, crawl it and add it to the filter. If the result is 1, skip it.Benefit: Cuts duplicate work without maintaining a full URL database in memory.
bool crawlerSystem() {
while (true) {
url = getURLFromQueue()
if (bf.exists(url_bloom, url)) {
// Probably already crawled — skip.
continue;
} else {
doDownload(url)
bf.add(url_bloom, url);
}
}
}For more examples, see Use Bloom filters to manage game event push notifications.
Prerequisites
Before you begin, ensure that you have:
A Tair DRAM-based instance
The latest minor version provides more features and higher stability. Update your instance to the latest minor version. For more information, see Update the minor version of an instance. For cluster or read/write splitting instances, also update the proxy nodes to the latest minor version to ensure all commands run as expected.
Usage notes
Choose the right creation command
The most important decision before writing data is whether to use BF.ADD or BF.RESERVE. Getting the initial capacity right avoids unnecessary scale-ups.
BF.ADD / BF.MADD | BF.RESERVE / BF.INSERT | |
|---|---|---|
| Initial capacity | 100 (default) | Specified by you |
| False positive rate | 0.01 (default) | Specified by you |
| When to use | Prototyping or small datasets (fewer than 100 elements) | Any production use case where you know the approximate dataset size |
If you undersize the initial capacity, TairBloom adds new layers as the filter fills up. This is not an error — it is how SBFs work. However, each additional layer increases query latency because queries must traverse more layers. If you oversize, you reserve more memory than needed.
For 10,000,000 elements at a false positive rate of 0.01:
BF.ADD(default capacity 100, auto-scaling): 176 MBBF.RESERVE(capacity set to 10,000,000 upfront): 16 MB
Use the following table to estimate the memory required for a given capacity and target false positive rate:
| Capacity | False positive: 0.01 | False positive: 0.001 | False positive: 0.0001 |
|---|---|---|---|
| 100,000 | 0.12 MB | 0.25 MB | 0.25 MB |
| 1,000,000 | 2 MB | 2 MB | 4 MB |
| 10,000,000 | 16 MB | 32 MB | 32 MB |
| 100,000,000 | 128 MB | 256 MB | 256 MB |
| 1,000,000,000 | 2 GB | 2 GB | 4 GB |
Ultra-high capacity with ultra-high precision may fail to be created if the instance does not have enough available memory.
Monitor for scale-up
Run BF.INFO to check whether a key is approaching a scale-up. When the items count of the latest layer equals its capacity, a scale-up is imminent.
Treat scale-up as a safety mechanism, not normal operating behavior. Each scale-up doubles capacity but quadruples the memory of the new layer. After an unexpected scale-up, rebuild the key promptly using DEL and BF.RESERVE to restore single-layer performance and reduce the risk of future scale-ups. Make sure the instance has enough free memory before writing to a scaled-up key — insufficient memory can trigger prolonged data eviction and block requests.
Manage key size
Elements can only be added to a key, never removed. The key size grows over time and cannot shrink. To prevent out of memory (OOM) errors:
Split data across multiple keys. Splitting prevents large keys from concentrating traffic on a single node. In cluster instances, distribute keys across nodes to avoid hot keys.
Rebuild keys periodically. Delete the key with
DEL, then repopulate it from your backend database. Alternatively, maintain two keys and rotate between them — this keeps each key bounded in size at the cost of additional memory.
Supported commands
| Command | Syntax | Description |
|---|---|---|
| BF.RESERVE | BF.RESERVE key error_rate capacity | Creates an empty TairBloom key with the specified capacity and false positive rate |
| BF.ADD | BF.ADD key item | Adds an element to a TairBloom key |
| BF.MADD | BF.MADD key item [item ...] | Adds multiple elements to a TairBloom key |
| BF.EXISTS | BF.EXISTS key item | Checks whether an element exists in a TairBloom key |
| BF.MEXISTS | BF.MEXISTS key item [item ...] | Checks whether multiple elements exist in a TairBloom key |
| BF.INSERT | BF.INSERT key [CAPACITY cap] [ERROR error] [NOCREATE] ITEMS item [item ...] | Adds multiple elements to a TairBloom key; optionally creates the key with a specified capacity and false positive rate |
| BF.INFO | BF.INFO key | Returns information about a TairBloom key, including layer count, items per layer, and false positive rate |
| DEL | DEL key [key ...] | Deletes one or more TairBloom keys |
Elements already added to a key cannot be deleted individually. DEL removes the entire key.Command syntax conventions
UPPERCASE: command keyword_italic_: variable[options]: optional parameter; parameters not in brackets are requiredA|B: mutually exclusive parameters; only one can be specified...: the preceding parameter can be repeated
BF.RESERVE
Time complexity| Item | Description |
|---|---|
| Syntax | BF.RESERVE key error_rate capacity |
| O(1) | |
| Description | Creates an empty TairBloom key with the specified initial capacity and false positive rate |
Parameters
| Parameter | Description |
|---|---|
key | The key name |
error_rate | The target false positive rate. Must be between 0 and 1. Lower values improve accuracy but increase memory usage and CPU utilization. |
capacity | The initial capacity. Specifies the maximum number of elements before a new layer is added. Set this to your expected dataset size to minimize the number of layers and maximize query performance. |
Output
OKon successAn error message otherwise
Example
BF.RESERVE BFKEY 0.01 100OKBF.ADD
| Item | Description |
|---|---|
| Syntax | BF.ADD key item |
| Time complexity | O(log N), where N is the number of Bloom filter layers |
| Description | Adds an element to a TairBloom key. If the key does not exist, it is created automatically with a default capacity of 100 and a false positive rate of 0.01. |
Parameters
| Parameter | Description |
|---|---|
key | The key name |
item | The element to add |
Output
| Return value | Meaning |
|---|---|
1 | Element added (did not exist in the filter) |
0 | Element not added (may already exist) |
| Error | Operation failed |
Example
BF.ADD BFKEY item1(integer) 1BF.MADD
| Item | Description |
|---|---|
| Syntax | BF.MADD key item [item ...] |
| Time complexity | O(log N), where N is the number of Bloom filter layers |
| Description | Adds multiple elements to a TairBloom key. If the key does not exist, it is created automatically with a default capacity of 100 and a false positive rate of 0.01. |
Parameters
| Parameter | Description |
|---|---|
key | The key name |
item | One or more elements to add |
Output
Returns one value per element, in order:
| Return value | Meaning |
|---|---|
1 | Element added (did not exist in the filter) |
0 | Element not added (may already exist) |
| Error | Operation failed |
Example
BF.MADD BFKEY item1 item2 item3(integer) 1
(integer) 1
(integer) 1BF.EXISTS
| Item | Description |
|---|---|
| Syntax | BF.EXISTS key item |
| Time complexity | O(log N), where N is the number of Bloom filter layers |
| Description | Checks whether an element exists in a TairBloom key |
Parameters
| Parameter | Description |
|---|---|
key | The key name |
item | The element to check |
Output
| Return value | Meaning |
|---|---|
0 | Element is definitely absent |
1 | Element is probably present (may be a false positive) |
| Error | Operation failed |
Example
BF.EXISTS BFKEY item1(integer) 1BF.MEXISTS
| Item | Description |
|---|---|
| Syntax | BF.MEXISTS key item [item ...] |
| Time complexity | O(log N), where N is the number of Bloom filter layers |
| Description | Checks whether multiple elements exist in a TairBloom key |
Parameters
| Parameter | Description |
|---|---|
key | The key name |
item | One or more elements to check |
Output
Returns one value per element, in order:
| Return value | Meaning |
|---|---|
0 | Element is definitely absent |
1 | Element is probably present (may be a false positive) |
| Error | Operation failed |
Example
BF.MEXISTS BFKEY item1 item5(integer) 1
(integer) 0BF.INSERT
| Item | Description |
|---|---|
| Syntax | BF.INSERT key [CAPACITY cap] [ERROR error] [NOCREATE] ITEMS item [item ...] |
| Time complexity | O(log N), where N is the number of Bloom filter layers |
| Description | Adds multiple elements to a TairBloom key. Optionally creates the key with a specified capacity and false positive rate, or prevents auto-creation. |
Parameters
| Parameter | Description |
|---|---|
key | The key name |
CAPACITY cap | The initial capacity. Ignored if the key already exists. If the number of elements exceeds this value, a new Bloom filter layer is added automatically. |
ERROR error | The target false positive rate. Must be between 0 and 1. |
NOCREATE | Prevents the key from being created if it does not exist. Cannot be used together with CAPACITY or ERROR. |
ITEMS item [item ...] | One or more elements to add |
Output
Returns one value per element, in order:
| Return value | Meaning |
|---|---|
1 | Element added (did not exist in the filter) |
0 | Element not added (may already exist) |
| Error | Operation failed |
Example
BF.INSERT bfkey1 CAPACITY 10000 ERROR 0.001 ITEMS item1 item2 item3(integer) 1
(integer) 1
(integer) 1BF.INFO
| Item | Description |
|---|---|
| Syntax | BF.INFO key |
| Time complexity | O(log N), where N is the number of Bloom filter layers |
| Description | Returns information about a TairBloom key, including the number of layers, items per layer, and false positive rate |
Parameters
| Parameter | Description |
|---|---|
key | The key name |
Output
Key information on success
An error message otherwise
Example
BF.INFO bk11) "total_items:6,num_blooms:2"
2) "bytes:4 bits:32 hashes:7 hashwidth:64 capacity:3 items:3 error_ratio:0.01"
3) "bytes:16 bits:128 hashes:9 hashwidth:64 capacity:10 items:3 error_ratio:0.0025"Response fields
The first line is a summary. Each subsequent line describes one Bloom filter layer.
| Field | Description |
|---|---|
total_items | Total number of elements across all layers |
num_blooms | Total number of Bloom filter layers |
bytes | Memory occupied by this layer, in bytes |
bits | Number of bits (bytes x 8) |
hashes | Number of hash functions |
hashwidth | Width of the hash functions |
capacity | Capacity of this layer |
items | Number of elements in this layer |
error_ratio | False positive rate of this layer |
Tip: Whenitemsequalscapacityin the latest layer, a scale-up is imminent.