All Products
Search
Document Center

Realtime Compute for Apache Flink:State backend

Last Updated:Mar 26, 2026

GeminiStateBackend is a key-value (KV) storage engine built for stream processing, and the default state backend for Realtime Compute for Apache Flink. This topic describes its core design and compares its performance with RocksDBStateBackend.

When to use GeminiStateBackend

GeminiStateBackend is the default backend for Realtime Compute for Apache Flink.

GeminiStateBackend is particularly well suited for the following scenarios:

Scenario Why GeminiStateBackend helps
Jobs with large state — state data exceeds or risks exceeding local disk capacity Decouples storage and compute so state operates independently of local disks
Dual-stream or multi-stream joins — low join success rates or large state values KV separation significantly improves throughput in state-intensive join workloads
Jobs sensitive to checkpoint duration — slow or unstable checkpoints in large-state jobs Decouples checkpoints from the LSM compaction mechanism to make them faster and more predictable
Operators with varied access patterns — different operators in the same job need different tuning Adaptive parameter tuning eliminates manual configuration across all operators

How it works

Stream processing places two demands on state storage:

  • High random-access volume, few range queries — state lookups are mostly point reads, not scans.

  • Dynamic traffic and hot spots — access patterns shift frequently, and different concurrent instances of the same operator can behave differently.

GeminiStateBackend addresses these demands with an architecture built on a Log-Structured Merge-tree (LSM tree). It combines three mechanisms:

  • Adaptive adjustments based on data scale and access patterns

  • Tiered storage for hot and cold data

  • Flexible switching between anti-caching and caching architectures

A hash-based storage structure handles random queries on top of this foundation.

Key capabilities

Storage and compute decoupling

Problem: When local disk space is limited, jobs with large state fail from disk exhaustion. Resolving this with RocksDBStateBackend typically means adding resources—such as increasing concurrency—to resolve the disk limit.

Solution: GeminiStateBackend decouples state storage from local disks. State storage operates independently of local disks, preventing job failures caused by state data exceeding local disk capacity.

For configuration details, see Storage and compute decoupling configuration.

Adaptive KV separation

Problem: Dual-stream and multi-stream joins are among the most state-intensive workloads in stream processing. When join success rates are low or state values are large, state storage becomes the bottleneck.

Solution: GeminiStateBackend introduces KV separation to address this. The feature is fully adaptive—it requires no extra configuration or tuning. Verified by Alibaba Group's core services during the Double 11 shopping festival:

  • Job throughput capacity increased by 50%–70%

  • Average compute resource utilization increased by 50%

  • In the most-improved scenarios, utilization increased by 100%–200%

For configuration details, see KV separation configuration.

Lightweight job snapshots

Problem: LSM compaction can interfere with checkpoint and snapshot completion, making them slower and less stable for large-state jobs.

Solution: GeminiStateBackend supports more fine-grained job snapshots and decouples checkpoints from the LSM compaction mechanism. This makes checkpoints faster and more stable. It also supports native incremental savepoints. Combined with the native snapshots provided by Realtime Compute for Apache Flink, savepoint performance approaches that of checkpoints—greatly improving snapshot availability for large-state jobs.

Adaptive parameter tuning

Problem: Different operators within the same job often have different state access patterns and require different parameter combinations for optimal performance. Manual tuning across all operators is impractical at scale.

Solution: GeminiStateBackend automatically adjusts parameters at runtime based on current access patterns and traffic. Verified by Alibaba Group's core services during the Double 11 shopping festival:

  • Eliminates manual tuning in over 95% of cases

  • Increases single-core throughput capacity by 10%–40%

For configuration details, see Adaptive parameter tuning configuration.

Nexmark performance comparison

The following results are from Nexmark state-bottlenecked use cases, tested on identical hardware. Performance is measured as single-core throughput capacity (TPS/Core).

The Nexmark website is a third-party site. Access may be slow or unavailable.
Case Gemini TPS/Core RocksDB TPS/Core Improvement
q4 83.63 K/s 53.26 K/s 57.02%
q5 84.52 K/s 57.86 K/s 46.08%
q8 468.96 K/s 361.37 K/s 29.77%
q9 59.42 K/s 26.56 K/s 123.72%
q11 93.08 K/s 48.82 K/s 90.66%
q18 150.93 K/s 87.37 K/s 72.75%
q19 143.46 K/s 58.5 K/s 145.23%
q20 75.69 K/s 22.44 K/s 237.30%

In about half of these cases, GeminiStateBackend outperforms RocksDBStateBackend by over 70%.

What's next