Understanding data latency and consistency in Paimon - Realtime Compute for Apache Flink

Apache Paimon (Paimon) uses snapshot-based reads and a two-phase commit protocol to deliver predictable data latency and well-defined consistency guarantees in Flink streaming pipelines.

How it works

Snapshots

A snapshot captures the state of a Paimon table at a specific point in time. All reads — whether in streaming or batch mode — are served from snapshots rather than from in-flight write buffers.

Snapshots serve two purposes depending on how you query the table:

Streaming mode: Adjust the consumer offset to start reading from a specific point in time.
Batch mode: Implement time travel to query the table as it existed at a past snapshot.

To query available snapshots and their creation times, see the Snapshots table.

By default, snapshots are retained for 1 hour before deletion. If the retention period is too short or consumption is slow, a snapshot may be deleted while still in use, causing an error. To prevent this, change the snapshot retention period. You can also specify a consumer ID to protect an active consumer, or optimize Paimon table performance to keep pace with snapshot generation.

Data latency

Data latency is the delay between when data is written to a Paimon table and when it becomes visible to downstream consumers.

Data moves through the pipeline as follows:

The writer caches incoming data in memory and temporary files.
A Flink checkpoint triggers the writer to commit the cached data.
The commit produces a new snapshot file.
Downstream consumers detect the new snapshot and read from it.

Because a new snapshot is produced at each checkpoint (when no backpressure occurs), data latency equals the checkpoint interval.

Checkpoint interval guidance

Interval	Effect
Too short (excessively small)	May affect the performance of the Flink deployment
1–10 minutes (recommended)	Balanced latency and throughput for most workloads
Longer within the range	Better read and write efficiency; acceptable for latency-tolerant pipelines

Set the checkpoint interval between 1 and 10 minutes. Increase it toward the upper end of the range when read and write efficiency matters more than data freshness.

Consistency guarantees

Paimon uses a two-phase commit protocol to atomically commit data.

When two Flink deployments write concurrently to the same Paimon table, the consistency guarantee depends on whether they write to the same bucket.

Concurrent write scenarios

Scenario	Guarantee	What happens
Writers target different buckets	Sequential consistency	Concurrent commits succeed independently; no conflict resolution needed
Writers target the same bucket	Snapshot isolation	Paimon triggers a failover to resolve the conflict; no data changes are lost, but the final table state may contain commits from both deployments

To achieve sequential consistency, partition writes across deployments so that each deployment targets a distinct set of buckets.