All Products
Search
Document Center

Data Lake Formation:Time travel and version rollback

Last Updated:Mar 26, 2026

Apache Paimon tables in DLF record every data commit as a separate snapshot. This lets you query the exact state of a table at any point in its history or roll it back to a specific version — without maintaining separate backup copies.

How it works

Each commit creates a snapshot that captures the schema ID and a manifest list index for that version. The manifest list tracks every physical data file belonging to that snapshot, so a read against a historical version accesses only the files from that point in time — not the current dataset.

Paimon's Log-Structured Merge-Tree (LSM tree) storage engine underpins this mechanism. It supports append-only writes and multi-version coexistence natively, and its integrated changelog enables row-level change tracking across any two snapshots.

Use cases

  • Data repair: Roll back a table to the last clean snapshot when incorrect writes or dirty data are detected.

  • Historical analysis: Query the table state at a specific point in time for year-over-year or month-over-month trend analysis.

  • ML feature backfilling: Reproduce a dataset as it existed at a historical moment for model training or feature validation.

View and roll back versions

The DLF console provides a visual version management interface that lists all snapshots that have not yet expired.

Each snapshot in the list shows the following metadata:

Field Description
Snapshot ID Unique identifier for the snapshot
Commit information Commit time and commit type: Append or Compact
Data statistics Total row count for that version, and the number of rows added in that commit
Schema version Schema version ID associated with the snapshot

Search for a snapshot by snapshot ID, tag name, or time range. After selecting a target snapshot, you can perform a one-click rollback to reset the table to that version.

Query historical data with SQL

You can also query historical data directly with SQL, without using the console.

Query by timestamp

Reads the snapshot closest to the specified point in time. Useful for auditing or reproducing a historical state.

SELECT * FROM t TIMESTAMP AS OF '2026-01-01 11:30:00'

Query by snapshot ID

Reads the complete data for a specific snapshot. Useful for precise version validation.

SELECT * FROM t VERSION AS OF 2

Query incremental changes

Reads the changelog between two timestamps — that is, only the rows that changed in that window. Useful for detailed change analysis.

SELECT * FROM t TIMESTAMP BETWEEN '2026-01-01 10:00' AND '2026-01-07 10:00'