Apache Paimon tables in DLF record every data commit as a separate snapshot. This lets you query the exact state of a table at any point in its history or roll it back to a specific version — without maintaining separate backup copies.
How it works
Each commit creates a snapshot that captures the schema ID and a manifest list index for that version. The manifest list tracks every physical data file belonging to that snapshot, so a read against a historical version accesses only the files from that point in time — not the current dataset.
Paimon's Log-Structured Merge-Tree (LSM tree) storage engine underpins this mechanism. It supports append-only writes and multi-version coexistence natively, and its integrated changelog enables row-level change tracking across any two snapshots.
Use cases
-
Data repair: Roll back a table to the last clean snapshot when incorrect writes or dirty data are detected.
-
Historical analysis: Query the table state at a specific point in time for year-over-year or month-over-month trend analysis.
-
ML feature backfilling: Reproduce a dataset as it existed at a historical moment for model training or feature validation.
View and roll back versions
The DLF console provides a visual version management interface that lists all snapshots that have not yet expired.
Each snapshot in the list shows the following metadata:
| Field | Description |
|---|---|
| Snapshot ID | Unique identifier for the snapshot |
| Commit information | Commit time and commit type: Append or Compact |
| Data statistics | Total row count for that version, and the number of rows added in that commit |
| Schema version | Schema version ID associated with the snapshot |
Search for a snapshot by snapshot ID, tag name, or time range. After selecting a target snapshot, you can perform a one-click rollback to reset the table to that version.
Query historical data with SQL
You can also query historical data directly with SQL, without using the console.
Query by timestamp
Reads the snapshot closest to the specified point in time. Useful for auditing or reproducing a historical state.
SELECT * FROM t TIMESTAMP AS OF '2026-01-01 11:30:00'
Query by snapshot ID
Reads the complete data for a specific snapshot. Useful for precise version validation.
SELECT * FROM t VERSION AS OF 2
Query incremental changes
Reads the changelog between two timestamps — that is, only the rows that changed in that window. Useful for detailed change analysis.
SELECT * FROM t TIMESTAMP BETWEEN '2026-01-01 10:00' AND '2026-01-07 10:00'