Understand Hot and Cold Data Separation in Lindorm - Lindorm

This page answers common questions about the hot and cold data separation feature in LindormTable.

When does data become cold?

Lindorm archives cold data asynchronously via a compaction process. The archival interval is calculated based on your hot-cold data boundary:

Parameter	Value
Default interval	Half the hot-cold data boundary
Minimum interval	1 day
Maximum interval	Half the major compaction period
Default major compaction period	20 days

For example, if your hot-cold data boundary is 3 days, a compaction runs every 1.5 days. If the boundary is 1 day, it runs once per day.

Can I manually trigger a compaction?

Yes. Run major_compact 'tableName' in an HBase Shell to force a compaction and immediately archive cold data to cold storage.

Important

The major_compact 'tableName' command increases I/O load. Avoid running it frequently.

Why isn't data cold after a compaction?

This can happen if the data you wrote has not yet been flushed to disk. Run a flush first to persist the in-memory data to disk, then run a compaction to move it to cold storage. To trigger compaction:

In HBase Shell, run the major_compact command.
In Lindorm-cli, use ALTER TABLE syntax.

Why is cold data archiving slowly?

Compaction backlog. If compactions are queuing up faster than they complete, hot-to-cold archiving slows down. Check the Compaction Queue Size(count) metric on the Instance Monitoring page under Table Metrics-Cluster Load. If the value is consistently above 0 and rising, you have a backlog. To resolve it, scale out or upgrade your instance to increase CPU resources. For details, see View monitoring information.

Outdated LindormTable version. Newer versions archive cold data faster. Check your current version in Release notes of LindormTable, then follow the steps in Minor version update to upgrade.

If neither resolves the issue, contact technical support.

Does cold data remain cold after an update (custom time column)?

It depends on whether the update changes the custom time column.

Update affects time column?	Result
No	Data remains cold
Yes	System re-evaluates hot/cold status based on the new value

Example: A table has primary key columns p1 and p2, and non-primary key columns c1 and c2. A row has values p1=row1, p2=2023-01-28, c1="c1", and c2="c2". The hot-cold data boundary is 1 day and the current date is 2023-01-30, so this row is cold.

Updating c1 or c2: row remains cold.
Updating p2 to 2023-01-30: row becomes hot, then cold again on February 1, 2023.

To keep cold data cold: Avoid updating the custom time column on rows you want to keep archived.

Is data separated without a custom time value?

No. Without a value in the custom time column, the system has no basis to classify the row, so it always stays in hot storage.

Does cold data remain cold after an update (timestamp-based)?

No. Any update to timestamp-based data refreshes its timestamp, making it hot again.

Why do hot-data queries return cold data?

Cold data archiving is periodic, not instant. When you query with the HOT_ONLY setting or the _l_hot_only_ hint, some data that has crossed the hot-cold boundary may still physically reside in hot storage and be included in results.

Add a time range filter to restrict results to the intended window:

-- Set _l_ts_min_ (start time) and _l_ts_max_ (end time) using consistent units.
SELECT /*+ _l_hot_only_(true), _l_ts_min_(1000), _l_ts_max_(2001) */ * FROM test WHERE p1>1;

Why do hot-data queries with HOT_ONLY time out?

This typically happens right after a data migration or after enabling hot and cold data separation on an existing table. The system may not have run an archival pass yet, leaving a large volume of cold data in hot storage. This increases the total data scanned and causes timeouts.

Run a major compaction on the table to force archival. For the syntax, see ALTER TABLE.

With HOT_ONLY or `_l_hot_only_(true)`, why do index and primary table queries differ?

The primary table and the index table each have independent archival processes that run on separate periodic schedules. At any given moment, the amount of cold data remaining in hot storage can differ between the two tables, causing inconsistent query results.

Add a time range filter to your query conditions to align results across both tables.

Why might a compaction trigger immediately after enabling separation?

The system checks whether the oldest data file's age exceeds the configured archival period. If it does, a compaction runs immediately. Whether this happens right away depends on how old your oldest file is relative to the archival period you configured.

Can I query cold data with a scan?

Yes. Specify a time range in your Scan operation to retrieve cold data.

For timestamp-based separation, see Query data separated by timestamp.
For custom time column separation, see Scan and query data separated by a custom time column.

Capacity-optimized and archive storage types have relatively low read IOPS, so Scan queries against cold data are slower than against hot storage.