Configure hot and cold data separation based on timestamps - Lindorm

LindormTable automatically routes data between hot storage and cold storage based on how long the data has been stored. Set a time boundary on a table or column family — data newer than the boundary stays in hot storage, and older data is archived to cold storage during compaction.

How it works

When data is written to a table, it lands in hot storage (Standard or Performance type). Lindorm periodically runs a compaction operation in the background. During compaction, Lindorm compares each row's timestamp against the configured boundary (COLD_BOUNDARY for HBase Shell and the Java API, or CHS for Lindorm SQL). Rows whose timestamps fall outside the boundary are moved to cold storage.

Archival is asynchronous — data is not moved the moment it ages past the boundary. To force an immediate transfer, run major_compact manually.

By default, the timestamp of a row is the system time when the data is written. If you use the ApsaraDB for HBase API for Java, you can specify a custom timestamp. See Operational considerations for how custom timestamps affect storage routing.

Prerequisites

Before you begin, ensure that you have:

Cold storage enabled for the Lindorm instance. For more information, see Overview.
LindormTable version 2.1.8 or later. For more information, see Release notes of LindormTable and Upgrade the minor engine version of a Lindorm instance.
A connection to LindormTable using one of the following:
- Apache HBase Shell. For more information, see Use Lindorm Shell to connect to LindormTable.
- ApsaraDB for HBase API for Java. For more information, see Use ApsaraDB for HBase API for Java to develop applications.
- lindorm-cli. For more information, see Use lindorm-cli to connect to and use LindormTable.
The STORAGE_POLICY attribute of the target table or column family set to a value other than COLD. For more information, see Configure cold storage.

Set the hot and cold data boundary

Choose the method that matches your client.

Use Apache HBase Shell

Step 1: Create a table with a boundary

HBase(main):002:0> create 'chsTable', {NAME=>'f', COLD_BOUNDARY=>'86400'}

Parameter	Description
`NAME`	The column family to configure.
`COLD_BOUNDARY`	The boundary in seconds. Data stored longer than this value is archived to cold storage during compaction. For example, `86400` equals one day. To disable hot and cold data separation, set to `""` (empty string).

Step 2 (optional): Change the boundary

Important

After you change the boundary, data moves between cold and hot storage during the next compaction. To trigger an immediate transfer, run major_compact 'chsTable'.

HBase(main):005:0> alter 'chsTable', {NAME=>'f', COLD_BOUNDARY=>'42300'}

Step 3 (optional): Disable hot and cold data separation

Important

After you disable hot and cold data separation, data moves back from cold to hot storage during the next compaction. To trigger an immediate transfer, run major_compact 'chsTable'.

HBase(main):004:0> alter 'chsTable', {NAME=>'f', COLD_BOUNDARY=>""}

Use ApsaraDB for HBase API for Java

Step 1: Create a table with a boundary

Admin admin = connection.getAdmin();
TableName tableName = TableName.valueOf("chsTable");
HTableDescriptor descriptor = new HTableDescriptor(tableName);
HColumnDescriptor cf = new HColumnDescriptor("f");
cf.setValue(AliHBaseConstants.COLD_BOUNDARY, "86400");
descriptor.addFamily(cf);
admin.createTable(descriptor);

Step 2 (optional): Change the boundary

Important

After you change the boundary, data moves between cold and hot storage during the next compaction.

HTableDescriptor descriptor = admin
    .getTableDescriptor(tableName);
HColumnDescriptor cf = descriptor.getFamily("f".getBytes());
cf.setValue(AliHBaseConstants.COLD_BOUNDARY, "86400");
admin.modifyTable(tableName, descriptor);

Step 3 (optional): Disable hot and cold data separation

Important

After you disable hot and cold data separation, data moves back from cold to hot storage during the next compaction.

HTableDescriptor descriptor = admin
    .getTableDescriptor(tableName);
HColumnDescriptor cf = descriptor.getFamily("f".getBytes());
cf.setValue(AliHBaseConstants.COLD_BOUNDARY, null);
admin.modifyTable(tableName, descriptor);

Use lindorm-cli (Lindorm SQL)

Step 1: Create a table with a boundary

Use the CREATE TABLE statement:

CREATE TABLE dt (
  p1 integer, p2 integer, c1 varchar, c2 bigint,
  constraint pk primary key(p1 desc))
WITH (COMPRESSION = 'ZSTD', CHS = '86400', CHS_L2 = 'storagetype=COLD');

Or enable hot and cold data separation on an existing table using ALTER TABLE:

-- Enable hot and cold data separation on an existing table.
-- The table was created without CHS:
-- CREATE TABLE dt (p1 integer, p2 integer, c1 varchar, c2 bigint,
--   constraint pk primary key(p1 desc)) WITH (COMPRESSION = 'ZSTD');

ALTER TABLE dt SET 'CHS' ='86400', 'CHS_L2' = 'storagetype=COLD';

Parameter	Description
`CHS`	The boundary in seconds. Data stored longer than this value is archived to cold storage during compaction. For example, `86400` equals one day. To disable hot and cold data separation, set to `''` (empty string).
`CHS_L2`	The layer-2 storage attribute. Set to `storagetype=COLD`.
`COMPRESSION`	The compression algorithm applied to all data in the table. The value is not case-sensitive. Default value: `NONE`.

Step 2 (optional): Change the boundary

Important

After you change the boundary, data moves between cold and hot storage during the next compaction. To trigger an immediate transfer, run major_compact on the table. For the syntax, see ALTER TABLE.

ALTER TABLE dt SET 'CHS'='1000';

Step 3 (optional): Disable hot and cold data separation

Important

After you disable hot and cold data separation, data moves back from cold to hot storage during the next compaction. To trigger an immediate transfer, run major_compact on the table.

ALTER TABLE dt SET 'CHS'='', 'CHS_L2' = '';

Write data

Write data to a hot-cold separated table the same way you write to a standard wide table. Data is always written to hot storage first, then moved to cold storage during the next compaction once it exceeds the configured boundary.

If you use the ApsaraDB for HBase API for Java, you can set a custom timestamp on each row. The timestamp determines which storage tier the row lands in — not the time of the write operation itself. See Custom timestamp behavior for details.

Query data

LindormTable stores hot and cold data in the same table, so all data is accessible through a single query. By default, a query without a time range may read cold storage, and throughput is throttled by the cold storage specification.

To limit a query to hot storage, use one of the following:

The HOT_ONLY parameter (Apache HBase Shell and Java API)
The _l_hot_only_ hint (Lindorm SQL)
The TimeRange parameter, set to a range that falls within the hot data window

Important

If a field in a cold storage row is updated, the updated field is stored in hot storage and the original data stays in cold storage. Querying with HOT_ONLY or a narrow TimeRange returns only the updated field, not the full row. To return the complete row, query without HOT_ONLY and make sure TimeRange covers the time from initial insert to the latest update. Avoid updating data stored in cold storage.

GET queries

Apache HBase Shell

-- Query all data (may include cold storage)
HBase(main):013:0> get 'chsTable', 'row1'

-- Query only hot storage
HBase(main):015:0> get 'chsTable', 'row1', {HOT_ONLY=>true}

-- Query by time range (UNIX timestamp in milliseconds)
HBase(main):016:0> get 'chsTable', 'row1', {TIMERANGE => [0, 1568203111265]}

The TimeRange parameter accepts UNIX timestamps in milliseconds since January 1, 1970, 00:00:00 UTC. Lindorm compares TimeRange against COLD_BOUNDARY to determine which storage tiers to read.

ApsaraDB for HBase API for Java

// Query all data (may include cold storage)
Get get = new Get("row1".getBytes());
System.out.println("result: " + table.get(get));

// Query only hot storage
get = new Get("row1".getBytes());
get.setAttribute(AliHBaseConstants.HOT_ONLY, Bytes.toBytes(true));

// Query by time range
get = new Get("row1".getBytes());
get.setTimeRange(0, 1568203111265);

Lindorm SQL

SELECT /*+ _l_hot_only_ */ * FROM dt WHERE pk IN (1, 2, 3);

SCAN queries

Note

Only Apache HBase Shell and ApsaraDB for HBase API for Java support range SCAN queries. Without HOT_ONLY or a TimeRange that covers only hot data, Lindorm reads both storage tiers and merges the results.

Apache HBase Shell

-- Scan all data (may include cold storage)
Lindorm(main):017:0> scan 'chsTable', {STARTROW =>'row1', STOPROW=>'row9'}

-- Scan only hot storage
Lindorm(main):018:0> scan 'chsTable', {STARTROW =>'row1', STOPROW=>'row9', HOT_ONLY=>true}

-- Scan by time range
Lindorm(main):019:0> scan 'chsTable', {STARTROW =>'row1', STOPROW=>'row9', TIMERANGE => [0, 1568203111265]}

ApsaraDB for HBase API for Java

TableName tableName = TableName.valueOf("chsTable");
Table table = connection.getTable(tableName);

// Scan all data (may include cold storage)
Scan scan = new Scan();
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
    System.out.println("scan result:" + result);
}

// Scan only hot storage
scan = new Scan();
scan.setAttribute(AliLindormConstants.HOT_ONLY, Bytes.toBytes(true));

// Scan by time range
scan = new Scan();
scan.setTimeRange(0, 1568203111265);

Prioritize hot data in SCAN queries

In SCAN queries that retrieve ordered data — such as all orders or messages for a user — LindormTable normally reads both hot and cold storage, which increases response time. Enable hot data prioritization with COLD_HOT_MERGE to scan hot storage first. Cold storage is only read when the number of rows in hot storage is less than the minimum number of data rows to query.

Note

COLD_HOT_MERGE is available only in Apache HBase Shell and ApsaraDB for HBase API for Java.

Apache HBase Shell

Lindorm(main):002:0> scan 'chsTable', {COLD_HOT_MERGE=>true}

Parameter	Value	Description
`COLD_HOT_MERGE`	`true`	Scan hot storage first. Cold storage is read only when the number of rows in hot storage is less than the minimum number of data rows to query.
`COLD_HOT_MERGE`	`false`	Disable hot data prioritization.

ApsaraDB for HBase API for Java

scan = new Scan();
scan.setAttribute(AliHBaseConstants.COLD_HOT_MERGE, Bytes.toBytes(true));
scanner = table.getScanner(scan);

Usage notes

Row key order is not guaranteed. When COLD_HOT_MERGE is enabled, hot rows are returned before cold rows. The result set contains hot rows and cold rows sorted separately by row key — the overall result is not globally sorted by row key.

// Normal scan (lexicographic order)
HBase(main):001:0> scan 'chsTable'
ROW                 COLUMN+CELL
 coldRow            column=f:value, timestamp=1560578400000, value=cold_value
 hotRow             column=f:value, timestamp=1565848800000, value=hot_value
2 row(s)

// With COLD_HOT_MERGE=true (hot rows first)
HBase(main):002:0> scan 'chsTable', {COLD_HOT_MERGE=>true}
ROW                 COLUMN+CELL
 hotRow             column=f:value, timestamp=1565848800000, value=hot_value
 coldRow            column=f:value, timestamp=1560578400000, value=cold_value
2 row(s)

To maintain a meaningful order in your application, design row keys that encode ordering information — for example, a composite key of customer ID and order creation time.

Rows with updated fields appear twice. If a row in cold storage has been partially updated, the row spans both hot and cold storage. With COLD_HOT_MERGE enabled, results include two entries for the same row key — one from hot storage, one from cold storage.

Operational considerations

Custom timestamp behavior

The timestamp you set on a row determines which storage tier it lands in — not the time of the write. Examples:

A row written with a timestamp three days in the future is archived to cold storage three days later than a row written with the current timestamp.
A row written with a timestamp three days in the past, when the boundary is set to three days, is archived to cold storage asynchronously shortly after the write — not after three more days.

Use custom timestamps carefully. Rows written with past timestamps may move to cold storage immediately after the next compaction, which can be unexpected if your query patterns assume recently written data is always in hot storage.

Frequently accessed cold data

If queries consistently hit cold storage, check whether the COLD_BOUNDARY (or CHS) value matches your actual access patterns. Cold storage throughput is throttled by the cold storage specification, so frequently accessed cold data degrades query performance.

FAQ

After I update a cold row, is it still cold data?

No. Updating a row refreshes its timestamp, so LindormTable treats it as hot data.

I set HOT_ONLY, but my query still returns cold data. Why?

Data is archived to cold storage asynchronously during compaction. Some rows that are old enough to be cold may not have been archived yet, so they are still physically in hot storage and returned by queries.

To exclude this data, specify an explicit time range in addition to HOT_ONLY. Use _l_ts_min_ and _l_ts_max_ hints in Lindorm SQL to define the window precisely:

-- _l_ts_min_: difference between current time and the hot/cold boundary
-- _l_ts_max_: current system time
-- Both values must use the same unit.
SELECT /*+ _l_hot_only_(true), _l_ts_min_(1000), _l_ts_max_(2001) */ * FROM test WHERE p1>1;

My query times out even though I specified HOT_ONLY and a TimeRange. Why?

This typically happens after you migrate data into a table or enable hot and cold data separation for the first time. A large amount of cold data is still physically in hot storage because compaction has not run yet. Run major compaction on the table to force the data to move. For the syntax, see ALTER TABLE.