ApsaraDB for HBase Performance-enhanced Edition automatically separates hot and cold data across different storage tiers based on a time boundary you define. Hot data stays on fast storage for quick access; cold data moves to cost-efficient cold storage—reducing storage costs by two thirds compared with ultra disks.
When to use this feature
Cold and hot data separation works well for:
Time-series workloads where recent data is accessed frequently but older data is rarely queried—for example, order records or monitoring metrics
Use cases where you need to query cold and hot data from a single table without maintaining separate tables
Avoid this feature if:
Your workload involves frequent updates to historical data. Updating a field in cold storage moves that field back to hot storage, which can cause unexpected query results.
How it works
When data is written to a table, ApsaraDB for HBase Performance-enhanced Edition compares the data's timestamp against the COLD_BOUNDARY value you configure. The timestamp of each record is the time when the data is written to the table. New data lands in hot storage (standard disks). Over time, as data ages past the boundary, the system automatically moves it to cold storage during the next major compaction—transparent to your application.
Data movement works in both directions: cold-to-hot and hot-to-cold.
Cold storage throughput is lower than hot storage throughput. Design your queries to target hot data when response time matters.
Prerequisites
Before you begin, make sure you have:
ApsaraDB for HBase Performance-enhanced Edition upgraded to V2.1.8 or later
Cold storage enabled on your cluster. See Cold storage.
If you use the Java API: AliHBase-Connector 1.x later than 1.0.7, or AliHBase-Connector 2.x later than 2.0.7. See Use the ApsaraDB for HBase Java API to access an ApsaraDB for HBase Performance-enhanced Edition instance.
If you use HBase Shell: version later than alihbase-2.0.7-bin.tar.gz. See Use HBase Shell to access an ApsaraDB for HBase Performance-enhanced Edition instance.
Configure the time boundary
The COLD_BOUNDARY parameter specifies how long data stays in hot storage before moving to cold storage. The value is in seconds.
For example, COLD_BOUNDARY=86400 means data written more than 86,400 seconds (one day) ago is treated as cold data.
Use HBase Shell or the Java API to create a table with cold and hot data separation or change the boundary on an existing table.
HBase Shell
// Create a table with cold and hot data separation.
hbase(main):002:0> create 'chsTable', {NAME=>'f', COLD_BOUNDARY=>'86400'}
// Change the time boundary on an existing table.
hbase(main):005:0> alter 'chsTable', {NAME=>'f', COLD_BOUNDARY=>'86400'}
// Disable cold and hot data separation.
hbase(main):004:0> alter 'chsTable', {NAME=>'f', COLD_BOUNDARY=>""}Before moving data from cold storage back to hot storage, run a major compaction on the table.
Java API
Admin admin = connection.getAdmin();
TableName tableName = TableName.valueOf("chsTable");
// Create a table with cold and hot data separation.
// COLD_BOUNDARY is in seconds. This example archives data as cold after one day.
HTableDescriptor descriptor = new HTableDescriptor(tableName);
HColumnDescriptor cf = new HColumnDescriptor("f");
cf.setValue(AliHBaseConstants.COLD_BOUNDARY, "86400");
descriptor.addFamily(cf);
admin.createTable(descriptor);
// Change the time boundary on an existing table.
HTableDescriptor descriptor = admin.getTableDescriptor(tableName);
HColumnDescriptor cf = descriptor.getFamily("f".getBytes());
cf.setValue(AliHBaseConstants.COLD_BOUNDARY, "86400");
admin.modifyTable(tableName, descriptor);
// Disable cold and hot data separation.
// Run a major compaction before moving cold data back to hot storage.
HTableDescriptor descriptor = admin.getTableDescriptor(tableName);
HColumnDescriptor cf = descriptor.getFamily("f".getBytes());
cf.setValue(AliHBaseConstants.COLD_BOUNDARY, null);
admin.modifyTable(tableName, descriptor);Do not set the column family property to COLD. If it is already set, remove it. For details, see Cold storage.Write data
Write data the same way you would for a standard table—no changes to your client code are needed. The timestamp of each record determines which storage tier it lands in. See Use the HBase Java API to access ApsaraDB for HBase Performance-enhanced Edition clusters and Use the multi-language API to access ApsaraDB for HBase Performance-enhanced Edition clusters.
Query data
All queries target a single table—no need to query cold and hot storage separately. The system routes each query automatically.
You have three query patterns:
| Pattern | How it works | Use when |
|---|---|---|
| Default (no hint) | Scans both hot and cold data and merges the results | You need complete results regardless of storage tier |
HOT_ONLY=true | Scans only hot storage; returns no result if the row is in cold storage | You want fast responses and only need recent data |
TimeRange | System determines the storage tier from the time range and COLD_BOUNDARY; scans only the relevant tier | You know the time range of the data you need |
TIMERANGE values in GET and Scan operations are in milliseconds.
Get examples
HBase Shell
// Default: may scan cold data.
hbase(main):013:0> get 'chsTable', 'row1'
// HOT_ONLY: scans only hot storage. No result is returned if the row is in cold storage.
hbase(main):015:0> get 'chsTable', 'row1', {HOT_ONLY=>true}
// TimeRange: system determines scope from TimeRange and COLD_BOUNDARY. Value is in milliseconds.
hbase(main):016:0> get 'chsTable', 'row1', {TIMERANGE => [0, 1568203111265]}Java API
Table table = connection.getTable("chsTable");
// Default: may scan cold data.
Get get = new Get("row1".getBytes());
System.out.println("result: " + table.get(get));
// HOT_ONLY: scans only hot storage. No result is returned if the row is in cold storage.
get = new Get("row1".getBytes());
get.setAttribute(AliHBaseConstants.HOT_ONLY, Bytes.toBytes(true));
// TimeRange: system determines scope from TimeRange and COLD_BOUNDARY. Value is in milliseconds.
get = new Get("row1".getBytes());
get.setTimeRange(0, 1568203111265);Scan examples
Without HOT_ONLY or a time range, a Scan operation reads both hot and cold data and merges the results.
HBase Shell
// Default: scans both hot and cold data.
hbase(main):017:0> scan 'chsTable', {STARTROW =>'row1', STOPROW=>'row9'}
// HOT_ONLY: scans only hot storage.
hbase(main):018:0> scan 'chsTable', {STARTROW =>'row1', STOPROW=>'row9', HOT_ONLY=>true}
// TimeRange: system determines scope from TimeRange and COLD_BOUNDARY. Value is in milliseconds.
hbase(main):019:0> scan 'chsTable', {STARTROW =>'row1', STOPROW=>'row9', TIMERANGE => [0, 1568203111265]}Java API
TableName tableName = TableName.valueOf("chsTable");
Table table = connection.getTable(tableName);
// Default: scans both hot and cold data.
Scan scan = new Scan();
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println("scan result:" + result);
}
// HOT_ONLY: scans only hot storage.
scan = new Scan();
scan.setAttribute(AliHBaseConstants.HOT_ONLY, Bytes.toBytes(true));
// TimeRange: system determines scope from TimeRange and COLD_BOUNDARY. Value is in milliseconds.
scan = new Scan();
scan.setTimeRange(0, 1568203111265);Prioritize hot data in scan results
For workloads like querying all orders or chat messages for a customer, results are typically sorted by timestamp in descending order—so recent (hot) data appears first. By default, the system still scans both tiers, which increases query latency when cold data is involved.
Setting COLD_HOT_MERGE=true tells the system to scan hot storage first. Cold data is fetched only if more results are needed (for example, when a user pages further back in results). This reduces cold data access and improves initial response time.
HBase Shell
hbase(main):002:0> scan 'chsTable', {COLD_HOT_MERGE=>true}Java API
scan = new Scan();
scan.setAttribute(AliHBaseConstants.COLD_HOT_MERGE, Bytes.toBytes(true));
scanner = table.getScanner(scan);When COLD_HOT_MERGE=true, the results for hot data and cold data are returned separately, each sorted by row key. The overall result set is not globally sorted. The following example shows the difference:
// Default scan: rows sorted lexicographically. coldRow appears before hotRow.
hbase(main):001:0> scan 'chsTable'
ROW COLUMN+CELL
coldRow column=f:value, timestamp=1560578400000, value=cold_value
hotRow column=f:value, timestamp=1565848800000, value=hot_value
2 row(s)
// COLD_HOT_MERGE=true: hot data is returned first, then cold data.
hbase(main):002:0> scan 'chsTable', {COLD_HOT_MERGE=>true}
ROW COLUMN+CELL
hotRow column=f:value, timestamp=1565848800000, value=hot_value
coldRow column=f:value, timestamp=1560578400000, value=cold_value
2 row(s)If a row has been partially updated (some fields moved back to hot storage), COLD_HOT_MERGE=true returns two entries for that row key—one from hot storage and one from cold storage. To ensure consistent ordering within a tier, use a composite row key. For example, for an orders table, combine the customer ID and order creation time in the row key so that orders for a given customer are sorted by time.Usage notes
Use `HOT_ONLY` or `TimeRange` for most queries. Cold storage is designed for archival, not frequent access. If your cluster sees many queries hitting cold data, check whether the
COLD_BOUNDARYvalue is appropriate.Avoid updating cold data. Updating a field in a cold storage row moves that field back to hot storage. Subsequent queries using
HOT_ONLYor a time range targeting hot data return only the updated field, not the full row. If you need to return the full row, remove theHOT_ONLYhint or set a time range that covers the original write time through the last update time. If frequent updates to cold data are unavoidable, adjustCOLD_BOUNDARYto move the affected data back to hot storage first.
Check cold and hot data sizes
View the sizes of cold and hot data per table on the User tables tab in ClusterManager. For details, see Cluster management system.
If no data appears in cold storage, the data may still be in random access memory (RAM). Run the flush command to flush data to disk, then run a major compaction, and check again.