This topic describes how to separately store hot data and cold data in Lindorm based on timestamps.
Background information
Lindorm can archive data to cold storage based on the timestamp and the specified hot and cold data boundary. If you do not specify a timestamp, the current system timestamp is used to determine whether to archive data to cold storage.
Prerequisites
- Cold storage is enabled for the Lindorm instance. For more information, see Overview.
- The version of LindormTable is 2.1.8 and later. For more information about how to view or upgrade the version of LindormTable, see Release notes of LindormTable and Upgrade the minor engine version of a Lindorm instance.
- You are connected to LindormTable by using Lindorm-cli.
- For more information about how to use HBase Shell to connect to LindormTable, see Use HBaseue Shell to connect to and use LindormTable.
- For more information about how to use the ApsaraDB for HBase API for Java to connect to LindormTable, see Use the ApsaraDB for HBase API for Java to connect to and use the wide table engine LindormTable.
- For more information about how to use Lindorm-cli to connect to LindormTable, see Use Lindorm-cli to connect to and use LindormTable.
- The attribute of the tables or column families is not COLD. For more information about how to modify the attribute of column families, see Configure cold storage.
Procedure
Method 1: Use Apache HBase Shell to implement hot and cold data separation
- Create a table and specify a hot data and cold data boundary for the table.
HBase(main):002:0> create 'chsTable', {NAME=>'f', COLD_BOUNDARY=>'86400'}
Parameters
- NAME: The name of the column family for which you want to implement hot and cold data separation.
- COLD_BOUNDARY: The boundary based on which hot and cold data separation is performed. The unit of the value of this parameter is seconds. For example, if the value of the COLD_BOUNDARY parameter is set to 86400, data is archived to cold storage after the data has been stored in hot storage for 86,400 seconds, which is equal to one day.
- Optional:Change the hot and cold data boundary.
HBase(main):005:0> alter 'chsTable', {NAME=>'f', COLD_BOUNDARY=>'86400'}
- Optional:Disable cold and hot data separation.
HBase(main):004:0> alter 'chsTable', {NAME=>'f', COLD_BOUNDARY=>""}
Note After you change the hot and cold data boundary or disable cold and hot data boundary, the data is transferred from cold storage to hot storage after Lindorm performs thecompaction
operation. To immediately transfer the data from cold storage to hot storage, manually run the major_compact command.
Method 2: Use ApsaraDB for HBase API for Java to implement hot and cold data separation
- Create a table and specify a hot data and cold data boundary for the table.
Admin admin = connection.getAdmin(); TableName tableName = TableName.valueOf("chsTable"); HTableDescriptor descriptor = new HTableDescriptor(tableName); HColumnDescriptor cf = new HColumnDescriptor("f"); cf.setValue(AliHBaseConstants.COLD_BOUNDARY, "86400"); descriptor.addFamily(cf); admin.createTable(descriptor);
- Optional:Change the hot and cold data boundary.
HTableDescriptor descriptor = admin .getTableDescriptor(tableName); HColumnDescriptor cf = descriptor.getFamily("f".getBytes()); cf.setValue(AliHBaseConstants.COLD_BOUNDARY, "86400"); admin.modifyTable(tableName, descriptor);
- Optional:Disable cold and hot data separation.
HTableDescriptor descriptor = admin .getTableDescriptor(tableName); HColumnDescriptor cf = descriptor.getFamily("f".getBytes()); cf.setValue(AliHBaseConstants.COLD_BOUNDARY, null); admin.modifyTable(tableName, descriptor);
Note After you change the hot and cold data boundary or disable cold and hot data boundary, the data is transferred from cold storage to hot storage after Lindorm performs thecompaction
operation.
Method 3: Use lindorm-cli to implement hot and cold data separation
- Create a table and specify a hot data and cold data boundary for the table.
USE test; CREATE TABLE dt ( p1 integer, p2 integer, c1 varchar, c2 bigint, constraint pk primary key(p1 desc)) WITH (COMPRESSION = 'ZSTD', CHS = '86400', CHS_L2 = 'storagetype=COLD');
Parameters
- CHS: The hot and cold data boundary for the table. Data that has been stored in the table for a period longer than this period is archived in cold storage. The unit of the value of this arameter is seconds. For example, if the value of the CHS parameter is set to 86400, data is archived to cold storage after the data is stored in hot storage for 86,400 seconds, which is equal to one day.
- COMPRESSION: The algorithm used to compress the data. The value of this parameter is not case-sensitive. Valid values: ZSTD. Default value: NONE.
- CHS_L2: The storage type. Set this parameter to storagetype=COLD.
- Optional:Change the hot and cold data boundary.
ALTER TABLE dt SET 'CHS'='1000';
- Optional:Disable cold and hot data separation.
ALTER TABLE dt SET 'CHS'='', 'CHS_L2' = '';
Note After you change the hot and cold data boundary or disable cold and hot data boundary, the data is transferred from cold storage to hot storage after Lindorm performs thecompaction
operation. To immediately transfer the data from cold storage to hot storage, manually run themajor_compact
command. For more information, see ALTER TABLE.
Write data
compaction
operation. Query data
Lindorm uses one table to store hot data and cold data. This way, you can query all data within one table. When you query data, you can configure the TimeRange parameter to specify a time range for the query. Lindorm determines whether to query only hot storage or cold storage, or query both hot and cold storage based on the time range that you specify. If you do not specify a time range in a query, only cold data is queried. In this case, the throughput of the query is throttled by the specification of cold storage. For more information, see Overview
If you want to query data in hot storage, you can use the _l_hot_only_ hint in the query or configure the HOT_ONLY parameter to query only data in hot storage.
- Data in cold storage is intended for archiving and should be rarely accessed. If data in cold storage is frequently accessed by a large number of requests, check whether the hot and cold data boundary specified by the COLD_BOUNDARY is properly configured. If large amounts of data that needs to be frequently queried is archived to cold storage based on the boundary, the performance of the queries may be throttled.
- If a field in a row of data stored in cold storage is updated, the updated field is stored in hot storage. If you specify the HOT_ONLY or TimeRange parameter to query only data in hot storage, only the updated field is returned. If you want LindormTable to return the data of the entire row, you must perform the query without specifying the HOT_ONLY or TimeRange parameter, or make sure that the time range specified by TimeRange covers the time period from the point in time when the row was inserted to the point in time when the row was last updated. Therefore, we recommend that you do not update data stored in cold storage.
Currently, the GET and SCAN methods are supported to query data.
Use the GET method to query data
- Method 1: Use Apache HBase Shell to query data
- Query data without specifying the HOT_ONLY parameter. In this case, data in cold storage may be queried.
HBase(main):013:0> get 'chsTable', 'row1'
- Query data by specifying the HOT_ONLY parameter. In this case, only data in hot storage is queried.
HBase(main):015:0> get 'chsTable', 'row1', {HOT_ONLY=>true}
- Query data by specifying the TimeRange parameter. Lindorm compares the values of TimeRange and COLD_BOUNDARY to determine whether to query only hot storage or cold storage, or query both hot and cold storage.
HBase(main):016:0> get 'chsTable', 'row1', {TIMERANGE => [0, 1568203111265]}
Parameters
TimeRange: the time range within which the query is performed. The time in the range is a UNIX timestamp representing the number of milliseconds that have elapsed since the epoch time January 1, 1970, 00:00:00 UTC.
- Query data without specifying the HOT_ONLY parameter. In this case, data in cold storage may be queried.
- Method 2: Use ApsaraDB for HBase API for Java to query data
- Query data without specifying the HOT_ONLY parameter. In this case, data in cold storage may be queried.
Get get = new Get("row1".getBytes()); System.out.println("result: " + table.get(get));
- Query data by specifying the HOT_ONLY parameter. In this case, only data in hot storage is queried.
get = new Get("row1".getBytes()); get.setAttribute(AliHBaseConstants.HOT_ONLY, Bytes.toBytes(true));
- Query data by specifying the TimeRange parameter. Lindorm compares the values of TimeRange and COLD_BOUNDARY to determine whether to query only hot storage or cold storage, or query both hot and cold storage.
get = new Get("row1".getBytes()); get.setTimeRange(0, 1568203111265)
- Query data without specifying the HOT_ONLY parameter. In this case, data in cold storage may be queried.
- Method 3: Use Lindorm SQL to query data
SELECT /*+ _l_hot_only_ */ * FROM dt WHERE pk IN (1, 2, 3);
Use the SCAN method to query data within a specified range
- If you do not configure the HOT_ONLY and TimeRange parameters, or the value of TimeRange covers the timestamps of data in cold storage, Lindorm queries data in both cold storage and hot storage and merge the query results.
- Only Apache HBase Shell and ApsaraDB for HBase API for Java support queries within specified ranges.
- Method 1: Use Apache HBase Shell to query data
- Query data without specifying the HOT_ONLY parameter. In this case, data in cold storage is queried.
Lindorm(main):017:0> scan 'chsTable', {STARTROW =>'row1', STOPROW=>'row9'}
- Query data by specifying the HOT_ONLY parameter. In this case, only data in hot storage is queried.
Lindorm(main):018:0> scan 'chsTable', {STARTROW =>'row1', STOPROW=>'row9', HOT_ONLY=>true}
- Query data by specifying the TimeRange parameter. Lindorm compares the values of TimeRange and COLD_BOUNDARY to determine whether to query only hot storage or cold storage, or query both hot and cold storage.
Lindorm(main):019:0> scan 'chsTable', {STARTROW =>'row1', STOPROW=>'row9', TIMERANGE => [0, 1568203111265]}
- Query data without specifying the HOT_ONLY parameter. In this case, data in cold storage is queried.
- Method 2: Use ApsaraDB for HBase API for Java to query data
TableName tableName = TableName.valueOf("chsTable"); Table table = connection.getTable(tableName); // In this example, the HOT_ONLY parameter is not configured. Data in cold storage is queried. Scan scan = new Scan(); ResultScanner scanner = table.getScanner(scan); for (Result result : scanner) { System.out.println("scan result:" + result); } // In this example, the HOT_ONLY parameter is configured. Only data in hot storage is queried. scan = new Scan(); scan.setAttribute(AliLindormConstants.HOT_ONLY, Bytes.toBytes(true)); // In this example, the TimeRange parameter is configured. Lindorm compares the values of TimeRange and COLD_BOUNDARY to determine whether to query only hot storage or cold storage, or query both hot and cold storage. scan = new Scan(); scan.setTimeRange(0, 1568203111265);
Preferentially query hot data
In a SCAN query that is performed to query information such as all orders or chat records of a customer, LindormTable may scan data in both hot storage and cold storage. The query results are paginated based on the timestamps when the data rows were written to the table in descending order. In most cases, hot data appears before cold data. If you do not configure the HOT_ONLY parameter in a SCAN query, LindormTable scans hot data and cold data. As a result, the query response time increases. If you enable the hot data prioritization feature, LindormTable preferentially queries data in hot storage. Data in cold storage is queried only when the number of rows in hot storage is less than the minimum number of data rows to query. This way, access to data in cold storage is reduced and the response speed is improved.
- Method 1: Use Apache HBase Shell to query data
Lindorm(main):002:0> scan 'chsTable', {COLD_HOT_MERGE=>true}
Parameters
COLD_HOT_MERGE: Specifies whether to enable the hot data prioritization feature.- true: enable the hot data prioritization feature. When the COLD_HOT_MERGE parameter is set to true, LindormTable preferentially scans hot data. LindormTable scans cold data only when the number of data rows in hot storage is less than the number of rows that you want to query.
- false: disable the hot data prioritization feature.
- Method 2: Use ApsaraDB for HBase API for Java to query data
scan = new Scan(); scan.setAttribute(AliHBaseConstants.COLD_HOT_MERGE, Bytes.toBytes(true)); scanner = table.getScanner(scan);
Usage notes
- In scenarios in which data in specific fields of a row is updated, the row stores hot data and cold data at the same time. If you enable the hot data prioritization feature, the query results are returned in two batches. You can find two results of the same row key in the result set.
- After you enable the hot data prioritization feature, hot data is returned before cold data. Therefore, the row key values of specific returned rows of cold data may be smaller than the row key values of specific returned rows of hot data because LindormTable returns hot data before cold data. The results that are returned for a SCAN query are not sequentially sorted. The rows of hot data and the rows of cold data are separately sorted based on row key values. For more information about how returned rows are sorted, see the following sample results. In some scenarios, you can specify a row key to ensure the order of the results of a SCAN query. For example, you use a table to store the information about orders that your customers make. You can specify a row key that consists of the column that store customer IDs and the column that stores the order creation time. This way, when you query the orders that were made by a customer, the returned orders are sorted based on the points in time when the orders were created.
// In the following example, the row whose row key value is coldRow stores cold data and the row whose row key value is hotRow stores hot data. // In most cases, the coldRow row is returned before the hotRow row because rows in Lindorm are sorted in lexicographic order. HBase(main):001:0> scan 'chsTable' ROW COLUMN+CELL coldRow column=f:value, timestamp=1560578400000, value=cold_value hotRow column=f:value, timestamp=1565848800000, value=hot_value 2 row(s) // If you set the value of the COLD_HOT_MERGE parameter to true, LindormTable scans the row whose row key value is hotRow first. As a result, the hotRow row is returned before the coldRow row. HBase(main):002:0> scan 'chsTable', {COLD_HOT_MERGE=>true} ROW COLUMN+CELL hotRow column=f:value, timestamp=1565848800000, value=hot_value coldRow column=f:value, timestamp=1560578400000, value=cold_value 2 row(s)
FAQ
- Q: Is a row of cold data still cold data after it is updated?
A: No. After a row of cold data is updated, the timestamp of this row is updated. Therefore, the cold data becomes hot data.
- Q: Why cold data is returned for my query even if I want to query only hot data? A: You can configure the HOT_ONLY parameter or the _l_hot_only_ hint to query only hot data. Data is periodically archived to cold storage based on its timestamps. Therefore, some cold data may have not been archived to cold storage yet when you query the data. In this case, cold data is returned. To resolve this issue, you can specify a time range for the hot data that you want to query. Example:
// You must use the _l_ts_min_ and _l_ts_max_ hints to specify a time range. The _l_ts_min_ hint indicates the difference between the current system time and the hot and cold data boundary. The _l_ts_max_ hint indicates the current system time. The unit of the hints must be the same. SELECT /*+ _l_hot_only_(true), _l_ts_min_(1000), _l_ts_max_(2001) */ * FROM test WHERE p1>1;
- Q: Why does my query time out even if I specify a time range and use the HOT_ONLY hint in my query?
A: This issue generally occurs after you migrate data to the table or enable hot and cold separation for the table. In this case, cold data is completely archived to cold storage and large mounts of cold data is still stored in hot storage. Therefore, the query may time out. To solve this issue, you must perform the
major compaction
operation on the table. For more information about the syntax used to perform this operation, see ALTER TABLE.