This topic provides answers to some frequently asked questions about the hot and cold data separation feature of LindormTable.
How does LindormTable identify cold data and transfer it to cold storage?
LindormTable asynchronously transfers cold data from hot storage to cold storage when the compaction
operation is performed. By default, LindormTable automatically performs compaction operations at an interval that is equal to half of the hot and cold data boundary. The minimum interval is 1 day and the maximum interval is the half of the interval at which the major compaction
operation is performed. By default, the major compaction
operation is performed at an interval of 20 days. For example, if you set the hot and cold data boundary to 3 days, LindormTable performs compaction
operations to archive cold data at an interval of 1.5 days. If you set the hot and cold data boundary to 1 day, LindormTable performs compaction
operations at an interval of 1 day.
Can I manually trigger a compaction
operation?
A: Yes, you can manually trigger a compaction
operation. You can use an HBase shell to run the major_compact 'tableName'
command on a table to trigger a compaction operation. This way, cold data is transferred from hot storage to cold storage.
The execution of the major_compact 'tableName'
command increases I/O loads. Therefore, we recommend that you do not frequently run this command.
Why is cold data not transferred to cold storage after the compaction
operation is performed?
This issue occurs because the data has not been written to the disk. To solve this issue, perform the flush
operation to write the data to the disk and then perform the compaction
operation to transfer the data to cold storage. You can use an HBase shell or lindorm-cli to perform the compaction
operation.
If you use an HBase shell, run the major_compact command.
If you use lindorm-cli, refer to ALTER TABLE to view the syntax used to perform the compaction operation.
Why is cold data archived more slowly than hot data?
The speed at which cold data is archived to cold storage decreases when multiple
compaction
operations are accumulated. In this case, you must add the CPU cores of your instance by scaling out or upgrading the instance to handle accumulatedcompaction
operations.NoteYou can check whether
compaction
operations are accumulated in the Lindorm console by performing the following steps: Go to the details page of your instance. In the left-side navigation pane, click Instance Monitoring. On the page that appears, check the Compaction Queue Size(count) metric in the Table Metrics-Cluster Load section. If the number shown in the chart is larger than 0 and keeps increasing,compaction
operations are accumulated. For more information about how to view the monitoring information of a Lindorm instance, see View monitoring information.Cold data is archived to cold storage more quickly in the latest version of LindormTable. Upgrade the LindormTable of your Lindorm instance to the latest version. For more information about how to view or upgrade the version of LindormTable, see Release notes of LindormTable and Upgrade the minor engine version of a Lindorm instance.
If you have other questions, contact technical support.
In a table for which hot and cold data separation is implemented based on custom time columns, is a row of cold data still stored in the cold storage after it is updated?
If the data that you update is not in the custom time column, the row is still stored in the cold storage. If the data that you update is in the custom time column, Lindorm determines whether to store this row in the cold storage based on the updated data in the custom time column. For example, a table contains the following columns: p1, p2, c1, and c2. The primary key of the table includes the p1 and p2 columns. In a row of data, the values in the p1, p2, c1, and c2 columns are row1, 2023.1.28, c1, and c2 respectively. The hot and cold data boundary for the table is one day, and the current date is January 30, 2023. In this case, this row is determined as cold data and is stored in the cold storage. If you update data in the c1 and c2 columns, this row is still stored in the cold storage. If you update the value in p2 to 2023.1.30, this row is determined as hot data until Feb 2, 2023 based on the hot and cold data boundary.
Is a row archived to the cold storage if no value is specified for the custom time column of the row?
No, the row is not archived to the cold storage. Hot and cold data separation is implemented based on the value in the custom time column. A row with the custom time column unspecified is stored in the hot storage.
In a table for which hot and cold data separation is implemented based on custom time columns, is a row of cold data still cold data after it is updated?
No. After a row of cold data is updated, the timestamp of this row is updated. Therefore, the cold data becomes hot data.
Why cold data is returned for my query even if I want to query only hot data?
You can configure the HOT_ONLY parameter or the _l_hot_only_ hint to query only hot data. Data is periodically archived to cold storage based on its timestamps. Therefore, some cold data may have not been archived to cold storage yet when you query the data. In this case, cold data is returned. To resolve this issue, you can specify a time range for the hot data that you want to query. The following example shows how to use this function:
// You must use the _l_ts_min_ and _l_ts_max_ hints to specify a time range. The _l_ts_min_ hint indicates the difference between the current system time and the hot and cold data boundary. The _l_ts_max_ hint indicates the current system time. The unit of the hints must be the same.
SELECT /*+ _l_hot_only_(true), _l_ts_min_(1000), _l_ts_max_(2001) */ * FROM test WHERE p1>1;
Why does my query time out even if I specify a time range and use the HOT_ONLY hint in my query?
This issue generally occurs after you migrate data to a table or enable hot and cold separation for a table. In this case, cold data is not completely archived to cold storage and large amounts of cold data is still stored in hot storage. Therefore, the query may time out. To solve this issue, you must perform the major compaction
operation on the table. For more information about the syntax used to perform this operation, see ALTER TABLE.
Why are the query results for the index table and primary table different even if hot and cold data separation is enabled for the index table and HOT_ONLY and _l_hot_only_(true)
are configured in the query?
The cold data in the index table and the primary table is separately archived on a regular basis. Therefore, the data in the hot storage of the index table and the primary table may be different within specific time periods. In this case, the query results for the index table and the primary table are different. To avoid this kind of problems, you can specify a time range for the hot data in the query.
Why is the compaction
operation triggered immediately after hot and cold data separation is enabled?
The compaction operation is triggered when the difference between the current time and the generation time of the earliest file is larger than the specified archiving interval. In this case, the compaction
operation is triggered immediately after hot and cold data separation is enabled to transfer cold data to the cold storage.
Can I use the SCAN method to query cold data?
Yes, you can query cold data stored based on timestamps and use the SCAN method to query cold data within a specified range. The data query process may be time-consuming for Capacity storage and archive storage because the read IOPS is low.