Hot and cold data separation - Lindorm - Alibaba Cloud Documentation Center

Lindorm supports the hot and cold data separation feature. You can store cold data that is infrequently accessed by using Capacity storage. You can store data by using different storage media based on the frequency at which the data is accessed. For example, you can store cold data that is infrequently accessed by using cost-effective Capacity storage. This way, expensive storage with higher I/O performance can be used to store hot data that is frequently accessed and the overall storage costs can be reduced.

Background information

In big data scenarios, a table may store large amounts of historical data, such as orders and monitoring data. The historical data becomes cold over time and is rarely accessed. In this case, the storage cost of the historical data becomes a challenge. To reduce the storage cost of historical data, Lindorm supports the hot and data separation feature. This feature allows you to separately store hot and cold data in different types of storage media. Cold data is stored in storage of the Capacity type. Hot data is stored in storage of the following types: Standard, Performance, local SSDs, or local HDDs. The unit price of Capacity storage used to store cold data is 80% lower than the unit price of Standard storage. This way, the storage cost of cold data is significantly reduced.

How it works

Lindorm separately stores the hot data and cold data in the same table. Data is stored in hot storage or cold storage based on the timestamps or custom time columns of the data and the hot and cold data boundary specified for the table. Lindorm stores new data in hot storage first and then transfers the data to cold storage after the age of the data exceeds the hot and cold data boundary.

You can easily access a table for which hot and cold data separation is enabled in the same way as you access a normal table. When you query data in a table for which hot and cold data separation is enabled, you can specify hints or a time range to query only hot data.

Lindorm separately stores hot data and cold data based on custom time columns or timestamps.

Hot and cold data separation based on custom time columns: You can configure a custom time column for data and specify a hot and cold data boundary for the table. Lindorm determines whether to store the data in cold storage or hot storage based on the custom time column and the specified hot and cold data boundary. If no value is specified for the custom time column of a row, the row is stored in hot storage. For more information, see Separately store hot data and cold data based on custom time columns.
Hot and cold data separation based on timestamps: You can specify a timestamp for data when you write the data to a table. Lindorm determines whether to store the data in cold storage or hot storage based on the timestamp of the data and the hot and cold data boundary specified for the table. If you do not specify a timestamp, the time when the data is written to the table is used to determine whether to archive data to cold storage. For more information, see Separately store hot data and cold data based on timestamps.

Limits

Hot and cold data separation based on custom time columns: Only tables created by using SQL statements are supported. Tables created by using an HBase shell or API are not supported. If your table is created by using SQL statements, we recommend that you use this method to implement hot and cold data separation.
Hot and cold data separation based on timestamps: Tables created by using SQL statements, HBase shells, and HBase APIs are supported. This method is applicable to scenarios in which custom time columns cannot be configured for the table. If your table is created by using an HBase shell or HBase API, we recommend that you use this method to implement hot and cold data separation.

Usage notes

Capacity storage is suitable for scenarios in which data is not frequently queried because the IOPS of Capacity storage is limited.
The write throughput of Capacity storage is close to that of standard storage.
Capacity storage is not suitable for processing a large number of concurrent read requests. An error may occur if Capacity storage is used to process a large number of concurrent read requests.
If you purchase a large capacity of Capacity storage for your Lindorm instance, you can adjust the read IOPS based on your business requirements. For more information, contact technical support.
We recommend that you store no more than 30 TB of cold data on each node. To increase the size of cold data that can be stored on each node, contact technical support.
If more than 95% of the Capacity storage of an instance is used, data can no longer be written to Capacity storage. Monitor the utilization of the Capacity storage of your instance. For more information, see View the size of cold storage.

For more information about the read performance of Capacity storage, see Limits on the read IOPS of Capacity storage.