ApsaraDB for ClickHouse provides tiered storage of hot data and cold data to reduce storage costs for large amounts of data. This feature is available only in ApsaraDB for ClickHouse V20.8 and later.

Definitions of hot data and cold data

Tiered storage of hot data and cold data provided by ApsaraDB for ClickHouse offers a more cost-effective method for single clusters to store hot data and cold data. This method also allows you to migrate data between different storage media.

Hot data is frequently accessed and stored in enhanced SSDs (ESSDs) or ultra disks. Hot data storage is suitable for business that has high requirements on real-time and high-performance queries.

Cold data is less frequently queried and accessed. Cold data is stored at low costs to meet the requirement for cost-effective storage.

By default, ApsaraDB for ClickHouse stores newly written data as hot data to enable efficient queries. When the amount of stored hot data reaches the storage threshold, earlier written data is automatically stored as cold data to free up storage space for newly written hot data.

Note

  • The data formats in which the data of different versions of ApsaraDB for ClickHouse clusters is stored are not compatible with each other. Therefore, the kernel version of an ApsaraDB for ClickHouse cluster cannot be upgraded.
  • If you want to use tiered storage of hot data and cold data on a cluster of a version that does not support this feature, create a cluster of V20.8.7.15 and migrate data to the new cluster.
  • The tiered storage of hot data and cold data feature cannot be disabled after you enable it. You can adjust the tiered storage policy to adjust the amount of available storage.

Billing

You are billed for cold data storage based on the pay-as-you-go billing method. You do not need to purchase storage before you use the storage. You are billed on an hourly basis based on the amount of storage that you use. The price is CNY 0.0002 per GB per hour.

For example, if 2,000 GB of cold data is stored between 19:00 and 20:00, the total fee is calculated based on the following formula: CNY 0.0002/GB/hour × 2000 GB × 1 hour = CNY 0.4. If you release 500 GB of cold data between 20:00 and 21:00, only 1,500 GB storage is used. In this case, the total fee incurred between 20:00 and 21:00 is calculated based on the following formula: CNY 0.0002/GB/hour × 1500 GB × 1 hour = CNY 0.3.

Enable tiered storage of hot data and cold data

  1. Use an Alibaba Cloud account to log on to the ApsaraDB for ClickHouse console.
  2. In the upper-left corner of the page, select the region where the cluster for which you want to enable tiered storage of hot and cold data is deployed.
  3. On the Clusters page, click the ID of the cluster. The Cluster Information page appears.
  4. In the left-side navigation pane, click Layered Storage of Hot and Cold Data. Click Click to Enable Layered Storage. After tiered storage of hot data and cold data is enabled, the cluster restarts. After the feature is enabled, the page displays the default tiered storage polices.
  5. The name of a tiered storage policy cannot be modified. By default, the move_factor parameter is set to 0.1. You can click Edit next to the parameter to modify the parameter. The value must be a floating-point number that is greater than 0 and smaller than or equal to 1. After this parameter is modified, click OK to save the modification.
    Note The tiered storage policy depends on the move_factor parameter. If the percentage of available storage for hot data in relation to the total storage for hot data is smaller than the specified move_factor value, the earlier written data is migrated from the storage space for hot data to the storage space for cold data. The default value is 0.1, which indicates that data is migrated when the percentage of available storage is smaller than 10%.

Query the information of disks

After you enable the tiered storage of hot data and cold data, you can query the following information about hot data disks and cold data disks.

select * from system.disks;

Sample responses:

┌─name──────┬─path───────────────────────────────────┬───────────free_space─┬──────────total_space─┬───used_space─┬─keep_free_space─┐
│ cold_disk │ /clickhouse/data/data/disks/cold_disk/ │ 18446744073709551615 │ 18446744073709551615 │ 115312080698 │               0 │
│ default   │ /clickhouse/data/data/                 │         201663209472 │         207083249664 │   5420040192 │               0 │
└───────────┴────────────────────────────────────────┴──────────────────────┴──────────────────────┴──────────────┴─────────────────┘
The following list describes the response parameters:
  • cold_disk: the name of the cold data disk.
  • default: the name of the hot data disk.
  • free_space: the remaining space of the disk. The available space of the cold data disk is not limited.
  • total_space: the total space of the disk. The available space of the cold data disk is not limited.
  • used_space: the used space of the disk.

Specify a storage policy

ApsaraDB for ClickHouse provides the following methods to specify storage policies.

Note After a storage policy is specified, the policy cannot be modified.
Configure a threshold for available storage of a disk to implement automatic tiered storage of hot data and cold data
The automatic migration and storage policy hot_to_cold is predefined in the system. You can apply the policy to a table when you create the table. When data is written, it is first stored as hot data in a disk. When the percentage of available storage in relation to the total storage is smaller than the move_factor value, earlier written data is migrated to the storage space for cold data. When you enable tiered storage of hot data and cold data, you must specify the move_factor parameter. The following example shows the statement that is used to create a table:
CREATE TABLE hot_cold_test_tbl
(
    `f1` String,
    `f2` String,
    `f3` Int64,
    `f4` Float64,
    `date` Date
)
ENGINE = MergeTree()
PARTITION BY date
ORDER BY f1
SETTINGS storage_policy = 'hot_to_cold';
Specify a TTL to implement tiered storage of hot data and cold data
If you specify a time to live (TTL), the data lifecycle is associated with tiered storage of hot data and cold data. This way, historical data is retained and storage costs are reduced. For example, frequently accessed data generated within the last 90 days is stored in the storage space for hot data. Infrequently accessed data generated 90 days ago is automatically migrated to the storage space for cold data.

In the following example that shows how to create a table, the move_factor and TTL limits take effect at the same time. Data is migrated if one of the limits is met.

CREATE TABLE ttl_test_tbl
(
    `f1` String,
    `f2` String,
    `f3` Int64,
    `f4` Float64,
    `date` Date
)
ENGINE = MergeTree()
PARTITION BY date
ORDER BY f1
TTL date + INTERVAL 90 DAY TO DISK 'cold_disk'
SETTINGS storage_policy = 'hot_to_cold';

View the data on a hot data disk and view the data on a cold data disk

Execute the following statement to view the data on a cold data disk:

select * from system.parts where database = '<db_name>' and table = '<tbl_name>' and disk_name ='cold_disk' and active = 1;

Execute the following statement to view the data on a hot data disk:

select * from system.parts where database = '<db_name>' and table = '<tbl_name>' and disk_name ='default' and active = 1;