All Products
Search
Document Center

ApsaraDB for ClickHouse:Tiered storage of hot data and cold data

Last Updated:Jan 17, 2024

To reduce data storage costs and improve query performance and system resource utilization, ApsaraDB for ClickHouse provides tiered storage of hot data and cold data. You can store data in different types of disks based on query frequencies. This topic describes how to enable tiered storage of hot data and cold data and configure tiered storage policies.

Prerequisites

The version of an ApsaraDB for ClickHouse cluster is 20.8 or later.

Note
  • If you want to use tiered storage of hot data and cold data on an ApsaraDB for ClickHouse cluster whose version does not support this feature, you can create an ApsaraDB for ClickHouse cluster of V20.8 or later and migrate data to the new cluster. For more information, see Migrate data between ApsaraDB for ClickHouse clusters.

  • ApsaraDB for ClickHouse Enterprise Edition automatically stores hot data in the cache and cold data in shared storage. Tiered storage of hot data and cold data does not depend on the storage rules of hot data and cold data.

Background information

  • Hot data is the data that is frequently accessed and stored on enhanced SSDs (ESSDs) or ultra disks. These disks are selected when you create clusters. Hot data storage is suitable for business scenarios that require high-performance queries.

  • Cold data is the data that is not frequently accessed. Cold data is stored on cold data disks at low storage costs to meet the requirement of cost-effective storage.

    Tiered storage of hot data and cold data provides the following storage policies.

    Storage policy

    Description

    Default storage policy

    Newly written data is stored on hot data disks to support efficient queries. If the amount of stored hot data reaches the storage threshold, the part data file whose size is the largest in the hot data disk is automatically migrated to a cold data disk. This way, newly written hot data can be stored on hot data disks.

    Storage policy based on time to live (TTL)

    Add a TTL clause to the default storage policy. This way, all data generated before the specified period in the TTL clause is automatically migrated to a cold data disk.

Precautions

  • If you enabled the data backup feature for an ApsaraDB for ClickHouse cluster, you cannot enable tiered storage of hot data and cold data for this cluster.

  • When recommend that you enable tiered storage of hot data and cold data only during off-peak hours. Enabling tiered storage of hot data and cold data causes the restart of the cluster. During the restart, you can only view the cluster information. You cannot upgrade, scale out, or scale in the cluster, or modify parameter configurations.

  • To use the tiered storage of hot data and cold data feature, you must add the SETTINGS storage_policy = 'hot_to_cold'; clause to specify a storage policy when you create a table. After a storage policy is specified, the policy cannot be modified.

  • The tiered storage of hot data and cold data feature cannot be disabled after you enable this feature.

Billing

  • You are billed for the tiered storage of hot data and cold data feature based on the following rules.

    Storage type

    Billing details

    Hot data storage

    The storage space that you purchase for a cluster is the storage space for hot data. You do not need to purchase storage for hot data after you purchase a cluster.

    Cold data storage

    You are billed for cold data storage based on the pay-as-you-go billing method. You do not need to purchase storage for cold data before you use the storage. You are billed for cold data storage on an hourly basis based on the used storage.

  • For more information about the unit prices of cold data storage, see Storage fees of ApsaraDB for ClickHouse clusters.

  • Billing formula: Fee of cold storage space = Unit price of cold data storage × Used cold data storage × Usage duration.

  • For example, if the unit price of cold data storage is USD 0.000026 per GB-hour and 2,000 GB of cold data is stored from 19:00 to 20:00, the total fee is calculated based on the following formula: USD 0.000026 per GB-hour × 2,000 GB × 1 hour = USD 0.052. If you release 500 GB of storage space for cold data between 20:00 and 21:00, only 1,500 GB storage is used. In this case, the total fee incurred between 20:00 and 21:00 is calculated based on the following formula: USD 0.000026 per GB-hour × 1,500 GB × 1 hour = USD 0.039.

Note

The preceding prices are provided only for your reference. In actual billing scenarios, the prices displayed in the ApsaraDB for ClickHouse console shall prevail.

Enable tiered storage of hot data and cold data

  1. Log on to the ApsaraDB for ClickHouse console.

  2. In the top navigation bar, select the region where your cluster is deployed.

  3. On the Clusters page, click the Default Instances tab and click the ID of the cluster.

  4. In the left-side navigation pane, click Cold and hot layered storage.

  5. Click Click to Enable Layered Storage.

  6. In the Enable Layered Storage message, click OK.

    Then, the system restarts the cluster. After the state of the cluster changes from Changing Configuration to Running, tiered storage of hot data and cold data is enabled.

Default storage policy used after tiered storage of hot data and cold data is enabled

The following table describes the parameters for the default storage policy that is used after tiered storage of hot data and cold data is enabled.

Parameter

Description

move_factor

If the percentage of available storage for hot data to the total storage for hot data is smaller than the value of the move_factor parameter, the part data file whose size is the largest in the hot data disk is automatically migrated to a cold data disk.

Valid values: [0,1]. If the move_factor parameter is set to 0, the tiered storage of hot data and cold data feature that is based on the disk storage capacity is disabled. The default value is 0.1. The default value specifies that data is automatically migrated when the percentage of available storage for hot data is less than 10%. You can also add a TTL-based tiered storage policy. For more information, see Add a TTL-based tiered storage policy in this topic.

Note

For more information about how to view the disk space, see View the disk space in this topic.

prefer_not_to_merge

Specifies whether to merge data on a cold data disk. Valid values:

  • true: Data on a cold data disk is not merged. This is the default value.

  • false: Data on a cold data disk is merged.

Add a TTL-based tiered storage policy

Add a TTL clause to the default storage policy. This way, all data generated before the specified period in the TTL clause is automatically migrated to a cold data disk.

The following sample code provides an example on how to add a TTL clause:

TTL <time_column> + INTERVAL <number> TO DISK 'cold_disk'

The following table describes the parameters.

Parameter

Description

time_column

The column of the DATE or DATETIME data type.

number

The period before which the data to be migrated was generated.

Common units are day, week, month, and year.

Note

For more information about TTL clauses, see TTL.

Example

Migrate all data that was generated 90 days ago to a cold data disk based on the date column. The date column contains the date when data was generated. Execute the following statement to create a table:

CREATE TABLE ttl_test_tbl
(
    `f1` String,
    `f2` String,
    `f3` Int64,
    `f4` Float64,
    `date` Date
)
ENGINE = MergeTree()
PARTITION BY date
ORDER BY f1
TTL date + INTERVAL 90 DAY TO DISK 'cold_disk'
SETTINGS storage_policy = 'hot_to_cold';

Modify TTL-based tiered storage of hot data and cold data

You can change the column of the DATE or DATETIME data type and the period in a TTL clause to modify the TTL-based tiered storage policy.

Note
  • By default, historical data and new data are stored based on the new storage policy after the TTL-based tiered storage policy is modified.

  • If you do not want to modify the TTL-based tiered storage policy for historical data, you can execute the set materialize_ttl_after_modify=0; statement before you modify the TTL-based tiered storage policy. After the statement is executed, you can modify the TTL-based tiered storage policy. Historical data is stored based on the original storage policy. Only new data is stored based on the new storage policy.

  • After you modify the TTL-based tiered storage policy, data that is migrated to a cold data disk based on the original storage policy is not automatically migrated to a hot data disk. If you want to migrate such cold data to a hot data disk, see Migrate data on hot data disks and cold data disks in this topic.

The following sample code provides an example on how to modify a TTL-based tiered storage policy:

ALTER TABLE <table_name> ON CLUSTER default MODIFY TTL <time_column> + INTERVAL <number> TO DISK 'cold_disk';

The following table describes the parameters.

Parameter

Description

table_name

The table name.

time_column

The new column of the DATE or DATETIME data type.

number

The new period before which the data to be migrated was generated.

Common units are day, week, month, and year.

Migrate data on hot data disks and cold data disks

  • Execute the following statement to migrate data from a hot data disk to a cold data disk:

    ALTER TABLE <table_name> ON CLUSTER default MOVE PARTITION <partition> TO DISK 'cold_disk';
  • Execute the following statement to migrate data from a cold data disk to a hot data disk:

    ALTER TABLE <table_name> ON CLUSTER default MOVE PARTITION <partition> TO DISK 'default';
Note

partition specifies a partition name. You can view the value in the system.parts system table.

View data on hot data disks and cold data disks

  • Execute the following statement to view data on a hot data disk:

    SELECT * FROM system.parts WHERE database = '<db_name>' AND TABLE = '<tbl_name>' AND disk_name ='default' AND active = 1;
  • Execute the following statement to view data on a cold data disk:

    SELECT * FROM system.parts WHERE database = '<db_name>' AND TABLE = '<tbl_name>' AND disk_name ='cold_disk' AND active = 1;

View the disk space

After tiered storage of hot data and cold data is enabled, you can execute the following statement to view the disk space:

SELECT * FROM system.disks;

The following result is returned:

┌─name─────┬─────────path───────────────────────────┬─free_space───────────┬─total_space ─────────┬─used_space───┬─keep_free_space───┬──type──┐
│ cold_disk│ /clickhouse/data/data/disks/cold_disk/ │ 18446744073709551615 │ 18446744073709551615 │ 115312080698 │   0               │   OSS  │
│ default  │ /clickhouse/data/data/                 │ 201663209472         │ 207083249664         │ 5420040192   │   0               │  local │ 
└──────────┴────────────────────────────────────────┴──────────────────────┴──────────────────────┴──────────────┴───────────────────┴────────┘

The following table describes the parameters.

Parameter

Description

name

The disk name. Valid values:

  • cold_disk: the name of the cold data disk.

  • default: the name of the hot data disk.

path

The path in which data on the disk is stored.

free_space

The available space on the disk. The available space on the cold data disk is not limited. Unit: bytes.

total_space

The total disk space. The available space on the cold data disk is not limited. Unit: bytes.

used_space

The space used on the disk. Unit: bytes.

keep_free_space

The reserved available space on the disk. Unit: bytes.

type

The storage medium. Valid values:

  • OSS: Cold data is stored in Object Storage Service (OSS).

  • local: Data is stored on a local disk.