This topic describes how to separate hot data and cold data in a ClickHouse cluster of Alibaba Cloud E-MapReduce (EMR) by using Object Storage Service (OSS). The operations that are described in this topic help you not only automatically maintain hot data and cold data in a cluster, but also fully utilize computing and storage resources in the cluster to reduce costs. The operations do not affect the read and write performance of the cluster.

Prerequisites

A ClickHouse cluster of EMR V5.7.0 or later is created in the EMR console. For more information, see Create a ClickHouse cluster.

Limits

You can perform the operations that are described in this topic only in a ClickHouse cluster of EMR V5.7.0 or later.

Procedure

  1. Step 1: Add an OSS disk in the EMR console
  2. Step 2: Check the configuration
  3. Step 3: Separate hot data and cold data

Step 1: Add an OSS disk in the EMR console

  1. Go to the Configure tab on the ClickHouse service page of the cluster.
    1. Log on to the EMR on ECS console.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. On the EMR on ECS page, find the cluster that you want to manage and click Services in the Actions column.
    4. On the Services tab, click Configure in the ClickHouse section.
  2. On the Configure tab, click the server-metrika tab.
  3. Change the value of the storage_configuration parameter.
    1. Add an disk in disks.
      Sample code:
      <disk_oss>
          <type>s3</type>
          <endpoint>http(s)://${yourBucketName}.${yourEndpoint}/${yourFlieName}</endpoint>
          <access_key_id>${yourAccessKeyId}</access_key_id>
          <secret_access_key>${yourAccessKeySecret}</secret_access_key>
          <send_metadata>false</send_metadata>
          <metadata_path>${yourMetadataPath}</metadata_path>
          <cache_enabled>true</cache_enabled>
          <cache_path>${yourCachePath}</cache_path>
          <skip_access_check>false</skip_access_check>
          <min_bytes_for_seek>1048576</min_bytes_for_seek>
          <thread_pool_size>16</thread_pool_size>
          <list_object_keys_size>1000</list_object_keys_size>
        </disk_oss>
      The following table describes the parameters that are used in the preceding code.
      ParameterRequiredDescription
      disk_ossYesThe name of the disk. You can specify a custom name.
      typeYesThe type of the disk. Set the value to s3.
      endpointYesThe URL of the OSS object. The URL is in the format of http(s)://${yourBucketName}.${yourEndpoint}/${yourFlieName}.
      Note The value of this parameter must start with http or https. ${yourBucketName} specifies the name of your OSS bucket. ${yourEndpoint} specifies the endpoint of your OSS bucket. {yourFlieName} specifies the name of the object in your OSS bucket. For example, you can set this parameter to http://clickhouse.oss-cn-hangzhou-internal.aliyuncs.com/test.
      access_key_idYesThe AccessKey ID of your Alibaba Cloud account.

      For more information about how to obtain an AccessKey ID, see Obtain an AccessKey pair.

      secret_access_keyYesThe AccessKey secret of your Alibaba Cloud account.

      The AccessKey secret is used to encrypt and verify signature strings. For more information about how to obtain an AccessKey secret, see Obtain an AccessKey pair.

      send_metadataNoSpecifies whether to add metadata when you perform operations on an OSS object. Valid values:
      • true: Metadata is added when you perform operations on an OSS object.
      • false: Metadata is not added when you perform operations on an OSS object. This is the default value.
      metadata_pathNoThe path that stores the mappings between local files and OSS objects.
      Default value: ${path}/disks/<disk_name>/.
      Note <disk_name> specifies the name of the disk that you add. <disk_name> corresponds to the disk_oss parameter.
      cache_enabledNoSpecifies whether to enable the data caching feature. Valid values:
      • true: enables the data caching feature. This is the default value.
      • false: disables the data caching feature.
      The data caching feature provides the following benefits:
      • The data caching feature in OSS can be used to locally cache only objects in the following formats: .idx, .mrk, .mrk2, .mrk3, .txt, and .dat. Objects in other formats are accessed from OSS rather than from a local cache.
      • The capacity of a local cache varies based on the capacity of a storage disk.
      • Data in a local cache is not cleared by using cache eviction algorithms, such as the Least Recently Used (LRU) algorithm. The existence of a local cache depends on the time-to-live (TTL) of an object.
      • If the data that you want to read from OSS for the first time does not exist in a local cache, you can download the object that contains the data from OSS to the local cache.
      • The first time you write data, the data is written only to a local cache. Then, the data is written from the local cache to OSS.
      • If you delete an object from OSS, the object is automatically deleted from the local cache. If you rename an object in OSS, the object is automatically renamed in the local cache.
      cache_pathNoThe path of the local cache.

      Default value: ${path}/disks/<disk_name>/cache/.

      skip_access_checkNoSpecifies whether to check if you have the read and write permissions on a disk when the disk is loaded. Valid values:
      • true: The system checks whether you have the permissions. This is the default value.
      • false: The system does not check whether you have the permissions.
      min_bytes_for_seekNoThe minimum number of bytes that are required to perform the seek operation. If the number of bytes is less than the value of this parameter, the skip operation is performed instead of the seek operation. Default value: 1048576.
      thread_pool_sizeNoThe size of the thread pool that is used to process the restore request in the disk. Default value: 16.
      list_object_keys_sizeNoThe maximum number of objects that can be listed in an object directory at the same time. Default value: 1000.
    2. Add a storage policy in policies.
      Sample code:
      <oss_ttl>
          <volumes>
            <local>
              <!-- Include all disks that use the default storage policy. -->
              <disk>disk1</disk>
              <disk>disk2</disk>
              <disk>disk3</disk>
              <disk>disk4</disk>
            </local>
            <remote>
              <disk>disk_oss</disk>
            </remote>
          </volumes>
          <move_factor>0.2</move_factor>
      </oss_ttl>
      Note You can also add the preceding code to the default storage policy.
  4. Save the configuration.
    1. Click Save in the upper-right corner of the Service Configuration section.
    2. In the Confirm Changes dialog box, configure Description, turn on the Auto-update Configuration switch, and then click OK.
  5. Deploy the client configuration.
    1. On the Configure tab of the ClickHouse service page, click Deploy Client Configuration.
    2. In the Cluster Activities dialog box, configure Description and click OK.
    3. In the Confirm message, click OK.

Step 2: Check the configuration

  1. Log on to the ClickHouse cluster in SSH mode. For more information, see Log on to a cluster.
  2. Run the following command to start the ClickHouse client:
    clickhouse-client -h core-1-1 -m
    Note In the sample command, core-1-1 indicates the name of the core node that you log on to. If you have multiple core nodes, you can log on to one of the nodes.
  3. Execute the following statement to view disk information:
    select * from system.disks;
    The following output is returned:
    ┌─name─────┬─path────────────────────────────────┬───────────free_space─┬──────────total_space─┬─keep_free_space─┬─type──┐
    │ default  │ /var/lib/clickhouse/                │          83868921856 │          84014424064 │               0 │ local  │
    │ disk1    │ /mnt/disk1/clickhouse/              │          83858436096 │          84003938304 │        10485760 │ local  │
    │ disk2    │ /mnt/disk2/clickhouse/              │          83928215552 │          84003938304 │        10485760 │ local  │
    │ disk3    │ /mnt/disk3/clickhouse/              │          83928301568 │          84003938304 │        10485760 │ local  │
    │ disk4    │ /mnt/disk4/clickhouse/              │          83928301568 │          84003938304 │        10485760 │ local  │
    │ disk_oss │ /var/lib/clickhouse/disks/disk_oss/ │ 18446744073709551615 │ 18446744073709551615 │               0 │ oss   │
    └──────────┴─────────────────────────────────────┴──────────────────────┴──────────────────────┴─────────────────┴───────┘
                                
  4. Execute the following statement to view storage policies:
    select * from system.storage_policies;
    The following output is returned:
    ┌─policy_name─┬─volume_name─┬─volume_priority─┬─disks─────────────────────────────┬─volume_type─┬─max_data_part_size─┬─move_factor─┬─prefer_not_to_merge─┐
    │ default     │ single      │               1 │ ['disk1','disk2','disk3','disk4']          │JBOD        │                  0 │           0 │                   0 │
    │ oss_ttl     │ local       │               1 │ ['disk1','disk2','disk3','disk4']          │JBOD        │                  0 │          0.2 │                   0 │
    │ oss_ttl     │ remote      │               2 │ ['disk_oss']                         │JBOD        │                  0 │          0.2 │                   0 │
    └─────────────┴─────────────┴─────────────────┴───────────────────────────────────┴─────────────┴────────────────────┴─────────────┴─────────────────────┘
    If the preceding output is returned, the OSS disk is added.

Step 3: Separate hot data and cold data

Reconstruct an existing table

  1. Execute the following statement on the ClickHouse client to view the current storage policy:
    SELECT
      storage_policy
    FROM system.tables
    WHERE database='<yourDatabaseName>' AND name='<yourTableName>';

    In the preceding statement, <yourDataName> specifies the database name. <yourTableName> specifies the table name.

    If the following output is returned, you must perform the next step to add a volume:
    <default>
      <volumes>
        <single>
          <disk>disk1</disk>
          <disk>disk2</disk>
          <disk>disk3</disk>
          <disk>disk4</disk>
        </single>
      </volumes>
    </default>
  2. Add a volume.
    On the Configure tab of the ClickHouse service page in the EMR console, add a volume to volumes. Sample code:
    <default>
      <volumes>
        <single>
          <disk>disk1</disk>
          <disk>disk2</disk>
          <disk>disk3</disk>
          <disk>disk4</disk>
        </single>
        <!-- The following volume named remote is added. -->
        <remote>
          <disk>disk_oss</disk>
        </remote>
      </volumes>
      <!-- If you want to add multiple volumes, specify move_factor. -->
      <move_factor>0.2</move_factor>
    </default>
  3. Execute the following statement to change the TTL:
    ALTER TABLE <yourDataName>.<yourTableName>
      MODIFY TTL toStartOfMinute(addMinutes(t, 5)) TO VOLUME 'remote';
  4. Execute the following statement to view the distribution of each data part:
    select partition,name,path from system.parts where database='<yourDataName>' and table='<yourTableName>' and active=1
    The following output is returned:
    
    ┌─partition───────────┬─name─────────────────────┬─path──────────────────────────────────────────────────────────────────────────────────────────────────────┐
    │ 2022-01-11 19:55:00 │ 1641902100_1_90_3_193    │ /var/lib/clickhouse/disks/disk_oss/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902100_1_90_3_193/  │
    │ 2022-01-11 19:55:00 │ 1641902100_91_96_1_193   │ /var/lib/clickhouse/disks/disk_oss/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902100_91_96_1_193/ │
    │ 2022-01-11 20:00:00 │ 1641902400_97_124_2_193  │ /mnt/disk3/clickhouse/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902400_97_124_2_193/             │
    │ 2022-01-11 20:00:00 │ 1641902400_125_152_2_193 │ /mnt/disk2/clickhouse/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902400_125_152_2_193/            │
    │ 2022-01-11 20:00:00 │ 1641902400_153_180_2_193 │ /mnt/disk4/clickhouse/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902400_153_180_2_193/            │
    │ 2022-01-11 20:00:00 │ 1641902400_181_186_1_193 │ /mnt/disk3/clickhouse/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902400_181_186_1_193/            │
    │ 2022-01-11 20:00:00 │ 1641902400_187_192_1_193 │ /mnt/disk4/clickhouse/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902400_187_192_1_193/            │
    └─────────────────────┴──────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
    
    7 rows in set. Elapsed: 0.002 sec.
    Note If the preceding output is returned, hot data and cold data are separated based on the TTL. Hot data is stored in local disks, and cold data is stored in OSS.

    /var/lib/clickhouse/disks/disk_oss is the default value of the metadata_path parameter for the OSS disk. /mnt/disk{1..4}/clickhouse is the local disk path.

Create a table

  • Syntax
    CREATE TABLE <yourDataName>.<yourTableName> [ON CLUSTER cluster_emr]
    (
      column1 Type1,
      column2 Type2,
      ...
    ) Engine = MergeTree() -- or Replicated*MergeTree()
    PARTITION BY <yourPartitionKey>
    ORDER BY <yourPartitionKey>
    TTL <yourTtlKey> TO VOLUME 'remote'
    SETTINGS storage_policy='oss_ttl';
    Note In the preceding syntax, <yourPartitionKey> specifies the partition key for a table in the ClickHouse cluster. <yourTtlKey> specifies the TTL that you specify.
  • Example
    CREATE TABLE test.test
    (
        `id` UInt32,
        `t` DateTime
    )
    ENGINE = MergeTree()
    PARTITION BY toStartOfFiveMinute(t)
    ORDER BY id
    TTL toStartOfMinute(addMinutes(t, 5)) TO VOLUME 'remote'
    SETTINGS storage_policy='oss_ttl';
    Note In this example, the table named test.test stores only data of the previous 5 minutes in local disks. Data is moved to the volume named remote in OSS 5 minutes after the data is generated.

Other parameters

  • server-config

    merge_tree.allow_remote_fs_zero_copy_replication: Set the value to true. This way, the engine of the Replicated*MergeTree type replicates the metadata that points to the OSS disk to generate multiple metadata replicas for the same shard in the ClickHouse cluster.

  • server-users
    • profile.${your-profile-name}.s3_min_upload_part_size: If the amount of data in the write buffer is greater than the value of this parameter, data is written to OSS.
    • profile.${your-profile-name}.s3_max_single_part_upload_size}: If the amount of data in the write buffer is greater than the value of this parameter, the MultipartUpload operations are used. For more information, see .