Use Object Storage Service (OSS) as a cold storage tier in your E-MapReduce (EMR) ClickHouse cluster to automatically move aging data off local disks — reducing storage costs without affecting read and write performance.
ClickHouse TTL supports three data management strategies:
-
Delete old data: Remove rows or columns after a specified time interval.
-
Move data between storage tiers (this guide): Automatically migrate data from fast local disks to lower-cost OSS after a set period.
-
Roll up data: Aggregate older data before deleting it.
This guide covers the "move" strategy — keeping recent (hot) data on local disks and automatically moving older (cold) data to OSS.
Prerequisites
Before you begin, make sure you have:
-
An EMR ClickHouse cluster running EMR V5.7.0 or later. See Create a ClickHouse cluster
-
An OSS bucket in the same region as your cluster
-
An AccessKey pair with read and write access to the bucket. See Obtain an AccessKey pair
How it works
Hot/cold separation relies on two ClickHouse storage concepts:
-
Storage policy: Defines a sequence of volumes (storage tiers). ClickHouse writes new data to the first volume and moves it to subsequent volumes based on TTL rules or disk usage thresholds.
-
TTL rule with `TO VOLUME`: Specifies when data moves to a named volume. Once the TTL expression evaluates to true for a data part, ClickHouse moves it to the target volume.
The move_factor setting controls threshold-based migration: when a volume's free space drops below this fraction of total capacity (default: 0.2, or 20%), ClickHouse begins moving data to the next volume. ClickHouse sorts data parts from largest to smallest and moves enough parts to satisfy the condition. If the total size of all parts is insufficient, all parts are moved.
Step 1: Add an OSS disk
Configure OSS as a disk in the storage_configuration section of your ClickHouse service.
-
Go to the Configure tab for your ClickHouse service.
-
Log in to the EMR on ECS console.
-
In the top navigation bar, select the region where your cluster resides and select a resource group.
-
On the EMR on ECS page, find your cluster and click Services in the Actions column.
-
On the Services tab, click Configure in the ClickHouse section.
-
-
On the Configure tab, click the server-metrika tab.
-
In
storage_configuration, add an OSS disk underdisks:Placeholder Description Example ${yourBucketName}OSS bucket name clickhouse${yourEndpoint}OSS bucket endpoint oss-cn-hangzhou-internal.aliyuncs.com${yourFlieName}Object path prefix in the bucket test${yourAccessKeyId}AccessKey ID LTAI5tXxx${yourAccessKeySecret}AccessKey secret — ${yourMetadataPath}Local path for OSS-to-local file mappings Defaults to ${path}/disks/<disk_name>/${yourCachePath}Local cache directory Defaults to ${path}/disks/<disk_name>/cache/Required parameters:
Parameter Description disk_ossDisk name. Use any custom name — this name is referenced in your storage policy. typeMust be s3. ClickHouse uses the S3-compatible protocol to access OSS.endpointOSS object URL. Must start with httporhttps.access_key_idAccessKey ID for your Alibaba Cloud account. secret_access_keyAccessKey secret for your Alibaba Cloud account. Optional parameters:
Parameter Default Description send_metadatafalseWhether to attach metadata to OSS operations. metadata_path${path}/disks/<disk_name>/Local path storing the mapping between local files and OSS objects. <disk_name>corresponds to the value ofdisk_oss.cache_enabledtrueEnables local caching of OSS data. See Local cache behavior. cache_path${path}/disks/<disk_name>/cache/Local directory for cached data. skip_access_checkfalseIf true, skips the read/write permission check when the disk loads.min_bytes_for_seek1048576Minimum bytes required to use a seek operation. Below this threshold, ClickHouse skips instead of seeking. thread_pool_size16Thread pool size for processing restorerequests on this disk.list_object_keys_size1000Maximum number of objects listed in an object directory at once. <disk_oss> <type>s3</type> <endpoint>http(s)://${yourBucketName}.${yourEndpoint}/${yourFlieName}</endpoint> <access_key_id>${yourAccessKeyId}</access_key_id> <secret_access_key>${yourAccessKeySecret}</secret_access_key> <send_metadata>false</send_metadata> <metadata_path>${yourMetadataPath}</metadata_path> <cache_enabled>true</cache_enabled> <cache_path>${yourCachePath}</cache_path> <skip_access_check>false</skip_access_check> <min_bytes_for_seek>1048576</min_bytes_for_seek> <thread_pool_size>16</thread_pool_size> <list_object_keys_size>1000</list_object_keys_size> </disk_oss>Replace the placeholders with your actual values: For example:
<endpoint>http://clickhouse.oss-cn-hangzhou-internal.aliyuncs.com/test</endpoint> -
Add a storage policy under
policies:You can also add theremotevolume to the existingdefaultstorage policy instead of creating a newoss_ttlpolicy.<oss_ttl> <volumes> <local> <!-- Include all disks that use the default storage policy. --> <disk>disk1</disk> <disk>disk2</disk> <disk>disk3</disk> <disk>disk4</disk> </local> <remote> <disk>disk_oss</disk> </remote> </volumes> <move_factor>0.2</move_factor> </oss_ttl> -
Save the configuration.
-
Click Save in the upper-right corner of the Service Configuration section.
-
In the Confirm Changes dialog, fill in Description, turn on Auto-update Configuration, and click OK.
-
-
Deploy the client configuration.
-
On the Configure tab, click Deploy Client Configuration.
-
In the Cluster Activities dialog, fill in Description and click OK.
-
In the confirmation message, click OK.
-
Local cache behavior
When cache_enabled is true, ClickHouse maintains a local cache for the OSS disk:
-
Only files in these formats are cached locally:
.idx,.mrk,.mrk2,.mrk3,.txt, and.dat. All other file types are read directly from OSS. -
Cache capacity scales with the storage disk capacity.
-
Cache entries are not evicted by algorithms such as least recently used (LRU). Entries expire based on the object's TTL.
-
On first read (cache miss), the object is downloaded from OSS to the local cache.
-
On first write, data is written to the local cache and then flushed to OSS.
-
When an OSS object is deleted or renamed, the local cache reflects the change automatically.
Step 2: Verify the configuration
After deploying, confirm that the OSS disk and storage policy are active.
-
Log in to a core node via SSH. See Log on to a cluster.
-
Start the ClickHouse client:
core-1-1is the name of the core node you logged in to. If your cluster has multiple core nodes, log in to any one of them.clickhouse-client -h core-1-1 -m -
Check that the OSS disk appears in the disk list:
SELECT name, type FROM system.disks;Confirm that
disk_ossappears withtype = oss:┌─name─────┬─type──┐ │ default │ local │ │ disk1 │ local │ │ disk2 │ local │ │ disk3 │ local │ │ disk4 │ local │ │ disk_oss │ oss │ └──────────┴───────┘ -
Check that the
oss_ttlstorage policy is active:SELECT policy_name, volume_name, disks FROM system.storage_policies;Confirm that
oss_ttlappears with volumeslocalandremote:┌─policy_name─┬─volume_name─┬─disks──────────────────────────────┐ │ default │ single │ ['disk1','disk2','disk3','disk4'] │ │ oss_ttl │ local │ ['disk1','disk2','disk3','disk4'] │ │ oss_ttl │ remote │ ['disk_oss'] │ └─────────────┴─────────────┴────────────────────────────────────┘
Step 3: Apply hot/cold separation
Choose the approach that matches your situation.
Option A: Update an existing table
Use this option if you have an existing table that currently stores all data on local disks.
-
Check the table's current storage policy:
SELECT storage_policy FROM system.tables WHERE database = '<yourDatabaseName>' AND name = '<yourTableName>'; -
If the table uses the
defaultpolicy (which has no remote volume), add theremotevolume to the default policy in the EMR console. On the server-metrika tab, update thedefaultpolicy underpolicies:<default> <volumes> <single> <disk>disk1</disk> <disk>disk2</disk> <disk>disk3</disk> <disk>disk4</disk> </single> <!-- Add the remote volume for OSS --> <remote> <disk>disk_oss</disk> </remote> </volumes> <!-- Required when defining multiple volumes --> <move_factor>0.2</move_factor> </default> -
Set a TTL rule to move data to the
remotevolume:ALTER TABLE <yourDatabaseName>.<yourTableName> MODIFY TTL toStartOfMinute(addMinutes(t, 5)) TO VOLUME 'remote';This example moves data to OSS 5 minutes after it is generated.
-
Verify that data is distributed across local and OSS storage:
If the preceding output is returned, hot data and cold data are separated based on the TTL. Hot data is stored in local disks, and cold data is stored in OSS./var/lib/clickhouse/disks/disk_ossis the default value of themetadata_pathparameter for the OSS disk./mnt/disk{1..4}/clickhouseis the local disk path.select partition,name,path from system.parts where database='<yourDatabaseName>' and table='<yourTableName>' and active=1The following output is returned:
┌─partition───────────┬─name─────────────────────┬─path──────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ 2022-01-11 19:55:00 │ 1641902100_1_90_3_193 │ /var/lib/clickhouse/disks/disk_oss/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902100_1_90_3_193/ │ │ 2022-01-11 19:55:00 │ 1641902100_91_96_1_193 │ /var/lib/clickhouse/disks/disk_oss/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902100_91_96_1_193/ │ │ 2022-01-11 20:00:00 │ 1641902400_97_124_2_193 │ /mnt/disk3/clickhouse/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902400_97_124_2_193/ │ │ 2022-01-11 20:00:00 │ 1641902400_125_152_2_193 │ /mnt/disk2/clickhouse/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902400_125_152_2_193/ │ │ 2022-01-11 20:00:00 │ 1641902400_153_180_2_193 │ /mnt/disk4/clickhouse/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902400_153_180_2_193/ │ │ 2022-01-11 20:00:00 │ 1641902400_181_186_1_193 │ /mnt/disk3/clickhouse/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902400_181_186_1_193/ │ │ 2022-01-11 20:00:00 │ 1641902400_187_192_1_193 │ /mnt/disk4/clickhouse/store/fc5/fc50a391-4c16-406b-a396-6e1104873f68/1641902400_187_192_1_193/ │ └─────────────────────┴──────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 7 rows in set. Elapsed: 0.002 sec.
Option B: Create a new table with hot/cold separation
Use this option when creating a new table. Define the TTL and storage policy at creation time.
CREATE TABLE <yourDatabaseName>.<yourTableName> [ON CLUSTER cluster_emr]
(
column1 Type1,
column2 Type2,
...
) ENGINE = MergeTree() -- or Replicated*MergeTree()
PARTITION BY <yourPartitionKey>
ORDER BY <yourSortKey>
TTL <yourTtlKey> TO VOLUME 'remote'
SETTINGS storage_policy = 'oss_ttl';
Example:
CREATE TABLE test.test
(
`id` UInt32,
`t` DateTime
)
ENGINE = MergeTree()
PARTITION BY toStartOfFiveMinute(t)
ORDER BY id
TTL toStartOfMinute(addMinutes(t, 5)) TO VOLUME 'remote'
SETTINGS storage_policy = 'oss_ttl';
This table keeps the last 5 minutes of data on local disks. Data generated more than 5 minutes ago is automatically moved to the remote volume in OSS.
Advanced configuration
Additional server parameters
server-config
-
merge_tree.allow_remote_fs_zero_copy_replication: Set totrueto enable Replicated*MergeTree tables to replicate only metadata (not data) to OSS. This creates multiple metadata replicas pointing to the same OSS objects for each shard, reducing replication traffic.
server-users (profile settings)
-
profile.${your-profile-name}.s3_min_upload_part_size: If the data in the write buffer exceeds this size, ClickHouse writes the data to OSS. -
profile.${your-profile-name}.s3_max_single_part_upload_size: If the data in the write buffer exceeds this size, ClickHouse uses MultipartUpload instead of a single-part upload.