As data ages, queries shift from recent records to historical archives. Keeping all data on local disks is expensive, while moving everything to HDFS sacrifices query performance. E-MapReduce (EMR) ClickHouse clusters solve this by supporting tiered storage through Hadoop Distributed File System (HDFS): recent data stays on local disks for fast queries, and older data moves automatically to HDFS to reduce storage costs — without affecting read or write performance.
Prerequisites
Before you begin, ensure that you have:
-
A ClickHouse cluster of EMR V5.5.0 or later. See Create a ClickHouse cluster
-
An HDFS service deployed in the same virtual private cloud (VPC) as the ClickHouse cluster — for example, the HDFS service in an EMR Hadoop cluster
-
Read and write permissions on the HDFS service
Limitations
Hot/cold data separation using HDFS is only supported on ClickHouse clusters of EMR V5.5.0 or later.
Key concepts
Understanding the storage hierarchy helps you read the configuration code in the steps that follow.
| Concept | Description |
|---|---|
| Disk | A storage device registered in ClickHouse — either a local disk or a remote store such as HDFS. |
| Volume | An ordered set of disks. Data moves across volumes according to storage policy rules. |
| Storage policy | A named set of volumes with rules that control how data moves between them, including time-to-live (TTL) expiration. |
In this guide, you define an HDFS disk, group local disks into a local volume and the HDFS disk into a remote volume, then apply a TTL rule to move data from local to remote after a specified time.
Step 1: Add an HDFS disk in the EMR console
-
Go to the Configure tab of the ClickHouse service.
-
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
-
In the top navigation bar, select the region where your cluster resides and select a resource group.
-
On the EMR on ECS page, find your cluster and click Services in the Actions column.
-
On the Services tab, click Configure in the ClickHouse section.
-
-
Click the server-metrika tab.
-
Update the
storage_configurationparameter.-
In
disks, add an HDFS disk entry:<disk_hdfs> <type>hdfs</type> <endpoint>hdfs://${your-hdfs-url}</endpoint> <min_bytes_for_seek>1048576</min_bytes_for_seek> <thread_pool_size>16</thread_pool_size> <objects_chunk_size_to_delete>1000</objects_chunk_size_to_delete> </disk_hdfs>Parameter Required Description disk_hdfs Yes Name of the disk. Customize as needed. type Yes Disk type. Must be hdfs. endpoint Yes Endpoint of the HDFS service, typically the NameNode address. If NameNode runs in high availability (HA) mode, the port is usually 8020; otherwise it is 9000. min_bytes_for_seek No Minimum positive seek offset. If the actual offset is smaller than this value, ClickHouse reads sequentially instead of seeking. Default: 1048576. thread_pool_size No Number of threads used to process restore requests on this disk. Default: 16. objects_chunk_size_to_delete No Maximum number of HDFS files deleted in a single batch. Default: 1000. -
In
policies, add a storage policy that maps local disks to alocalvolume and the HDFS disk to aremotevolume:<hdfs_ttl> <volumes> <local> <!-- Add all disks that currently use the default storage policy. --> <disk>disk1</disk> <disk>disk2</disk> <disk>disk3</disk> <disk>disk4</disk> </local> <remote> <disk>disk_hdfs</disk> </remote> </volumes> <move_factor>0.2</move_factor> </hdfs_ttl>NoteYou can also add this policy to the existing default storage policy instead of creating a new one.
-
-
Save the configuration.
-
Click Save.
-
In the dialog box, enter an execution reason, turn on Save and Deliver Configuration, and click Save.
-
-
Deploy the client configuration.
-
On the Configure tab, click Deploy Client Configuration.
-
In the dialog box, enter an execution reason and click OK.
-
In the Confirm dialog box, click OK.
-
Step 2: Verify the configuration
Log on to the ClickHouse cluster over SSH. See Log on to a cluster.
-
Start the ClickHouse client:
clickhouse-client -h core-1-1 -mNoteReplace
core-1-1with the name of the core node you logged on to. If the cluster has multiple core nodes, use any one. -
Check that the HDFS disk appears in
system.disks:SELECT * FROM system.disks;The configuration is correct if the output includes a row where
nameisdisk_hdfsandtypeishdfs:┌─name─────┬─path─────────────────────────────────┬───────────free_space─┬──────────total_space─┬─keep_free_space─┬─type──┐ │ default │ /var/lib/clickhouse/ │ 83868921856 │ 84014424064 │ 0 │ local │ │ disk1 │ /mnt/disk1/clickhouse/ │ 83858436096 │ 84003938304 │ 10485760 │ local │ │ disk2 │ /mnt/disk2/clickhouse/ │ 83928215552 │ 84003938304 │ 10485760 │ local │ │ disk3 │ /mnt/disk3/clickhouse/ │ 83928301568 │ 84003938304 │ 10485760 │ local │ │ disk4 │ /mnt/disk4/clickhouse/ │ 83928301568 │ 84003938304 │ 10485760 │ local │ │ disk_hdfs│ /var/lib/clickhouse/disks/disk_hdfs/ │ 18446744073709551615 │ 18446744073709551615 │ 0 │ hdfs │ └──────────┴──────────────────────────────────────┴──────────────────────┴──────────────────────┴─────────────────┴───────┘ -
Check that the
hdfs_ttlstorage policy appears insystem.storage_policies:SELECT * FROM system.storage_policies;The configuration is correct if the output includes two rows for
hdfs_ttl— one for thelocalvolume and one for theremotevolume:┌─policy_name──┬─volume_name─┬─volume_priority─┬─disks──────────────────────────────┬─volume_type─┬─max_data_part_size─┬─move_factor─┬─prefer_not_to_merge─┐ │ default │ single │ 1 │ ['disk1','disk2','disk3','disk4'] │ JBOD │ 0 │ 0 │ 0 │ │ hdfs_ttl │ local │ 1 │ ['disk1','disk2','disk3','disk4'] │ JBOD │ 0 │ 0.2 │ 0 │ │ hdfs_ttl │ remote │ 2 │ ['disk_hdfs'] │ JBOD │ 0 │ 0.2 │ 0 │ └──────────────┴─────────────┴─────────────────┴────────────────────────────────────┴─────────────┴────────────────────┴─────────────┴─────────────────────┘
Step 3: Apply hot/cold separation to tables
Choose one of the following approaches depending on whether you are working with an existing table or creating a new one.
Apply to an existing table
-
Check the current storage policy of the table:
SELECT storage_policy FROM system.tables WHERE database = '<database_name>' AND name = '<table_name>';If the output shows the
defaultpolicy, the table stores all data on local disks. Continue to the next step to add a remote volume. -
In the EMR console, add a
remotevolume to the default storage policy understorage_configuration:<default> <volumes> <single> <disk>disk1</disk> <disk>disk2</disk> <disk>disk3</disk> <disk>disk4</disk> </single> <!-- Add this remote volume. --> <remote> <disk>disk_hdfs</disk> </remote> </volumes> <!-- Required when the policy has more than one volume. --> <move_factor>0.2</move_factor> </default> -
Modify the TTL on the table to move data older than 5 minutes to the
remotevolume:ALTER TABLE <yourDataName>.<yourTableName> MODIFY TTL toStartOfMinute(addMinutes(t, 5)) TO VOLUME 'remote'; -
Verify that data parts are distributed across volumes:
SELECT partition, name, path FROM system.parts WHERE database = '<yourDataName>' AND table = '<yourTableName>' AND active = 1;The separation is working if you see paths on both local disks (
/mnt/disk{1..4}/clickhouse/...) and on the HDFS metadata path (/var/lib/clickhouse/disks/disk_hdfs/...). Hot data (within the TTL window) stays on local disks; cold data (past the TTL) moves to HDFS. Example output:┌─partition───────────┬─name─────────────────┬─path──────────────────────────────────────────────────────────────────────────────────────────────────┐ │ 2022-01-12 11:30:00 │ 1641958200_1_96_3 │ /var/lib/clickhouse/disks/disk_hdfs/store/156/156008ff-41bf-460c-8848-e34fad88c25d/1641958200_1_96_3/ │ │ 2022-01-12 11:35:00 │ 1641958500_97_124_2 │ /mnt/disk3/clickhouse/store/156/156008ff-41bf-460c-8848-e34fad88c25d/1641958500_97_124_2/ │ │ 2022-01-12 11:35:00 │ 1641958500_125_152_2 │ /mnt/disk4/clickhouse/store/156/156008ff-41bf-460c-8848-e34fad88c25d/1641958500_125_152_2/ │ │ 2022-01-12 11:35:00 │ 1641958500_153_180_2 │ /mnt/disk1/clickhouse/store/156/156008ff-41bf-460c-8848-e34fad88c25d/1641958500_153_180_2/ │ │ 2022-01-12 11:35:00 │ 1641958500_181_186_1 │ /mnt/disk4/clickhouse/store/156/156008ff-41bf-460c-8848-e34fad88c25d/1641958500_181_186_1/ │ │ 2022-01-12 11:35:00 │ 1641958500_187_192_1 │ /mnt/disk3/clickhouse/store/156/156008ff-41bf-460c-8848-e34fad88c25d/1641958500_187_192_1/ │ └─────────────────────┴──────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────┘
Create a new table with hot/cold separation
Use the hdfs_ttl storage policy and a TTL ... TO VOLUME 'remote' clause when creating the table.
Syntax
CREATE TABLE <yourDataName>.<yourTableName> [ON CLUSTER cluster_emr]
(
column1 Type1,
column2 Type2,
...
)
ENGINE = MergeTree() -- Replicated*MergeTree types are also supported.
PARTITION BY <yourPartitionKey>
ORDER BY <yourPartitionKey>
TTL <yourTtlKey> TO VOLUME 'remote'
SETTINGS storage_policy = 'hdfs_ttl';
Example
The following table keeps data from the last 5 minutes on local disks. After 5 minutes, data moves to the remote volume on HDFS.
CREATE TABLE test.test
(
`id` UInt32,
`t` DateTime
)
ENGINE = MergeTree()
PARTITION BY toStartOfFiveMinute(t)
ORDER BY id
TTL toStartOfMinute(addMinutes(t, 5)) TO VOLUME 'remote'
SETTINGS storage_policy = 'hdfs_ttl';
Additional parameters
The following optional parameters give you finer control over replication behavior.
server-config
merge_tree.allow_remote_fs_zero_copy_replication: Set to true to enable zero-copy replication for the Replicated\*MergeTree engine. ClickHouse replicates the metadata that points to the HDFS disk to generate multiple metadata replicas for the same shard in the ClickHouse cluster.
server-users
profile.${your-profile-name}.hdfs_replication: Controls the number of data replicas written to HDFS.