Configure the ClickHouse service in an E-MapReduce (EMR) ClickHouse cluster using Extensible Markup Language (XML) configuration files. Parameters are organized across three configuration tabs in the EMR console: client-config, server-config, and server-metrika. User permission parameters are managed separately.
Prerequisites
Before you begin, make sure that you have:
-
A running EMR ClickHouse cluster. See Create a ClickHouse cluster.
Usage notes
All three configuration tabs generate XML files. Take note of these rules when adding custom parameters:
-
If a parameter belongs to the
yandextag, add it directly — no console action is needed. -
For nested parameters, separate each layer with a period (
.). For example, to set a password for a user namedaliyunon the server-users tab, useusers.aliyun.passwordas the parameter name. -
Do not specify parameter names or values in XML format when adding custom parameters through the console.
Configuration tabs reference
| Tab | Generated file | Description |
|---|---|---|
| client-config | config.xml |
ClickHouse client configuration |
| server-config | config.xml |
ClickHouse server configuration |
| server-metrika | metrika.xml |
Extended parameters, including ZooKeeper, shards, replicas, and compression. Referenced by the server config.xml by default. |
| User permissions | — | See Configure user permissions. |
client-config
Parameters on the client-config tab configure the ClickHouse client. Navigate to the ClickHouse service page in the EMR console, click the Configure tab, and then click client-config.
| Parameter | Default | Description |
|---|---|---|
user |
default |
Username for logging on to the ClickHouse client. |
password |
(blank) | Password for logging on to the ClickHouse client. |
prompt_by_server_display_name.production |
— | Custom prompt for the ClickHouse client. The prompt changes based on the display_name value set on the server-config tab. If display_name is set to default, the client uses prompt_by_server_display_name.default. For prompt color options, see Color prompts with readline and tip_colors_and_formatting. |
prompt_by_server_display_name.default |
— | Prompt used when display_name is default. |
prompt_by_server_display_name.test |
— | Prompt used when display_name is test. |
server-config
Parameters on the server-config tab configure the ClickHouse server. Navigate to the ClickHouse service page in the EMR console, click the Configure tab, and then click server-config.
Network
| Parameter | Default | Description |
|---|---|---|
tcp_port |
9000 |
TCP port for ClickHouse client communication. |
http_port |
8123 |
HTTP port for ClickHouse server communication. Open source ClickHouse Java Database Connectivity (JDBC) drivers (see clickhouse-jdbc) also use this port. |
listen_host |
0.0.0.0 |
IP address the ClickHouse server listens on. Accepts IPv4 or IPv6 addresses. Set to :: to listen on all interfaces. Separate multiple addresses with commas, for example: 127.0.0.1,localhost. |
keep_alive_timeout |
10 |
Seconds ClickHouse waits for incoming HTTP requests before closing an idle connection. |
Logging
All three logging parameters control the log rotation policy.
| Parameter | Default | Description |
|---|---|---|
logger.level |
information |
Log level. Valid values in order of urgency: none (logging disabled), fatal, critical, error, warning, notice, information, debug, trace. |
logger.size |
1000M |
Maximum size of a log file. When this limit is reached, ClickHouse archives and renames the file and creates a new one. |
logger.count |
10 |
Maximum number of archived log files to keep. When the limit is reached, the oldest archived files are deleted. |
Sessions and queries
| Parameter | Default | Unit | Description |
|---|---|---|---|
max_session_timeout |
3600 |
seconds | Maximum session timeout. |
default_session_timeout |
60 |
seconds | Default session timeout. |
max_concurrent_queries |
0 |
— | Maximum number of queries processed in parallel. 0 means no limit. |
Caching
| Parameter | Default | Description |
|---|---|---|
uncompressed_cache_size |
0 |
Cache size for decompressed blocks when using the MergeTree table engine. 0 disables caching. |
mark_cache_size |
5368709120 |
Approximate size (in bytes) of the mark index cache for MergeTree table engine tables. |
Database
| Parameter | Default | Description |
|---|---|---|
default_database |
default |
Default database name. |
default_profile |
default |
Default settings profile name. |
timezone |
Asia/Shanghai |
Time zone of the ClickHouse server. |
Distributed DDL
| Parameter | Default | Description |
|---|---|---|
distributed_ddl.path |
/clickhouse/task_queue/ddl |
ZooKeeper path for the DDL query queue. By default, CREATE, DROP, ALTER, and RENAME statements affect only the node that processes the query. Setting distributed_ddl parameters enables these statements to run across the entire ClickHouse cluster. Requires ZooKeeper to be enabled. |
Replication
| Parameter | Default | Description |
|---|---|---|
merge_tree.allow_remote_fs_zero_copy_replication |
— | Set to true to enable metadata replication for the Replicated\*MergeTree engine type. With this enabled, metadata pointing to the Hadoop Distributed File System (HDFS) disk is replicated to generate multiple metadata replicas for the same shard. |
transaction.enable_public_ip |
— | By default, ClickHouse uses a private IP address to identify transactions. Set to true to use a public IP address instead. All nodes must have public IP addresses assigned. |
server-metrika
Parameters on the server-metrika tab generate the metrika.xml file, which the ClickHouse server config.xml references by default. Navigate to the ClickHouse service page in the EMR console, click the Configure tab, and then click server-metrika.
| Parameter | Default | Description |
|---|---|---|
clickhouse_compression |
(blank) | Data compression settings for MergeTree engine tables. By default, this parameter is left empty. Set this parameter to enable data compression. For more information, see Server Settings. |
storage_configuration |
— | Custom disk configuration. |
zookeeper_servers |
ZooKeeper server created at cluster creation | ZooKeeper server information for the ClickHouse cluster. Separate multiple servers with commas, for example: emr-header-1.cluster-12345:2181,emr-worker-1.cluster-12345:2181,emr-worker-2.cluster-12345:2181. |
quotas_default |
— | Quota settings for resource limits. Configures the quota named default. Add custom quotas as needed. |
clickhouse_remote_servers |
Topology based on shard and replica count at cluster creation | Shard and replica configuration for the ClickHouse cluster. Important
Change only when necessary. Manually modifying the shard count, replica count, or topology may cause errors when writing data to or querying data from the cluster. |
References
What to do next
To modify or add configuration parameters, see Manage configuration items.