The configuration parameters for the ClickHouse service of an E-MapReduce (EMR) ClickHouse cluster include client parameters, server parameters, user permission parameters, and extended parameters. This topic describes how to configure the ClickHouse client, ClickHouse server, and extended parameters for the ClickHouse service.

Background information

The following table provides the references of the configuration parameters for the ClickHouse service.
Item References
ClickHouse client client-config
ClickHouse server server-config
Extended parameters server-metrika
User permissions Configure user permissions

Before you begin

An EMR ClickHouse cluster is created. For more information, see Create a ClickHouse cluster.

Usage notes

Extensible Markup Language (XML) files are used to configure the ClickHouse service. An XML file can contain nested parameters and nested parameter values. Take note of the following rules when you add custom parameters:
  • If you can add parameters to the yandex tag, directly add the parameters. This eliminates the need to add the parameters in the EMR console.
  • If a nested parameter is used, separate layers in the nested parameter with periods (.).

    For example, on the server-users tab, you can configure the nested parameter users.aliyun.password for a newly added user named aliyun. The value of this parameter is a password. You can specify a custom password.

  • When you add a custom parameter, do not specify the parameter name and parameter value in XML format.

client-config

The parameters on the client-config tab are used to generate the config.xml file that is used by a ClickHouse client. You can go to the ClickHouse service page of the EMR console, click client-config on the Configure tab, and then set the following parameters.

Parameter Description
user The username that is used to log on to the ClickHouse client. Default value: default.
password The password that is used to log on to the ClickHouse client. By default, this parameter is left empty.
prompt_by_server_display_name.production The prompt that is customized for the ClickHouse client. The prompt varies based on the value of the display_name parameter on the server-config tab. If you set the display_name parameter to default, the prompt is the value of the prompt_by_server_display_name.default parameter on the client-config tab. For more information about the color of prompts, see Color prompts with readline and tip_colors_and_formatting.
prompt_by_server_display_name.default
prompt_by_server_display_name.test

server-config

The parameters on the server-config tab are used to generate the config.xml file that is used by a ClickHouse server. You can go to the ClickHouse service page of the EMR console, click server-config on the Configure tab, and then set the following parameters.

Parameter Description
tcp_port The TCP port that is used to communicate with the ClickHouse client. Default value: 9000.
logger.count The maximum number of archived ClickHouse log files. If the number of archived log files reaches the value of this parameter, ClickHouse deletes the earliest archived log files. Default value: 10.
logger.level The level of the logs. Default value: information. Valid values sorted based on the urgency degree: none, fatal, critical, error, warning, notice, information, debug, and trace. A value of none indicates that logging is disabled.
logger.size The maximum size of a log file. If the size of a log file reaches the value of this parameter, ClickHouse archives and renames the log file and creates another log file. Default value: 1000M.
distributed_ddl.path The path that is used by ZooKeeper to store DDL query queues. Default value: /clickhouse/task_queue/ddl. Unless otherwise specified, the CREATE, DROP, ALTER, and RENAME statements that are executed in the ClickHouse cluster affect only the machine that is used to process queries. You can set the parameters that are prefixed with distributed_ddl to allow the queries to be run in the ClickHouse cluster. These parameters take effect only if ZooKeeper is enabled.
default_database The name of the default database. Default value: default.
uncompressed_cache_size The cache size of the decompressed block if the MergeTree table engine is used. Default value: 0.

If you use the default value, caching is disabled.

timezone The time zone of the ClickHouse server. Default value: Asia/Shanghai.
max_session_timeout The maximum session timeout. Unit: seconds. Default value: 3600.
default_session_timeout The default session timeout. Unit: seconds. Default value: 60.
max_concurrent_queries The maximum number of queries that can be processed in parallel. Default value: 0.
keep_alive_timeout The time that is required for a request to be sent to the ClickHouse service before the existing connection is closed. Unit: seconds. Default value: 10.
http_port The HTTP port that is used to communicate with the ClickHouse server. Default value: 8123.

The Java Database Connectivity (JDBC) of open source ClickHouse also uses this port to access a ClickHouse cluster. For more information, see clickhouse-jdbc.

listen_host The IP address on which the ClickHouse server listens. You can set this parameter to an IPv4 or IPv6 address. If you set this parameter to ::, all IP addresses are allowed. You can configure multiple IP addresses. Separate multiple IP addresses with commas (,). Example: 127.0.0.1,localhost. Default value: 0.0.0.0.
default_profile The default name of the profile. Default value: default.
mark_cache_size The approximate size of the cache that is used by the mark index if the MergeTree table engine is used. Default value: 5368709120. Unit: bytes.
merge_tree.allow_remote_fs_zero_copy_replication

Set the value to true. This way, the engine of the Replicated*MergeTree type replicates the metadata that points to the HDFS disk to generate multiple metadata replicas for the same shard in the ClickHouse cluster.

transaction.enable_public_ip

The IP address that is used to identify a transaction in the ClickHouse server. By default, a private IP address is used.

Set the value to true to use a public IP address. However, you must assign public IP addresses for all nodes.

server-metrika

The parameters on the server-metrika tab are used to generate the metrika.xml file. By default, the metrika.xml file is referenced by the config.xml file of the ClickHouse server. You can go to the ClickHouse service page of the EMR console, click server-metrika on the Configure tab, and then set the following parameters.

Parameter Description
clickhouse_compression The data compression settings for tables that use the MergeTree engine. For more information, see Server Settings. By default, this parameter is left empty.

You can set this parameter if you want to enable data compression.

storage_configuration The custom disk information.
zookeeper_servers The information about ZooKeeper servers that are used to configure a ClickHouse cluster. The default value is the information of a ZooKeeper server that is created when you create a ClickHouse cluster. You can specify multiple ZooKeeper servers. Separate the information of the ZooKeeper servers with commas (,), such as emr-header-1.cluster-12345:2181,emr-worker-1.cluster-12345:2181,emr-worker-2.cluster-12345:2181.
quotas_default You can configure multiple quotas to flexibly adjust resource limits. This parameter specifies the value of the quota that is named default. You can add custom quota settings.
clickhouse_remote_servers The information about shards and replicas that you configure for a ClickHouse cluster. The default value is the topology that is generated based on the numbers of shards and replicas that are configured when you create the ClickHouse cluster.
Important Change the value of this parameter only when it is necessary. We recommend that you do not manually change the number of shards, the number of replicas, or the topology. Otherwise, errors may occur when you write data to or query data from a ClickHouse cluster.

References

For more information about the ClickHouse parameters, see the following official documentation:

What to do next

For more information about how to modify or add parameters, see Manage configuration items.