The configuration parameters for the ClickHouse service of an E-MapReduce (EMR) ClickHouse cluster include client parameters, server parameters, user permission parameters, and extended parameters. This topic describes how to configure the ClickHouse client, ClickHouse server, and extended parameters for the ClickHouse service.
Background information
Item | References |
---|---|
ClickHouse client | client-config |
ClickHouse server | server-config |
Extended parameters | server-metrika |
User permissions | Configure user permissions |
Before you begin
An EMR ClickHouse cluster is created. For more information, see Create a ClickHouse cluster.
Usage notes
- If you can add parameters to the yandex tag, directly add the parameters. This eliminates the need to add the parameters in the EMR console.
- If a nested parameter is used, separate layers in the nested parameter with periods
(.).
For example, on the server-users tab, you can configure the nested parameter users.aliyun.password for a newly added user named aliyun. The value of this parameter is a password. You can specify a custom password.
- When you add a custom parameter, do not specify the parameter name and parameter value in XML format.
client-config
The parameters on the client-config tab are used to generate the config.xml file that is used by a ClickHouse client. You can go to the ClickHouse service page of the EMR console, click client-config on the Configure tab, and then set the following parameters.
Parameter | Description |
---|---|
user | The username that is used to log on to the ClickHouse client. Default value: default. |
password | The password that is used to log on to the ClickHouse client. By default, this parameter is left empty. |
prompt_by_server_display_name.production | The prompt that is customized for the ClickHouse client. The prompt varies based on the value of the display_name parameter on the server-config tab. If you set the display_name parameter to default, the prompt is the value of the prompt_by_server_display_name.default parameter on the client-config tab. For more information about the color of prompts, see Color prompts with readline and tip_colors_and_formatting. |
prompt_by_server_display_name.default | |
prompt_by_server_display_name.test |
server-config
The parameters on the server-config tab are used to generate the config.xml file that is used by a ClickHouse server. You can go to the ClickHouse service page of the EMR console, click server-config on the Configure tab, and then set the following parameters.
Parameter | Description |
---|---|
tcp_port | The TCP port that is used to communicate with the ClickHouse client. Default value: 9000. |
logger.count | The maximum number of archived ClickHouse log files. If the number of archived log files reaches the value of this parameter, ClickHouse deletes the earliest archived log files. Default value: 10. |
logger.level | The level of the logs. Default value: information. Valid values sorted based on the urgency degree: none, fatal, critical, error, warning, notice, information, debug, and trace. A value of none indicates that logging is disabled. |
logger.size | The maximum size of a log file. If the size of a log file reaches the value of this parameter, ClickHouse archives and renames the log file and creates another log file. Default value: 1000M. |
distributed_ddl.path | The path that is used by ZooKeeper to store DDL query queues. Default value: /clickhouse/task_queue/ddl. Unless otherwise specified, the CREATE, DROP, ALTER, and RENAME statements that are executed in the ClickHouse cluster affect only the machine that is used to process queries. You can set the parameters that are prefixed with distributed_ddl to allow the queries to be run in the ClickHouse cluster. These parameters take effect only if ZooKeeper is enabled. |
default_database | The name of the default database. Default value: default. |
uncompressed_cache_size | The cache size of the decompressed block if the MergeTree table engine is used. Default
value: 0.
If you use the default value, caching is disabled. |
timezone | The time zone of the ClickHouse server. Default value: Asia/Shanghai. |
max_session_timeout | The maximum session timeout. Unit: seconds. Default value: 3600. |
default_session_timeout | The default session timeout. Unit: seconds. Default value: 60. |
max_concurrent_queries | The maximum number of queries that can be processed in parallel. Default value: 0. |
keep_alive_timeout | The time that is required for a request to be sent to the ClickHouse service before the existing connection is closed. Unit: seconds. Default value: 10. |
http_port | The HTTP port that is used to communicate with the ClickHouse server. Default value:
8123.
The Java Database Connectivity (JDBC) of open source ClickHouse also uses this port to access a ClickHouse cluster. For more information, see clickhouse-jdbc. |
listen_host | The IP address on which the ClickHouse server listens. You can set this parameter to an IPv4 or IPv6 address. If you set this parameter to ::, all IP addresses are allowed. You can configure multiple IP addresses. Separate multiple IP addresses with commas (,). Example: 127.0.0.1,localhost. Default value: 0.0.0.0. |
default_profile | The default name of the profile. Default value: default. |
mark_cache_size | The approximate size of the cache that is used by the mark index if the MergeTree table engine is used. Default value: 5368709120. Unit: bytes. |
merge_tree.allow_remote_fs_zero_copy_replication |
Set the value to true. This way, the engine of the Replicated*MergeTree type replicates the metadata that points to the HDFS disk to generate multiple metadata replicas for the same shard in the ClickHouse cluster. |
transaction.enable_public_ip |
The IP address that is used to identify a transaction in the ClickHouse server. By default, a private IP address is used. Set the value to true to use a public IP address. However, you must assign public IP addresses for all nodes. |
server-metrika
The parameters on the server-metrika tab are used to generate the metrika.xml file. By default, the metrika.xml file is referenced by the config.xml file of the ClickHouse server. You can go to the ClickHouse service page of the EMR console, click server-metrika on the Configure tab, and then set the following parameters.
Parameter | Description |
---|---|
clickhouse_compression | The data compression settings for tables that use the MergeTree engine. For more information,
see Server Settings. By default, this parameter is left empty.
You can set this parameter if you want to enable data compression. |
storage_configuration | The custom disk information. |
zookeeper_servers | The information about ZooKeeper servers that are used to configure a ClickHouse cluster.
The default value is the information of a ZooKeeper server that is created when you
create a ClickHouse cluster. You can specify multiple ZooKeeper servers. Separate
the information of the ZooKeeper servers with commas (,), such as emr-header-1.cluster-12345:2181,emr-worker-1.cluster-12345:2181,emr-worker-2.cluster-12345:2181 .
|
quotas_default | You can configure multiple quotas to flexibly adjust resource limits. This parameter specifies the value of the quota that is named default. You can add custom quota settings. |
clickhouse_remote_servers | The information about shards and replicas that you configure for a ClickHouse cluster.
The default value is the topology that is generated based on the numbers of shards
and replicas that are configured when you create the ClickHouse cluster.
Important Change the value of this parameter only when it is necessary. We recommend that you
do not manually change the number of shards, the number of replicas, or the topology.
Otherwise, errors may occur when you write data to or query data from a ClickHouse
cluster.
|