The configuration parameters for the ClickHouse service of an E-MapReduce (EMR) ClickHouse cluster include client parameters, server parameters, user permission parameters, and extended parameters. This topic describes how to configure the ClickHouse client, ClickHouse server, and extended parameters for the ClickHouse service.
Background information
Item | References |
---|---|
ClickHouse client | client-config |
ClickHouse server | server-config |
Extended parameters | server-metrika |
User permissions | Configure user permissions |
Prerequisites
A ClickHouse cluster is created. For more information, see Create a cluster.
Precautions
- If you can add parameters to the yandex tag, directly add the parameters. This eliminates the need to add the parameters in the EMR console.
- If a nested parameter is used, separate layers in the nested parameter with periods
(.).
For example, on the server-users tab, you can configure the nested parameter users.aliyun.password for a newly added user named aliyun. The value of this parameter is a password. You can specify a custom password.
- When you add a custom parameter, do not specify the parameter name and parameter value in XML format.
client-config
The parameters on the client-config tab are used to generate the config.xml file that is used by a ClickHouse client. You can go to the ClickHouse service page of the EMR console, click client-config on the Configure tab, and then configure the following parameters.
Parameter | Description |
---|---|
user | The username that is used to log on to the ClickHouse client. Default value: default. |
password | The password that is used to log on to the ClickHouse client. This parameter is left empty by default. |
prompt_by_server_display_name.production | The prompt that is customized for the ClickHouse client. The prompt varies based on the value of the display_name parameter on the server-config tab. If you set the display_name parameter to default, the prompt is the value of the prompt_by_server_display_name.default parameter on the client-config tab. For more information about the color of prompts, see Color prompts with readline and tip_colors_and_formatting. |
prompt_by_server_display_name.default | |
prompt_by_server_display_name.test |
server-config
The parameters on the server-config tab are used to generate the config.xml file that is used by a ClickHouse server. You can go to the ClickHouse service page of the EMR console, click server-config on the Configure tab, and then configure the following parameters.
Parameter | Description |
---|---|
tcp_port | The TCP port that is used to communicate with the ClickHouse client. Default value: 9000. |
logger.count | The maximum number of archived ClickHouse log files. If the number of archived log files reaches the value of this parameter, ClickHouse deletes the earliest archived log files. Default value: 10. |
logger.errorlog | The path that is used by the ClickHouse server to store error logs. Default value: /var/log/clickhouse-server/clickhouse-server.err.log. |
logger.level | The level of the logs. The default level is information. Valid values (sorted based on the urgency degree): none, fatal, critical, error, warning, notice, information, debug, and trace. none indicates that logging is disabled. |
logger.size | The maximum size of a log file. If the size of a log file reaches the value of this parameter, ClickHouse archives and renames the log file and creates another log file. Default value: 1000M. |
logger.path | The path that is used by the ClickHouse server to store common logs. The default path is /var/log/clickhouse-server/clickhouse-server.log. The log file records logs of the level specified by the logger.level parameter. |
access_control_path | The path of the folder that is used by the ClickHouse server to store the configurations of the users and roles that are created by executing SQL statements. Default value: /var/lib/clickhouse/access/. |
user_files_path | The path that stores user files. This parameter is used in the file() function of a table. Default value: /var/lib/clickhouse/user_files/. |
path_to_regions_hierarchy_file | The path that stores regional hierarchy files. This parameter is used by the ClickHouse internal dictionary. This parameter is left empty by default. |
path_to_regions_names_files | The path that stores files that contain region names. This parameter is used by the ClickHouse internal dictionary. This parameter is left empty by default. |
distributed_ddl.path | The path that is used by ZooKeeper to store Data Definition Language (DDL) query queues. Default value: /clickhouse/task_queue/ddl. Unless otherwise specified, the CREATE, DROP, ALTER, and RENAME statements that are executed in the ClickHouse cluster affect only the machine that is used to process queries. You can configure the parameters that are prefixed with distributed_ddl to allow the queries to be run in the ClickHouse cluster. These parameters take effect only if ZooKeeper is enabled. |
tmp_policy | The policy that is used to store temporary data that is generated when large table
queries are processed. This parameter is left empty by default.
You can set this parameter to a value based on disk policies that are specified by
the storage_configuration parameter on the server-metrika tab.
Note If this parameter is left empty, the tmp_path parameter takes effect. If this parameter is specified, the tmp_path parameter is ignored.
|
path | The path to the directory of data files. You must add a forward slash (/) to the end of the path. Default value: /var/lib/clickhouse/access/. |
https_port | The HTTPS port that is used to communicate with the ClickHouse server. Parameters related to OpenSSL are required only if you configure the https_port parameter. If you specify both the https_port parameter and the http_port parameter, the https_port parameter is ignored. This parameter is left empty by default. |
query_log.flush_interval_milliseconds | If log_queries=1 is configured for the profile that you use, the information about threads that are
used for queries is stored in a table. The following parameters that are prefixed
with query_log can be used to configure the information storage:
|
query_log.engine | |
query_log.partition_by | |
query_log.database | |
query_log.table | |
interserver_http_credentials.user | The credentials. In most cases, if the name of the engine that is used by the table
is prefixed with Replicated, table replication does not require authentication. You
can configure the parameters to enable authentication. The credentials are used only
for communication between replicas and are independent of the credentials of the ClickHouse
client.
|
interserver_http_credentials.password | |
mlock_executable | Specifies whether to call the mlockall function. If you call the mlockall function
after the ClickHouse service is started, the latency of the first query can be reduced
and the executable file of the ClickHouse service can be prevented from being called
when the I/O load is high. Default value: false.
Note We recommend that you set this parameter to true. However, take note that the time
that is required to start the ClickHouse service is increased by several seconds if
you set this parameter to true.
|
trace_log.table | If the value of either the query_profiler_real_time_period_ns or query_profiler_cpu_time_period_ns parameter for the profile that you use is not 0, the stack trace that is recorded
by the query profiler is stored in a table. You can use the following parameters that
are prefixed with trace_log to configure the information storage.
|
trace_log.database | |
trace_log.partition_by | |
trace_log.engine | |
trace_log.flush_interval_milliseconds | |
disable_internal_dns_cache | Specifies whether to disable the internal DNS cache. The internal DNS cache is disabled
if you set this parameter to a value that is not 0. Default value: 0.
Note We recommend that you configure this parameter in a system in which an environment
frequently changes, such as Kubernetes.
|
listen_reuse_port | Specifies whether to allow a port to be reused among sockets. Valid values:
|
query_thread_log.table | If log_query_threads=1 is configured for the profile that you use, the information about threads that are
used for queries is stored in a table.
|
query_thread_log.database | |
query_thread_log.partition_by | |
query_thread_log.engine | |
query_thread_log.flush_interval_milliseconds | |
default_database | The name of the default database. Default value: default. |
http_server_default_response | The page that is automatically returned when you access the HTTP server of the ClickHouse service. |
display_name | The default prompt that is configured for the ClickHouse client. This parameter is left empty by default. |
builtin_dictionaries_reload_interval | The interval at which the built-in dictionary is reloaded. Unit: seconds. Default value: 3600. |
umask | The mask of file permissions. The default value of this parameter is 027, which specifies that operating system users cannot read files such as log and data files. Users in the same group can only read the files. |
uncompressed_cache_size | The cache size of the decompressed block if the MergeTree table engine is used. Default
value: 0.
If you use the default value, caching is disabled. |
timezone | The time zone of the ClickHouse server. Default value: Asia/Shanghai. |
max_session_timeout | The maximum session timeout. Unit: seconds. Default value: 3600. |
default_session_timeout | The default session timeout. Unit: seconds. Default value: 60. |
max_open_files | The maximum number of files that you can open. Default value: 262144.
Note The valid values of this parameter vary based on the operating system that you use.
If you leave this parameter empty, ClickHouse uses the value of the max_open_files parameter that is configured for the operating system.
|
tmp_path | The path that stores temporary data that is generated when large table queries are processed. You must add a forward slash (/) to the end of the path. Default value: /var/lib/clickhouse/tmp/. |
max_concurrent_queries | The maximum number of queries that can be processed in parallel. Default value: 100. |
tcp_port_secure | The TCP port that is used to communicate with the ClickHouse client. This parameter
is left empty by default.
Note Parameters related to OpenSSL are required only if you configure the tcp_port_secure
parameter.
|
listen_try | Specifies whether to immediately exit if the protocol such as IPv4 or IPv6 that is
specified by listen_host cannot be used.
|
mysql_port | The MySQL port that is used to communicate with the ClickHouse client. |
keep_alive_timeout | The time that is required for a request to be sent to the ClickHouse service before the existing connection is closed. Unit: seconds. Default value: 3. |
max_connections | The maximum number of connections allowed. Default value: 4096. |
dns_cache_update_period | The interval at which the IP addresses that are stored in the internal DNS cache of
the ClickHouse service are updated. Unit: seconds. Default value: 15.
The update is performed asynchronously in a separate system thread. |
path_to_regions_names_files | The path that stores the files that contain region names. This parameter is used by the ClickHouse internal dictionary. This parameter is left empty by default. |
include_from | The configuration file of the ClickHouse server is compiled based on XML. Some XML tags contain the include attribute. The content of these XML tags can be replaced by the configurations in the file referenced by include_from. Default value: /etc/ecm/clickhouse-conf/clickhouse-server/metrika.xml. |
interserver_http_port | The port that is used for data exchange between ClickHouse servers. Default value: 9009. |
dictionaries_config | The path that stores the configuration file of the external dictionary. The path can contain wildcards such as periods (.), asterisks (*), and question marks (?). Default value: *_dictionary.xml. |
http_port | The HTTP port that is used to communicate with the ClickHouse server. Default value:
8123.
The Java Database Connectivity (JDBC) of open source ClickHouse also uses this port to access a ClickHouse cluster. For more information, see clickhouse-jdbc. |
users_config | The path that stores the user configuration, access control configuration, resource limit configuration, and setting configuration files. Default value: users.xml. |
dictionaries_lazy_load | Specifies whether to delay the creation of a dictionary. Valid values:
|
listen_host | The IP address on which the ClickHouse server listens. You can set this parameter to an IPv4 or IPv6 address. If you set this parameter to ::, all IP addresses are allowed. You can configure multiple IP addresses. Separate multiple IP addresses with commas (,), such as 127.0.0.1,localhost. Default value: 0.0.0.0. |
default_profile | The default name of the profile. Default value: default. |
mark_cache_size | The approximate size of the cache that is used by the mark index if the MergeTree table engine is used. Default value: 5368709120. Unit: bytes. |
listen_backlog | The number of backlogs. Default value: 64. |
format_schema_path | The path that stores the schema of input data. Default value: /var/lib/clickhouse/format_schemas/. |
server-metrika
The parameters on the server-metrika tab are used to generate the metrika.xml file. By default, the metrika.xml file is referenced by the config.xml file of the ClickHouse server. You can go to the ClickHouse service page of the EMR console, click server-metrika on the Configure tab, and then configure the following parameters.
Parameter | Description |
---|---|
clickhouse_compression | The data compression settings for tables that use the MergeTree engine. For more information,
see Server Settings. This parameter is left empty by default.
You can configure this parameter if you want to enable data compression. |
storage_configuration | The custom disk information. Alibaba Cloud EMR automatically creates a ClickHouse data directory for each disk and creates the HDD in order disk policy for the disks. |
zookeeper_servers | The information about ZooKeeper servers that are used to configure a ClickHouse cluster.
The default value is the information of a ZooKeeper server that is created when you
create a ClickHouse cluster. You can specify multiple ZooKeeper servers. Separate
the information of the ZooKeeper servers with commas (,), such as emr-header-1.cluster-12345:2181,emr-worker-1.cluster-12345:2181,emr-worker-2.cluster-12345:2181 .
|
quotas_default | You can configure multiple quotas to flexibly adjust resource limits. This parameter specifies the value of the quota that is named default. You can add custom quota settings. |
clickhouse_remote_servers | The information about shards and replicas that you configure for a ClickHouse cluster. The default value is the topology that is generated based on the numbers of shards and replicas that are configured when you create the ClickHouse cluster. |
<cluster_emr>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>emr-header-1.cluster-12345</host>
<port>9000</port>
</replica>
<replica>
<host>emr-worker-1.cluster-12345</host>
<port>9000</port>
</replica>
</shard>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>emr-worker-2.cluster-12345</host>
<port>9000</port>
</replica>
<replica>
<host>emr-worker-3.cluster-12345</host>
<port>9000</port>
</replica>
</shard>
</cluster_emr>
<disks>
<disk1>
<path>/mnt/disk1/clickhouse/</path>
<keep_free_space_bytes>10485760</keep_free_space_bytes>
</disk1>
<disk2>
<path>/mnt/disk2/clickhouse/</path>
<keep_free_space_bytes>10485760</keep_free_space_bytes>
</disk2>
<disk3>
<path>/mnt/disk3/clickhouse/</path>
<keep_free_space_bytes>10485760</keep_free_space_bytes>
</disk3>
<disk4>
<path>/mnt/disk4/clickhouse/</path>
<keep_free_space_bytes>10485760</keep_free_space_bytes>
</disk4>
</disks>
<policies>
<hdd_in_order>
<volumes>
<single>
<disk>disk1</disk>
<disk>disk2</disk>
<disk>disk3</disk>
<disk>disk4</disk>
</single>
</volumes>
</hdd_in_order>
</policies>