This topic describes how to configure timeout-related parameters in ApsaraDB for ClickHouse. This topic also describes how to query the cluster status when timeouts occur.

Background information

The kernel of ApsaraDB for ClickHouse provides several timeout-related parameters and allows you to interact with it by using multiple protocols. This topic describes how to configure timeout-related parameters of HTTP and TCP.

HTTP (port number: 8123)

HTTP is the most popular interaction protocol that is used by ApsaraDB for ClickHouse clusters in production environments. A number of tools and services adopt HTTP, such as the Java Database Connectivity (JDBC) driver that is provided by ClickHouse, Alibaba Cloud Data Management (DMS), and DataGrip. The frequently used HTTP port is 8123.

Handle timeouts by modifying the distributed_ddl_task_timeout parameter

The distributed_ddl_task_timeout parameter specifies the timeout period of distributed data definition language (DDL) queries that contain the on cluster clause. The default value is 180 seconds. You can run the following command in DMS to globally modify this parameter. After the parameter is modified, you must restart your cluster.

set global on cluster default distributed_ddl_task_timeout = 1800;

Distributed DDL queries are asynchronously executed based on task queues that are created on ZooKeeper nodes. The timeout of a distributed DDL query does not mean that the query has failed. After a distributed DDL query times out, the query is still waiting to be executed. You do not need to submit the query again.

Handle timeouts by modifying the max_execution_time parameter
The max_execution_time parameter specifies the timeout period of queries. The default value is 7,200 seconds in DMS and 30 seconds in the JDBC driver and DataGrip. If the waiting time of a query exceeds the specified timeout period, the query is automatically canceled. You can modify this parameter directly in query statements. For example, you can write the following query statement: select * from system.numbers settings max_execution_time = 3600. You can also run the following command in DMS to globally modify this parameter:
set global on cluster default max_execution_time = 3600;
Handle timeouts by modifying the socket_timeout parameter
The socket_timeout parameter specifies the timeout period when the client listens to sockets over HTTP and waits for return results. The default value is 7,200 seconds in DMS and 30 seconds in the JDBC driver and DataGrip. This parameter is not a built-in parameter of ClickHouse, but a parameter of the JDBC driver for HTTP. This parameter sets a time limit for the client to wait for results. The max_execution_time parameter may become invalid due to the setting of this parameter. In most cases, when you modify the max_execution_time parameter, you also need to modify the socket_timeout parameter. Set the value of the socket_timeout parameter slightly greater than the value of the max_execution_time parameter. To set the socket_timeout parameter, add the socket_timeout property to the JDBC connection string. Example: 'jdbc:clickhouse://127.0.0.1:8123/default?socket_timeout=3600000'.
Handle SLB disconnections
If no data packet is transmitted over a connection between a Server Load Balancer (SLB) instance and a client within the specified period, the client receives the "read timeout" error message. In this case, queries that are sent over the connection cannot be traced. You can run the following commands in DMS to globally modify the send_progress_in_http_headers and http_headers_progress_interval_ms parameters. After the parameters are modified, you must restart your cluster.
set global on cluster default send_progress_in_http_headers = 1; 
set global on cluster default http_headers_progress_interval_ms = 60000; 
If the kernel version of your cluster is 20.8, you need to run the following commands. If the parameters fail to be set after you run the following commands, you can apply for a minor version update.
set global on cluster default http_server_enable_tcp_keep_alive = 1; 
set global on cluster default tcp_keep_alive_timeout = 60; 

After the send_progress_in_http_headers parameter is set to 1, the ApsaraDB for ClickHouse server continuously sends messages that contain the query execution progress information in an HTTP header to the client. This way, data packets are transmitted over the connection all the time to prevent the connection timeout.

If the client is disconnected from the network, the queries that are sent over HTTP can still be executed. You can query system tables to check whether the queries are executed.
  • Query the queries that are being executed on all the nodes in the ApsaraDB for ClickHouse cluster.
    select * from remote(default, system, processes) where query like 'XXX'
  • Query the execution results of the queries that are executed on the current day. The execution results include both successful query results and error information of failed queries.
    select * from remote(default, system, query_log) where event_date = toDate(now()) and query like 'XXX'
Handle the hanging of a client that connects to the ApsaraDB for ClickHouse server by using the server IP address

When the JDBC client on an Elastic Compute Service (ECS) instance accesses the ApsaraDB for ClickHouse server in another security group, a silent connection error may occur on the ECS instance. The cause is that the IP address of the ApsaraDB for ClickHouse server is not added to the whitelist of the security group to which the ECS instance belongs. If the system takes a long time to obtain the query result for a request of the client, the returned result may fail to be sent to the client because no route is found in the routing table. In this case, the client is hung.

You can resolve this issue in the same way that you handle SLB disconnections by setting the send_progress_in_http_headers parameter to 1. This method works in most cases. In rare cases, this issue persists after you set the send_progress_in_http_headers parameter to 1. If this happens, you can add the IP address of the ApsaraDB for ClickHouse server to the whitelist of the security group to which the ECS instance belongs. Then, check whether the issue is resolved.

TCP (port number: 3306)

TCP is mostly used in scenarios where interactive analysis is performed by using the built-in command-line tool of ClickHouse. The frequently used TCP port is 3306. TCP uses keepalive packets that are regularly sent to ensure the liveness of a connection. Therefore, TCP does not involve socket timeouts. The only timeout-related parameters of TCP that you need to configure are distributed_ddl_task_timeout and max_execution_time. You can configure the two parameters in the same way that you configure the parameters for HTTP connections.