This topic provides answers to frequently asked questions about ApsaraDB for ClickHouse.

FAQ

What do I do if a timeout error occurs when I use a client tool to connect to an ApsaraDB for ClickHouse cluster?

To resolve the timeout error, perform the following operations:
  • Check whether the network connection can be established. You can run the ping command to check the network connection status. You can also run the telnet command to check whether port 3306 and port 8123 of the ApsaraDB for ClickHouse cluster are enabled.
  • Check whether a whitelist is configured for the ApsaraDB for ClickHouse cluster to which you want to connect. For more information, see Configure a whitelist.
  • Check whether the real IP address of the client is used. In most enterprises, dynamic IP addresses are assigned to clients that are connected to internal networks. In this case, the IP address that you can obtain may not be the real IP address of the client. You can use an IP address checker such as WhatsMyIP to obtain the real IP address of your client. For more information, see http://www.whatsmyip.net/.

What do I do if a different query result is returned each time I execute the same statement to query data from a table when no data is being written to the table?

Problem description: The number that is returned for the select count(*) statement is approximately half of the total number of data records or different query results are returned for the same statement.

Solution:
  • Check whether your ApsaraDB for ClickHouse cluster is a multi-node cluster. If your ApsaraDB for ClickHouse cluster has multiple nodes, you must create distributed tables on the nodes. Then, you can write data to the distributed tables and query data from the distributed tables. This way, the same query result is returned each time you execute the same statement to query data from the distributed tables. However, if distributed tables are not used, data of a different shard can be queried each time you execute the same statement. As a result, different query results are returned for the same statement. For more information, see Create a distributed table.
  • Check whether your cluster is an ApsaraDB for ClickHouse cluster of the Double-replica Edition. In an ApsaraDB for ClickHouse cluster of the Double-replica Edition, you must create replicated tables on nodes. This ensures data consistency between the replicas of a node. However, if you do not create replicated tables, the query result may vary based on the replica that is queried. For information about how to create a replicated table, see Table engines.

What do I do when a syntax error for the set global on cluster default statement occurs?

The syntax error can occur because of one of the following causes. A solution is provided for each cause.
  • Cause 1: The ApsaraDB for ClickHouse client parses the syntax of statements. However, the set global on cluster default statement is added only in the ApsaraDB for ClickHouse server. If the client version does not match the server version, the client reports a syntax error for the set global on cluster default statement.

    Solution:
    • Use tools that do not parse the syntax of statements on a client. For example, you can use DataGrip or DBeaver with a Java Database Connectivity (JDBC) driver.
    • You can also write a JDBC program to execute the set global on cluster default statement.
  • Cause 2: You specify a string value in the set global on cluster default statement. However, the value is not enclosed in single quotation marks (').

    Solution: Enclose a value of the string type in single quotation marks (').

What do I do if the resources are insufficient in the selected zone when I create an ApsaraDB for ClickHouse cluster?

You can select another zone in the same region or submit a ticket. ApsaraDB for ClickHouse clusters that are connected to the same virtual private cloud (VPC) can use the resources of different zones in the same region. The services without perceptible network latency can be delivered.

If I want to purchase an ApsaraDB for ClickHouse cluster, which version is recommended?

ApsaraDB for ClickHouse provides services based on the long-term support (LTS) kernel versions released by the open source community. In most cases, if a kernel version remains stable for over three months after release, ApsaraDB for ClickHouse allows you to purchase a cluster that uses the kernel version. We recommend that you purchase an ApsaraDB for ClickHouse cluster of V20.8.7.15 or later. An ApsaraDB for ClickHouse cluster of V20.8.7.15 allows you to enable tiered storage of hot and cold data and use the MaterializeMySQL database engine. For more information, see Tiered storage of hot data and cold data and MaterializeMySQL.

What is the difference between an ApsaraDB for ClickHouse cluster of the Single-replica Edition and an ApsaraDB for ClickHouse cluster of the Double-replica Edition?

  • For an ApsaraDB for ClickHouse cluster of the Single-replica Edition, each shard has no replica shards. This cannot ensure high availability. However, an ApsaraDB for ClickHouse cluster of the Single-replica Edition replicates data into multiple copies and stores data on Alibaba Cloud disks. This ensures data security in a cost-effective manner.
  • In an ApsaraDB for ClickHouse cluster of the Double-replica Edition, each shard has a replica shard. This enables disaster recovery. If a shard in the cluster is unavailable, the replica shard continues to provide services.

What do I do if a table that is created is not found when I query data from the table?

Cause: A DDL statement is executed on only one node.

Solution: Check whether the DDL statement that you execute contains the on cluster keyword. If this keyword is not contained in the DDL statement, recreate the table by executing the DDL statement that contains the on cluster keyword. For more information, see Syntax of statements that are used to create a table.

What do I do if the "longer than distributed_ddl_task_timeout (=xxx) seconds" error is returned when I execute a DDL statement to create a distributed table?

To resolve the error, execute the set global on cluster default distributed_ddl_task_timeout=xxx statement to modify the default timeout period. Replace xxx in the statement with the timeout period that you want to specify. The timeout period is measured in seconds. For more information about how to modify global parameters, see Modify parameters for an ApsaraDB for ClickHouse cluster.

What are the unique features of ApsaraDB for ClickHouse compared with the open source ClickHouse?

Multiple bugs of kernel versions released by the open source ClickHouse are fixed in ApsaraDB for ClickHouse. This ensures stable performance. ApsaraDB for ClickHouse also provides the following kernel features that are not supported by the open source ClickHouse:

Why am I unable to connect an external table in an ApsaraDB for ClickHouse cluster to a table on platforms such as MySQL, Hadoop Distributed File System (HDFS), and Kafka?

When you create an external table in an ApsaraDB for ClickHouse cluster of V20.3 or V20.8, the system automatically verifies the network connection between the ApsaraDB for ClickHouse cluster and the platform. If the external table is created, the network connection is established. The external table can fail to be created because of one of the following reasons:
  • The instance to which the table to be connected is deployed and the ApsaraDB for ClickHouse cluster are not connected to the same VPC. In this case, the network connection fails to be established.
  • A whitelist is configured for the MySQL instance. However, the CIDR block of the VPC to which the ApsaraDB for ClickHouse cluster is connected is not specified in the whitelist. In this case, you must add the CIDR block to the whitelist.

For example, you create an external table in an ApsaraDB for ClickHouse cluster. The external table is connected to a table in a Kafka instance. However, when you query data from the external table, no results are returned. A common cause is that data in the Kafka instance fails to be parsed based on the fields and the format defined in the schema of the external table. The error message provides the location of the parse error.

Why does the amount of data in the local table that stores the data of a Kafka external table remain unchanged after I write data to the external table?

To determine the cause, execute the select * from statement to query data from the external table. If an error message is reported, you can determine the cause based on the error message. In most cases, the error message indicates that data failed to be parsed. If a query result without errors is returned, check whether the fields in the local table and the fields in the external table are consistent. If data cannot be written to the local table, the fields in the local table and the fields in the external table are inconsistent. You can execute the following sample statement to write data to the local table:

insert into <Local table> as select * from <External table that is connected to a table in a Kafka instance>;

What do I do if an error occurs because the memory usage exceeds the limit after I execute the insert into select statement?

The error can occur because of one of the following causes. A solution is provided for each cause.
  • Cause 1: The memory usage is high.

    Solution: Modify the max_insert_threads parameter to reduce the memory usage.

  • Cause 2: You execute the insert into select statement to import data from one cluster to another cluster in ApsaraDB for ClickHouse.

    Solution: Migrate data by importing a file to the destination cluster. For more information, see Migrate data between two ClickHouse databases.

Why does an out-of-memory error or another memory-related error occur when I import data in formats such as ORC and Parquet from an Object Storage Service (OSS) object to an external table in an ApsaraDB for ClickHouse cluster?

A common cause is that the memory usage is high.

To resolve the error, perform the following operations:
  • Split the OSS object into multiple objects. Then, import the data from these objects to the external table in the ApsaraDB for ClickHouse cluster.
  • Scale up the memory capacity of the ApsaraDB for ClickHouse cluster. For more information, see Change configurations and scale resources.

What do I do if an error occurs because a large number of data parts are generated in an ApsaraDB for ClickHouse cluster?

A data part is generated each time you write data to a table in an ApsaraDB for ClickHouse cluster. If you write only one data record or a small amount of data each time, a large number of data parts are generated in your ApsaraDB for ClickHouse cluster. This causes high CPU loads that slow down the merge operations and queries. To prevent the scenario where a large number of data parts are generated in an ApsaraDB for ClickHouse cluster, the number of data parts is limited. If a large number of data parts are generated in an ApsaraDB for ClickHouse cluster, the "too many parts" error occurs. To resolve this error, increase the size of data that can be written in a batch. If the batch size of data cannot be adjusted, modify the max_partitions_per_insert_block parameter based on your business requirements.

What causes slow performance when I use DataX to import data to a table in an ApsaraDB for ClickHouse cluster?

The slow performance can be caused by one of the following reasons. A solution is provided for each cause.
  • Cause 1: The specified parameter values are not suitable for the scenario. In ApsaraDB for ClickHouse, we recommend that you write data in multiple batches. Make sure that each batch contains a large amount of data and a small number of concurrent data writing tasks run at the same time. In most cases, a batch of data can contain tens of thousands or even hundreds of thousands rows and the size of a row is 1 KB. You must estimate a row size and calculate the number of rows based on the row size.

    Solution: Set the maximum number of concurrent write requests to 10. You can modify parameters and try again.

  • Cause 2: In an exclusive resource group that is created in DataWorks, Elastic Compute Service (ECS) instances have low specifications that cannot ensure high performance. In an exclusive resource group, the low specifications of CPU cores or memory can impose limits on the request concurrency or outbound bandwidth. If you specify a large batch size and have insufficient memory capacity, garbage collection in Java can occur during DataWorks processes.

    Solution: You can check the specifications of an ECS instance based on the output logs of DataWorks.

  • Cause 3: The read operation in a data source is time-consuming.

    Solution: Search for totalWaitReaderTime and totalWaitWriterTime in output logs of DataWorks. If the value of totalWaitReaderTime is much greater than the value of totalWaitWriterTime, the read operation is more time-consuming than the write operation.

  • Cause 4: A public endpoint is used. The bandwidth provided by a public endpoint is limited. This cannot ensure high-performance data import or export.

    Solution: Replace the public endpoint with a VPC endpoint.

Why is the number of rows inconsistent between a table in a Hive instance and an external table in an ApsaraDB for ClickHouse cluster after data is synchronized between the tables?

To identify the cause of the inconsistency in the number of rows between the tables, perform the following steps:
  1. Check whether an error is reported during data synchronization based on a system table named query_log. If an error is reported, data loss may occur.
  2. Check whether a table engine that supports data deduplication is used. For example, ReplacingMergeTree is used. Compared with the table in the Hive instance, the external table in the ApsaraDB for ClickHouse cluster can have a fewer number of rows.
  3. Check whether the number of rows in the table that is stored on the Hive instance is correct. The number that you have obtained may be incorrect.

Why is the number of rows inconsistent between a table in a Kafka instance and an external table in an ApsaraDB for ClickHouse cluster after data is synchronized between the tables?

To identify the cause of the inconsistency in the number of rows between the tables, perform the following steps:
  1. Check whether an error is reported during data synchronization based on a system table named query_log. If an error is reported, data loss may occur.
  2. Check whether a table engine that supports data deduplication is used. For example, ReplacingMergeTree is used. Compared with the table in the Kafka instance, the external table in the ApsaraDB for ClickHouse cluster can have a fewer number of rows.
  3. Check whether the kafka_skip_broken_messages parameter is configured for the external table. If this parameter is configured, ApsaraDB for ClickHouse may skip the Kafka messages that fail to be parsed. As a result, the external table in the ApsaraDB for ClickHouse cluster may have a fewer number of rows than the table in the Kafka instance.

How do I use Spark or Flink to import data to an ApsaraDB for ClickHouse cluster?

How do I migrate data from a self-managed ClickHouse database to an ApsaraDB for ClickHouse cluster?

You can use one of the following solutions:

Why am I unable to find the data that I write to a table in an ApsaraDB for ClickHouse cluster?

In most cases, this is because a local table and its associated distributed table use different schemas. You can query a system table named system.distribution_queue to check whether an error occurs when you write data to the distributed table.

How do I add, delete, or modify a column by executing a DDL statement in ApsaraDB for ClickHouse?

For a local table, you can execute a DDL statement to add, delete, or modify a column. For a distributed table, you can choose different methods to add, delete, or modify a column based on the scenario:
  • If no data is being written to the distributed table, you can perform an add, delete, or modify operation on a column in the local table and then in the distributed table.
  • If data is being written to the distributed table, you can use one of the following procedures to perform different operations.
    Operation Procedure
    Add a nullable column.
    1. Perform an add or modify operation on a column in the local table.
    2. Perform an add or modify operation on a column in the distributed table.
    If data type conversion is supported, modify the data type of a column.
    Delete a nullable column.
    1. Delete a nullable column from the distributed table.
    2. Delete a nullable column from the local table.
    Add a non-nullable column.
    1. Stop writing data.
    2. Execute the SYSTEM FLUSH DISTRIBUTED statement.
    3. Perform an add, delete, or modify operation on a column in the local table.
    4. Perform an add, delete, or modify operation on a column in the distributed table.
    5. Resume writing data to the distributed table.
    Delete a non-nullable column.
    Modify the name of a column.

Why am I unable to find the table that I created?

The issue can be caused by one of the following reasons. A solution is provided for each cause.
  • Cause 1: The DDL statement that is used to create a table in an ApsaraDB for ClickHouse cluster is incorrect. A create table statement that is not used together with the on cluster default clause cannot be used to create a table in a distributed ApsaraDB for ClickHouse cluster. For example, if you create a table in a self-managed ClickHouse cluster by executing the create table statement, the table is created only on the server to which your client is connected even if you can query data from the table. If you connect your client to another server, you cannot find the table.

    Solution: When you create a table, execute the create table <Table name> on cluster default statement. The on cluster default clause applies the statement to all servers in the cluster named default. The following sample code provides an example.
    Create table test on cluster default (a UInt64) Engine = MergeTree() order by tuple();
  • Cause 2: A ReplicatedMergeTree table is incorrectly configured. The ReplicatedMergeTree table engine is developed based on the MergeTree table engine and supports data synchronization between replicated tables. Only MergeTree tables can be created in an ApsaraDB for ClickHouse cluster of the Single-replica Edition. Only ReplicatedMergeTree tables can be created in an ApsaraDB for ClickHouse cluster of the Double-replica Edition.

    Solution: ApsaraDB for ClickHouse allows you to synchronize data between replicated tables. Only ReplicatedMergeTree supports asynchronous replication. When you configure parameters for a ReplicatedMergeTree table, two parameters are related to paths in ZooKeeper. The following sample code provides an example:
    ReplicatedMergeTree('/clickhouse/tables/{database}/{table}/{shard}', '{replica}')
    The following table describes the parameters in the sample code.
    Parameter Description
    /clickhouse/tables/{database}/{table}/{shard} The only path in which the metadata of a shard is stored.
    {replica} The path in which the metadata of a replica shard is stored.
    Notice
    • To define a ReplicatedMergeTree table, configure the parameters in the preceding fixed formats or execute ReplicatedMergeTree(). If you execute ReplicatedMergeTree(), the values of the parameters are set to default values.
    • Make sure that you configure the parameters based on your business requirements. Otherwise, data may not be replicated as expected.

What do I do if a different query result is returned each time I execute the same statement?

The error can occur because of one of the following causes. A solution is provided for each cause.
  • Cause 1: The DDL statement that is used to create a table in an ApsaraDB for ClickHouse cluster is incorrect. A create table statement that is not used together with the on cluster default clause cannot be used to create a table in a distributed ApsaraDB for ClickHouse cluster. For example, if you create a table in a self-managed ClickHouse cluster by executing the create table statement, the table is created only on the server to which your client is connected even if you can query data from the table. If you connect your client to another server, you cannot find the table.

    Solution:
    1. To create a table, execute the create table <Table name> on cluster default statement. The on cluster default clause applies the statement to all servers in the cluster named default. You can use the following sample code:
      Create table test on cluster default (a UInt64) Engine = MergeTree() order by tuple();
    2. To create a distributed table named test_dis on a table named test, execute the following statement:
      Create table test_dis on cluster default as test Engine = Distributed(default, default, test, cityHash64(a));
  • Cause 2: A ReplicatedMergeTree table is incorrectly configured. The ReplicatedMergeTree table engine is developed based on the MergeTree table engine and supports data synchronization between replicated tables. Only MergeTree tables can be created in an ApsaraDB for ClickHouse cluster of the Single-replica Edition. Only ReplicatedMergeTree tables can be created in an ApsaraDB for ClickHouse cluster of the Double-replica Edition.

    Solution: ApsaraDB for ClickHouse allows you to synchronize data between replicated tables. Only ReplicatedMergeTree supports asynchronous replication. When you configure parameters for a ReplicatedMergeTree table, two parameters are related to paths in ZooKeeper. The following sample code provides an example:
    ReplicatedMergeTree('/clickhouse/tables/{database}/{table}/{shard}', '{replica}')
    The following table describes the parameters in the sample code.
    Parameter Description
    /clickhouse/tables/{database}/{table}/{shard} The only path in which the metadata of a shard is stored.
    {replica} The path in which the metadata of a replica shard is stored.
    Notice
    • To define a ReplicatedMergeTree table, configure the parameters in the preceding fixed formats or execute ReplicatedMergeTree(). If you execute ReplicatedMergeTree(), the values of the parameters are set to default values.
    • Make sure that you configure the parameters based on your business requirements. Otherwise, data may not be replicated as expected.

Why are the primary keys not merged after the OPTIMIZE statement is executed?

Before the OPTIMIZE statement is executed, make sure that the following conditions are met. This ensures that the logic used for primary key merges is valid.
  • When you define a table used for storage, the fields specified in PARTITION BY are included in the fields specified in ORDER BY. The primary keys cannot be merged across partitions.
  • When you define a distributed table, the fields to be hashed are included in the fields specified in ORDER BY. The primary keys cannot be merged across nodes.
The following table describes the common OPTIMIZE statements.
Statement Description
optimize table test; Attempts to merge the primary keys in some data parts of a MergeTree table. The system can return a result without merging primary keys. The execution of this statement does not ensure that all primary keys of the table are merged. In most cases, this statement is not used.
optimize table test partition tuple(); Merges the primary keys in all data parts of a specified partition. The system can return a result without merging primary keys. After this statement is executed, the primary keys in all data parts of the specified partition are merged into a composite primary key in a data part. If this statement is executed on a non-partitioned table, all primary keys of the table are merged. However, if data is written during the execution of this statement, the newly written data is not merged. If the partition that you specify contains only a data part, the execution of this statement does not take effect.
Note If a table does not have a partition key, the default partition of the table is partition tuple().
optimize table test final; Forcibly merges all primary keys in all partitions of a table. This statement can still be executed on a table partition that contains only a data part. You can execute this statement to forcibly remove expired data records. Among the three statements, the execution of the optimize table test final; statement consumes the most resources. After the optimize table test final; statement is executed, the system can return a result without merging primary keys.

When you execute one of the preceding statements, you can configure the optimize_throw_if_noop parameter. This parameter specifies whether to throw an exception if a statement does not perform primary key merges.

Why does TTL expiration not take effect after the OPTIMIZE statement is executed?

The issue can be caused by one of the following reasons. A solution is provided for each cause.
  • Cause 1: Time to live (TTL) eviction takes place when primary keys are being merged. If primary keys in data parts are not merged for a long period of time, the expired data cannot be deleted.

    Solution:
    • You can execute optimize table test final; or optimize table test partition tuple(); to trigger primary key merges.
    • When you create a table, you can configure parameters such as merge_with_ttl_timeout and ttl_only_drop_parts to increase the merge frequency of primary keys in data parts that contain expired data.
  • Cause 2: The TTL for a table is added or modified. This can cause incomplete or missing TTL information in existing data parts. This can also cause expired data to not be deleted.

    Solution:
    • You can execute the alter table materialize ttl statement to modify the TTL.
    • You can execute the optimize table test partition tuple(); statement to update the TTL.

Why is data not updated or deleted after the OPTIMIZE statement is executed?

ApsaraDB for ClickHouse asynchronously updates or deletes data. No measures can be taken to intervene in the progress of data updates or deletion. You can view the progress of each operation in the system.mutations table.

Why does the OPTIMIZE statement take a long time to be executed?

The execution of the OPTIMIZE statement consumes a large number of CPU resources and increases the disk throughput. High CPU utilization and high disk throughput degrade query performance and slow the execution of the OPTIMIZE statement. If the OPTIMIZE statement is executed on a node that is handling a heavy load, the execution of this statement takes a long time. No solution is available to this issue.

What do I do if the memory space consumed by a query exceeds the limit?

clickhouse-server equips each query thread with a memory tracker. The memory trackers of all threads for each query report the memory usage to a memory tracker named memory tracker for query. Then, memory tracker for query reports the obtained information to the upper-layer memory tracker named memory tracker for total. To resolve the issue, perform the following operations based on your scenario:
  • The query consumes more than 70 percent memory resources of your cluster and the message Memory limit (for query) exceeded is reported. In this case, we recommend that you scale up your cluster by upgrading the memory specifications.
  • More than 90 percent memory resources of your cluster are consumed and the message Memory limit (for total) exceeded is reported. In this case, you can reduce the number of concurrent queries. If the issue persists after you perform this operation, a large number of memory resources may be consumed by asynchronous tasks in the backend. In most cases, primary key merges after data writes consume a large number of resources. We recommend that you scale up your cluster by upgrading the memory specifications.
Note For more information about how to scale up a cluster, see Change configurations and scale resources.

How do I query the size of cold data?

You can use the following sample code:
select * from system.disks;

How do I query which data is in cold storage?

You can use the following sample code:
select * from system.parts where disk_name = 'cold_disk';

How do I migrate data in a partition to cold storage?

You can use the following sample code:
ALTER TABLE table_name MOVE PARTITION partition_expr TO DISK 'cold_disk';

What do I do if the maximum number of concurrent queries is exceeded?

By default, the maximum number of concurrent queries for clickhouse-server is 100. You can change the value in the ApsaraDB for ClickHouse console. To change the maximum number of concurrent queries, perform the following steps:
  1. Log on to the ApsaraDB for ClickHouse console.
  2. On the Clusters page, click the ID of the cluster that you want to manage.
  3. In the left-side navigation pane of the cluster details page, click Parameter Configuration.
  4. Change the value of the max_concurrent_queries parameter. Enter a valid value and click OK. Modify parameters
  5. Click Submit Parameters.
  6. Click OK.

What do I do if the timestamp data queried from a table is different from that written to the table?

Execute the SELECT timezone() statement to query the time zone of your cluster. If your cluster does not use the local time zone, set the timezone parameter to the local time zone. For more information, see Change parameter values.

Why is the time displayed in clickhouse-client different from that displayed in clickhouse-server and why is the time zone of clickhouse-client different from that of clickhouse-server?

The use_client_time_zone parameter is configured for the client and the parameter value is not set to the local time zone.

How do I find queries that consume a large number of CPU and memory resources?

You can view the query logs generated during the period of time when a large number of memory and CPU resources are consumed. The query logs are recorded in the system.query_log table. You can query the table to view the CPU and memory resources consumed by each query. For more information, see system.query_log.

How is the amount of time required for horizontal scaling affected?

If you perform horizontal scaling, data needs to be migrated. The amount of time required for horizontal scaling is based on the amount of cluster data to be migrated. A larger amount of data requires a longer time.

How is a cluster affected during cluster scaling?

When you scale a cluster, you can perform read-only operations on the cluster. This ensures data consistency after data is migrated.

What do I need to take note of during horizontal scaling?

Horizontal scaling requires a long period of time. If the cluster performance cannot meet your business requirements, we recommend that you scale up your cluster. For more information about how to scale up or scale down a cluster, see Change configurations and scale resources.

Why does my application fail to connect to an ApsaraDB for ClickHouse cluster?

This issue can be caused by one of the following reasons:
  • Your application may fail to connect to a cluster due to network issues. Check whether an issue exists in a virtual private cloud (VPC) or the Internet. If your application and the cluster are deployed in the same VPC, you can connect your application to the cluster over an internal network. If your application and the cluster are deployed in different VPCs, apply for a public endpoint for the cluster.
  • The IP address of your application is not added to the whitelist of the cluster.
  • The security group of the Elastic Compute Service (ECS) instance on which the cluster runs denies access traffic.
  • Your company uses a network firewall.

How do I handle a timeout in ApsaraDB for ClickHouse?

For more information, see Handle timeouts in ApsaraDB for ClickHouse.

Why is cluster monitoring data missing during a period of time?

This issue can be caused by one of the following reasons:
  • Data queries trigger an out of memory (OOM) error.
  • The configuration of your cluster is modified. This causes a cluster restart.
  • The specifications of your cluster are upgraded or downgraded. This causes a cluster restart.

What are the common system tables?

The following table describes the common system tables.
System table Description
system.processes Contains the SQL statements that are being executed.
system.query_log Contains the SQL statements that were executed.
system.merges Contains information about merges in a cluster.
system.mutations Contains information about mutations in a cluster.

How do I modify system parameters? After I modify system parameters, do I need to restart my cluster? What are the effects on the cluster?

The config.xml file contains system settings. You can configure the system parameters in the file. To modify a system parameter, perform the following steps:

  1. Log on to the ApsaraDB for ClickHouse console.
  2. On the Clusters page, click the ID of the cluster that you want to manage.
  3. In the left-side navigation pane of the cluster details page, click Parameter Configuration.
  4. Change the value of the max_concurrent_queries parameter. Enter a valid value and click OK. Modify parameters
  5. Click Submit Parameters.
  6. Click OK.

After you click OK, clickhouse-server is automatically restarted. This causes a transient connection for 1 minute.

How do I modify user parameters?

The users.xml file contains user settings. You can configure the user parameters in the file. To modify a user parameter, execute the following sample statement:
set global on cluster default ${key}=${value};

Unless otherwise specified, the new parameter value immediately takes effect when the statement is executed.

How do I modify a resource quota?

You can execute the following sample statement to increase a resource quota:
settings max_memory_usage = XXX;

How does each port work?

Port Description
3306 The port for communicating with clients over the TCP protocol.
8123 The port for communicating with clients over the HTTP protocol.

What is the relationship between a port and a driver?

Driver Port
Java 8123
Python 3306
Go

What are the recommended drivers for Go or Python?

For more information, see Client libraries from third-party developers.

What are the recommended business intelligence (BI) tools?

We recommend that you use Quick BI.

What are the recommended integrated development environments (IDEs) for data queries?

We recommend that you use DataGrip or DBeaver.

Does an ApsaraDB for ClickHouse cluster of V20.8 or later support smooth upgrades that do not require data migration?

Yes, this feature is supported.

Why does the execution of DDL statements take a long time or the system often get stuck?

If you execute global DDL statements, the statements are executed in serial mode. A complex query may cause a deadlock.

To resolve the issue, perform the following operations based on your scenario:
  • Wait until all DDL statements are executed.
  • Stop DDL statements from being executed in the ApsaraDB for ClickHouse console.
  • In the ApsaraDB for ClickHouse console, go to the cluster details page. In the left-side navigation pane, click Parameter Configuration. On the page that appears, modify a parameter and retain the original parameter value. Then, click Submit Parameters.
    Note For more information about how to change parameter values, see Change parameter values.
  • If the issue persists after you perform these operations, submit a ticket.