Change the configurations of an ApsaraDB for ClickHouse Community-Compatible Edition cluster - ApsaraDB for ClickHouse

ApsaraDB for ClickHouse Community-Compatible Edition supports vertical scaling (scale up or scale down) and horizontal scaling (scale out or scale in). Vertical scaling changes node resources. Horizontal scaling changes the number of nodes.

Choose a scaling approach

Vertical scaling is faster and causes less disruption. If your cluster lacks CPU, memory, or disk resources, scale up first.

Approach	What changes	When to use	Impact	Duration
Scale up	Node specs, storage capacity, storage type, or ZooKeeper specs	CPU, memory, or disk resources are insufficient	Storage changes: none. Spec changes: cluster restarts.	Storage: immediate. Specs: 10-15 minutes.
Scale down	Node specs or ZooKeeper specs	Resources are over-provisioned	Same as scale up	10-15 minutes
Scale out	Adds nodes	More compute capacity needed; redistribute data across nodes	DDL operations blocked. CPU and memory usage increases during migration.	30+ minutes (depends on data volume)
Scale in	Removes nodes	Reduce costs by running fewer nodes	Same as scale out	30+ minutes (depends on data volume)

Storage space cannot be reduced through vertical scaling. To reduce storage, scale in a node (multi-node clusters) or create a new instance and migrate data (standalone clusters).

Prerequisites

The cluster is a Community-Compatible Edition cluster.
The cluster is in the Running status.
No unpaid renewal orders exist. To check, log on to the ApsaraDB for ClickHouse console. In the upper-right corner, choose Expenses > Expenses and Costs. In the left navigation pane, click Orders. Pay for or cancel outstanding orders.
For master-replica clusters: confirm that your application has a retry mechanism for transient connection interruptions during spec changes.
For standalone clusters: the cluster is fully unavailable during spec changes. Stop write operations before you start.

Billing

Scaling changes the cluster billing. The actual cost appears on the console during the operation. For details, see Billing for configuration changes.

Scale up or scale down (vertical scaling)

Vertical scaling changes node specifications, storage capacity, storage type, or ZooKeeper specifications.

Constraints

ZooKeeper specification changes are supported only for clusters created after December 1, 2021. For pricing, see Pricing for ZooKeeper specifications of the Community-Compatible Edition.
Upgrading storage capacity or storage type does not restart the cluster (applies only to clusters created after December 1, 2021). Changing cluster specifications or ZooKeeper specifications restarts the cluster.
Server Specifications or ZooKeeper Specifications cannot be changed at the same time as an Upgrade of Storage Capacity or Storage Type.
Changing ZooKeeper specifications during peak hours may cause inconsistencies between table metadata and actual data. Perform this change during off-peak hours or when write operations are stopped.

Impact by cluster type

Cluster type	Behavior during spec changes
Master-replica	Transient connection interruptions occur as requests switch between replicas. Schedule changes during off-peak hours.
Standalone	The cluster is unavailable for the entire upgrade. Schedule changes during off-peak hours or stop write operations first.

Procedure

Log on to the ApsaraDB for ClickHouse console.
In the upper-left corner, select the region where your cluster resides.
On the Clusters page, click the Clusters of Community-compatible Edition tab.
Find the target cluster. In the Actions column, click Change.
In the Change dialog box, select Scale Up or Scale Down, and click OK.
On the Upgrade/Downgrade page, select the desired configurations. By default, a cluster has a ZooKeeper service with 4 cores and 8 GB of memory. To check for resource bottlenecks, go to the Monitoring and Alerting page and view ZooKeeper metrics on the Cluster Monitoring panel. Upgrade the ZooKeeper specifications if the defaults are insufficient.
Click Buy Now and complete the payment.
On the The order is complete page, click Console.
Check the cluster status in the Status column of the Clusters of Community-compatible Edition list.
- Storage Capacity changes take effect immediately. The cluster status remains Running.
- Server Specifications or ZooKeeper Specifications changes take 10 to 15 minutes. The status changes from Changing Specification to Running when the operation completes.

Post-scaling behavior

After changing cluster or ZooKeeper specifications, the cluster restarts. Restart duration depends on the number of databases, tables, and the volume of cold data. High-frequency merge operations may run for a period after the change, increasing I/O usage and potentially increasing request latency. For information about estimating merge duration, see Calculate the merge duration after migration.

Scale out or scale in (horizontal scaling)

Horizontal scaling adds or removes cluster nodes. This involves data migration and requires additional preparation.

Scale-out methods

Method	Console label	Data migration	When to use
Scale-out with data migration	Migration Expansion (default)	Migrates and redistributes existing data across all nodes	Most scenarios. Ensures balanced data distribution.
Simple scale-out	Simple Expansion	No redistribution. New data goes to new nodes only.	Data was written directly to local tables or to distributed tables with a `rand` sharding key, and data balancing is not needed.

Important

Do not use simple scale-out for clusters with ReplacingMergeTree, CollapsingMergeTree, or VersionedCollapsingMergeTree tables. These engines merge data on the same node. Simple scale-out disperses data across nodes and prevents merges from completing correctly.

Scale-in methods

Method	Behavior	Data loss
Standard scale-in	Removes nodes randomly. Data is migrated and redistributed.	No
Scale-in by specifying nodes	Removes specified nodes. Available only for clusters using local disks.	Yes -- data on the removed nodes is lost.

Migration scope

During scale-out with data migration or standard scale-in, the following data is migrated to the new cluster configuration.

Supported:

Databases, dictionaries, and materialized views
Table schemas for all tables except Kafka and RabbitMQ engine tables
Data from MergeTree engine family tables (incremental migration)

Not supported:

Kafka and RabbitMQ engine tables and their data
Data from non-MergeTree tables (such as external tables and Log family engine tables)

Important

During scale-out or scale-in, data is migrated to a new instance and traffic is switched. To prevent Kafka and RabbitMQ data from being split, delete these engine tables from the source cluster before the operation. Recreate them after the operation completes.

Constraints

DDL operations are prohibited during the entire scale-out or scale-in process.
CPU and memory usage increases during the operation. Estimated overhead: less than 5 cores and 20 GB of memory per node.
After scaling, high-frequency merge operations continue for a period, increasing I/O usage and potentially increasing request latency. For information about estimating merge duration, see Calculate the merge duration after migration.
After scaling out, internal cluster node IP addresses change. If your application connects to specific node IP addresses, retrieve the VPC CIDR block again. For details, see Obtain the VPC CIDR block of a cluster.

Step 1: Handle Kafka and RabbitMQ engine tables

Skip this step if your cluster has no Kafka or RabbitMQ engine tables.

Log on to the cluster. For connection instructions, see Connect to a ClickHouse cluster using DMS.

Query for Kafka and RabbitMQ engine tables:

   SELECT * FROM `system`.`tables` WHERE engine IN ('RabbitMQ', 'Kafka');

Back up the CREATE TABLE statement for each table:
```
   SHOW CREATE TABLE <table_name>;
```
Delete the Kafka and RabbitMQ engine tables. > Important: When deleting a Kafka table, also delete the materialized views that reference it. Otherwise, the scale-out or scale-in operation fails.

Step 2: Back up data from non-MergeTree tables

Skip this step if your cluster has no non-MergeTree tables with data that needs to be preserved.

Identify non-MergeTree tables whose data requires migration:

   SELECT
       `database` AS database_name,
       `name` AS table_name,
       `engine`
   FROM `system`.`tables`
   WHERE (`engine` NOT LIKE '%MergeTree%')
     AND (`engine` != 'Distributed')
     AND (`engine` != 'MaterializedView')
     AND (`engine` NOT IN ('Kafka', 'RabbitMQ'))
     AND (`database` NOT IN ('system', 'INFORMATION_SCHEMA', 'information_schema'))
     AND (`database` NOT IN (
         SELECT `name`
         FROM `system`.`databases`
         WHERE `engine` IN ('MySQL', 'MaterializedMySQL', 'MaterializeMySQL',
                            'Lazy', 'PostgreSQL', 'MaterializedPostgreSQL', 'SQLite')
     ))

Back up the data from the identified tables. For instructions, see Back up data to OSS.

Step 3: Scale out or scale in from the console

Log on to the ApsaraDB for ClickHouse console.
In the upper-left corner, select the region where your cluster resides.
On the Clusters page, select the Clusters of Community-compatible Edition tab.
Find the target cluster. In the Actions column, click Change.
In the Change dialog box, select Scale Out or Scale In, and click OK.

In the check window, review the check status. The Scale Out window selects Migration Expansion by default. To use Simple Expansion, click Previous, select Simple Expansion in the Scale-out dialog box, and click Next. Common check failure reasons:

If the check passes, click Next.
If the check fails, fix the reported issues and click Retry Check. Click Next after the check passes.

Failure	Resolution
Missing unique distributed table	A local table has no corresponding distributed table. Create one.
Corresponding distributed table is not unique	A local table has more than one distributed table. Keep only one.
Kafka/RabbitMQ engine tables exist	Delete these tables first (see Step 1).
Non-replicated `*MergeTree` tables in a master-replica instance	Data is inconsistent between replicas. Resolve the inconsistency.
Distributed and local table columns are inconsistent	Align the column definitions.
Table is missing on some nodes	Create matching tables on all shards. For materialized view inner tables, rename the inner table and rebuild the materialized view. For details, see The inner table of a materialized view is inconsistent across shards.

On the Upgrade/Downgrade page, configure the Server Nodes count and the write suspension window. Write suspension window requirements:
- Set the write suspension time to at least 30 minutes.
- The operation must complete within 5 days. The end date of Stopping Data Writing for the source cluster must be no later than the current date plus 5 days.
- Set the write suspension window during off-peak hours to reduce business impact.
Click Buy Now and complete the payment.
On the The order is complete page, click Console.
Check the cluster status in the Status column of the Clusters of Community-compatible Edition list. The operation completes when the status changes from Scaling to Running. A scale-out or scale-in operation takes at least 30 minutes. The exact duration depends on data volume.

Step 4: Recreate Kafka and RabbitMQ engine tables

Skip this step if you did not delete any Kafka or RabbitMQ engine tables in Step 1.

Log on to the cluster and run the CREATE TABLE statements you backed up in Step 1. For connection instructions, see Connect to a ClickHouse cluster using DMS.

Step 5: Restore data to non-MergeTree tables

Skip this step if you did not back up any non-MergeTree table data in Step 2.

Log on to the cluster and import the data you backed up in Step 2. For instructions, see Import data from OSS.

Post-scaling behavior

Merge operations: High-frequency merge operations continue for a period after the operation, increasing I/O usage and potentially increasing request latency. For information about estimating merge duration, see Calculate the merge duration after migration.
IP addresses: After scaling out, internal cluster node IP addresses change. Update your application configuration if it connects to specific node IPs. For details, see Obtain the VPC CIDR block of a cluster.
Verify data: After the operation completes and the cluster status returns to Running, verify that your data and tables are intact.