If an Elasticsearch (ES) cluster experiences consistently high CPU, memory, or disk utilization, or if its query and write performance cannot meet business needs, you can upgrade the cluster configuration. An upgrade lets you restore service stability by increasing the number of nodes, upgrading node specifications, adding disk space, or adding new node types.
Before you upgrade
An upgrade operation can cause service latency, configuration conflicts, and billing changes. Read the following information carefully before you proceed.
Service stability
Service stability rules during a cluster change:
Cluster
Service status
Action
Normal load + Replicas exist
Normal load: CPU ≤ 60%, heap memory ≤ 50%, load < number of cores
Service continues. Performance may slightly decrease.
No extra action is required.
High load + No replicas
High load: High concurrency for writes or queries during the upgrade, CPU > 60%, heap memory > 50%
Occasional access timeouts
Enable the retry mechanism on the client.
Increase the number of index replicas before the upgrade.
High load + Abnormal status
Occasional access timeouts or jitter
Fix the cluster status before changing the configuration.
Operation window: Perform the operation during off-peak hours.
Capacity planning
Evaluate the required cluster capacity.
Configuration constraints
Upgrades do not support version changes.
An upgrade operation can change only one type of node at a time.
Cost impact
After you submit an upgrade order, the system bills you based on the new configuration. For more information about billing rules, see Pay-as-you-go and Subscription.
Pre-upgrade checks
Upgrading the cluster without completing the following checks can lead to a cluster crash, data loss, or service unavailability. You must check and verify each item.
Cluster health
Run
GET _cluster/healthto ensure the cluster status is GREEN. If the status is not GREEN, see Cluster change error: Unhealthy cluster status.Load safety
Run
GET _cat/nodes?v. The recommended CPU utilization is 60% or less. If it exceeds this value, enable the retry mechanism on the client and increase the number of index replicas.Index readiness
Check for indices in the CLOSE state by running
GET /_cat/indices?v. If any exist, temporarily open them by runningPOST /<index_name>/_open. Otherwise, the configuration change may fail for the following reasons:If an index is in the CLOSE state, the cluster status cannot become GREEN. ES requires the cluster status to be GREEN before it performs certain sensitive configuration changes, such as adjusting shard allocation rules.
During a configuration change, the cluster reallocates shards:
Shards of a closed index cannot be reallocated.
This causes operations that depend on the GREEN status to fail.
This prevents the cluster status from reaching GREEN. The highest status it can reach is YELLOW.
Run
GET _cat/indices?vto check if the number of replicas for each index is at least 1.For multi-zone instances, ensure that the number of replicas for any index in the cluster is less than the number of zones during the upgrade. For example, you can set the number of replicas to 1. After the upgrade is complete, you must manually increase the number of replicas.
Shard balance
Run
GET _cat/shards?vto check for unbalanced shards.ImportantChecking for a balanced shard distribution before an upgrade is a key step to prevent performance degradation or a cluster crash during or after the process.
prirep: Check if any replica shards (r) areUNASSIGNED.state: Check if any shards are stuck in theRELOCATINGstate for a long time.
These issues prevent new nodes from receiving shards correctly. This causes the cluster status to remain YELLOW or RED after the upgrade. If these issues exist, see Solutions for uneven cluster load to resolve them.
Method 1: Upgrade in the console
On the Instances page, click Upgrade.
Alternatively, on the Basic Information page, choose .
On the Upgrade/Downgrade page, adjust the configuration parameters as needed.
ImportantThe available configuration parameters vary based on the cluster type and version. The actual parameters are displayed on the Upgrade/Downgrade page.
The following rules apply when you change the number of zones: If the inventory for a specific instance type is insufficient in a zone, you must migrate the nodes in that zone before you upgrade.
Increase: You can increase the number of zones from one to two or three.
You can upgrade the node specifications (storage class). The following options are sorted by performance from lowest to highest:
Previous-generation disks: basic disk, ultra disk, and standard SSD.
NoteThese disks are being phased out in some regions and zones. When you select a disk type, we recommend that you choose enterprise SSDs (ESSDs).
ESSD: ESSDs combine 25 Gigabit Ethernet (GbE) networks and Remote Direct Memory Access (RDMA) technology. They provide up to 1 million random read/write I/O operations per second (IOPS) per disk and low single-link latency.
Local disks.
NoteA local disk is a local hard disk device on the physical server where an ECS instance resides. It provides local storage access for the ECS instance. Local disks are suitable for scenarios that require high storage I/O performance and cost-effective mass storage.
Smart Change (enabled by default): The system automatically selects the optimal change method based on the configuration items. You can manually disable this feature to specify a change method:
Change method
Principle
Time required
Service impact and scenarios
Blue-green change
Add new nodes → Copy data → Switch seamlessly
Longer
Node IP addresses change. Cluster performance may fluctuate briefly.
Suitable for scenarios that are not sensitive to the change duration but require high cluster availability.
In-place change
Perform a rolling update of nodes (no data copy required).
Shorter
Node IP addresses do not change. Cluster performance may fluctuate briefly.
Suitable for scenarios where the cluster has a performance bottleneck and a fast change is desired.
ImportantIf the resource utilization is high (for example, CPU > 60%), use in-place changes with caution.
Forced Change: Skips the health check but triggers a forced cluster restart. This may cause a prolonged service interruption. The recovery time depends on the data volume. Use this method only for emergency scale-outs when the cluster is already unavailable.
Review and agree to the Terms Of Service and Service Level Agreement, and then click Buy Now. The system charges you based on the billing method.
During the upgrade, the cluster status changes to Activating. Cluster performance may fluctuate briefly, and transient connections may occur. After the upgrade is complete, the cluster status changes to Normal.
Method 2: Upgrade by calling an API
For information about how to upgrade a cluster by calling an API, see UpdateInstance.
Monitor progress and verify after the upgrade
After the upgrade starts, you can view the progress in the console under Instances > Instance Basic Information.
Click Show Details:
After the upgrade is complete, confirm that the new configuration is applied. On the Basic Information page of the cluster, check the following items:
The cluster status is Normal.
Zone
Number of nodes and storage specifications: Confirm that the new nodes have joined the cluster and that the storage specifications are correct.
Shard balance: Run
GET _cat/allocation?vto check the shard distribution. If the shards are unbalanced, see Solutions for uneven cluster load to resolve the issue.
FAQ
Does Alibaba Cloud ES support version upgrades or downgrades?
After I change the number of nodes, does the cluster automatically rebalance the shards?
Does changing the cluster configuration affect the ES service?
What do I do if I selected the wrong configuration when purchasing an ES instance?
When upgrading a cluster, I receive the message "UpgradeVersionMustFromConsole". What should I do?
What do I do if an error or timeout occurs when upgrading a cluster?
Will changing the disk type of an ES instance cause data loss?
Can I directly upgrade the CPU for an ES instance to avoid data migration?