All Products
Search
Document Center

Elasticsearch:Upgrade a cluster

Last Updated:Oct 31, 2025

If an Elasticsearch (ES) cluster experiences consistently high CPU, memory, or disk utilization, or if its query and write performance cannot meet business needs, you can upgrade the cluster configuration. An upgrade lets you restore service stability by increasing the number of nodes, upgrading node specifications, adding disk space, or adding new node types.

Before you upgrade

Important

An upgrade operation can cause service latency, configuration conflicts, and billing changes. Read the following information carefully before you proceed.

  • Service stability

    • Service stability rules during a cluster change:

      Cluster

      Service status

      Action

      Normal load + Replicas exist

      Normal load: CPU ≤ 60%, heap memory ≤ 50%, load < number of cores

      Service continues. Performance may slightly decrease.

      No extra action is required.

      High load + No replicas

      High load: High concurrency for writes or queries during the upgrade, CPU > 60%, heap memory > 50%

      Occasional access timeouts

      • Enable the retry mechanism on the client.

      • Increase the number of index replicas before the upgrade.

      High load + Abnormal status

      Occasional access timeouts or jitter

      Fix the cluster status before changing the configuration.

    • Operation window: Perform the operation during off-peak hours.

  • Capacity planning

    Evaluate the required cluster capacity.

  • Configuration constraints

    • Upgrades do not support version changes.

    • An upgrade operation can change only one type of node at a time.

  • Cost impact

    After you submit an upgrade order, the system bills you based on the new configuration. For more information about billing rules, see Pay-as-you-go and Subscription.

Pre-upgrade checks

Important

Upgrading the cluster without completing the following checks can lead to a cluster crash, data loss, or service unavailability. You must check and verify each item.

  • Cluster health

    Run GET _cluster/health to ensure the cluster status is GREEN. If the status is not GREEN, see Cluster change error: Unhealthy cluster status.

  • Load safety

    Run GET _cat/nodes?v. The recommended CPU utilization is 60% or less. If it exceeds this value, enable the retry mechanism on the client and increase the number of index replicas.

  • Index readiness

    • Check for indices in the CLOSE state by running GET /_cat/indices?v. If any exist, temporarily open them by running POST /<index_name>/_open. Otherwise, the configuration change may fail for the following reasons:

      • If an index is in the CLOSE state, the cluster status cannot become GREEN. ES requires the cluster status to be GREEN before it performs certain sensitive configuration changes, such as adjusting shard allocation rules.

      • During a configuration change, the cluster reallocates shards:

        • Shards of a closed index cannot be reallocated.

        • This causes operations that depend on the GREEN status to fail.

        • This prevents the cluster status from reaching GREEN. The highest status it can reach is YELLOW.

    • Run GET _cat/indices?v to check if the number of replicas for each index is at least 1.

      For multi-zone instances, ensure that the number of replicas for any index in the cluster is less than the number of zones during the upgrade. For example, you can set the number of replicas to 1. After the upgrade is complete, you must manually increase the number of replicas.

  • Shard balance

    Run GET _cat/shards?v to check for unbalanced shards.

    Important

    Checking for a balanced shard distribution before an upgrade is a key step to prevent performance degradation or a cluster crash during or after the process.

    • prirep: Check if any replica shards (r) are UNASSIGNED.

    • state: Check if any shards are stuck in the RELOCATING state for a long time.

    These issues prevent new nodes from receiving shards correctly. This causes the cluster status to remain YELLOW or RED after the upgrade. If these issues exist, see Solutions for uneven cluster load to resolve them.

Method 1: Upgrade in the console

  1. On the Instances page, click Upgrade.

    Alternatively, on the Basic Information page, choose Configuration Change > Cluster Upgrade.

  2. On the Upgrade/Downgrade page, adjust the configuration parameters as needed.

    Important

    The available configuration parameters vary based on the cluster type and version. The actual parameters are displayed on the Upgrade/Downgrade page.

    • The following rules apply when you change the number of zones: If the inventory for a specific instance type is insufficient in a zone, you must migrate the nodes in that zone before you upgrade.

      Increase: You can increase the number of zones from one to two or three.

    • You can upgrade the node specifications (storage class). The following options are sorted by performance from lowest to highest:

      1. Previous-generation disks: basic disk, ultra disk, and standard SSD.

        Note

        These disks are being phased out in some regions and zones. When you select a disk type, we recommend that you choose enterprise SSDs (ESSDs).

      2. ESSD: ESSDs combine 25 Gigabit Ethernet (GbE) networks and Remote Direct Memory Access (RDMA) technology. They provide up to 1 million random read/write I/O operations per second (IOPS) per disk and low single-link latency.

      3. Local disks.

        Note

        A local disk is a local hard disk device on the physical server where an ECS instance resides. It provides local storage access for the ECS instance. Local disks are suitable for scenarios that require high storage I/O performance and cost-effective mass storage.

    • Smart Change (enabled by default): The system automatically selects the optimal change method based on the configuration items. You can manually disable this feature to specify a change method:

      Change method

      Principle

      Time required

      Service impact and scenarios

      Blue-green change

      Add new nodes → Copy data → Switch seamlessly

      Longer

      • Node IP addresses change. Cluster performance may fluctuate briefly.

      • Suitable for scenarios that are not sensitive to the change duration but require high cluster availability.

      In-place change

      Perform a rolling update of nodes (no data copy required).

      Shorter

      • Node IP addresses do not change. Cluster performance may fluctuate briefly.

      • Suitable for scenarios where the cluster has a performance bottleneck and a fast change is desired.

        Important

        If the resource utilization is high (for example, CPU > 60%), use in-place changes with caution.

    • Forced Change: Skips the health check but triggers a forced cluster restart. This may cause a prolonged service interruption. The recovery time depends on the data volume. Use this method only for emergency scale-outs when the cluster is already unavailable.

  3. Review and agree to the Terms Of Service and Service Level Agreement, and then click Buy Now. The system charges you based on the billing method.

    During the upgrade, the cluster status changes to Activating. Cluster performance may fluctuate briefly, and transient connections may occur. After the upgrade is complete, the cluster status changes to Normal.

Method 2: Upgrade by calling an API

For information about how to upgrade a cluster by calling an API, see UpdateInstance.

Monitor progress and verify after the upgrade

  • After the upgrade starts, you can view the progress in the console under Instances > Instance Basic Information.

    Click Show Details:

  • After the upgrade is complete, confirm that the new configuration is applied. On the Basic Information page of the cluster, check the following items:

    • The cluster status is Normal.

    • Zone

    • Number of nodes and storage specifications: Confirm that the new nodes have joined the cluster and that the storage specifications are correct.

    • Shard balance: Run GET _cat/allocation?v to check the shard distribution. If the shards are unbalanced, see Solutions for uneven cluster load to resolve the issue.

FAQ