For applications that require high availability, a single-zone deployment is a single point of failure, as an outage in its physical data center can cause a complete service disruption. A cross-zone deployment improves reliability by distributing the nodes of an Alibaba Cloud Elasticsearch instance across multiple, physically isolated availability zones within the same region. This architecture provides data center-level disaster recovery. If one availability zone fails, the cluster remains operational using nodes and replica shards in the other zones, ensuring business continuity.
How it works
A cross-zone deployment uses Elasticsearch's built-in shard allocation awareness feature.
When you create a multi-availability zone instance, the system automatically adds a property named zone_id to nodes deployed in different availability zones and configures the cluster with cluster.routing.allocation.awareness.attributes: zone_id to instruct Elasticsearch to consider this node property when allocating shards.
This mechanism ensures that the primary and replica shards of an index are distributed across different availability zones. If an entire availability zone fails, a copy of its shards exists in the other zones, ensuring data redundancy and service availability.
Deployment modes
Select a deployment mode that matches your availability requirements and budget.
Deployment mode | Architecture | Disaster recovery | Use cases |
Single availability zone | All nodes are located in a single availability zone. | A failure in the availability zone causes a complete service outage. | Non-critical workloads such as development and testing. |
Two availability zones | Nodes are distributed across two availability zones. | Service continues if one availability zone fails. | Production environments with high availability requirements. |
Three availability zones | Nodes are distributed across three availability zones. | Service continues if one availability zone fails. | Core production applications with very high availability requirements. |
Create a cross-zone instance
Go to the Create Alibaba Cloud Elasticsearch instance page.
In the Number of Availability Zones section, select two or three availability zones.
Node count constraint: The number of data nodes, cold data nodes, or coordinating nodes must be a multiple of the number of selected availability zones to ensure even distribution.
Dedicated master nodes: You must purchase three dedicated master nodes to ensure the stability of a multi-zone architecture.
The availability zone you select in the console (for example, Zone A) serves as the primary access point for the cluster. The system automatically and evenly distributes the nodes across the specified number of availability zones based on real-time resource availability. For example, if you select two availability zones, the system might deploy nodes in Zone A and Zone B.
Upgrade to a multi-zone deployment (V3 clusters only)
Before you upgrade, verify the following conditions:
Run
GET _cluster/healthto ensure the cluster health status is GREEN. If the cluster is not healthy, see Cluster modification error: Unhealthy cluster status to resolve the issue.Optimize client connection distribution. Avoid concentrating long-lived connections in a single availability zone, which can exhaust resources on nodes in that zone while leaving nodes in other zones idle. You can optimize the connection distribution by setting a connection validity period, restarting clients in batches, or using a separate coordinating node. For more information, see Analysis and solutions for uneven cluster load.
Run
GET _cluster/settingsand confirm that the output includes"cluster.routing.allocation.enable": "all", which allows Elasticsearch to automatically allocate shards. If the setting is different, run the following command to enable automatic shard allocation.PUT _cluster/settings { "transient": { "cluster.routing.allocation.enable": "all" } }
On the Instance List page, click Upgrade.
Alternatively, go to the Basic Information page and click .
On the configuration page, in the Number of Availability Zones area, select two or three availability zones, and then complete the payment.
During the upgrade, the system automatically enables dedicated master nodes (if not already enabled) and may add data nodes to meet the even distribution requirement for the selected number of availability zones. These new nodes will incur additional fees. Refer to your bill for details.
For example, if you upgrade a single-zone instance with two data nodes to a three-zone deployment, the system automatically adds one data node. This brings the total to three, ensuring one data node can be allocated to each availability zone.
Migrate an availability zone
If you need to upgrade your cluster but the current availability zone has insufficient resources, you can migrate the nodes to a new availability zone with adequate resources before performing the upgrade.
Migrating an availability zone triggers a cluster restart. The cluster remains available during the restart, but service instability may occur. Perform this operation during off-peak hours.
Before you migrate, verify the following conditions:
Run
GET _cluster/healthto ensure the cluster health status is GREEN. If the cluster is not healthy, see Cluster modification error: Unhealthy cluster status to resolve the issue.Run
GET /_cat/indices?vto check for any closed indices. If any indices are closed, you must temporarily open them by runningPOST /<index_name>/_open. Otherwise, the upgrade may fail, because the cluster health status cannot be GREEN if indices are closed.Run
GET _cluster/settingsand confirm that the output includes"cluster.routing.allocation.enable": "all", which allows Elasticsearch to automatically allocate shards. If the setting is different, run the following command to enable automatic shard allocation.PUT _cluster/settings { "transient": { "cluster.routing.allocation.enable": "all" } }
Perform the migration:
Go to the Basic Information page of the target instance. In the node visualization area, hover over the availability zone you want to migrate and click Migrate.
In the dialog box that appears, select the destination availability zone and vSwitch. You can only migrate one availability zone at a time.
Agree to the data migration service agreement and click Confirm.
After you confirm, the cluster will restart. During this process, you may experience brief performance fluctuations. The system first provisions new master nodes in the destination availability zone, so the old and new availability zones will coexist temporarily.
After the migration is complete, the cluster returns to normal. However, the console (on the instance information or configuration pages) might still show the old availability zone due to a display delay. This does not affect the cluster's operation in the new availability zone. Note that the node IP addresses will change.
Availability zone failover and recovery
If you detect a failed availability zone, you can perform a failover to redirect client traffic to the remaining zones. After the failed zone recovers, you can recover it to rejoin the cluster.
Failover (isolate a failed zone)
In the node visualization area of your instance, hover over the availability zone you want to isolate and click Failover.
In the dialog box that appears, click Confirm.
ImportantAn availability zone failover isolates all nodes in the affected zone. After the failover, requests are handled only by nodes in the remaining zones. The system attempts to provision additional resources in the remaining zones to compensate, but success is not guaranteed due to factors like resource availability and scheduling concurrency. Monitor your cluster load and implement traffic throttling measures if necessary.
If your indices were configured with replicas before the failover, but the cluster status is YELLOW (unhealthy) after the failover, you can connect to the cluster through Kibana and run the following command. This forces the reallocation of shards from the failed zone to the remaining zones.
PUT /_cluster/settings { "persistent" : { "cluster.routing.allocation.awareness.force.zone_id.values" : {"0": null, "1": null, "2": null} } }After the shards are reallocated, the cluster health status returns to GREEN.
Recovery (rejoin a zone)
After confirming that the failed availability zone has recovered, hover over the offline zone in the node visualization area and click Recovery.
In the dialog box that appears, click Confirm. The cluster will restart. After the recovery, any temporary nodes added during the failover process are removed, and the cluster architecture returns to its original state.