For a multi-availability zone cloud-native instance, if the primary zone has two or more nodes (and the total number of nodes is three or more), the high availability (HA) system prioritizes failover within the primary zone when the master node fails. This feature prevents the increase in application access latency that results from a failover to a secondary zone.
This feature is enabled by default for cloud-native instances that use the standard or cluster architecture, and it requires no manual configuration. This topic uses the cluster architecture as an example.
Background information
By default, a multi-availability zone deployment places the master and replica nodes of each shard in a cluster instance into different availability zones within the same region. These zones are physically isolated locations with independent power and networking, which gives the instance high disaster recovery.
When the master node of a shard fails, the instance automatically triggers a failover to minimize the impact. Clients are typically deployed in the primary zone. Before a failover, the client and master node are in the same availability zone, resulting in the lowest possible access latency. The following diagram shows a typical three-shard cluster architecture.
When the master node of a shard fails, the high availability (HA) system promotes a replica node in the secondary zone to become the new master node. This forces the client to access the instance across availability zones, which significantly increases access latency.
Access latency between different availability zones is much higher than within a single availability zone. For details about the average latency between Alibaba Cloud availability zones, see Cloud Network Performance.
Because Tair (Redis OSS-compatible) is a high-performance, low-latency in-memory database, high network latency directly affects your business response times. Therefore, we recommend that you add an extra replica node to the primary zone of your cluster instance. This approach balances high performance and stability with high disaster recovery.
-
When the master node fails, the instance preferentially fails over to a replica node within the same availability zone. After the failover, the new master node remains in the primary zone, which prevents increased access latency.
-
If a zone-level failure occurs in the primary zone, the instance performs a cross-zone failover, providing high disaster recovery.
How it works
Tair (Redis OSS-compatible) allows you to configure 2 to 5 nodes for each shard of a cluster instance.
-
With 2 nodes, one node is deployed in the primary zone and the other in the secondary zone by default.
-
With 3 nodes, two nodes are deployed in the primary zone and one node is deployed in the secondary zone by default.
-
With 4 or 5 nodes, you can distribute the remaining nodes between the primary and secondary zones.
This topic uses a three-shard, three-node cluster architecture (two nodes in the primary zone and one node in the secondary zone) as an example, as shown in the following figure.
If the master node of a shard fails, the high availability (HA) system preferentially promotes the replica node in the primary zone to become the new master node. The client continues to access the instance within the same availability zone, which prevents increased access latency, as shown in the following figure.
Procedure
-
If you do not have an instance, create a multi-availability zone, cloud-native instance. For more information, see Create an instance.
To ensure the primary zone has at least two nodes, configure the following settings: On the settings page, configure Intelligent Read/Write Splitting (You can select Disabled or Enabled. This feature is not available for cluster architecture instances in proxy-less direct connection mode). Select the Number of Nodes. For example, 3 Nodes provides one master and two replica nodes for each shard in a cluster architecture. If read/write splitting is enabled, replica nodes can also serve read requests. Set the number of nodes for Node Allocation (Primary AZ). The number of nodes in the secondary zone is calculated as: Number of Nodes - Nodes in Primary AZ. We recommend configuring at least two nodes in both the primary and secondary zones, for a total of at least four nodes.
-
If you have an existing single-availability zone, cloud-native instance, first migrate the instance to a multi-availability zone deployment. Then, on the Add or Remove Replica Nodes page, increase the number of nodes in the primary zone to at least two.
-
If you have an existing multi-availability zone, cloud-native instance, increase the number of nodes in the primary zone to at least two on the Add or Remove Replica Nodes page. In the Node Distribution (Availability Zone) section, click Modify in the Actions column for the primary zone.
-
If you have an instance that uses the Classic deployment mode, create a new instance that meets the preceding requirements and migrate data to the new instance by using DTS to perform one-way synchronization in the same Alibaba Cloud account.