ApsaraDB for Redis can monitor the health status of nodes. If a master node in an instance becomes unavailable, ApsaraDB for Redis automatically triggers a master-replica switchover. The roles of master and replica nodes are switched over to ensure the high availability of the instance. ApsaraDB for Redis allows a master-replica switchover to be manually triggered. This feature can be applied to disaster recovery drills and access to nearby nodes that are deployed in different zones.
- Manual switchover
A master-replica switchover is manually performed by you or an authorized Alibaba Cloud technical expert. For more information, see Manually switch workloads from a master node to a replica node.
- Risk mitigation
Alibaba Cloud automatically detects vulnerabilities in an ApsaraDB for Redis instance. These vulnerabilities may cause the ApsaraDB for Redis instance unable to run as expected. In this case, ApsaraDB for Redis fixes the vulnerabilities and performs a master-replica switchover during a specified maintenance window.
You can find the events that are triggered under the preceding conditions in logs. For more information, see Query history events. You can also manage pending events of master-replica switchovers. For more information, see Query and manage pending events.
- Instance failure
Alibaba Cloud detects failures in an ApsaraDB for Redis instance. These failures cause the ApsaraDB for Redis instance unable to run as expected. In this case, ApsaraDB for Redis performs a master-replica switchover to switch your workloads to the replica nodes. This minimizes the impacts of the failures.
You are notified of such events with internal messages in the following format:
[Alibaba Cloud] Dear ******: Your ApsaraDB for Redis instance r-bp1zxszhcgatnx**** (name: ****) has an error. A switchover is triggered to ensure that your instance runs as expected. We recommend that you check whether your application is still connected to your instance and configure your application to automatically reconnect to the instance.
||Make sure that your applications are configured to automatically reconnect to the
instance or handle exceptions. Otherwise, one of the following error messages may
be returned during a switchover:
- Q: What is the principle behind the master-replica switchover triggered by an instance
A: The detection mechanism of the High Availability (HA) system is used to detect failures. The following table describes the detection mechanism.
Event Description Health check The HA system checks whether master and replica nodes are healthy. Master node failure
- When a master node is determined to be unavailable, a replica node acts as the master node. At the same time, the virtual IP address (VIP) of the master node is switched to the replica node.
- Another replica node is created to ensure data synchronization.
Replica node failure When a replica node is determined to be unavailable, another replica node is created to ensure data synchronization and maintain the data persistence of the master-replica architecture.Note Some data that was recently written to a master node may be lost because the synchronization between the master and replica nodes is asynchronously implemented.
- Q: Does a master-replica switchover affect the use of read replica nodes in read/write splitting instances?
A: A master-replica switchover does not affect the use of read-only nodes in read/write splitting instances.
- Q: Does a master-replica switchover triggered for a specific data shard in an instance
affect the instance as a whole if the instance is a cluster master-replica instance?
A: The instance as a whole is not affected. Only the data shard is affected. For more information, see Impacts.