Disaster recovery - ApsaraDB for Redis - Alibaba Cloud Documentation Center

ApsaraDB for Redis is a high-performance key-value database service. This service helps you store large amounts of important data in various scenarios. This topic describes the disaster recovery solutions provided by ApsaraDB for Redis.

Roadmap of disaster recovery for ApsaraDB for Redis

An ApsaraDB for Redis instance may fail due to unexpected reasons, such as a device or power failure in the data center. In this case, disaster recovery can help ensure data consistency and service availability. ApsaraDB for Redis provides a variety of disaster recovery solutions to meet the requirements in different business scenarios.

Disaster recovery solution	Protection level	Description
Single-zone HA solution	★★★☆☆	The master node and replica node are deployed on different machines in the same zone. If the master node fails, the high availability (HA) system performs a failover to prevent service interruption caused by a single point of failure (SPOF).
Zone-disaster recovery solution	★★★★☆	The master node and replica node are deployed in two different zones in the same region. If the zone in which the master node resides is disconnected due to force majeure factors such as a power or network failure, the HA system performs a failover to ensure continuous availability of the entire instance.
Cross-region disaster recovery solution	★★★★★	In the architecture of Global Distributed Cache for Redis, a distributed instance consists of multiple child instances that synchronize data among each other in real time by using synchronization channels. The channel manager monitors the health status of child instances and handles exceptions that occur on child instances, such as a switchover between the primary and secondary databases. Global Distributed Cache for Redis is applicable in scenarios such as geo-disaster recovery, active geo-redundancy, nearby application access, and load sharing.

Single-zone HA solution

All ApsaraDB for Redis instances support a single-zone HA architecture. The HA system monitors the health status of the master node and replica node and performs failovers to prevent service interruption caused by SPOFs.

Instance type	Description
Standard master-replica instance	Figure 2. HA architecture for a standard master-replica instance A standard master-replica instance runs in a master-replica architecture. If the HA system detects a failure on the master node, the system switches the workloads from the master node to the replica node and the replica node takes over the role of the master node. The original master node works as the replica node after recovery.
Master-replica cluster instance	Figure 3. HA architecture for a master-replica cluster instance In a master-replica cluster instance, data is stored on data shards. Each data shard consists of a master node and a replica node. The master node and replica node are deployed on different machines in an HA architecture. If the master node fails, the HA system performs a failover to ensure high service availability. For more information about the components of a cluster instance, see Cluster master-replica instances.
Read/write splitting instance	Figure 4. HA architecture for a read/write splitting instance The HA system monitors the health status of each node. If the master node fails, the HA system performs a failover between the master node and the replica node. If a read-only replica fails, the HA system creates another read-only replica to process read requests. During this process, the HA system updates the routing and weight information. Proxy servers monitor the status of each read-only replica in real time. If a read-only replica is in one of the following states, proxy servers stop routing traffic to the node: Abnormal: If a read-only replica is abnormal, proxy servers reduce the weight of traffic to the read-only replica. If the read-only replica fails to be connected for the specified number of times, the proxy servers stop routing traffic to the read-only replica until it is recovered. Full data synchronization: If proxy servers detect that full data is being synchronized on a read-only replica, the proxy servers stop routing traffic to the read-only replica until the synchronization is complete. For more information about the components of a read/write splitting instance, see Read/write splitting instances.

Zone-disaster recovery solution

ApsaraDB for Redis standard instances and cluster instances support zone-disaster recovery across two data centers. If your workloads are deployed in a single region and require disaster recovery, you can select the zones that support zone-disaster recovery when you create an ApsaraDB for Redis instance. For more information, see Step 1: Create an ApsaraDB for Redis instance.

Create a zone-disaster recovery instance

When you create a multi-zone instance, the master node and replica node with the same specifications are deployed in different zones. The master node synchronizes data to the replica node through a dedicated channel.

If a power failure or a network error occurs on the master node, the replica node takes over the role of the master node. The system calls an API operation on the configuration server to update routing information for proxy servers. In addition, ApsaraDB for Redis provides an optimized Redis synchronization mechanism. Similar to global transaction identifiers (GTIDs) of MySQL, ApsaraDB for Redis uses global operation identifiers (OpIDs) to indicate synchronization offsets and runs lock-free threads in the background to search for OpIDs. The system asynchronously synchronizes append-only file (AOF) binary logs (binlogs) from the master node to the replica node. You can throttle synchronization to ensure service performance.

Cross-region disaster recovery solution

As your business rapidly grows to cover a wide range of areas, the architecture of cross-region long-distance access causes a high latency that affects user experience. The Global Distributed Cache for Redis feature of Alibaba Cloud can help you reduce the high latency caused by cross-region access. Global Distributed Cache for Redis has the following benefits:

You can directly create child instances or specify the child instances that need to be synchronized without the need to implement redundancy in your business logic. This greatly reduces the complexity of business design and allows you to focus on the development of upper-layer business.
The geo-replication capability is provided for you to implement geo-disaster recovery or active geo-redundancy.

This feature applies to cross-region data synchronization scenarios and global business deployment in industries such as multimedia, gaming, and e-commerce. For more information, see Overview.

Architecture of Global Distributed Cache for Redis