Data is a core element of many businesses, and as data storage systems, databases bear a critical responsibility. ApsaraDB for Redis is a high-performance key-value database that is often used to store large volumes of important service data. This topic describes the disaster recovery mechanism used by ApsaraDB for Redis in detail.

Evolution of the disaster recovery architecture based on ApsaraDB for Redis

Some issues may occur when programs are running, such as software bugs, device faults, and power failures at data centers. An excellent disaster recovery mechanism can ensure data consistency and service availability in these cases. ApsaraDB for Redis improves the disaster recovery capability to ensure high availability (HA) of services, and provides high-availability solutions in various scenarios.

The following figure shows the evolution of the disaster recovery architecture based on ApsaraDB for Redis.

Figure 1. Evolution of the disaster recovery architecture based on ApsaraDB for Redis

All of these solutions are available. You can choose them as needed. The following sections describe these solutions in details.

Single-zone high-availability mechanism

All types of ApsaraDB for Redis instances support the single-zone high-availability architecture. An HA monitoring module uses an independent platform architecture, and provides a high-availability mechanism across zones. Therefore, ApsaraDB for Redis serves more stably than on-premises Redis systems.

Standard dual-replica edition

A standard dual-replica instance uses a master-replica architecture. When detecting a failure on a master node, the HA monitoring module automatically performs the failover operation. The replica node takes over services and becomes a master node, and upon recovery from the failure, the original master node works as a new replica node. The instances support data persistence by default, and allows automatic data replication. You can use the replication files to roll back or clone instances.

Figure 2. High-availability architecture of the master-replica instance

Master-replica instances

A Master-replica instance consists of a configuration server, multiple proxy servers, and multiple shard servers. These servers are described as follows:

  • The configuration server is a cluster management tool that provides global routing and configuration information. This server uses a triple-replica cluster architecture that follows the Raft protocol.
  • A proxy server uses a single-node architecture. A cluster edition contains multiple proxy servers. ApsaraDB for Redis performs load balancing and failover operations for all proxy servers.
  • A shard server uses a dual-replica high-availability architecture. Similar to a standard dual-replica instance, after the master node of the shard server fails, the HA module automatically performs the failover operation to ensure high availability of services, and updates information on the proxy servers and configuration server.
Figure 3. High-availability architecture of master-replica clusters

Zone-disaster recovery mechanism

The standard and cluster editions support zone-disaster recovery between two data centers. You can deploy your business in a single region, and require excellent disaster recovery. In this case, you need to select a zone that supports zone-disaster recovery when you create an ApsaraDB for Redis instance. For example, you can select Singapore Zone (B+C) as shown in the following figure.

Figure 4. Create a zone-disaster recovery instance

When you create instances that run in multiple zones, the replica instance at the replica data center is the same type of instance at the master data center. The instances at master and replica data centers synchronize data to each other through a specialized replication channel.

If power or network failures occur at the master data center, the replica instance takes over services and becomes the master instance. The system calls an operation on the configuration server to update routing information for the proxy server. The system performs the failover operation at the network layer according to the routing precision. In normal conditions, the system transmits data directly to the instance at the master data center through precise Classless Inter-Domain Routing (CIDR) blocks. However, the master data center does not upload routing details to the backbone when failures occur. The backbone only provides lower-precision CIDR blocks of the replica data center. The system has to route requests to the replica data center during failover.

ApsaraDB for Redis optimizes Redis synchronization mechanism. Similar to global transaction identifiers (GTIDs) of MySQL, ApsaraDB for Redis uses global operation identifiers (OpIDs) to indicate synchronization offsets and uses background lock-free threads to search OpIDs. The system synchronizes the append-only file (AOF) binary log (binlog) asynchronously. You can throttle this synchronization to ensure service performance.