Data serves as a key component for most businesses. Databases are used to store data and play a key role in data management. ApsaraDB for Redis is a high-availability (HA) key-value database service. This service helps you store large amounts of important data in various scenarios. This topic describes disaster recovery solutions provided by ApsaraDB for Redis.

Roadmap of ApsaraDB for Redis for disaster recovery

A variety of problems may occur during data management, such as software bugs, device malfunctions, or power failures at data centers. A disaster recovery solution guarantees data consistency and service availability. ApsaraDB for Redis provides optimized disaster recovery solutions to achieve high availability in different scenarios.

The following figure shows the roadmap of ApsaraDB for Redis for disaster recovery.

Figure 1. Disaster recovery solutions

Three solutions are available in ApsaraDB for Redis to meet different requirements. The following sections describe these solutions in detail.

Single-zone high availability

All ApsaraDB for Redis instances support a single-zone HA architecture. The HA system runs on an independent platform to guarantee high availability across zones. Compared with self-managed Redis databases, ApsaraDB for Redis enables more stable database management.

Standard master-replica instances

A standard master-replica instance runs in a master-replica architecture. If the HA system detects a failure on the master node, the system switches the workloads from the master node to the replica node and the replica node takes over the role of the master node. The original master node works as the replica node after recovery. By default, data persistence is enabled for the instance. The system automatically creates backup files on the instance. You can use the backup files to roll back or clone the instance. This mechanism avoids data loss caused by user errors and enables reliable disaster recovery.

Figure 2. High-availability solution of a standard master-replica instance

Master-replica cluster instances

A master-replica cluster instance consists of a configuration server, multiple proxy servers, and multiple data shards.

  • The configuration server is a cluster management tool that provides global routing and configuration information. This server uses a cluster architecture with three replica nodes and follows the Raft protocol.
  • A proxy server runs in a standalone architecture. A cluster contains multiple proxy servers. The cluster automatically balances loads and performs failovers among these proxy servers.
  • A data shard runs in a master-replica high-availability architecture. Similar to a standard master-replica instance, if the master node fails, the HA system performs a failover to ensure high availability, and updates the information on the proxy servers and configuration server.
Figure 3. High-availability solution of a master-replica cluster

Zone-disaster recovery

Standard instances and cluster instances support zone-disaster recovery across two data centers. If your workloads are deployed in a single region and require disaster recovery, you can select the zones that support zone-disaster recovery when you create an ApsaraDB for Redis instance. For example, you can select China (Hangzhou) Zone (B+F) or China (Hangzhou) (G+H) from the Zone drop-down list in the console.

Figure 4. Create a zone-disaster recovery instance

When you create a multi-zone instance, the master node and replica node are deployed in different zones and provided with the same specifications. The master node synchronizes data to the replica node through a dedicated channel.

If a power failure or a network error occurs on the master node, the replica node takes over the role of the master node. The system calls an API operation on the configuration server to update routing information for proxy servers. The underlying network performs a failover based on the precision of the routing information available in a backbone network. The master node provides more specific CIDR blocks than the replica node. In normal conditions, the system transmits requests to the master node through specific CIDR blocks. If the master node fails, the master node does not upload routing information to the backbone network. The backbone network only provides less specific CIDR blocks of the replica node. The system routes requests to the replica node based on the available routing information.

ApsaraDB for Redis provides an optimized Redis synchronization mechanism. Similar to global transaction identifiers (GTIDs) of MySQL, ApsaraDB for Redis uses global operation identifiers (OpIDs) to indicate synchronization offsets and runs lock-free threads in the background to search OpIDs. The system asynchronously synchronizes append-only file (AOF) binary logs (binlogs) from the master node to the replica node. You can throttle synchronization to ensure service performance.

Geo-disaster recovery

ApsaraDB for Redis supports the Redis Global Replica solution. You can use a global replica instance of ApsaraDB for Redis to run multiple child instances simultaneously across regions worldwide. These child instances exchange data in real time. Different from earlier disaster recovery solutions, child instances work as master nodes at the same time in this solution.
Note Redis Global Replica is tested on the Alibaba Cloud China site. Other Alibaba Cloud sites do not support this solution.
The global replica instance of ApsaraDB for Redis consists of multiple child instances, multiple synchronization channels, and a channel manager.
  • A child instance is a basic service unit on the global replica instance. All child instances can process read and write requests.
  • The synchronization channels enable a two-way synchronization between child instances in real time. If synchronization is interrupted, the system can resume the synchronization from the last breakpoint within a few days after the interruption.
  • The channel manager controls the lifecycle of the synchronization channels. If a child instance fails, the channel manager switches the workloads from the failed child instance to another child instance, and creates a new child instance. This failover process guarantees high availability of your workloads.
Note The global replica instance asynchronously replicates data among child instances to minimize the impact on the service performance.

When you manage the global replica instance, you can set the failover feature on your application. If a failure occurs on a child instance in a region, the system switches the workloads from the failed child instance to a child instance in another region to ensure service availability.

ApsaraDB for Redis provides multiple disaster recovery solutions to enable high availability at the instance level, zone level, and region level. You can select a solution to meet your business requirements.