ApsaraDB for Redis read/write splitting instances support multiple read replicas, providing high-performance service for more-reading and less-writing scenarios.

Background

In ApsaraDB for Redis, whether in the master-replica edition or the cluster edition, replica serves as a standby database and does not provide external services. When high availability is enabled and the primary master fails, the replica can be promoted to the master to take over read and write operations. In this architecture, read and write requests are completed on the master node with high consistency, but the performance is limited by the number of master nodes. Often, even when the user data is small, the cluster specification still needs to be updated because the traffic and the concurrency is too high.

In business scenarios where there are more reads than writes, ApsaraDB for Redis provides a read/write splitting specification that is transparent, flexible, highly available, and high-performance. This specification helps users minimize the cost.

Archietecture

Redis cluster mode has several roles, including redis-proxy, Master, replica, and HA. In a read/write splitting instance, the read-only replica role is added to take over the read traffic. The replica serves as a hot standby and does not provide services. This architecture remains compatible with existing cluster specifications. The proxy forwards the read and write requests to the master node or a read-only replica accordingly by weight. The highly available (HA) cluster is responsible for monitoring the health status of nodes. When an exception occurs, the replica will take over or the read-only replica will be rebuilt to perform critical operations, and the route will be updated.

Typically, according to the data synchronization methods of master nodes and read-only replicas, there are two replication types: star replication and cascading replication.

Star replication

In the star replication, data volumes are replicated on multiple nodes in parallel. Since the master node is connected to all other read-only replica nodes, there is no need to failover a replica node in the event of a failure thus reducing the duration of recovery.

Redis uses a single-thread and single-process model. The data replication between the master node and the replica node is processed in the main thread. The CPU utilization on the master node due to data synchronization increases with the number of read-only replicas. Therefore, the write performance of the cluster is diminished by the increasing number of read-only replica nodes. In the star replication, the outbound bandwidth of the master node also increases with the number of read-only replicas. The tradeoffs between these two replication types is one of latency and throughput. Due to the high CPU utilization on the master node and the heavy network load, the low-latency star replication delivers lower throughput than the cascading replication. The performance of the entire cluster is limited by the master node.

Cascading replication

All read-only replica nodes are replicated sequentially on intermediate and tail nodes, as shown in the following figure. The master node only needs to synchronize the data to the replica node and the first read-only replica on the replication chain.

Cascading replication solves the extension problem of star replication. In theory, the number of read-only replicas can increase infinitely, and the performance of the entire cluster will increase accordingly.

In a chain replication, the longer the replication chain, the greater the delay between the original master node and the read-only replica at the end of the chain.This shortcoming is usually acceptable, since that the read/write splitting is mainly used in scenarios that have low requirements on consistency. However, if a node in the replication chain fails, all data on the downstream nodes will be delayed significantly. What's worse, this may lead to a full synchronization that is passed to the end of the replication chain, and reduce the service performance. To solve this problem, the Redis read/write splitting uses an optimized binlog replication provided by Alibaba Cloud to minimize the probability of full synchronization.

In light of the preceding discussions and comparisons, Redis chooses a cascading replication architecture for read/write splitting.

Advantages of Redis read/write splitting

Transparent and compatible

Redis read/write splitting uses redis_proxy to forward requests. There are certain restrictions on the use of multi-sharding commands. This feature is fully compatible with the upgrade from the master-replica edition to the single-sharding read/write splitting, and the upgrade from the cluster specification to the multi-sharding read/write splitting.

The user establishes a connection with redis-proxy, a Redis proxy that supports read/write splitting. The proxy recognizes whether the request sent by the client is read or write, and then performs load balancing according to the weight. The proxy forwards write requests to the master and read requests to the read-only replica. The master also supports read requests by default, which can be controlled by weight.

You can purchase instances of read/write splitting specifications and use them directly with any client, with no modification to the business. You can enjoy an improved service performance almost at no cost.

Highly available

The high availability module (HA) monitors the health of all nodes to ensure instance availability. If the master node fails, the HA module redirects the requests to a new master node. If a read-only replica fails, the HA module can detect it promptly, create a new read-only replica, and turn the failed node offline.

In addition to the HA module, redis-proxy can also detect the state of each read-only replica in real time. During a read-only replica failure, redis_proxy automatically reduces the weight of this node. If a read-only replica fails multiple times, redis-proxy will temporarily block this node. After the node recovers, its weight will be resumed to a normal level.

HA and redis_proxy work together to minimize the business awareness of backend exceptions and improve service availability.

High performance

In business scenarios where there are more reads than writes, using the cluster edition directly is not the best solution. The read/write splitting provides more options, and you can choose the best specification based on the business scenario to make full use of the read-only replicas.

Multiple specifications are available: 1 master + 1 read-only replica, 1 master + 3 read-only replicas, and 1 master + 5 read-only replicas. You can submit a ticket if you need a different specification. This service provides 0.6 million QPS and 192 MB/s service capability. This service breaks the resource limit of a single machine since it is fully compatible with all commands. In the following versions, there will be no specification limit, and users can increase or decrease the number of read-only replicas based on the business traffic.

Specification QPS Bandwidth
1 master 80 to 100 thousand reads and writes 10 to 48 MB
1 master + 1 read-only replica 0.1 million writes + 0.1 million reads 20 to 64 MB
1 master + 3 read-only replicas 0.1 million writes + 0.3 million reads 40 to 128 MB
1 master + 5 read-only replicas 0.1 million writes + 0.5 million reads 60 to 192 MB

Concluding remarks

The asynchronous replication of the Redis master-replica edition may read old data from the read-only replica, so read/write splitting feature requires the business to tolerate a certain degree of data inconsistency. The following editions will grant users more flexibility in parameter configurations, such as the allowed maximum delay time.