All Products
Search
Document Center

Tair:Read/write splitting architecture

Last Updated:Sep 20, 2024

Tair introduces the read/write splitting architecture to handle read-heavy workloads. This architecture offers a high level of availability, performance, and flexibility in delivering read/write splitting services. This architecture allows a large number of clients to concurrently read hot data from read replicas. Additionally, read/write splitting instances use the proxy component developed by the Alibaba Cloud Tair team to provide services such as data distribution and failover. This reduces O&M costs.

Components

A read/write splitting instance contains a master node, multiple read replicas, multiple proxy nodes, and a high availability (HA) system.

Figure 1. Cloud-native read/write splitting architecture 云盘读写分离版

Figure 2. Classic read/write splitting architecture (retired) 本地盘读写分离版

Component

Cloud-native read/write splitting architecture (recommended)

Classic read/write splitting architecture

Master node

The master node processes all write requests. It also processes specific read requests together with read replicas.

Read replica

Read replicas handle read requests and have the following benefits:

  • All read replicas can be used as replica nodes to back up data and ensure disaster recovery.

  • Read replicas synchronize data from the master node by using star replication. Therefore, the synchronization latency of cloud-native instances is far less than that of classic instances that use chained replication.

  • You can adjust the number of read replicas in a read/write splitting instance within the range of 1 to 9.

Read replicas handle read requests and have the following benefits:

  • Read replicas use chained replication. If your instance contains a large number of read replicas, the read replicas located at the end of the chain experience higher latency.

  • The number of read replicas can be set to 1, 3, or 5.

Replica node

No replica nodes are provided. Read replicas are used as replica nodes. If the master node fails, requests are switched to a random read replica.

Cloud-native read/write splitting instances cost less than classic read/write splitting instances that have the same specifications because of the lack of replica nodes.

A replica node serves as a cold standby node to back up data and does not provide services. If the master node fails, requests are switched to the replica node.

Proxy node

When a client is connected to a proxy node, the proxy node automatically identifies the type of request initiated by the client and then distributes the traffic based on predefined weights assigned to each data node. In a typical configuration, all data nodes have equal weights, and you cannot change the weights. For example, write requests are forwarded to the master node, and read requests are forwarded to the master node and read replicas.

Note
  • Clients must connect to proxy nodes instead of other nodes.

  • The system evenly distributes read requests among the master node and read replicas. You cannot change the weights. For example, if you purchase an instance that has three read replicas, the weights of the master node and three read replicas are all 25%.

HA system

The HA system monitors the status of each node. If the master node fails, the HA system performs a switchover between the master node and the replica node. If a read replica fails, the HA system creates another read replica to process read requests. During a switchover, the HA system updates the routing and weight information.

Description of read/write splitting instances in dual-zone deployment mode

Cloud-native read/write splitting architecture (recommended)

Classic read/write splitting architecture

Both the primary and secondary zones provide services with the following minimum configurations:

  • Primary zone: one primary node and one read replica

  • Secondary zone: one read replica

Separate endpoints are available for both the primary and secondary zones. Each endpoint supports both read and write operations. Read requests are routed to the primary node or read replicas within the same zone from which the requests originated. This ensures that the requests are served by the geographically closest nodes. Write requests are always routed to the primary node in the primary zone. The following figure shows the architecture.

image
Note

We recommend that you configure at least two nodes in both the primary and secondary zones:

  • Primary zone: one primary node and one read replica

  • Secondary zone: two read replicas

Both the primary node and read replica are deployed in the primary zone. Only the replica node is deployed in the secondary zone. A replica node serves as a cold standby node to back up data and does not provide services. If the master node fails, requests are switched to the replica node.

Benefits

  • Compatibility

    You can upgrade standard instances to read/write splitting instances that use proxy nodes to forward requests. After the upgrade, you can connect to the instances from any Redis clients without modifying your application. Read/write splitting instances are fully compatible with Redis commands. For information about the limits on commands supported by read/write splitting instances, see Limits on commands supported by read/write splitting instances.

  • HA system

    • Alibaba Cloud has developed an HA system for read/write splitting instances. The HA system monitors the status of all nodes of an instance to ensure HA. If the master node fails, the HA system switches the workloads from the master node to the replica node and updates the instance topology. If a read replica fails, the HA system creates another read replica. The HA system synchronizes data, forwards read requests to the new read replica, and suspends the faulty read replica.

    • A proxy node monitors the status of each read replica in real time. If a read replica is unavailable due to an exception, the proxy node reduces the weight of this read replica. If a read replica fails to be connected for a specified number of times, the system suspends the read replica and forwards read requests to available read replicas. The proxy node continues to monitor the status of the unavailable read replica. After the read replica recovers, the proxy node adds it to the list of available read replicas and forwards requests to it.

  • High performance

    The read/write splitting architecture supports chained replication. This allows you to scale out read replicas to increase the read capacity. The replication process is optimized based on the Redis source code to maximize workload stability during replication and make full use of the physical resources for each read replica.

Scenarios

High queries per second (QPS)

The standard architecture of Tair is not designed for high QPS scenarios. If your application is read-heavy, you can select a read/write splitting instance and deploy multiple read replicas to resolve performance bottlenecks caused by the single-node standard architecture. A read/write splitting instance can handle QPS that is up to nine times that of a standard instance.

Note

Latency exists when data is synchronized to read replicas. Therefore, read/write splitting instances are suitable for applications that can tolerate a specific amount of dirty data. In scenarios that require high data consistency, we recommend that you choose the cluster architecture.

Usage notes

  • If a read replica fails, requests are forwarded to other available read replicas. If all read replicas are unavailable, requests are forwarded to the master node. Read replica failures may result in increased workloads on the master node and prolonged response time. To process a large number of read requests, we recommend that you use multiple read replicas.

  • If an error occurs on a read replica, the HA system suspends the read replica and creates another read replica. This process involves resource allocation, data synchronization, and service loading. The amount of time that is required for a switchover depends on the system workloads and data volume. Tair does not guarantee a specific amount of time required for data restoration by using read replicas.

  • Full data synchronization among read replicas is triggered in specific scenarios. For example, it can be triggered when a switchover occurs on the master node. During full data synchronization, read replicas are unavailable. If your requests are forwarded to the read replicas, the following error message is returned: -LOADING Redis is loading the dataset in memory\r\n.

  • For more information about routing methods, see Features of proxy nodes.

Purchase method

If you have created a cloud-native standard instance, you can enable the read/write splitting feature for the instance. For more information, see Enable read/write splitting.