Tair (Redis® OSS-Compatible) Imperceptible Switchover Technology Ensures Business Continuity

As a database system that has been iterated for more than 15 years at Alibaba and Alibaba Cloud, Tair (Redis® OSS-Compatible) relies on core advantages such as high performance, low latency, distributed nature, and high reliability. It has been widely applied in multiple industries such as AI, E-commerce, gaming, transportation, education, and healthcare.

At critical moments of E-commerce sales promotions, gaming battles, or finance transactions, every database O&M change is like "walking a tightrope." For a long time, the master-slave architecture (Standalone) of the Redis® open source community has had a defect that gives developers a headache: during version upgrades or high availability switchovers, connection interruptions and errors lasting several seconds are inevitable. This service jitter of just a few seconds often means order churn or user disconnection in modern business scenarios that demand extreme experiences.

Can true imperceptible switchover be achieved, ensuring no errors, no connection interruptions, and continuous online status for the business? The Alibaba Cloud Tair team has given a positive answer. This article focuses on analyzing the innovative practices of Tair in the realm of imperceptible switchover technology. Through imperceptible switchover technology, this feature helps customer instances significantly reduce the instance unavailability time during O&M operations such as minor version upgrades and high availability switchovers, so the client does not perceive obvious performance fluctuations.

1. Overview of the high availability architecture of the Redis®/Valkey community

For the open source database Valkey or Redis®, the two currently supported architectures are as follows:

Standalone: In the standard master-slave architecture, the primary database usually provides read and write services, and the secondary database serves only as a replacement node in case of failure. The mainstream high availability solution in the community is to introduce the Sentinel component as an independent monitoring and arbitration layer to continuously detect the survival status and replication health of the primary database and secondary database. When the primary database is detected to be continuously unreachable, failover is automatically initiated after a majority of Sentinel instances reach a consensus.

Cluster: The cluster architecture integrates sharding and high availability. The entire keyspace is divided into 16,384 slots Distributed across multiple master nodes, and each master can also be configured with multiple replicas. When a master fails, its corresponding replica is automatically promoted to the new master through the voting and election mechanism within the cluster and takes over all slots managed by that master, thereby realizing automatic master-slave switchover. Cluster nodes synchronize status via the gossip protocol, and the client can achieve automatic routing switchover when receiving MOVED/ASK redirection.

Figure 1: High-availability mechanisms of Standalone and Cluster architectures

2. Imperceptible switchover and ecosystem support in cluster architecture

According to the standard protocol RESP of open source Redis®, after a primary/secondary failover occurs in the cluster architecture, the secondary database returns a MOVED instruction to existing requests. After receiving the corresponding MOVED instruction, the client refreshes the route table and retries the command, thereby realizing imperceptible switchover.

When implementing the direct connection cluster architecture (non-Proxy cluster architecture), Tair did not simply adopt the open source architecture in its entirety but transformed it. First, it introduced the virtual IP address (VIP) mechanism to ensure link connectivity after switchover to achieve imperceptible switchover. Second, it abandoned the gossip protocol and used a centralized Config Server to uniformly distribute the route table. This "strong management" pattern eliminates the uncertainty of distributed protocols and significantly improves cluster stability.

Figure 2: Schematic of Alibaba Cloud Tair direct connection cluster architecture

key1: 100 indicates that key1 is located in slot 100. Accessing the master node of VIP1-1 will be directly accepted.
key1: 100 Accessing the replica node of VIP1-2 will return MOVED.
key2: 200 Accessing the master node of VIP2-1, because the slot range managed by it is 8192 to 16383, which does not include the slot where the key is located

3. Implementation of imperceptible switchover in master-slave architecture

Compared with the redirection capability naturally possessed by the cluster architecture, the master-slave architecture faces two major "genetic defects" to achieve imperceptible switchover:

Incompatibility between master-slave and cluster protocols: The Redis® community first created a simple master-slave architecture, but when it started to Support the cluster architecture from version 3.0, the master-slave architecture lacked the slot routing mechanism unique to the cluster, and the protocols of the two were incompatible. If users are forcibly required to upgrade to the cluster version, extremely high code modification costs will result.
The Sentinel solution cannot be completely seamless: As an external monitoring component, Sentinel itself involves a certain degree of deployment difficulty and cost. It can only be responsible for detection and cannot control command behavior at the kernel connection dimension. The Master-Replica version of Alibaba Cloud uses a standalone high availability component to be responsible for the lifecycle keep-alive detection of instances and does not depend on Sentinel.
Facing the above constraints, Tair chose a more difficult path: It collaborated with the kernel, client, and Network Layer to perform full-link refactoring, and successfully promoted the implementation of the new standard for master-replica seamless switching in the community.

3.1 Promotion of community standard protocols

Support for the master-replica seamless protocol at the kernel level [1] was initially submitted to the Redis® community. After more than a hundred exchanges, and experiencing the closed-sourcing of the Redis® community, it was finally merged and accepted in the new Valkey community [2]. Three major obstacles were encountered during the entire discussion process:

• Architecture bias: Users outside China mostly use the Cluster Edition, so the community's desire for master-replica optimization is not very strong. However, there are a large number of Standard Edition customers on Alibaba Cloud because the Standard Edition is freer to use, such as multi-key commands not being subject to slot limitations.

• Inconsistent experience standards: Regarding seamless experience, some customers choose to tolerate breaks, but the Alibaba Cloud Tair team attaches great importance to customers' increasing demand for response time (RT) and the ultimate seamless experience.

• Ecosystem inertia: The community itself is relatively resistant to importing new protocols because it means that clients need to re-adapt. To dispel this concern, Tair directly entered the Redis® client ecosystem and is currently the owner of the valkey-java [3] client.

Although the closed-sourcing of Redis® was encountered during this period, upholding the open source spirit, the Alibaba Cloud Tair team united with multiple core contributors from the former Redis community and major vendors to establish the Valkey community, and succeeded in promoting the development and merging of seamless technology in the Valkey community, making seamless switching a standard capability of the Valkey database. The new protocol format is: REDIRECT HOST PORT, where REDIRECT indicates that the client needs to redirect the Request to the target node.

Figure 3: Support PR for Valkey seamless switching

3.2 Follow-up of software development kit (SDK) technology

After Redis® was closed-sourced, the core contributors of the Jedis community from the Tair team forked Jedis to create the Valkey-Java client, providing customers with continued client services. On Valkey-Java, Support for master-replica seamless switching was implemented. Customers only need to upgrade the client to the following versions to enjoy the capability support of master-replica seamless switching:

<dependency>
    <groupId>io.valkey</groupId>
    <artifactId>valkey-java</artifactId>
    <version>5.3.0</version>
</dependency>

In the procedure of Supporting seamless switching, how to handle in-flight connections is specifically explained: that is, for Links accessed concurrently in the connection pool, if one or more links encounter -REDIRECT at the same time, the procedure of re-initializing the connection pool is controlled through a two-phase lock:

The first layer of lock uses tryLock of ReentrantLock. Only when tryLock has succeeded is it possible to enter the candidate renew connection pool.
The second layer of lock uses ReentrantReadWriteLock, which is mainly used to control the reading and writing of the connection pool. However, after the write lock is added, external API requests will be blocked until the connection pool update is completed, thereby achieving concurrent secure access.

Figure 4: Valkey-Java Code for handling in-flight connections

Currently, seamless switching has become a standard capability of the Valkey client community. In addition to the Valkey-Java client, Support for seamless switching by other clients in the community is also proceeding successively:

Client	URL	Imperceptible Switchover Support	Starting Version
Valkey-Java	https://github.com/valkey-io/valkey-java	Supported	5.3.0
Valkey-Go	https://github.com/valkey-io/valkey-go	Supported	1.0.67
Valkey-py	https://github.com/valkey-io/valkey-py	In progress	</td>
Glide	https://github.com/valkey-io/valkey-glide	In progress	</td>

3.3 Network technology: connection draining

Apart from the differences in modifications and optimizations at the kernel layer, the difference in network environments between ApsaraDB and self-managed databases is also a very important aspect. Alibaba Cloud Tair adopts an "LB (load balancing) + VIP" architecture. The LB is responsible for connecting the VPC network. Regardless of how the backend primary and secondary nodes change, the client only needs to access a fixed VIP, which greatly reduces access costs.

However, in a standard high availability switchover, when the VIP mapping changes, the LB usually immediately resets (RST) the old connection. This leads to a consequence: the old primary database loses the link before it has time to send the redirection instruction REDIRECT to the client.

The implementation of Tair's imperceptible primary-secondary switchover relies on Connection Draining. This is a "graceful shutdown" mechanism which acts as a delay protector. It forces the old connection to remain open for a period of time (configurable) after the VIP switch. This ensures that the old primary database has sufficient time to successfully deliver the REDIRECT instruction to the client before gracefully disconnecting, thereby achieving a truly imperceptible switchover.

Figure 5: Imperceptible primary-secondary switchover of Alibaba Cloud Tair Standard Edition architecture

4. Testing results of imperceptible switchover

After the imperceptible switchover is published, Tair-pulse3 during the switchover) is used to perform comparative testing on the instance during the switchover. You can see that this feature brings improvements in two parts:

The original switchover solution encounters 5 types of errors, such as readonly and connection reset, but the new switchover solution does not encounter any errors.
The original switchover solution has an unavailability of about 5 s during the switchover procedure, while the impact of the new switchover solution is a 1 s increase in RT. It is important to note that the new switchover solution uses pause write to block instance writes, which may cause command timeouts of up to about 1 s. If you wish to avoid processing any errors during the switchover, you can appropriately increase the timeout.

Figure 6: The old solution has 5 types of errors during the switchover procedure, and the unavailable time is 5 s

Figure 7: The imperceptible switchover solution has 0 errors, and the unavailable time is about 1 s

Summary and Outlook

Currently, there is still space for optimization in the primary-secondary imperceptible switchover of the Tair cloud architecture. We will continue to work with the network team to further reduce the switchover duration and provide high-quality services to customers. You are welcome to try out the Tair primary-secondary imperceptible switchover capability and provide valuable feedback.

Appendix: Practical guide for Tair imperceptible switchover:

Tair Supports the primary-secondary imperceptible switchover capability starting from major version 7.0. If your major version is lower than 7.0, you need to upgrade the version to 7.0. If you are already on version 7.0, you can simply upgrade the minor version to 0.2.9. Note that you need to use Valkey-Java Version 5.3.0 or later to experience the complete imperceptible switchover capability.

Appendix:

[1] https://github.com/redis/redis/pull/12192
[2] https://github.com/valkey-io/valkey/pull/325
[3] https://github.com/valkey-io/valkey-java
[4] https://github.com/tair-opensource/tair-tools/tree/main/tair-pulse

Community

Tair (Redis® OSS-Compatible) Imperceptible Switchover Technology Ensures Business Continuity

1. Overview of the high availability architecture of the Redis®/Valkey community

2. Imperceptible switchover and ecosystem support in cluster architecture

3. Implementation of imperceptible switchover in master-slave architecture

3.1 Promotion of community standard protocols

3.2 Follow-up of software development kit (SDK) technology

3.3 Network technology: connection draining

4. Testing results of imperceptible switchover

Summary and Outlook

Appendix:

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

Tair (Redis® OSS-Compatible)

Application High Availability Service

Database for FinTech Solution