Assistant Engineer
Assistant Engineer
  • UID627
  • Fans2
  • Follows0
  • Posts55

[Others]Master-slave hot standby HA scheme of ApsaraDB for Redis

More Posted time:Mar 1, 2017 13:13 PM
High availability is an essential condition for any online production environment. Alibaba Cloud ApsaraDB for Redis, as a mature and stable database product, also supports high availability in view of Redis features. This article will introduce how Redis Cloud implements this scheme.
Redis Cloud currently has two architectures, namely the master-slave architecture and the cluster architecture. This article focuses on the master-slave version for HA analysis.
The figure below demonstrates a master-slave architecture.
We can learn from the figure that the Redis Cloud instance has a master node and a slave node. Usually only the master node provides services, and the slave node serves as the hot backup without providing access. The slave node is mounted to the master node through the slaveof command and keeps receiving data from the master node to guarantee that Redis Cloud can continue functioning when the master node goes down.
Every Redis Cloud instance will assign a VIP and bind it with the DNS. The VIP accesses the master node directly after the Server Load Balancer without going through any layers in between. The route for accessing Redis is DNS -> VIP -> Server Load Balancer -> Redis (master).
The cluster architecture basically follows the same underlying principle with the master-slave architecture for HA, with some differences only in the implementation. Because there is an additional proxy route forwarding layer between the VIP and the backend Redis on the route, you don't need to change the VIP direction in backend RS switchover, but to update the proxy route table.
HA module
HA, as an independent system module, remotely detects the health condition of Redis Cloud and initiates master-slave switchover promptly in case of instance unavailability to ensure the service quality.
Health check
The health check logic is quite simple: connecting to the Redis via the client and sending the ping command. If PONG is returned, it indicates that Redis is healthy. Otherwise, it indicates Redis experiences an exception. The checking logic can be described through pseudocode:
    client = Redis(ip, port, connection_timeout, socket_timeout)
    //Specify the ip:port (here the ip:port can be VIP:VPORT, or the physical ip:port of the master or slave node. HA has multi-dimensional detection) of the Redis to be connected, and set the timeout value.
    //Try to connect to Redis. If the connection fails or times out, an exception will be thrown out.
    res = client.ping()
    //Send the ping command to Redis. If the result is PONG, it indicates Redis is health and OK is returned; if non-PONG result is returned or the session times out, an exception will be thrown out.
    if res == PONG:
        return OK
    //In exception handling, if the exception falls within the pre-defined errors, it indicates Redis experiences an exception for real and ERROR is returned. HA will proceed to the next switchover operation.
    if e.message in ERRORS:
        return ERROR
        return OK

The exceptional situations that require real switchover by HA include the following:
/* The machine of a specified IP address cannot be found (that is to say the current machine has no routing to the specified IP address), or the IP address exists, but the specified port cannot be found for listening. At this time, the host of Redis may go down or the process is suspended */
"Connection refused"    
/* The number of concurrent connections of the server exceeds its capacity and the server will take the initiative to down some of the connections */
"Connection reset"
/* Connection times out */
"connect timed out"
/* Data reading times out */
"Read timed out"
/* Redis is loading data */
"LOADING Redis is loading the dataset in memory"
/* The accessed Redis is the slave */
"READONLY You can't write against a read only slave"

Upgrading scheme of health check
In the single-thread model, all the commands that Redis receives will enter the queue for serial processing. Some time-consuming commands such as flushall and keys * are very easy to cause wait timeout on the client. If the timeout value is not correctly set here, HA may diagnose Redis unhealthy by mistake in its remote detection. In addition, a single ping command won't be able to fully get the process state of Redis.
Sentinel, the HA scheme nested in Redis, relies on the ping command for health checks. You can configure sentinel down-after-milliseconds mymaster to determine a time length value. If no response is received after the defined time length following the ping timeout, Redis will be identified as down. But this practice has a degree of risk. For example, the flushall of a 64GB instance may take two minutes to wipe the data. If the timeout value is set to 60s, misjudgment easily occurs.
We renovated the Redis kernel to solve this problem, by adding a state thread dedicated to providing health check services for HA. At the same time, we also added a state port. The state thread listens to this port, and HA also carries out interactions with Redis through this port during detections, with no impact on the primary service thread of Redis. You can also do many additional detection operations through the new state thread, such as reading/writing performance, and disk IO.
We call the port that the Redis main thread listens to redis_port, and the port that the state thread listens to status_port. The new health check logic is as follows:
    client = Redis(ip, status_port, connection_timeout, socket_timeout)
    //Here the port to be connected is the state port status_port
    //Try to connect to Redis. If the connection fails or times out, an exception will be thrown out.
    res = client.health_check()
    //Send customized health check commands to Redis. If the result is True, it indicates Redis is healthy and OK is returned. Otherwise, an exception will be thrown out.
    if res == True:
        return OK
    //In exception handling, if the exception falls within the pre-defined errors, it indicates Redis experiences an exception for real and ERROR is returned. HA will proceed to the next switchover operation.
    if e.message in ERRORS:
        return ERROR
        return OK

Preparations before master-slave switchover
When the health check finds that Redis is not available, you have to prepare for the master-slave switchover. Some preparations are required before the switchover:
Check Redis health status through VIP
if VIP is unhealthy:
    Check the slave status
    If the slave is healthy:
        Check the VIP status again
        if VIP is healthy:
            No master-slave switchover is required.
            Execute the master-slave switchover.
        If the slave is unhealthy, the switchover is not executable.
    No master-slave switchover is required.

Before the switchover, you should check the slave node state to ensure that instance is serviceable after the switchover; otherwise, the switchover will be unhelpful, such as the extreme situation in which both of the hosts are down.
At the same time, we made some special processing on VPC-type instances. Because we cannot access the VIP of user-defined networks, we need to change the health check on VIP to the health check on the master node.
Execute the master-slave switchover
When Redis becomes unavailable and the switchover conditions are met, you can start to execute the master-slave switchover operation. The switchover also supports active task switching and passive failover. The main difference between the two is whether the slave node needs to wait for the master to be synchronized. The following describes the master-slave switchover:
1. Check the master state once again
if the master is healthy:
    if failover:
        No switchover is required. Success is returned.
        a. Set the master node to readONLY
        b. Check the master-slave synchronization status through the info replication command, that is, whether the master_repl_offset is equal to the slave offset.
        if the status fails to be consistent when the session times out:
            Reset the master node to readwrite and an exception is returned.
    The master node is logged as unable to connect.
2. Redirect the VIP to the slave node
if switchover fails:
    Log the action and return an exception.
3. Update the master and slave metainformation
4. Send the slaveof no one command to the slave nodeto promote it to the new master node.
5. Try to send the slaveof command to the original master node and downgrade it to the new slave node.
    At this time, the original master node may already be down. So no failure processing is initiated, only to log the event.

With the above scheme, the SLA of ApsaraDB for Redis can reach 99.99%. It is only unavailable in the extreme situation where both the master and slave nodes go down.
Switchover scenarios
Downtime or abnormal exit of processes is the most severe case. At this time, HA is unable to connect to Redis in its remote detection and will immediately initiate the switchover. The entire switchover process can be completed within seconds in actual tests.
After the master-slave switchover, the original master node becomes the slave node. To ensure the high availability of Redis Cloud, there will be another component to detect the availability of the slave. When the backup database is down, the backup database rebuilding will be triggered to guarantee master-slave two-node hot standby.
Comparison with Redis-Sentinel
The Redis-Sentinel HA solution that comes with Redis 2.8 monitors the master and slave nodes through the Sentinel system composed of one or more sentinel instances. There are many documents available talking about its specific solutions and I will not detail them here.
We didn't adopt Sentinel in our production environment, mainly out of several considerations below:
1. Although Sentinel itself is a Redis process, an additional role is required to be configured for maintaining Sentinel in the special mode, increasing the complexity of the system.
2. The high availability issue of Sentinel itself. If it is a single-point Sentinel, there will also be the risk of failover failure. A Sentinel cluster is needed, which increases the management costs.
3. Sentinel itself is stateful. You need to load the configuration file at startup to get the master information. If you build a distributed Sentinel cluster, the dynamic addition, deletion and change operations will also incur consistency problems, which is not applicable to the elastic resources on the cloud and fails to cope with the instance creation, deletion, configuration change, migration and other actions.
4. The health check of Sentinel is not flexible enough. We has discussed in the previous section that single ping commands cannot cope with flushall, keys * and other time-consuming commands. Our upgraded health check scheme is able to deal with the complex environments on the cloud better.

Sentinel cannot serve as a uniform HA system to manage all the Redis Cloud resources because of the aforementioned problems. It may lead to wasted resources and increased management complexity to create a separate Sentinel system for each Redis Cloud. Based on this, we developed this HA system applicable to Redis Cloud.
Ending remarks
This article introduces the HA scheme of ApsaraDB for Redis which ensures the service's high availability by master-slave dual-node hot standby. It enables timely master-slave switchover in the case of health check exceptions to guarantee effective business operation.