The high-availability service consists of several modules including the Detection, Repair, and Notification modules. In combination, these modules guarantee the availability of the data link services and process any internal database exception.
In addition, RDS can improve the performance of its high-availability service by migrating to a region that supports multiple zones and by adopting the appropriate high-availability policies.
The Detection module checks whether the master and slave nodes of the DB Engine offer their services normally. The HA (High Available) node uses heartbeat information, acquired at an interval of 8 to 10 seconds, to check the health status of the master node. This information, combined with the health status of the slave node and heartbeat information from other HA nodes, allows the Detection module to eliminate any risk of misjudgment caused by exceptions such as network jitter and allows that the exception switchover can be completed within 30 seconds.
The Repair module maintains the replication relationship between the master and slave nodes of the DB Engine. It can also repair any errors that may occur on either node.
- Automatic restoration of master/slave replication in case of disconnection
- Automatic repair of table-level damage to the master or slave nodes
- On-site saving and automatic repair if the master or slave nodes crash
The Notice module informs the SLB or Proxy of status changes to the master and slave nodes to guarantee that you can continue to access the correct node.
For example, the Detection module discovers that the master node has an exception and instructs the Repair module to fix it. If the Repair module fails to resolve the problem, it directs the Notification module to initiate traffic switching. The Notification module then forwards the switching request to the SLB or Proxy, which begins to redirect all traffic to the slave node. Simultaneously, the Repair module creates a new slave node on another physical server and synchronizes this change back to the Detection module. The Detection module then incorporates this new information and starts to recheck the health status of the instance.
Multi-zone refers to the physical area that is formed by combining multiple individual zones within the same region. Multi-zone RDS instances can withstand higher level disasters than single-zone instances. For example, a single-zone RDS instance can withstand server and rack failures, while a multi-zone RDS instance can survive a situation such as failure of an entire data center.
Currently no extra charge for multi-zone RDS instances is generated. Users in a region where multi-zone is enabled can purchase multi-zone RDS instances directly or convert single-zone RDS instances into multi-zone RDS instances by using inter-zone migration.
|Multiple zones may have a certain amount of network latency. As a result, when a multi-zone RDS instance uses a semi-synchronous data replication solution, its response time to any individual update may be longer than that of a single-zone instance. In this case, the best way to improve overall throughput is to increase concurrency.|
The high-availability policies use a combination of service priorities and data replication modes to meet the business needs.
The service priorities are as follows:
- RTO (Recovery Time Objective) priority: The database must restore services as soon as possible within a specified time frame. This is best for users who require their databases to provide uninterrupted online service.
- RPO (Recovery Point Objective) priority: The database must guarantee the data reliability, that is, as little data loss as possible. This is best for users whose highest priority is data consistency.
There are three data replication methods:
- Asynchronous replication (Async):
In this mode, the master node does not immediately synchronize data to the slave node. When an application initiates an update request, which may include add, delete, or modify operations, the master node responds to the application immediately after completing the operation but does not necessarily replicate that data to the slave node right away. This means that the operation of the primary database is not affected if the slave node is unavailable, but data inconsistencies may occur if the master node is unavailable.
- Forced synchronous replication (Sync):
When an application initiates an update (add, delete, or modify) request, the primary database completes the operation and then replicates data to the standby database. After the standby database receives the data, it returns a success message to the primary database. The primary database waits for a feedback from the standby database before responding to the application. This means that the operation of the master node is affected if the slave node is unavailable, but the data on the master and slave nodes is always consistent.
- Semi-synchronous replication (Semi-Sync):
This functions as a hybrid of the two preceeding replication modes. In this mode, when both nodes are functioning normally, data replication is identical to the forced synchronous replication mode. However, when there is an exception, such as the slave node becoming unavailable or a network exception occurring between the two nodes, the master node only attempts to replicate data to the slave node and suspend its response to the application for a set period of time. Once the replication mode has timed out, the master node degrades to asynchronous replication. At this point, if the master node becomes unavailable and the application updates its data from the slave node, it is consistent with the data on the master node. When data replication between the two nodes returns to normal, because the slave node or network connection is recovered, forced synchronous replication is reinstated. The amount of time it takes for the nodes to return to forced synchronous replication depends on how the semi-synchronous replication mode was implemented. For instance, ApsaraDB for MySQL 5.5 is different from ApsaraDB for MySQL 5.6 in this regard.
Several combinations of service priorities and data replication modes are available to meet your database and business needs. The characteristics of key combinations are detailed in the following table.
|Cloud data engine||Service priority||Data replication mode||Combination characteristics|
|MySQL 5.7||X||X||Currently this engine does not support policy adjustments.|
|SQL Server 2008 R2||X||X||Currently this engine does not support policy adjustments.|
|SQL Server 2012||X||X||Currently this engine does not support policy adjustments.|
|PostgreSQL||X||X||Currently this engine does not support policy adjustments.|
|PPAS||X||X||Currently this engine does not support policy adjustments.|