MySQL group replication (MGR) is a distributed replication mode built on MySQL's binary log mechanism and implemented using the Paxos protocol. ApsaraDB RDS for MySQL instances running RDS Cluster Edition support MGR.
Traditional replication is asynchronous or semi-synchronous: the primary node writes a transaction to its binary log and then propagates it to secondary nodes, with no guarantee that secondaries have received the data before a failure occurs. MGR takes a fundamentally different approach — before a transaction commits, a majority of nodes must acknowledge receipt of its binary logs. This majority-acknowledgment model gives MGR stronger data guarantees than traditional replication modes.
| Feature | MGR | Semi-synchronous replication | Asynchronous replication |
|---|---|---|---|
| Data reliability | ★★★★★ | ★★★ | ★ |
| Data consistency | Ensured | Not ensured | Not ensured |
| Global transaction consistency | Supported | Not supported | Not supported |
Advantages
High data reliability
MGR uses the majority rule of the Paxos protocol: a transaction commits only after most nodes in the RDS cluster have received its binary logs. This prevents data loss even when a minority of nodes become faulty.
For example, in a 5-node cluster where 3 nodes receive the binary logs and 2 do not:
-
If the 2 faulty nodes are among those that received the binary logs, at least 1 node holding the data is still running.
-
If the 2 faulty nodes are among those that did not receive the binary logs, all 3 nodes holding the data are still running.
Strong data consistency
In traditional primary-secondary replication, if the primary node fails after writing a transaction to its binary log but before transmitting it to secondary nodes, data inconsistency occurs. MGR prevents this: transactions are transmitted to other nodes before being written to the binary log, so every committed transaction is already on a majority of nodes.
If the primary node fails and restarts, it automatically rejoins the cluster and syncs any missing binary logs to catch up.
Global transaction consistency
Set the group_replication_consistency parameter to control the consistency level for read and write operations:
-
Strong read consistency (
group_replication_consistency=BEFOREon secondary nodes): a query executes only after all transactions that preceded it on the primary node have completed. This guarantees that reads on secondary nodes are never stale. -
Strong write consistency (
group_replication_consistency=AFTERon the primary node): a write transaction commits only after it has been successfully applied on all nodes.
Deployment methods
MGR supports two deployment modes. The table below compares their capabilities and trade-offs to help you choose the right mode for your workload.
| Multiple leader | Single leader | |
|---|---|---|
| Write nodes | All nodes | Primary node only |
| Read nodes | All nodes | All non-primary nodes |
| Write throughput | Higher (parallel writes across nodes) | Lower (single write path) |
| Read throughput | Standard | Higher (read-only nodes optimized) |
| Availability when a minority of nodes fail | Degraded — nodes after the failing node in polling order cannot send data | Unaffected — cluster continues as long as a majority of nodes are available |
| Primary/secondary switchover on primary failure | N/A | Automatic, based on Paxos protocol |
| Data consistency guarantee | Strong | Strong |
Multiple leader
All nodes process both read and write requests. The multiple leader mode increases write throughput by using Paxos mechanisms to write multiple records simultaneously and detect row-level conflicts so that all nodes receive data in the same order.
Limitation: When a node experiences jitter or a fault, nodes listed after it in the polling order cannot send data, which makes the cluster temporarily unavailable. This is an inherent limitation of the multiple leader mode that cannot be eliminated.
Single leader
Only one node processes write requests; all other nodes process read requests. The single leader mode uses the single-leader replication strategy of the Paxos protocol to maximize read throughput and maintain high availability:
-
If a secondary node fails and a majority of nodes remain available, the cluster continues operating without interruption.
-
If the primary node fails, the cluster automatically performs a primary/secondary switchover based on the Paxos protocol to maintain strong data consistency.
ApsaraDB RDS for MySQL supports creating RDS clusters that use MGR in single leader mode. In this mode, read-only nodes are optimized to improve cluster performance while ensuring high data reliability and strong data consistency.
Architecture
MGR adds three layers beneath MySQL's server and replica layers:
| Layer | Role |
|---|---|
| Group replication logic layer | Interacts with MySQL's server layer to send and receive transactions to and from the group communication system (GCS) layer, and applies transactions |
| GCS layer | Delivers messages, detects faults, and manages cluster membership |
| XCom layer | Ensures data order consistency and majority-node availability, based on the Paxos protocol |
Paxos protocol
The Paxos protocol performs two functions in MGR:
-
Ensures that all nodes in the cluster receive the binary logs of a transaction in the same order — essential for multiple leader mode.
-
Ensures that a transaction commits only after a majority of nodes acknowledge receipt of its binary logs.
The XCom layer relies on the Mencius protocol, a Paxos-based variant that replaces Paxos's lock-based ordering with a polling mechanism. This achieves load balancing across nodes and improves overall efficiency.
How multiple leader mode works
In multiple leader mode, each node belongs to a Paxos group. The XCom layer enforces a consistent global data order using two mechanisms:
-
Serial sending within a group: Data within each group is sent serially to ensure ordering consistency within the group.
-
Polling across groups: When multiple groups send data, a polling mechanism enforces a consistent order globally. For example, data is sent in the order (1,1), (1,2), (1,3).
In the notation (m,n), n is the group number and m is the sequence number of the data record sent by that group. For example, (2,1) means Group 1 is sending its second data record.
The noop mechanism handles idle nodes: if a node has no data to send but data from nodes listed after it has already been received by a majority of the cluster, the node broadcasts a noop state to skip itself and unblock the next node in the polling order.
How single leader mode works
Because only one node sends write transactions in single leader mode, only one Paxos group needs to be active. The receiver ignores all other Paxos groups during polling. This eliminates the noop-related availability issue present in multiple leader mode: as long as a majority of nodes are available, the cluster remains operational even if individual nodes fail.
Secondary nodes in single leader mode do not send transactions — they send only cluster management information. Before a secondary node sends any data, it must request a sending slot from the primary node. The notation <3,1> in the diagram above represents such a slot. Because cluster management messages are infrequent, the latency overhead of this slot-request mechanism has no meaningful impact on cluster throughput.
Group replication logic layer
The group replication logic layer handles transaction propagation and conflict detection on both primary and secondary nodes.
Primary node flow:
Before a transaction commits, the group replication logic layer sends its binary logs to the XCom layer, which broadcasts them to all other nodes. After a majority of nodes acknowledge receipt, conflict detection runs:
-
Pass: the transaction is written to the binary log file and committed.
-
Fail: the transaction is rolled back.
Secondary node flow:
After a majority of nodes acknowledge a transaction, the XCom layer forwards it to the group replication logic layer for conflict detection:
-
Pass: the transaction is written to the relay log file and applied by the applier thread.
-
Fail: the transaction data is discarded.
Conflict detection
When conflict detection is required:
-
Multiple leader mode: Required for every write operation, because multiple nodes can accept writes simultaneously.
-
Single leader mode: Required only when a primary/secondary switchover occurs and write transactions arrive on the new primary before its relay logs from the original primary have been fully applied.
How it works:
Each node maintains a hash array of transaction authentication information:
-
Key: The hash value of a modified data row (based on its primary key).
-
Value: The union of the global transaction identifier (GTID) of the current transaction and
gtid_executed— the set of all GTIDs committed on the source node before the current transaction was committed.
Before a transaction commits, each node builds a dependency set by looking up the hash values of all rows modified by the transaction in its authentication array. The dependency set represents all transactions that must have completed before the current transaction can safely commit.
The system then compares the transaction's commitment set (transactions committed before this one on the source node) against the dependency set:
-
If the commitment set contains the dependency set, the transaction passes conflict detection and is committed on the source node and written to relay logs on other nodes.
-
If the commitment set does not contain the dependency set, the transaction fails conflict detection, is rolled back on the source node, and its relay log data is discarded on other nodes.
Deletion of stale authentication data:
Once a transaction has been applied on all nodes in the cluster, its rows can be removed from the authentication array. MGR purges stale authentication data every 60 seconds.
AliSQL optimizations for MGR stability
Single leader mode significantly improves MGR stability, but a specific scenario remains problematic: when a secondary node experiences high latency, it cannot apply transactions fast enough, causing authentication information to accumulate. This leads to two issues:
-
High memory consumption, with a risk of out-of-memory (OOM) errors.
-
High overhead from periodically purging the accumulated authentication data, which degrades cluster performance.
AliSQL addresses this by reducing the volume of authentication information maintained on each node:
-
Primary node: The authentication array on the primary node is never used for conflict detection in single leader mode. AliSQL removes it entirely, eliminating its memory and purge overhead.
-
Secondary node: Authentication information is only needed when
group_replication_consistencyis set toEVENTUAL. TheEVENTUALsetting allows a newly elected primary to serve external traffic immediately without waiting for relay log playback to complete — which can cause data conflicts and is rarely used in production. WhenEVENTUALis not used, AliSQL removes the authentication array from secondary nodes as well, substantially reducing memory consumption and improving cluster stability.