Cross region high availability architecture - PolarDB - Alibaba Cloud - PolarDB

The three data centers across two regions architecture deploys a PolarDB-X instance across two primary data centers in one region and one secondary data center in a second region. This topology provides cross-region high availability with a recovery point objective (RPO) of zero, meeting Level 4 through Level 6 disaster recovery requirements in the financial industry.

This topic describes how the architecture works, the mechanisms that underpin it, and the operations you need to manage it.

Supported version

This architecture requires Apsara Stack DBStack V1.2.1 or later.

Disaster recovery levels

Level	Recovery time objective (RTO)	Deployment requirement
Level 4	≤ 30 minutes	Zone-disaster recovery or geo-disaster recovery
Level 5	≤ 15 minutes	Geo-disaster recovery, at least one replica in the remote region
Level 6	≤ 1 minute	Geo-disaster recovery, at least two replicas in the remote region

PolarDB-X uses five replicas based on the Paxos majority consensus protocol to achieve RPO = 0 across regions. When the primary data centers fail completely, the remote secondary instance restores service within the RTO constraints above.

How it works

A PolarDB-X instance in this topology runs five replicas: four replicas distributed across two primary data centers in one region, and one replica in a secondary data center in another region. Majority synchronization requires responses from at least three replicas.

Within the primary data centers, the four replicas communicate over low-latency local networks—majority synchronization typically completes in approximately 1 millisecond. Network latency between the primary and secondary data centers is approximately 30 milliseconds, which is typical for cross-region connections in industries such as financial services.

Four mechanisms make this architecture work reliably:

Weighted election mechanism: Keeps leader elections local to avoid unnecessary cross-region latency.
Dynamic replica adjustment: Restores low-latency majority synchronization after a data center failure.
Forced start of a single replica: Lets the secondary data center serve requests when both primary data centers fail.
Remote secondary instance: Provides geo-disaster recovery to meet RTO requirements.

Failure scenarios and responses

Failure scenario	Scope	Response
Leader replica failure	Primary data centers	Leader re-election is triggered. A follower in the same data center is prioritized to minimize traffic rerouting.
Follower replica failure	Primary data centers	No action required.
Follower replica failure	Secondary data center	No action required.
Primary data center failure	One of the two primary data centers	Five replicas are dynamically downgraded to three replicas. Cross-region synchronization may occur.
Secondary data center failure	—	Four replicas remain in the primary data centers. Paxos protocol performance is unaffected.
Both primary data centers fail	Regional failure	One replica remains in the secondary data center. Run `force_single_mode` to start that replica in single-replica mode. Switch business traffic to the remote secondary instance.
Secondary data center fails	Regional failure	No action required.

Key mechanisms

Weighted election mechanism

PolarDB-X applies a weighted election mechanism so that leader re-elections prefer replicas in the same data center, avoiding unnecessary cross-region latency.

Replica election weights:

Data center	Replica role	Election weight
Primary Data Center 1	Leader	9
Primary Data Center 1	Follower	7
Primary Data Center 2	Follower	5
Primary Data Center 2	Follower	3
Secondary Data Center	Follower	1

The mechanism has two parts:

Optimistic weighted election: Each node waits a calculated delay before initiating a leader election. The delay is inversely proportional to the node's weight, so higher-weight nodes initiate elections first.
Mandatory weighted election: When a new leader discovers it does not have the highest weight among all nodes, it enters an abdication phase instead of immediately accepting writes. During this phase, the node sends heartbeat signals (for example, every 1 to 2 seconds). If a higher-weight node responds before the abdication phase ends, leadership transfers to that node.

For example, if the leader in Primary Data Center 1 fails, the follower in the same data center (weight 7) is prioritized over all others, keeping traffic local.

Dynamic adjustment of replica quantities

When a primary data center fails and only three replicas remain, majority synchronization must include the secondary data center replica, adding approximately 30 milliseconds of cross-region latency. Adjust replica counts using the following commands to manage this tradeoff:

Transition	Command	Notes
Five replicas to three replicas	`downgrade_follower`	Converts two followers to learners
Three replicas to five replicas	`upgrade_learner`	Converts two learners back to followers; make sure replication logs are current before upgrading
One replica to three replicas	`add_follower`	Adds new replicas as learners; they are automatically promoted to followers once their logs are current

Forced start of a single replica

When both primary data centers fail, the single remaining replica in the secondary data center cannot satisfy the majority consensus requirement on its own. Run force_single_mode to force the system into single-replica mode, sidelining all follower replicas so the remaining replica can serve requests.

Once the primary data centers recover, PolarDB-X rebuilds the distributed system incrementally:

Add replicas from one to three: run add_follower.
Add replicas from three to five: run upgrade_learner.

Remote secondary instance

In distributed database systems, replication progress can differ across Paxos groups during a distributed transaction. Without coordination, data from a partially replicated transaction could appear in the secondary instance, causing transaction inconsistency.

PolarDB-X addresses this using Change Data Capture (CDC) log nodes deployed in the remote region. These nodes sort and reorganize distributed transactions to guarantee atomic replication—no transaction is partially committed when data moves from the primary to the secondary instance. This guarantee holds during both routine disaster recovery drills and real failovers.

Two design points govern how replication and failover work:

Primary instance replication: The primary instance uses the cross-region Paxos protocol, requiring responses from at least three replicas. Because four replicas are in the primary data centers, majority synchronization normally completes locally. The remote replica in the secondary data center responds asynchronously, so cross-region latency does not affect primary instance write performance.
Secondary instance replication: The secondary instance is in the remote region. CDC log nodes replicate data in near real time across regions. Replication latency may occur, but atomic transaction replication ensures the secondary instance never reflects a partially committed transaction.

Common operations and maintenance (O&M)

Quick reference: scenario to command

Scenario	Action
One primary data center fails	Run `downgrade_follower` to reduce five replicas to three
Primary data centers recover	Run `upgrade_learner` to restore five replicas
Both primary data centers fail	Run `force_single_mode` to start single-replica mode; switch traffic to secondary instance
Secondary data center recovers after regional failure	Run `add_follower` to add replicas, then `upgrade_learner`

Create an instance with this topology

When creating a PolarDB-X instance, set the Topology parameter to Three Data Centers Across Two Regions.

View instance topology

On the Basic Information page of the instance, find the Topology Information section to see the zone details of all resources.

Perform a failover

Before you begin

Schedule the failover during low-traffic periods to reduce the impact on write performance.
On the Basic Information page, verify the current topology and confirm which data center you want to designate as the primary zone.

Steps

Log on to the PolarDB for Xscale console.
In the top navigation bar, select the region where the target instance is located.
On the Instances page, click the PolarDB-X 2.0 tab.
Find the target instance and click its ID.
On the Basic Information page, click Specify Primary Zone in the upper-right corner of the Topology Information section.
In the Specify Primary Zone dialog box, set the Data Center, Primary Zone, and Switch Mode parameters.
Click OK.

After the failover

Verify that the Topology Information section reflects the new primary zone before resuming normal write traffic.