All Products
Search
Document Center

PolarDB:Three data centers across two regions

Last Updated:Mar 28, 2026

The three data centers across two regions architecture deploys a PolarDB-X instance across two primary data centers in one region and one secondary data center in a second region. This topology provides cross-region high availability with a recovery point objective (RPO) of zero, meeting Level 4 through Level 6 disaster recovery requirements in the financial industry.

This topic describes how the architecture works, the mechanisms that underpin it, and the operations you need to manage it.

Supported version

This architecture requires Apsara Stack DBStack V1.2.1 or later.

Disaster recovery levels

LevelRecovery time objective (RTO)RPODeployment requirement
Level 4≤ 30 minutes0Zone-disaster recovery or geo-disaster recovery
Level 5≤ 15 minutes0Geo-disaster recovery, at least one replica in the remote region
Level 6≤ 1 minute0Geo-disaster recovery, at least two replicas in the remote region

PolarDB-X uses five replicas based on the Paxos majority consensus protocol to achieve RPO = 0 across regions. When the primary data centers fail completely, the remote secondary instance restores service within the RTO constraints above.

How it works

A PolarDB-X instance in this topology runs five replicas: four replicas distributed across two primary data centers in one region, and one replica in a secondary data center in another region. Majority synchronization requires responses from at least three replicas.

Within the primary data centers, the four replicas communicate over low-latency local networks—majority synchronization typically completes in approximately 1 millisecond. Network latency between the primary and secondary data centers is approximately 30 milliseconds, which is typical for cross-region connections in industries such as financial services.

Four mechanisms make this architecture work reliably:

Failure scenarios and responses

Failure scenarioScopeResponse
Leader replica failurePrimary data centersLeader re-election is triggered. A follower in the same data center is prioritized to minimize traffic rerouting.
Follower replica failurePrimary data centersNo action required.
Follower replica failureSecondary data centerNo action required.
Primary data center failureOne of the two primary data centersFive replicas are dynamically downgraded to three replicas. Cross-region synchronization may occur.
Secondary data center failureFour replicas remain in the primary data centers. Paxos protocol performance is unaffected.
Both primary data centers failRegional failureOne replica remains in the secondary data center. Run force_single_mode to start that replica in single-replica mode. Switch business traffic to the remote secondary instance.
Secondary data center failsRegional failureNo action required.

Key mechanisms

Weighted election mechanism

PolarDB-X applies a weighted election mechanism so that leader re-elections prefer replicas in the same data center, avoiding unnecessary cross-region latency.

Replica election weights:

Data centerReplica roleElection weight
Primary Data Center 1Leader9
Primary Data Center 1Follower7
Primary Data Center 2Follower5
Primary Data Center 2Follower3
Secondary Data CenterFollower1

The mechanism has two parts:

  • Optimistic weighted election: Each node waits a calculated delay before initiating a leader election. The delay is inversely proportional to the node's weight, so higher-weight nodes initiate elections first.

  • Mandatory weighted election: When a new leader discovers it does not have the highest weight among all nodes, it enters an abdication phase instead of immediately accepting writes. During this phase, the node sends heartbeat signals (for example, every 1 to 2 seconds). If a higher-weight node responds before the abdication phase ends, leadership transfers to that node.

For example, if the leader in Primary Data Center 1 fails, the follower in the same data center (weight 7) is prioritized over all others, keeping traffic local.

Dynamic adjustment of replica quantities

When a primary data center fails and only three replicas remain, majority synchronization must include the secondary data center replica, adding approximately 30 milliseconds of cross-region latency. Adjust replica counts using the following commands to manage this tradeoff:

TransitionCommandNotes
Five replicas to three replicasdowngrade_followerConverts two followers to learners
Three replicas to five replicasupgrade_learnerConverts two learners back to followers; make sure replication logs are current before upgrading
One replica to three replicasadd_followerAdds new replicas as learners; they are automatically promoted to followers once their logs are current

Forced start of a single replica

When both primary data centers fail, the single remaining replica in the secondary data center cannot satisfy the majority consensus requirement on its own. Run force_single_mode to force the system into single-replica mode, sidelining all follower replicas so the remaining replica can serve requests.

Once the primary data centers recover, PolarDB-X rebuilds the distributed system incrementally:

  1. Add replicas from one to three: run add_follower.

  2. Add replicas from three to five: run upgrade_learner.

Remote secondary instance

In distributed database systems, replication progress can differ across Paxos groups during a distributed transaction. Without coordination, data from a partially replicated transaction could appear in the secondary instance, causing transaction inconsistency.

PolarDB-X addresses this using Change Data Capture (CDC) log nodes deployed in the remote region. These nodes sort and reorganize distributed transactions to guarantee atomic replication—no transaction is partially committed when data moves from the primary to the secondary instance. This guarantee holds during both routine disaster recovery drills and real failovers.

Two design points govern how replication and failover work:

  • Primary instance replication: The primary instance uses the cross-region Paxos protocol, requiring responses from at least three replicas. Because four replicas are in the primary data centers, majority synchronization normally completes locally. The remote replica in the secondary data center responds asynchronously, so cross-region latency does not affect primary instance write performance.

  • Secondary instance replication: The secondary instance is in the remote region. CDC log nodes replicate data in near real time across regions. Replication latency may occur, but atomic transaction replication ensures the secondary instance never reflects a partially committed transaction.

Common operations and maintenance (O&M)

Quick reference: scenario to command

ScenarioAction
One primary data center failsRun downgrade_follower to reduce five replicas to three
Primary data centers recoverRun upgrade_learner to restore five replicas
Both primary data centers failRun force_single_mode to start single-replica mode; switch traffic to secondary instance
Secondary data center recovers after regional failureRun add_follower to add replicas, then upgrade_learner

Create an instance with this topology

When creating a PolarDB-X instance, set the Topology parameter to Three Data Centers Across Two Regions.

View instance topology

On the Basic Information page of the instance, find the Topology Information section to see the zone details of all resources.

Perform a failover

Before you begin

  • Schedule the failover during low-traffic periods to reduce the impact on write performance.

  • On the Basic Information page, verify the current topology and confirm which data center you want to designate as the primary zone.

Steps

  1. Log on to the PolarDB for Xscale console.

  2. In the top navigation bar, select the region where the target instance is located.

  3. On the Instances page, click the PolarDB-X 2.0 tab.

  4. Find the target instance and click its ID.

  5. On the Basic Information page, click Specify Primary Zone in the upper-right corner of the Topology Information section.

  6. In the Specify Primary Zone dialog box, set the Data Center, Primary Zone, and Switch Mode parameters.

  7. Click OK.

After the failover

Verify that the Topology Information section reflects the new primary zone before resuming normal write traffic.