Solutions and customer use cases for active geo-redundancy - PolarDB

Active geo-redundancy is a multi-region database architecture that keeps all data centers active simultaneously — each handling live traffic, each capable of absorbing traffic from a failed peer within minutes. This document covers the architecture of the active geo-redundancy solution for PolarDB-X and two production deployments: the State Administration of Taxation's individual income taxation system and China Unicom's customer service platform.

Database solution for active geo-redundancy

Background

Traditional disaster recovery architectures — zone-disaster recovery, active-active zone-disaster recovery, geo-disaster recovery, and three data centers across two zones — share a common limitation: the secondary data center sits idle until a failure occurs. Under China's national disaster recovery standard GB/T20988-2007 (released in 2007 by the Information Office of the China State Council across eight industries including banking, electric power, civil aviation, railway, and securities), most of these solutions cannot meet grade 5 or grade 6 requirements.

Active geo-redundancy addresses this by making every data center a production data center. All regions handle live traffic continuously, so failover is a traffic rerouting operation, not a system activation.

Recovery targets with active geo-redundancy:

Metric	Typical result
Recovery time objective (RTO)	Failover within minutes
Recovery point objective (RPO)	No data loss — two-way real-time synchronization via Data Transmission Service (DTS)
Disaster recovery grade	Grade 6 per GB/T20988-2007

How it works

The architecture applies a top-down traffic isolation model. Business traffic is split along a defined dimension (such as user ID or region) and routed to different unit clusters. A unit cluster is the set of databases deployed in a single region. One unit cluster acts as the logical center, providing centralized services such as sequence distribution and strong consistency for read operations.

Each unit cluster has three layers:

Use cases

Scenario	Why active geo-redundancy fits
High disaster recovery requirements	Meets grade 6 of GB/T20988-2007; suitable for core business systems and traffic-sensitive workloads
Fine-grained traffic management	Supports custom splitting policies by user attribute, location, or any dimension you define
Rapid business growth	Add a unit cluster by deploying from a configuration image — no need to redesign the architecture
Read-heavy workloads	Asynchronous replication lag is mitigated when reads outnumber writes; migration cost stays low

Benefits

Business systems double as disaster recovery systems. In traditional geo-disaster recovery and three-data-center architectures, the remote secondary center handles no production traffic. If that center experiences a catastrophic failure before it is ever needed, its readiness cannot be guaranteed. With active geo-redundancy, every data center runs production traffic continuously — each one is simultaneously a business system and a disaster recovery system.

Failovers complete within minutes. Because all data centers are live, a failover routes traffic rather than activating a dormant system. The more unit clusters you deploy, the smaller the traffic share each one carries — meaning a single unit cluster's failure affects a proportionally smaller fraction of total traffic, and the remaining clusters absorb it without disruption.

Scale horizontally on demand. When a single region approaches resource limits, active geo-redundancy prevents a single point of failure (SPOF) at the data layer. Each data center accepts reads and writes, so you can add regions or scale existing unit clusters without a complete architecture overhaul.

Isolate traffic for safe rollouts. The architecture's inherent top-to-bottom isolation lets you route a small traffic slice — for example, 1% — to a single unit cluster. Use that cluster to validate infrastructure upgrades or new technologies without exposing the rest of your traffic to risk.

Cost stays below 200% of your primary center. Traditional geo-disaster recovery requires the secondary center to match primary capacity, effectively doubling costs. Active geo-redundancy distributes load across all data centers, so each center is sized for its actual traffic share rather than for worst-case failover capacity. Implementation cost stays below 200% of the redundancy cost.

Case 1: State Administration of Taxation individual income taxation system

Background

The individual income taxation system stores basic information for approximately 780 million natural persons and sensitive data for approximately 360 million active taxpayers. As a large-scale cloud platform serving public-sector operations, it requires grade 6 disaster recovery per the China national standard and must handle explosive user growth without architectural limitations.

Architecture

The system runs on a hybrid transactional and analytical processing (HTAP) and active geo-redundancy architecture, using multiple Alibaba Cloud services in combination:

Service	Role
ApsaraDB RDS for MySQL	Transactional data processing
PolarDB-X	Transactional data processing
AnalyticDB for MySQL	Analytical data processing
Data Transmission Service (DTS)	Cross-region real-time data synchronization
Data Management (DMS)	Routine operations and data change management
Multi-site High Availability (MSHA)	Traffic throttling and failover execution
Database Backup	Data backup to a third-party database

Results

Traffic splitting by business module: Online business of the Natural Person Electronic Tax Department splits traffic by file number of the natural person. Offline business splits by region.
Grade 6 disaster recovery: The system meets GB/T20988-2007 grade 6 requirements. Failovers complete within seconds with no data loss.
Traffic throttling and failover via MSHA: MSHA handles traffic throttling for databases that use the active geo-redundancy architecture and performs failovers.
Full resource utilization: Two unit clusters are deployed, each handling half the total traffic — no idle standby capacity.
Canary releases supported: Configurable traffic splitting rules let the team route a small slice of key-service traffic for staged rollouts before full deployment.

Case 2: China Unicom new customer service system

Background

China Unicom's new customer service system supports its customer service operations across China. The major workload is transactional. This project marks China Unicom's transition to a high-availability architecture for its core customer-facing systems.

Architecture

The system uses active geo-redundancy across seven business centers, with the following services:

Service	Role
ApsaraDB RDS for MySQL	Business data processing
PolarDB-X	Business data processing; centralized console for management
Data Transmission Service (DTS)	Cross-region real-time data synchronization with per-task status reporting
Multi-site High Availability (MSHA)	Traffic throttling and failover execution

Results

Seven business systems connected: Including the inbound call center, outbound call center, and business support center, with traffic distributed by region.
Cross-province failover within seconds: Multiple disaster recovery drills confirm failovers complete across provinces with no data loss.
Full resource utilization: Two unit clusters are deployed, each handling half the traffic.