Two-way data synchronization keeps two ApsaraDB for MongoDB sharded cluster instances in sync simultaneously, enabling active geo-redundancy (unit-based) and geo-disaster recovery scenarios. This topic describes how to configure a two-way synchronization task using Data Transmission Service (DTS).
Plan your data writes before you start. Assign disjoint primary key ranges to each instance so that records with the same primary key are updated on only one node at a time — for example, instance A owns records with primary keys 1, 3, and 5, and instance B owns records with primary keys 2, 4, and 6. This prevents primary key conflicts and mutual overwrites.
How it works
A two-way synchronization instance runs two tasks simultaneously:
-
Forward task: synchronizes data from instance A to instance B.
-
Reverse task: synchronizes data from instance B to instance A.
When both tasks share objects, only one task performs both full data synchronization and incremental data synchronization. The other task performs incremental data synchronization only. Synchronized data is not re-synchronized in the opposite direction.
DDL operations are synchronized in the forward direction only.
Prerequisites
Before you begin, make sure that:
-
Both the source and destination ApsaraDB for MongoDB sharded cluster instances are created. See Create a sharded cluster instance.
-
Each shard in the source instance has an endpoint assigned, and all shards share the same account and password. See Apply for an endpoint for a shard or ConfigServer node.
For supported database versions, see Overview of data synchronization scenarios.
-
(Recommended) The destination instance has at least 10% more available storage space than the total data size of the source instance.
-
The
replication.oplogGlobalIdEnabledparameter is set totruefor the shard and ConfigServer nodes of both instances. See Configure database parameters for an instance.If
replication.oplogGlobalIdEnabledis nottrue, the precheck fails or the errortwo-way mongo must have gidis returned. -
The databases and collections to be sharded are created in both instances, data sharding is configured, the balancer is enabled, and pre-sharding is performed. See Configure sharding to maximize the performance of shards and the FAQ.
Pre-sharding and enabling the balancer before synchronization ensure that synced data is evenly distributed across shards, preventing data skew.
-
The database accounts used for DTS have the required permissions. See the account permissions section.
Account permissions
Create or verify that the following accounts exist before configuring the task.
Source instance
The account must have read permissions on the source, config, admin, and local databases.
Destination instance
The account must have the dbAdminAnyDatabase permission, read and write permissions on the destination database, and read permissions on the local database.
Limitations
Source and destination database requirements
-
The server to which the source database is deployed must have sufficient outbound bandwidth. Otherwise, the data synchronization speed is affected.
-
Collections must have PRIMARY KEY or UNIQUE constraints, and all fields must be unique. Otherwise, the destination may contain duplicate records.
-
The
_idfield must be unique across all records in a collection. Otherwise, data inconsistency may occur. -
If you select collections as the objects to synchronize and need to edit collections in the destination database, such as renaming collections, a single task supports up to 1,000 collections. For more than 1,000 collections, configure multiple tasks or synchronize entire databases instead.
-
A single data entry cannot exceed 16 MB. Tasks fail if this limit is exceeded.
-
The source cannot be an Azure Cosmos DB for MongoDB cluster or an Amazon DocumentDB elastic cluster.
-
The oplog must be enabled and retain at least seven days of data. Alternatively, enable change streams covering the last seven days. If neither condition is met, DTS may fail to obtain data changes, causing synchronization failure or data loss. Issues that arise from this are not covered by the DTS service level agreement (SLA).
Important- Use the oplog to record data changes (recommended). Change streams are supported only for MongoDB 4.0 and later, and two-way synchronization is not supported when using change streams. - For non-elastic Amazon DocumentDB clusters, enable change streams and set Migration Method to ChangeStream and Architecture to Sharded Cluster.
-
MongoDB sharded cluster instances cannot be scaled during an active synchronization task. Scaling causes the task to fail.
-
The source MongoDB sharded cluster cannot have more than 10 Mongos nodes.
-
Collections with time to live (TTL) indexes cannot be synchronized. If the source has TTL indexes, data inconsistency may occur after synchronization.
-
Make sure neither the source nor the destination has orphaned documents. Orphaned documents can cause data inconsistency or task failure. See Orphaned document and How do I delete orphaned documents?
-
Both instances must be ApsaraDB for MongoDB instances with the same architecture. Two-way synchronization is not supported for self-managed MongoDB databases or instances with different architectures.
-
During schema synchronization and full data synchronization, do not modify database or collection schemas.
-
During full data synchronization only (without incremental synchronization), do not write to the source database.
Other requirements
-
Add shard keys to all data before starting the task. INSERT operations must include shard keys, and UPDATE operations cannot modify shard keys.
-
The destination MongoDB version must be the same as or later than the source version. A lower destination version may cause compatibility issues.
-
If a destination collection has a unique index or has
cappedset totrue, the collection supports only single-thread writes and does not support concurrent replay during incremental synchronization. This may increase synchronization latency. -
DTS cannot synchronize data from the
adminorlocaldatabase. -
Transactions are not retained. During synchronization, each transaction is converted to a single record in the destination.
-
Disable the MongoDB balancer on the source instance during full data synchronization. Enable the balancer only after incremental synchronization starts. Otherwise, data inconsistency may occur. See Manage the ApsaraDB for MongoDB balancer.
-
If data sharding is already configured for the destination and you do not need schema synchronization, do not select Schema Synchronization under Synchronization Types. Selecting it may cause shard conflicts or data inconsistency.
-
We recommend that you synchronize data during off-peak hours. Full data synchronization consumes read and write resources on both instances and increases database load.
-
The data is concurrently written to the destination database. Therefore, the storage space occupied in the destination database is 5% to 10% larger than the size of the data in the source database.
-
During full data synchronization, concurrent INSERT operations create fragmentation in destination collections. After full data synchronization is complete, the storage space for collections of the destination database is larger than that of the source database.
-
Do not write data from other sources to the destination during synchronization. For example, running online DDL statements through Data Management (DMS) while other sources write to the destination may cause data loss.
-
To query document counts in the destination, use:
db.$table_name.aggregate([{ $count:"myCount"}]). -
Make sure the destination does not have records with the same primary key (
_id) as the source. If conflicts exist, delete the conflicting records from the destination without interrupting DTS. -
If a DTS task fails, DTS support attempts to restore it within 8 hours. During restoration, the task may restart and task parameters (not database parameters) may be modified.
Billing
| Synchronization type | Fee |
|---|---|
| Schema synchronization and full data synchronization | Free |
| Incremental data synchronization | Charged. See Billing overview. |
Supported topology
DTS supports two-way synchronization between exactly two ApsaraDB for MongoDB sharded cluster instances. Synchronization among more than two instances is not supported.
Conflict detection
To maintain data consistency, update records with the same primary key, business primary key, or unique key on only one synchronization node at a time. If both nodes update the same record, DTS resolves the conflict based on the policy you configure.
DTS detects the following conflict types:
-
INSERT uniqueness conflicts: If the same primary key is inserted into both nodes at nearly the same time, one INSERT fails because the primary key already exists on the other node.
-
UPDATE inconsistency:
-
If the record to update does not exist in the destination, DTS converts the UPDATE to an INSERT. This may trigger a uniqueness conflict.
-
The primary or unique key of the inserted record may conflict with existing destination records.
-
-
DELETE on non-existent records: If the record to delete does not exist in the destination, DTS ignores the DELETE operation regardless of the conflict resolution policy.
-
System time differences and synchronization latency between the source and destination mean DTS cannot guarantee that conflict detection prevents all conflicts. Design your application so that records with the same primary key or unique key are updated on only one node.
-
In this scenario (two-way synchronization for MongoDB sharded clusters), only the Ignore conflict resolution policy is supported.
Synchronization types
| Type | Description |
|---|---|
| Schema synchronization | Synchronizes the schemas of selected objects from the source to the destination. |
| Full data synchronization | Synchronizes all existing data of selected objects. Supports databases and collections. |
| Incremental data synchronization | Synchronizes data changes: CREATE COLLECTION, CREATE INDEX, DROP COLLECTION, DROP INDEX, RENAME COLLECTION, and insert, update, and delete operations on documents. |
Forward and reverse task settings at a glance
Before you start, review the key differences between the forward and reverse tasks:
| Setting | Forward task (A to B) | Reverse task (B to A) |
|---|---|---|
| Synchronization types | Schema synchronization + full data synchronization + incremental data synchronization | Incremental data synchronization only (full sync is not required if data was already synced in the forward direction) |
| Source and destination | Source: instance A; destination: instance B | Source: instance B; destination: instance A (swap from forward) |
| Object name mapping | Allowed | Avoid — may cause data inconsistency |
| DDL operations | Configurable | Ignored (DDL is forward-only) |
| Objects selected | Your chosen objects | Cannot overlap with forward task objects |
| Instance Region | Configurable | Cannot be modified |
Configure two-way data synchronization
In this procedure, you configure the DTS task before purchasing a DTS instance, so you do not need to specify the number of shards in the source instance upfront. If you purchase the DTS instance first, you must specify the number of shards at purchase time.
Step 1: Create and configure the forward task
-
Go to the Data Synchronization page of the DTS console.
Alternatively, log on to the DMS console. In the top navigation bar, move the pointer over Data Management (DMS) consoleData Development and choose DTS (DTS) > Data Synchronization.
-
In the upper-left corner, select the region where the synchronization instance will reside.
-
Click Create Task. In the Create Task wizard, configure the source and destination databases.
WarningAfter you configure the source and destination databases, read the Limits displayed on the page before proceeding. Skipping this step may cause task failure or data inconsistency.
Source database
Parameter Description Select an existing DMS database instance Optional. If you select an existing DMS instance, DTS populates the parameters automatically. Database Type Select MongoDB. Connection Type Select Alibaba Cloud Instance. Instance Region Select the region of the source ApsaraDB for MongoDB instance. Replicate Data Across Alibaba Cloud Accounts Select No for same-account synchronization. Architecture Select Sharded Cluster. Instance ID Select the ID of the source instance. Authentication Database The database that stores the account and password. Default: admin.Database Account An account with read permissions on the source, config,admin, andlocaldatabases.Database Password Password for the database account. Shard Account Account for accessing shards in the source instance. Shard Password Password for the shard account. Destination database
Parameter Description Select an existing DMS database instance Optional. If you select an existing DMS instance, DTS populates the parameters automatically. Database Type Select MongoDB. Connection Type Select Alibaba Cloud Instance. Instance Region Select the region of the destination ApsaraDB for MongoDB instance. Architecture Select Sharded Cluster. Instance ID Select the ID of the destination instance. Authentication Database The database that stores the account and password. Default: admin.Database Account An account with the dbAdminAnyDatabasepermission, read and write permissions on the destination database, and read permissions on thelocaldatabase.Database Password Password for the database account. -
Click Test Connectivity and Proceed. DTS automatically adds its server CIDR blocks to the whitelist of Alibaba Cloud database instances or ECS security groups. For self-managed databases or third-party cloud databases, manually add the DTS CIDR blocks. See Add the CIDR blocks of DTS servers.
WarningAdding DTS CIDR blocks to whitelists or security groups introduces security exposure. Take preventive measures: use strong credentials, restrict exposed ports, authenticate API calls, regularly audit whitelist rules, and consider connecting through Express Connect, VPN Gateway, or Smart Access Gateway.
-
Configure objects and synchronization settings.
Parameter Description Synchronization Types Select Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. Full data synchronization seeds the destination with historical data before incremental synchronization begins. Processing Mode of Conflicting Tables Precheck and Report Errors (default): the precheck fails if the source and destination have collections with identical names. Ignore Errors and Proceed: skips the precheck for identical names. > WarningSelecting Ignore Errors and Proceed may cause data inconsistency. Existing destination records with the same primary or unique key are retained and not overwritten.
Synchronization Topology Select Two-way Synchronization. Exclude DDL Operations Yes: excludes DDL operations. No: synchronizes DDL operations. DDL operations are synchronized in the forward direction only. Conflict Resolution Policy Select how to handle conflicts. TaskFailed: the task stops on conflict — resolve manually. Ignore: keeps the destination record and skips the conflicting statement. Overwrite: overwrites the destination record. > NoteOnly Ignore is supported for two-way MongoDB sharded cluster synchronization.
Source Objects Select databases or collections to synchronize, then click the arrow icon to add them to Selected Objects. Selected Objects To rename a single object, right-click it. See Map the name of a single object. To rename multiple objects at once, click Batch Edit. See Map multiple object names at a time. To filter data by condition (full synchronization only), right-click a table and specify conditions. See Specify filter conditions. > NoteRenaming a database or collection may cause dependent objects to fail synchronization.
-
Click Next: Advanced Settings and configure the following parameters. Data verification See Configure a data verification task.
Advanced settings
Parameter Description Dedicated Cluster for Task Scheduling By default, DTS uses the shared cluster. For higher stability, purchase a dedicated cluster. See What is a DTS dedicated cluster. Set Alerts No: disables alerting. Yes: sends notifications when the task fails or synchronization latency exceeds the threshold. See Configure monitoring and alerting. Retry Time for Failed Connections How long DTS retries failed connections after the task starts. Valid values: 10–1440 minutes. Default: 720. Set to more than 30 minutes. If the reconnection succeeds within this window, the task resumes; otherwise, it fails. > NoteIf multiple tasks share the same source or destination database, the shortest retry window takes effect. DTS charges for the instance during retries.
Retry Time for Other Issues How long DTS retries failed DDL or DML operations. Valid values: 1–1440 minutes. Default: 10. Set to more than 10 minutes. Must be smaller than Retry Time for Failed Connections. Enable Throttling for Full Data Migration Limit queries per second (QPS), records per second (RPS), and migration speed during full data synchronization to reduce load on source and destination servers. Displayed only when Full Data Synchronization is selected. Enable Throttling for Incremental Data Synchronization Limit RPS and synchronization speed during incremental synchronization to reduce load on the destination server. Environment Tag Tag the DTS instance for environment identification. Optional. Configure ETL Yes: enables the extract, transform, and load (ETL) feature. Enter processing statements in the code editor. See Configure ETL. No: disables ETL. -
Click Next: Save Task Settings and Precheck.
- To preview the API parameters for this configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters before proceeding. - DTS runs a precheck before the task starts. The task can only start after passing the precheck. - If the precheck fails, click View Details next to each failed item, troubleshoot the issue, and then rerun the precheck. - If an alert appears during the precheck: for critical alerts, fix the issue and rerun the precheck. For ignorable alerts, click Confirm Alert Details, click Ignore in the dialog box, click OK, and then click Precheck Again.
-
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
-
On the purchase page, configure the billing method and instance class.
Parameter Description Billing Method Subscription: pay upfront for a fixed term — more cost-effective for long-term use. Pay-as-you-go: billed hourly — suitable for short-term or temporary use. Resource Group Settings The resource group for the synchronization instance. Default: default resource group. See What is Resource Management? Instance Class DTS offers multiple instance classes with different synchronization speeds. See Instance classes of data synchronization instances. Subscription Duration Available only for the subscription billing method. Options: 1–9 months, or 1, 2, 3, or 5 years. -
Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
-
Click Buy and Start. The forward synchronization task starts. Monitor its progress in the task list.
Step 2: Configure the reverse task
-
Wait until the forward synchronization task enters the Running state. Find the reverse synchronization task in the task list and click Configure Task.
-
Configure the reverse task by repeating steps 3 through 7 with the following differences:
Important- Swap the source and destination instances: the destination instance of the forward task becomes the source of the reverse task, and vice versa. - The Instance Region parameter cannot be modified. - Do not use the object name mapping feature. Using it for the reverse task may cause data inconsistency. - DTS ignores collections already synchronized to the destination in the forward task during the reverse precheck. - Do not select objects that are already selected in the forward task. - The reverse task ignores DDL operations.
-
Wait until the precheck success rate reaches 100%, then click Back.
-
Wait until both the forward and reverse tasks enter the Running state. Two-way data synchronization is now active.
What to do next
-
After full data synchronization completes and incremental synchronization begins, re-enable the MongoDB balancer on the source instance.