Synchronize data from a MongoDB replica set instance to a MongoDB replica set or sharded cluster instance - ApsaraDB for MongoDB

Use Data Transmission Service (DTS) to continuously sync data from a MongoDB replica set to another replica set or sharded cluster—without taking your source database offline.

Supported source and destination databases

Source	Destination	Notes
ApsaraDB for MongoDB replica set instance	ApsaraDB for MongoDB replica set or sharded cluster instance	This topic uses this combination as the example.
Self-managed MongoDB replica set on an Elastic Compute Service (ECS) instance	Self-managed MongoDB replica set or sharded cluster on an ECS instance	Follow the same procedure.
Self-managed MongoDB replica set connected over Express Connect, VPN Gateway, or Smart Access Gateway (SAG)	Self-managed MongoDB replica set or sharded cluster connected over Express Connect, VPN Gateway, or SAG	Follow the same procedure.

For supported MongoDB version combinations, see Overview of data synchronization scenarios. The destination version must be the same as or later than the source version.

Prerequisites

Before you begin, ensure that you have:

Created both the source ApsaraDB for MongoDB replica set instance and the destination replica set or sharded cluster instance. See Create a replica set instance and Create a sharded cluster instance.
Verified that the destination instance has at least 10% more available storage than the total data size of the source instance.
(For sharded cluster destinations) Created the databases and collections to be sharded, configured sharding, enabled the balancer, and performed pre-sharding. See Configure sharding to maximize the performance of shards.

Pre-sharding distributes synchronized data across shards and prevents data skew. If adding shard keys to the source data is not possible, see Synchronize MongoDB (without shard keys) to MongoDB (sharded cluster architecture).

Billing

Synchronization type	Fee
Schema synchronization and full data synchronization	Free
Incremental data synchronization	Charged. See Billing overview.

Supported synchronization topologies

One-way one-to-one synchronization
One-way one-to-many synchronization
One-way many-to-one synchronization
One-way cascade synchronization

For details, see Synchronization topologies.

Synchronization types

Type	What DTS synchronizes
Schema synchronization	Schemas of the selected objects from source to destination
Full data synchronization	Historical data of the selected objects. Supported objects: databases and collections.
Incremental data synchronization	Ongoing changes after the full sync completes. Supported operations: CREATE COLLECTION, CREATE INDEX, DROP DATABASE, DROP COLLECTION, DROP INDEX, RENAME COLLECTION, and document-level insert, update, and delete. Databases created after the task starts are not included. Transactions are converted into a single record—transaction context is not preserved. Use oplog A DTS task does not synchronize incremental data from databases that are created after the task starts to run. DTS synchronizes incremental data generated by the following operations: CREATE COLLECTION and INDEX DROP DATABASE, COLLECTION, and INDEX RENAME COLLECTION The operations that are performed to insert, update, and delete documents in a collection. Note When you synchronize incremental data of documents, only update operations that use the `$set` command are supported. Use change streams DTS synchronizes incremental data generated by the following operations: DROP DATABASE and COLLECTION RENAME COLLECTION The operations that are performed to insert, update, and delete documents in a collection. Note When you synchronize incremental data of documents, only update operations that use the `$set` command are supported.

Limitations

Review these constraints before configuring the task. Violating them can cause task failure or data inconsistency.

Source database requirements

Constraint	Detail
Outbound bandwidth	The source server must have enough outbound bandwidth. Insufficient bandwidth slows synchronization.
Primary key or unique key	Collections must have PRIMARY KEY or UNIQUE constraints with no duplicate field values. Otherwise the destination may contain duplicate records.
Collection count (when editing destination collections)	Up to 1,000 collections per task when you select collections as the sync objects and plan to rename them on the destination. For more collections, run multiple tasks or sync at the database level.
Single document size	Cannot exceed 16 MB. Larger documents cause task failure.
Unsupported sources	Azure Cosmos DB for MongoDB clusters and Amazon DocumentDB elastic clusters are not supported.
oplog or change streams	The oplog must be enabled and retain at least 7 days of logs. Alternatively, enable change streams covering the last 7 days. If neither condition is met, DTS may fail to capture changes, which can cause data loss—not covered by the DTS SLA. Use the oplog where possible; change streams require MongoDB 4.0 or later and do not support two-way synchronization. For non-elastic Amazon DocumentDB clusters, enable change streams and set Migration Method to ChangeStream and Architecture to Sharded Cluster.
TTL indexes	Collections with TTL indexes cannot be synchronized. Attempting to do so may cause data inconsistency.
Schema changes during sync	Do not change database or collection schemas (including array type updates) while schema synchronization or full data synchronization is running.
Writes during full-only sync	Do not write to the source database during full data synchronization if you are not running incremental synchronization.

Destination database requirements

Constraint	Detail
Sharded cluster: orphaned documents	Clear all orphaned documents before starting. Orphaned documents degrade performance and can cause `_id` conflicts or task failure.
Sharded cluster: shard keys	Add shard keys to the source data before starting. During sync, INSERT operations must include shard keys; UPDATE operations cannot modify shard keys.
Replica set: connection endpoint	If connecting over Express Connect, VPN Gateway, SAG, a public IP address, or Cloud Enterprise Network (CEN), set Domain Name or IP and Port Number to the primary node's IP and port, or use a high-availability endpoint. See Create a DTS task with a high-availability MongoDB database. If connecting over a self-managed ECS instance, set Port Number to the primary node's port.
Unique index or capped collections	Collections with a unique index or `capped: true` support only single-thread writes. This disables concurrent replay during incremental sync and may increase latency.
Excluded databases	DTS does not synchronize the `admin` or `local` database.
Primary key conflicts	Make sure the destination has no documents with the same `_id` as the source. If conflicts exist, delete the conflicting documents from the destination before starting the task.

General constraints

Constraint	Detail
Performance impact	Before you synchronize data, evaluate the impact on the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. During full data synchronization, DTS uses read and write resources of the source and destination databases, which may increase the loads on the database servers.
Storage expansion	Concurrent writes during full sync cause collection fragmentation. Destination storage ends up 5%–10% larger than the source.
Writes from other sources	Do not write to the destination from other sources (for example, using Data Management (DMS) for online DDL operations) during synchronization. Concurrent external writes can cause data loss.
Count query syntax	Use `db.$table_name.aggregate([{ $count:"myCount"}])` to count documents on the destination.
Task recovery	If a DTS task fails, DTS support attempts restoration within 8 hours. The task may be restarted and task parameters may be modified during recovery. Database parameters are not modified.
Retry time for failed connections	Valid range: 10–1,440 minutes. Default: 720 minutes. Set to a value greater than 30 minutes. If the connection is restored within this period, DTS resumes the task. If multiple tasks share the same source or destination, the shortest retry time applies. DTS instance charges continue during retries.
Retry time for other issues	Valid range: 1–1,440 minutes. Default: 10 minutes. Set to a value greater than 10 minutes. Must be less than Retry Time for Failed Connections.

Self-managed MongoDB constraints

If the source database is a self-managed MongoDB instance, note the following:

Constraint	Detail
Primary/secondary switchover	If a primary/secondary switchover occurs on the source while the task is running, the task fails.
Synchronization latency accuracy	DTS calculates latency based on the timestamp of the latest synced data in the destination and the current timestamp in the source. If no updates occur on the source for an extended period, the reported latency may be inaccurate. To reset the latency reading, perform an update on the source. If you select an entire database as the sync object, you can create a heartbeat table that is updated every second.

Constraint

Detail

Primary/secondary switchover

If a primary/secondary switchover occurs on the source while the task is running, the task fails.

Synchronization latency accuracy

DTS calculates latency based on the timestamp of the latest synced data in the destination and the current timestamp in the source. If no updates occur on the source for an extended period, the reported latency may be inaccurate. To reset the latency reading, perform an update on the source. If you select an entire database as the sync object, you can create a heartbeat table that is updated every second.

Configure the data synchronization task

Step 1: Go to the Data Synchronization Tasks page

Log on to the Data Management (DMS) console.
In the top navigation bar, click Data + AI.
In the left-side navigation pane, choose DTS (DTS) > Data Synchronization.

Steps may vary based on the DMS console mode. See Simple mode and Customize the layout and style of the DMS console. You can also go directly to the Data Synchronization Tasks page in the new DTS console.

Step 2: Select the region

On the right side of Data Synchronization Tasks, select the region where the synchronization instance resides.

In the new DTS console, select the region from the top navigation bar.

Step 3: Configure source and destination databases

Click Create Task. On the Create Task page, configure the parameters described in the following tables.

Warning

After configuring the source and destination databases, read the Limits shown on the page before proceeding. Skipping this step may cause task failure or data inconsistency.

Source database

Parameter	Description
Task Name	A name for the DTS task. DTS generates a default name. Specify a descriptive name to help identify the task. The name does not need to be unique.
Select a DMS database instance	Select an existing database instance to auto-populate its parameters, or leave blank and configure the following parameters manually.
Database Type	Select MongoDB.
Access Method	Select Alibaba Cloud Instance.
Instance Region	The region where the source instance resides.
Replicate Data Across Alibaba Cloud Accounts	Select No for same-account synchronization.
Architecture	Select Replica Set.
Instance ID	The ID of the source ApsaraDB for MongoDB instance.
Authentication Database	The database that stores the account credentials. Default: `admin`.
Database Account	An account with read access to the source database, the `config` database, the `admin` database, and the `local` database.
Database Password	The password for the database account.
Encryption	Select Non-encrypted or SSL-encrypted. This parameter applies only to Replica Set instances. If you select SSL-encrypted for a self-managed replica set, you can upload a CA certificate to verify the connection.

Destination database

Parameter	Description
Select a DMS database instance	Select an existing database instance to auto-populate its parameters, or leave blank and configure the following parameters manually.
Database Type	Select MongoDB.
Access Method	Select Alibaba Cloud Instance.
Instance Region	The region where the destination instance resides.
Architecture	The architecture of the destination instance (Replica Set or Sharded Cluster).
Instance ID	The ID of the destination ApsaraDB for MongoDB instance.
Authentication Database	The database that stores the account credentials. Default: `admin`.
Database Account	An account with the `dbAdminAnyDatabase` permission, read and write access to the destination database, and read access to the `local` database.
Database Password	The password for the database account.
Encryption	Select Non-encrypted or SSL-encrypted. This parameter applies only to Replica Set instances. If you select SSL-encrypted for a self-managed replica set, you can upload a CA certificate to verify the connection.

Step 4: Test connectivity

Click Test Connectivity and Proceed.

DTS automatically adds its server CIDR blocks to the whitelist of Alibaba Cloud database instances or to the security group rules of ECS-hosted databases. For self-managed databases in on-premises data centers or hosted by third-party providers, manually add the DTS server CIDR blocks to the database whitelist. See Add the CIDR blocks of DTS servers.

Warning

Adding DTS CIDR blocks to your whitelist or security group rules introduces security risks. Before proceeding, take preventive measures: use strong credentials, limit exposed ports, authenticate API calls, review whitelist rules regularly, and remove unauthorized CIDR blocks. For higher security, connect through Express Connect, VPN Gateway, or SAG instead of using public IP access.

Step 5: Select objects and configure settings

Configure the following parameters:

Parameter	Description
Synchronization Types	Select Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. Incremental data synchronization is selected by default. Schema and full sync run first; incremental sync starts after full sync completes. For more information, see Synchronization types.
Processing Mode of Conflicting Tables	Precheck and Report Errors: checks for collections with identical names in source and destination before starting. If a conflict exists, the task does not start. To resolve naming conflicts, use the object name mapping feature. See Rename an object to be synchronized. Ignore Errors and Proceed: skips the conflict check. If a record in the destination has the same primary key or unique key value as a source record, the destination record is kept and the source record is skipped. This may cause data inconsistency.
Synchronization Topology	Select One-way Synchronization.
Capitalization of Object Names in Destination Instance	Controls the capitalization of database and collection names in the destination. Default: DTS default policy. See Specify the capitalization of object names.
Source Objects	Select databases or collections to synchronize, then click to move them to Selected Objects.
Selected Objects	To rename a single object, right-click it. To rename multiple objects at once, click Batch Edit. See Map object names. To filter data with WHERE conditions, right-click an object. See Set filter conditions.

Step 6: Configure advanced settings

Click Next: Advanced Settings and configure the following:

Data verification

For data verification setup, see Configure data verification.

Advanced settings

Parameter	Description
Dedicated Cluster for Task Scheduling	By default, DTS uses the shared cluster. For higher stability, purchase a dedicated cluster. See What is a DTS dedicated cluster.
Set Alerts	No: no alerts. Yes: configure alerting. Specify the latency threshold and notification contacts. See Configure monitoring and alerting.
Retry Time for Failed Connections	How long DTS retries after a connection failure. Range: 10–1,440 minutes. Default: 720 minutes. Set to a value greater than 30 minutes. If the connection is restored within this period, DTS resumes the task. If multiple tasks share the same source or destination, the shortest retry time applies. DTS instance charges continue during retries.
Retry Time for Other Issues	How long DTS retries after DDL or DML failures. Range: 1–1,440 minutes. Default: 10 minutes. Set to a value greater than 10 minutes. Must be less than Retry Time for Failed Connections.
Enable Throttling for Full Data Migration	Limits the read/write load on source and destination during full data synchronization. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Visible only when Full Data Synchronization is selected.
Enable Throttling for Incremental Data Synchronization	Limits the load during incremental sync. Configure RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Environment Tag	A tag to identify the DTS instance. Optional.
Configure ETL	Yes: enable extract, transform, and load (ETL) and enter data processing statements. See Configure ETL. No: skip ETL. For an ETL overview, see What is ETL?.

Step 7: Save settings and run the precheck

Click Next: Save Task Settings and Precheck.

To preview the OpenAPI parameters for this task, hover over the button and click Preview OpenAPI parameters before clicking through.

DTS runs a precheck before the task starts. If the precheck fails:

Click View Details next to each failed item, resolve the issues, and click Precheck Again.
For alert items that can be safely ignored, click Confirm Alert Details > Ignore > OK, then click Precheck Again. Ignoring alerts may cause data inconsistency.

Step 8: Wait for the precheck to complete

Wait until Success Rate reaches 100%, then click Next: Purchase Instance.

Step 9: Purchase the synchronization instance

Configure the following parameters:

Parameter	Description
Billing Method	Subscription: pay upfront for a fixed term. More cost-effective for long-term use. Pay-as-you-go: billed hourly. More flexible for short-term use. Release the instance when no longer needed to stop charges.
Resource Group Settings	The resource group for the instance. Default: default resource group. See What is Resource Management?.
Instance Class	The synchronization speed varies by instance class. Choose based on your data volume and latency requirements. See Instance classes of data synchronization instances.
Subscription Duration	Available only for the Subscription billing method. Options: 1–9 months, 1 year, 2 years, 3 years, or 5 years.

Step 10: Start the task

Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start.
In the confirmation dialog, click OK.

The task appears in the task list. Monitor progress from there.

What's next

FAQ

Why do task latency and data inconsistency occur even when no data is written to the database?

Cause: A conflict between the automatic deletion mechanism of TTL indexes in MongoDB collections and the data synchronization mechanism of DTS can cause latency and data inconsistency in synchronization or migration tasks.

Missed DELETE operations during incremental writes reduce efficiency: When the TTL index on the source instance deletes expired data, it generates a DELETE record in the Oplog. DTS then synchronizes this DELETE operation. If the TTL index on the destination instance has already deleted the same data, the DELETE operation from DTS will not find the data to delete. The MongoDB engine then returns an unexpected number of affected rows. This triggers an exception handling process and reduces migration efficiency.

Data inconsistency caused by asynchronous deletion of expired data: A TTL index does not delete data in real time. Expired data might still exist on the source instance when it has already been deleted on the destination instance. This causes data inconsistency.

Example:

The MongoDB Oplog or ChangeStream records only the updated fields for an UPDATE operation. It does not record the full document before and after the update. Therefore, if an UPDATE operation cannot find the target data on the destination, DTS ignores the operation.

Timing	Source instance	Destination instance
1	Service inserts data
2		DTS synchronizes the INSERT operation
3	Data has expired but is not yet deleted by the TTL index
4	Service updates the data (for example, updates the TTL index field to change the expiration time)
5		TTL index deletes the data
6		DTS synchronizes the UPDATE, but the data is not found. The operation is ignored.

As a result, this document is missing from the destination MongoDB instance.

Solution: You need to temporarily modify the expiration time of the TTL index in the destination during synchronization or migration to ensure efficiency and consistency. For more information, see Best practices for synchronizing/migrating collections with TTL indexes when MongoDB is the source.