Use Data Transmission Service (DTS) to continuously sync data from a MongoDB replica set to another replica set or sharded cluster—without taking your source database offline.
Supported source and destination databases
| Source | Destination | Notes |
|---|---|---|
| ApsaraDB for MongoDB replica set instance | ApsaraDB for MongoDB replica set or sharded cluster instance | This topic uses this combination as the example. |
| Self-managed MongoDB replica set on an Elastic Compute Service (ECS) instance | Self-managed MongoDB replica set or sharded cluster on an ECS instance | Follow the same procedure. |
| Self-managed MongoDB replica set connected over Express Connect, VPN Gateway, or Smart Access Gateway (SAG) | Self-managed MongoDB replica set or sharded cluster connected over Express Connect, VPN Gateway, or SAG | Follow the same procedure. |
For supported MongoDB version combinations, see Overview of data synchronization scenarios. The destination version must be the same as or later than the source version.
Prerequisites
Before you begin, ensure that you have:
-
Created both the source ApsaraDB for MongoDB replica set instance and the destination replica set or sharded cluster instance. See Create a replica set instance and Create a sharded cluster instance.
-
Verified that the destination instance has at least 10% more available storage than the total data size of the source instance.
-
(For sharded cluster destinations) Created the databases and collections to be sharded, configured sharding, enabled the balancer, and performed pre-sharding. See Configure sharding to maximize the performance of shards.
Pre-sharding distributes synchronized data across shards and prevents data skew. If adding shard keys to the source data is not possible, see Synchronize MongoDB (without shard keys) to MongoDB (sharded cluster architecture).
Billing
| Synchronization type | Fee |
|---|---|
| Schema synchronization and full data synchronization | Free |
| Incremental data synchronization | Charged. See Billing overview. |
Supported synchronization topologies
-
One-way one-to-one synchronization
-
One-way one-to-many synchronization
-
One-way many-to-one synchronization
-
One-way cascade synchronization
For details, see Synchronization topologies.
Synchronization types
| Type | What DTS synchronizes |
|---|---|
| Schema synchronization | Schemas of the selected objects from source to destination |
| Full data synchronization | Historical data of the selected objects. Supported objects: databases and collections. |
| Incremental data synchronization | Ongoing changes after the full sync completes. Supported operations: CREATE COLLECTION, CREATE INDEX, DROP DATABASE, DROP COLLECTION, DROP INDEX, RENAME COLLECTION, and document-level insert, update, and delete. Databases created after the task starts are not included. Transactions are converted into a single record—transaction context is not preserved.Use oplogA DTS task does not synchronize incremental data from databases that are created after the task starts to run. DTS synchronizes incremental data generated by the following operations:
Use change streamsDTS synchronizes incremental data generated by the following operations:
|
Limitations
Review these constraints before configuring the task. Violating them can cause task failure or data inconsistency.
Source database requirements
| Constraint | Detail |
|---|---|
| Outbound bandwidth | The source server must have enough outbound bandwidth. Insufficient bandwidth slows synchronization. |
| Primary key or unique key | Collections must have PRIMARY KEY or UNIQUE constraints with no duplicate field values. Otherwise the destination may contain duplicate records. |
| Collection count (when editing destination collections) | Up to 1,000 collections per task when you select collections as the sync objects and plan to rename them on the destination. For more collections, run multiple tasks or sync at the database level. |
| Single document size | Cannot exceed 16 MB. Larger documents cause task failure. |
| Unsupported sources | Azure Cosmos DB for MongoDB clusters and Amazon DocumentDB elastic clusters are not supported. |
| oplog or change streams | The oplog must be enabled and retain at least 7 days of logs. Alternatively, enable change streams covering the last 7 days. If neither condition is met, DTS may fail to capture changes, which can cause data loss—not covered by the DTS SLA. Use the oplog where possible; change streams require MongoDB 4.0 or later and do not support two-way synchronization. For non-elastic Amazon DocumentDB clusters, enable change streams and set Migration Method to ChangeStream and Architecture to Sharded Cluster. |
| TTL indexes | Collections with TTL indexes cannot be synchronized. Attempting to do so may cause data inconsistency. |
| Schema changes during sync | Do not change database or collection schemas (including array type updates) while schema synchronization or full data synchronization is running. |
| Writes during full-only sync | Do not write to the source database during full data synchronization if you are not running incremental synchronization. |
Destination database requirements
| Constraint | Detail |
|---|---|
| Sharded cluster: orphaned documents | Clear all orphaned documents before starting. Orphaned documents degrade performance and can cause _id conflicts or task failure. |
| Sharded cluster: shard keys | Add shard keys to the source data before starting. During sync, INSERT operations must include shard keys; UPDATE operations cannot modify shard keys. |
| Replica set: connection endpoint | If connecting over Express Connect, VPN Gateway, SAG, a public IP address, or Cloud Enterprise Network (CEN), set Domain Name or IP and Port Number to the primary node's IP and port, or use a high-availability endpoint. See Create a DTS task with a high-availability MongoDB database. If connecting over a self-managed ECS instance, set Port Number to the primary node's port. |
| Unique index or capped collections | Collections with a unique index or capped: true support only single-thread writes. This disables concurrent replay during incremental sync and may increase latency. |
| Excluded databases | DTS does not synchronize the admin or local database. |
| Primary key conflicts | Make sure the destination has no documents with the same _id as the source. If conflicts exist, delete the conflicting documents from the destination before starting the task. |
General constraints
| Constraint | Detail |
|---|---|
| Performance impact | Before you synchronize data, evaluate the impact on the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. During full data synchronization, DTS uses read and write resources of the source and destination databases, which may increase the loads on the database servers. |
| Storage expansion | Concurrent writes during full sync cause collection fragmentation. Destination storage ends up 5%–10% larger than the source. |
| Writes from other sources | Do not write to the destination from other sources (for example, using Data Management (DMS) for online DDL operations) during synchronization. Concurrent external writes can cause data loss. |
| Count query syntax | Use db.$table_name.aggregate([{ $count:"myCount"}]) to count documents on the destination. |
| Task recovery | If a DTS task fails, DTS support attempts restoration within 8 hours. The task may be restarted and task parameters may be modified during recovery. Database parameters are not modified. |
| Retry time for failed connections | Valid range: 10–1,440 minutes. Default: 720 minutes. Set to a value greater than 30 minutes. If the connection is restored within this period, DTS resumes the task. If multiple tasks share the same source or destination, the shortest retry time applies. DTS instance charges continue during retries. |
| Retry time for other issues | Valid range: 1–1,440 minutes. Default: 10 minutes. Set to a value greater than 10 minutes. Must be less than Retry Time for Failed Connections. |
Self-managed MongoDB constraints
If the source database is a self-managed MongoDB instance, note the following:
| Constraint | Detail |
|---|---|
| Primary/secondary switchover | If a primary/secondary switchover occurs on the source while the task is running, the task fails. |
| Synchronization latency accuracy | DTS calculates latency based on the timestamp of the latest synced data in the destination and the current timestamp in the source. If no updates occur on the source for an extended period, the reported latency may be inaccurate. To reset the latency reading, perform an update on the source. If you select an entire database as the sync object, you can create a heartbeat table that is updated every second. |
Configure the data synchronization task
Step 1: Go to the Data Synchronization Tasks page
-
Log on to the Data Management (DMS) console.
-
In the top navigation bar, click Data + AI.
-
In the left-side navigation pane, choose DTS (DTS) > Data Synchronization.
Steps may vary based on the DMS console mode. See Simple mode and Customize the layout and style of the DMS console. You can also go directly to the Data Synchronization Tasks page in the new DTS console.
Step 2: Select the region
On the right side of Data Synchronization Tasks, select the region where the synchronization instance resides.
In the new DTS console, select the region from the top navigation bar.
Step 3: Configure source and destination databases
Click Create Task. On the Create Task page, configure the parameters described in the following tables.
After configuring the source and destination databases, read the Limits shown on the page before proceeding. Skipping this step may cause task failure or data inconsistency.
Source database
| Parameter | Description |
|---|---|
| Task Name | A name for the DTS task. DTS generates a default name. Specify a descriptive name to help identify the task. The name does not need to be unique. |
| Select a DMS database instance | Select an existing database instance to auto-populate its parameters, or leave blank and configure the following parameters manually. |
| Database Type | Select MongoDB. |
| Access Method | Select Alibaba Cloud Instance. |
| Instance Region | The region where the source instance resides. |
| Replicate Data Across Alibaba Cloud Accounts | Select No for same-account synchronization. |
| Architecture | Select Replica Set. |
| Instance ID | The ID of the source ApsaraDB for MongoDB instance. |
| Authentication Database | The database that stores the account credentials. Default: admin. |
| Database Account | An account with read access to the source database, the config database, the admin database, and the local database. |
| Database Password | The password for the database account. |
| Encryption | Select Non-encrypted or SSL-encrypted. This parameter applies only to Replica Set instances. If you select SSL-encrypted for a self-managed replica set, you can upload a CA certificate to verify the connection. |
Destination database
| Parameter | Description |
|---|---|
| Select a DMS database instance | Select an existing database instance to auto-populate its parameters, or leave blank and configure the following parameters manually. |
| Database Type | Select MongoDB. |
| Access Method | Select Alibaba Cloud Instance. |
| Instance Region | The region where the destination instance resides. |
| Architecture | The architecture of the destination instance (Replica Set or Sharded Cluster). |
| Instance ID | The ID of the destination ApsaraDB for MongoDB instance. |
| Authentication Database | The database that stores the account credentials. Default: admin. |
| Database Account | An account with the dbAdminAnyDatabase permission, read and write access to the destination database, and read access to the local database. |
| Database Password | The password for the database account. |
| Encryption | Select Non-encrypted or SSL-encrypted. This parameter applies only to Replica Set instances. If you select SSL-encrypted for a self-managed replica set, you can upload a CA certificate to verify the connection. |
Step 4: Test connectivity
Click Test Connectivity and Proceed.
DTS automatically adds its server CIDR blocks to the whitelist of Alibaba Cloud database instances or to the security group rules of ECS-hosted databases. For self-managed databases in on-premises data centers or hosted by third-party providers, manually add the DTS server CIDR blocks to the database whitelist. See Add the CIDR blocks of DTS servers.
Adding DTS CIDR blocks to your whitelist or security group rules introduces security risks. Before proceeding, take preventive measures: use strong credentials, limit exposed ports, authenticate API calls, review whitelist rules regularly, and remove unauthorized CIDR blocks. For higher security, connect through Express Connect, VPN Gateway, or SAG instead of using public IP access.
Step 5: Select objects and configure settings
Configure the following parameters:
| Parameter | Description |
|---|---|
| Synchronization Types | Select Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. Incremental data synchronization is selected by default. Schema and full sync run first; incremental sync starts after full sync completes. For more information, see Synchronization types. |
| Processing Mode of Conflicting Tables | Precheck and Report Errors: checks for collections with identical names in source and destination before starting. If a conflict exists, the task does not start. To resolve naming conflicts, use the object name mapping feature. See Rename an object to be synchronized. Ignore Errors and Proceed: skips the conflict check. If a record in the destination has the same primary key or unique key value as a source record, the destination record is kept and the source record is skipped. This may cause data inconsistency. |
| Synchronization Topology | Select One-way Synchronization. |
| Capitalization of Object Names in Destination Instance | Controls the capitalization of database and collection names in the destination. Default: DTS default policy. See Specify the capitalization of object names. |
| Source Objects | Select databases or collections to synchronize, then click |
| Selected Objects | To rename a single object, right-click it. To rename multiple objects at once, click Batch Edit. See Map object names. To filter data with WHERE conditions, right-click an object. See Set filter conditions. |
Step 6: Configure advanced settings
Click Next: Advanced Settings and configure the following:
Data verification
For data verification setup, see Configure data verification.
Advanced settings
| Parameter | Description |
|---|---|
| Dedicated Cluster for Task Scheduling | By default, DTS uses the shared cluster. For higher stability, purchase a dedicated cluster. See What is a DTS dedicated cluster. |
| Set Alerts | No: no alerts. Yes: configure alerting. Specify the latency threshold and notification contacts. See Configure monitoring and alerting. |
| Retry Time for Failed Connections | How long DTS retries after a connection failure. Range: 10–1,440 minutes. Default: 720 minutes. Set to a value greater than 30 minutes. If the connection is restored within this period, DTS resumes the task. If multiple tasks share the same source or destination, the shortest retry time applies. DTS instance charges continue during retries. |
| Retry Time for Other Issues | How long DTS retries after DDL or DML failures. Range: 1–1,440 minutes. Default: 10 minutes. Set to a value greater than 10 minutes. Must be less than Retry Time for Failed Connections. |
| Enable Throttling for Full Data Migration | Limits the read/write load on source and destination during full data synchronization. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Visible only when Full Data Synchronization is selected. |
| Enable Throttling for Incremental Data Synchronization | Limits the load during incremental sync. Configure RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s). |
| Environment Tag | A tag to identify the DTS instance. Optional. |
| Configure ETL | Yes: enable extract, transform, and load (ETL) and enter data processing statements. See Configure ETL. No: skip ETL. For an ETL overview, see What is ETL?. |
Step 7: Save settings and run the precheck
Click Next: Save Task Settings and Precheck.
To preview the OpenAPI parameters for this task, hover over the button and click Preview OpenAPI parameters before clicking through.
DTS runs a precheck before the task starts. If the precheck fails:
-
Click View Details next to each failed item, resolve the issues, and click Precheck Again.
-
For alert items that can be safely ignored, click Confirm Alert Details > Ignore > OK, then click Precheck Again. Ignoring alerts may cause data inconsistency.
Step 8: Wait for the precheck to complete
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
Step 9: Purchase the synchronization instance
Configure the following parameters:
| Parameter | Description |
|---|---|
| Billing Method | Subscription: pay upfront for a fixed term. More cost-effective for long-term use. Pay-as-you-go: billed hourly. More flexible for short-term use. Release the instance when no longer needed to stop charges. |
| Resource Group Settings | The resource group for the instance. Default: default resource group. See What is Resource Management?. |
| Instance Class | The synchronization speed varies by instance class. Choose based on your data volume and latency requirements. See Instance classes of data synchronization instances. |
| Subscription Duration | Available only for the Subscription billing method. Options: 1–9 months, 1 year, 2 years, 3 years, or 5 years. |
Step 10: Start the task
-
Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
-
Click Buy and Start.
-
In the confirmation dialog, click OK.
The task appears in the task list. Monitor progress from there.
What's next
FAQ
-
Why do task latency and data inconsistency occur even when no data is written to the database?
-
Cause: A conflict between the automatic deletion mechanism of TTL indexes in MongoDB collections and the data synchronization mechanism of DTS can cause latency and data inconsistency in synchronization or migration tasks.
Missed DELETE operations during incremental writes reduce efficiency: When the TTL index on the source instance deletes expired data, it generates a DELETE record in the Oplog. DTS then synchronizes this DELETE operation. If the TTL index on the destination instance has already deleted the same data, the DELETE operation from DTS will not find the data to delete. The MongoDB engine then returns an unexpected number of affected rows. This triggers an exception handling process and reduces migration efficiency.
Data inconsistency caused by asynchronous deletion of expired data: A TTL index does not delete data in real time. Expired data might still exist on the source instance when it has already been deleted on the destination instance. This causes data inconsistency.
Example:
The MongoDB Oplog or ChangeStream records only the updated fields for an UPDATE operation. It does not record the full document before and after the update. Therefore, if an UPDATE operation cannot find the target data on the destination, DTS ignores the operation.
Timing
Source instance
Destination instance
1
Service inserts data
2
DTS synchronizes the INSERT operation
3
Data has expired but is not yet deleted by the TTL index
4
Service updates the data (for example, updates the TTL index field to change the expiration time)
5
TTL index deletes the data
6
DTS synchronizes the UPDATE, but the data is not found. The operation is ignored.
As a result, this document is missing from the destination MongoDB instance.
-
Solution: You need to temporarily modify the expiration time of the TTL index in the destination during synchronization or migration to ensure efficiency and consistency. For more information, see Best practices for synchronizing/migrating collections with TTL indexes when MongoDB is the source.
-