Data Transmission Service (DTS) lets you synchronize data one-way from an ApsaraDB for MongoDB sharded cluster instance to an ApsaraDB for MongoDB replica set or sharded cluster instance. This topic walks you through the full configuration process.
DTS supports one-way data synchronization between only two ApsaraDB for MongoDB sharded cluster instances. DTS does not support one-way data synchronization among multiple ApsaraDB for MongoDB instances.
How it works
DTS supports three synchronization types for this scenario: schema synchronization, full data synchronization, and incremental data synchronization. You can combine these types based on your requirements.
Full data synchronization covers databases and collections.
For incremental data, DTS reads changes from the source database using either the oplog or change streams:
-
Oplog (recommended): Pulls log data from the source MongoDB oplog. Lower latency and broader operation coverage. By default, the oplog feature is enabled for both self-managed MongoDB databases and ApsaraDB for MongoDB instances.
-
Change streams: Subscribes to change events from the source. Required for Amazon DocumentDB sources. Available on MongoDB 4.0 and later. Two-way synchronization is not supported with change streams.
After DTS starts a task, it does not synchronize incremental data from databases created after the task starts.
Operations captured via oplog:
-
CREATE COLLECTION,CREATE INDEX -
DROP DATABASE,DROP COLLECTION,DROP INDEX -
RENAME COLLECTION -
Insert, update, and delete documents
For file incremental data, only the $set command runs synchronously.
Operations captured via change streams:
-
DROP DATABASE,DROP COLLECTION -
RENAME COLLECTION -
Insert, update, and delete documents
For file incremental data, only the $set command runs synchronously.
Billing
| Synchronization type | Fee |
|---|---|
| Schema synchronization and full data synchronization | Free |
| Incremental data synchronization | Charged. See Billing overview. |
Limitations
Source and destination database limits
-
The server hosting the source database must have sufficient outbound bandwidth. Otherwise, synchronization speed is affected.
-
Collections to be synchronized must have PRIMARY KEY or UNIQUE constraints, and all fields must be unique. Otherwise, the destination database may contain duplicate records.
-
The
_idfield in each synchronized collection must be unique. Otherwise, data inconsistency may occur. -
If you select collections as the objects to synchronize and need to rename them in the destination, a single task supports up to 1,000 collections. For more than 1,000 collections, configure multiple tasks in batches, or synchronize at the database level instead.
-
A single data entry to be synchronized cannot exceed 16 MB. Otherwise, the task fails.
-
The source database cannot be an Azure Cosmos DB for MongoDB cluster or an Amazon DocumentDB elastic cluster.
-
The oplog must be enabled on the source database and must retain data for at least 7 days. Alternatively, change streams must be enabled to ensure DTS can subscribe to changes within the last 7 days. Otherwise, DTS may fail to read source changes, causing synchronization failure, data inconsistency, or data loss. Issues in such circumstances are not covered by the DTS service level agreement (SLA).
-
Use the oplog to record changes in the source database.
-
Only MongoDB 4.0 and later support using change streams to read source changes. Two-way synchronization is not supported with change streams.
-
If the source is a non-elastic Amazon DocumentDB cluster, enable change streams and set Migration Method to ChangeStream and Architecture to Sharded Cluster.
-
-
MongoDB sharded cluster databases involved in a running task cannot be scaled. Otherwise, the task fails.
-
If the source is a self-managed MongoDB database in the sharded cluster architecture, set Access Method to Express Connect, VPN Gateway, or Smart Access Gateway or Cloud Enterprise Network (CEN) only.
-
The source MongoDB sharded cluster cannot have more than 10 mongos nodes.
-
Collections with time to live (TTL) indexes cannot be synchronized. If the source has TTL indexes, data inconsistency may occur between source and destination.
-
Make sure no orphaned documents exist in source or destination databases. Otherwise, data inconsistency or task failure may occur. See Glossary of MongoDB and How do I delete orphaned documents of a MongoDB database deployed in the sharded cluster architecture?.
-
During schema synchronization and full data synchronization, do not change the schemas of databases or collections (including array type updates). Otherwise, the task fails or data inconsistency may occur.
-
If you perform only full data synchronization, do not write data to the source database during synchronization. Otherwise, data inconsistency occurs.
-
If the balancer of the source database is enabled to balance data, the DTS task may experience delays.
Other limits
-
If the destination is a replica set instance:
-
If connected over Express Connect, VPN Gateway, or Smart Access Gateway, Public IP Address, or Cloud Enterprise Network (CEN): Set Domain Name or IP and Port Number to the primary node's IP address and port, or configure a high-availability endpoint. See Create a DTS task in which the source or destination database is a high-availability MongoDB database.
-
If connected over Self-managed Database on ECS: Set Port Number to the primary node's port.
-
-
Add shard keys to the data to be synchronized in the source database before starting the task. During synchronization, INSERT operations must include shard keys, and UPDATE operations cannot modify shard keys.
-
The destination MongoDB version must be the same as or later than the source version. Earlier versions may cause compatibility issues.
-
DTS cannot synchronize data from the
adminorlocaldatabase. -
Transaction information is not retained. Transactions synchronized to the destination are converted to single records.
-
Evaluate the impact on source and destination performance before synchronizing data. Run synchronization during off-peak hours. During full data synchronization, DTS uses read and write resources on both instances, which may increase server load.
-
During full data synchronization, concurrent INSERT operations cause collection fragmentation in the destination database. After full synchronization completes, the destination collections occupy more storage space than the source.
-
If a destination collection has a unique index or its
cappedattribute istrue, that collection supports only single-thread writes and does not support concurrent replay during incremental synchronization. This may increase synchronization latency. -
Data is written concurrently to the destination, so the destination storage space is 5%–10% larger than the source data size.
-
Use the
db.$table_name.aggregate([{ $count:"myCount"}])syntax to query a count on the destination MongoDB database. -
Make sure the destination MongoDB database does not have the same primary key as the source. The default primary key is
_id. If the destination has matching primary keys, delete those records from the destination without interrupting DTS. For example, if the same_idexists, delete the destination records with that_id. -
Disable the MongoDB balancer of the source database during full data synchronization. Do not re-enable the balancer until full synchronization is complete and incremental synchronization starts. Otherwise, data inconsistency may occur. See Manage the ApsaraDB for MongoDB balancer.
-
If data sharding is configured for the destination and you do not need schema synchronization, do not select Schema Synchronization in the Synchronization Types parameter. Otherwise, shard conflicts may cause data inconsistency or task failure.
-
If a DTS task fails, DTS technical support will attempt to restore it within 8 hours. During restoration, the task may be restarted and its parameters may be modified (database parameters are not changed). Modified parameters may include those in the Modify instance parameters section.
Prerequisites
Before you begin, ensure that you have:
-
A destination ApsaraDB for MongoDB replica set or sharded cluster instance. See Create a replica set instance or Create a sharded cluster instance.
ImportantUse a destination instance whose available storage space is at least 10% larger than the total data size in the source instance.
For information about supported instance versions, see Overview of data synchronization scenarios.
-
Endpoints assigned to all shard nodes in the source sharded cluster instance, with all shard nodes sharing the same account and password. See Apply for an endpoint for a shard or ConfigServer component.
-
(For sharded cluster destinations) Databases and collections to be sharded created in the destination, with data sharding configured, the balancer enabled, and pre-sharding performed. See Configure sharding to maximize the performance of shards and the What do I do if the data of a MongoDB database deployed in the sharded cluster architecture is not evenly distributed? FAQ.
Configuring sharding before the task ensures synchronized data distributes across shards, maximizing sharded cluster performance. The balancer and pre-sharding also prevent data skew.
-
Orphaned documents deleted from the source MongoDB database (see Delete orphaned documents below).
Delete orphaned documents
Delete orphaned documents from the source MongoDB database before synchronization. If orphaned documents remain, synchronization performance degrades and documents may have duplicate _id values, causing unintended data to be synchronized.
ApsaraDB for MongoDB instances
Running a cleanup script on an ApsaraDB for MongoDB instance with a major version earlier than 4.2, or a minor version earlier than 4.0.6, returns an error. To check your version, see Release notes for the minor versions of ApsaraDB for MongoDB. To upgrade, see Upgrade the major version of an instance and Update the minor version of an instance.
The cleanupOrphaned command removes orphaned documents. The script varies by MongoDB version.
MongoDB 4.4 and later
-
Create a JavaScript file named
cleanupOrphaned.json a server that can connect to the sharded cluster instance.This script deletes orphaned documents from all collections across multiple databases in multiple shards. To target a specific collection, modify the script parameters.
Parameter Description shardNamesThe IDs of the shards to clean. Find these IDs in the Shard List section on the Basic Information page of the sharded cluster instance. Example: d-bp15a3796d3a****.databasesToProcessThe names of the databases from which to delete orphaned documents. // The names of shards. var shardNames = ["shardName1", "shardName2"]; // The databases from which you want to delete orphaned documents. var databasesToProcess = ["database1", "database2", "database3"]; shardNames.forEach(function(shardName) { // Traverse the specified databases. databasesToProcess.forEach(function(dbName) { var dbInstance = db.getSiblingDB(dbName); // Obtain the names of all collections of the specified databases. var collectionNames = dbInstance.getCollectionNames(); // Traverse all collections. collectionNames.forEach(function(collectionName) { // The complete collection name. var fullCollectionName = dbName + "." + collectionName; // Build the cleanupOrphaned command. var command = { runCommandOnShard: shardName, command: { cleanupOrphaned: fullCollectionName } }; // Run the cleanupOrphaned command. var result = db.adminCommand(command); if (result.ok) { print("Cleaned up orphaned documents for collection " + fullCollectionName + " on shard " + shardName); printjson(result); } else { print("Failed to clean up orphaned documents for collection " + fullCollectionName + " on shard " + shardName); } }); }); });Replace the following parameters:
-
From the directory containing
cleanupOrphaned.js, run:Parameter Description <Mongoshost>The endpoint of the mongos node. Format: s-bp14423a2a51****.mongodb.rds.aliyuncs.com.<Primaryport>The port of the mongos node. Default: 3717.<database>The authentication database for the account. <username>The database account. <password>The account password. output.txtThe file where execution results are saved. mongo --host <Mongoshost> --port <Primaryport> --authenticationDatabase <database> -u <username> -p <password> cleanupOrphaned.js > output.txt
MongoDB 4.2 and earlier
-
Create a JavaScript file named
cleanupOrphaned.json a server that can connect to the sharded cluster instance.This script deletes orphaned documents from a specific collection in a database across multiple shards. To clean multiple collections, modify the
fullCollectionNameparameter and run the script multiple times, or extend the script to iterate over all collections.Parameter Description shardNamesThe IDs of the shards to clean. Find these IDs in the Shard List section on the Basic Information page of the sharded cluster instance. Example: d-bp15a3796d3a****.fullCollectionNameThe collection to clean. Format: database name.collection name.function cleanupOrphanedOnShard(shardName, fullCollectionName) { var nextKey = { }; var result; while ( nextKey != null ) { var command = { runCommandOnShard: shardName, command: { cleanupOrphaned: fullCollectionName, startingFromKey: nextKey } }; result = db.adminCommand(command); printjson(result); if (result.ok != 1 || !(result.results.hasOwnProperty(shardName)) || result.results[shardName].ok != 1 ) { print("Unable to complete at this time: failure or timeout.") break } nextKey = result.results[shardName].stoppedAtKey; } print("cleanupOrphaned done for coll: " + fullCollectionName + " on shard: " + shardName) } var shardNames = ["shardName1", "shardName2", "shardName3"] var fullCollectionName = "database.collection" shardNames.forEach(function(shardName) { cleanupOrphanedOnShard(shardName, fullCollectionName); });Replace the following parameters:
-
From the directory containing
cleanupOrphaned.js, run:Parameter Description <Mongoshost>The endpoint of the mongos node. Format: s-bp14423a2a51****.mongodb.rds.aliyuncs.com.<Primaryport>The port of the mongos node. Default: 3717.<database>The authentication database for the account. <username>The database account. <password>The account password. output.txtThe file where execution results are saved. mongo --host <Mongoshost> --port <Primaryport> --authenticationDatabase <database> -u <username> -p <password> cleanupOrphaned.js > output.txt
Self-managed MongoDB databases
-
Download the cleanupOrphaned.js script on a server that can connect to the self-managed MongoDB database.
wget "https://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/120562/cn_zh/1564451237979/cleanupOrphaned.js" -
Replace
testin the script with the name of the database from which to delete orphaned documents.ImportantTo clean multiple databases, repeat steps 2 and 3 for each database.

-
On each shard, run the following command to delete orphaned documents from all collections in the specified database.
Repeat this step for each shard.
Parameter Description <Shardhost>The IP address of the shard. <Primaryport>The service port of the primary node in the shard. <database>The authentication database for the account. <username>The account for the self-managed MongoDB database. <password>The password for the account. mongo --host <Shardhost> --port <Primaryport> --authenticationDatabase <database> -u <username> -p <password> cleanupOrphaned.jsExample: A self-managed MongoDB database with three shards:
mongo --host 172.16.1.10 --port 27018 --authenticationDatabase admin -u dtstest -p 'Test123456' cleanupOrphaned.js mongo --host 172.16.1.11 --port 27021 --authenticationDatabase admin -u dtstest -p 'Test123456' cleanupOrphaned.js mongo --host 172.16.1.12 --port 27024 --authenticationDatabase admin -u dtstest -p 'Test123456' cleanupOrphaned.js
Configure the synchronization task
In this example, the DTS task is configured before purchasing a DTS instance. You do not need to specify the number of shards in the source sharded cluster instance. If you purchase a DTS instance before configuring the task, specify the number of shards when purchasing.
-
Go to the Data Synchronization page and select the region where the instance resides. DTS console DMS console
-
Log on to the DMS console.
-
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
-
From the drop-down list to the right of Data Synchronization Tasks, select the region where the instance resides.
The actual operations may vary based on the mode and layout of the DMS console. See Simple mode and Customize the layout and style of the DMS console.
-
-
Click Create Task.
-
(Optional) Click New Configuration Page in the upper-right corner.
Skip this step if the Back to Previous Version button is already displayed. Use the new configuration page when available.
-
Configure the source and destination databases.
Section Parameter Description N/A Task Name A name for the DTS task. DTS generates a default name. Specify a descriptive name to make the task easier to identify. The name does not need to be unique. Source Database Select Existing Connection Select an existing registered database to auto-populate parameters, or leave blank to configure manually. To register a database: in the DTS console, use the Database Connections page (see Manage database connections); in the DMS console, select from the Select a DMS database instance. list or click Add DMS Database Instance (see Register an Alibaba Cloud database instance and Register a database hosted on a third-party cloud service or a self-managed database). Database Type Select MongoDB. Access Method Select Alibaba Cloud Instance. Instance Region The region of the source ApsaraDB for MongoDB instance. Replicate Data Across Alibaba Cloud Accounts Select No for this example (same account). Architecture Select Sharded Cluster. Migration Method The method used to synchronize incremental data. Oplog (recommended): by default, the oplog feature is enabled for both self-managed MongoDB databases and ApsaraDB for MongoDB instances. Enables low-latency synchronization due to fast log pulling. ChangeStream: available when change streams are enabled. See Change Streams. Required for non-elastic Amazon DocumentDB clusters. If Sharded Cluster is selected for Architecture, the Shard account and Shard password parameters are not required. Instance ID The ID of the source ApsaraDB for MongoDB instance. Authentication Database The authentication database name. Default: admin.Database Account The source database account. Must have read permissions on the source database, the configdatabase, theadmindatabase, and thelocaldatabase.Database Password The account password. Shard account The account for accessing shard nodes. Required if the source is a self-managed MongoDB database. Shard password The password for accessing shard nodes. Destination Database Encryption Whether to encrypt the source database connection. Select Non-encrypted, SSL-encrypted, or Mongo Atlas SSL. Available options depend on the Access Method and Architecture values. If Architecture is Sharded Cluster and Migration Method is Oplog, SSL-encrypted is unavailable. If the source is a self-managed MongoDB database using Replica Set architecture and the Access Method is not Alibaba Cloud Instance, you can upload a CA certificate when SSL-encrypted is selected. Select Existing Connection Select an existing registered database to auto-populate parameters, or leave blank to configure manually. Database Type Select MongoDB. Access Method Select Alibaba Cloud Instance. Instance Region The region of the destination ApsaraDB for MongoDB instance. Replicate Data Across Alibaba Cloud Accounts Select No for this example (same account). Architecture The architecture of the destination instance. Instance ID The ID of the destination ApsaraDB for MongoDB instance. Authentication Database The authentication database name. Default: admin.Database Account The destination database account. Must have the dbAdminAnyDatabasepermission, read and write permissions on the destination database, and read permissions on thelocaldatabase.Database Password The account password. Encryption Whether to encrypt the destination database connection. Select Non-encrypted, SSL-encrypted, or Mongo Atlas SSL. If the destination is an ApsaraDB for MongoDB instance with Sharded Cluster architecture, SSL-encrypted is unavailable. If the destination is a self-managed MongoDB database using Replica Set architecture and the Access Method is not Alibaba Cloud Instance, you can upload a CA certificate when SSL-encrypted is selected. -
Click Test Connectivity and Proceed.
Make sure DTS server CIDR blocks can access the source and destination databases. They are added automatically when you use Alibaba Cloud instances, or you can add them manually. See Add the CIDR blocks of DTS servers. If the source or destination is a self-managed database not connected via Alibaba Cloud Instance, click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.
-
Configure the objects to synchronize.
-
In the Configure Objects step, set the following parameters.
Parameter Description Synchronization Types The types of synchronization to perform. Incremental Data Synchronization is selected by default. Also select Schema Synchronization and Full Data Synchronization to synchronize historical data as the basis for subsequent incremental synchronization. See How it works for details on synchronization types. Processing Mode of Conflicting Tables Precheck and Report Errors (default): checks for collection name conflicts between source and destination before starting. The task fails the precheck if identical names are found. To synchronize to a destination collection that cannot be deleted or renamed, use object name mapping. See Rename an object to be synchronized. Ignore Errors and Proceed: skips the conflict check. > Warning: If you select this option, data inconsistency may occur. Records in the destination with the same primary key or unique key as the source are not overwritten; the existing destination records are retained. This may also cause initialization failures or partial column synchronization. Synchronization Topology Select One-way Synchronization. Capitalization of Object Names in Destination Instance The capitalization policy for database and collection names in the destination. Default: DTS default policy. See Specify the capitalization of object names in the destination instance. Source Objects Select databases or collections from Source Objects and click the
icon to move them to Selected Objects.Selected Objects To rename an object in the destination, right-click it. See Map object names. To remove an object, click it and then click the
icon. To set incremental synchronization scope by database or collection, right-click Selected Objects. To filter data during full synchronization, right-click a table in Selected Objects and configure filter conditions. Note that filters do not apply during incremental synchronization. See Specify filter conditions. Renaming a database or collection using object name mapping may cause dependent objects to fail synchronization. -
Click Next: Advanced Settings and configure the following parameters.
Parameter Description Dedicated Cluster for Task Scheduling By default, DTS uses the shared cluster. For improved stability, purchase a dedicated cluster. See What is a DTS dedicated cluster. Retry Time for Failed Connections How long DTS retries a connection if the source or destination database becomes unreachable. Valid values: 10–1440 minutes. Default: 720 minutes. Set this to more than 30 minutes. If DTS reconnects within the retry window, the task resumes; otherwise, the task fails. If multiple tasks share the same source or destination, the shortest retry window applies. DTS continues to charge during retries. Retry Time for Other Issues How long DTS retries if DDL or DML operations fail. Valid values: 1–1440 minutes. Default: 10 minutes. Set this to more than 10 minutes. This value must be less than Retry Time for Failed Connections. Enable Throttling for Full Data Migration Limits DTS read/write rate during full synchronization to reduce load on the database servers. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Displayed only when Full Data Synchronization is selected. Only one data type for primary key _id in a single table Whether the _idfield has a single data type across all records in a collection. Yes: DTS skips scanning_iddata types during full synchronization. No: DTS scans_iddata types. Displayed only when Full Data Synchronization is selected.Enable Throttling for Incremental Data Synchronization Limits DTS rate during incremental synchronization. Configure RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s). Environment Tag A tag to identify the DTS instance. Optional. Configure ETL Whether to enable the extract, transform, and load (ETL) feature. Yes: configure ETL with data processing statements. See Configure ETL in a data migration or data synchronization task. No: skip ETL configuration. Monitoring and Alerting Whether to enable alerting for the task. Yes: configure alert thresholds and contacts. See Configure monitoring and alerting when you create a DTS task. No: disable alerting. -
Click Next Step: Data Verification to configure data verification. See Configure a data verification task.
-
-
Save the task settings and run a precheck.
-
To preview the API parameters for this configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
-
Click Next: Save Task Settings and Precheck.
DTS runs a precheck before the task can start. If the precheck fails, click View Details next to each failed item to diagnose and resolve the issue, then rerun the precheck. If an alert is triggered for an item: if the alert cannot be ignored, resolve the issue and recheck; if it can be ignored, click Confirm Alert Details, then Ignore, then OK, then Precheck Again.
-
-
Purchase an instance.
-
Wait for Success Rate to reach 100%, then click Next: Purchase Instance.
-
On the buy page, configure the following parameters.
Section Parameter Description New Instance Class Billing Method Subscription: pay upfront for a set duration. More cost-effective for long-term use. Pay-as-you-go: billed hourly. Suitable for short-term use. Release the instance when no longer needed to stop charges. Resource Group Settings The resource group for the instance. Default: default resource group. See What is Resource Management?. Instance Class Select a class based on your synchronization speed requirements. See Instance classes of data synchronization instances. Subscription Duration Available only for the Subscription billing method. Options: 1–9 months, 1 year, 2 years, 3 years, or 5 years. -
Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
-
Click Buy and Start, then click OK in the dialog box.
-
View the task progress in the task list.