All Products
Search
Document Center

Data Transmission Service:Synchronize data from an ApsaraDB for MongoDB sharded cluster instance to an ApsaraDB for MongoDB replica set or sharded cluster instance

Last Updated:Mar 30, 2026

Data Transmission Service (DTS) supports one-way synchronization from an ApsaraDB for MongoDB sharded cluster instance to an ApsaraDB for MongoDB replica set or sharded cluster instance. This topic describes how to configure that synchronization task.

Supported synchronization paths

Source architecture Destination architecture Supported
Sharded cluster Replica set Yes
Sharded cluster Sharded cluster Yes
DTS supports one-way data synchronization between only two ApsaraDB for MongoDB sharded cluster instances. DTS does not support one-way data synchronization among multiple ApsaraDB for MongoDB instances.

Billing

Synchronization type Fee
Schema synchronization and full data synchronization Free of charge
Incremental data synchronization Charged. For more information, see Billing overview.

Synchronization types

Type Description
Schema synchronization DTS synchronizes the schemas of the selected objects from the source to the destination instance.
Full data synchronization DTS synchronizes historical data of the selected objects. Supported objects: databases and collections.
Incremental data synchronization DTS synchronizes ongoing data changes from the source to the destination instance.

Supported incremental operations

DTS synchronizes incremental data only from databases that exist when the task starts. Databases created after the task starts are not synchronized.

When using oplog:

  • CREATE COLLECTION and CREATE INDEX

  • DROP DATABASE, DROP COLLECTION, and DROP INDEX

  • RENAME COLLECTION

  • Insert, update, and delete operations on documents

Only the $set command runs synchronously during incremental data synchronization.

When using change streams:

  • DROP DATABASE and DROP COLLECTION

  • RENAME COLLECTION

  • Insert, update, and delete operations on documents

Only the $set command runs synchronously during incremental data synchronization.

Prerequisites

Before you begin, make sure that:

If the destination is a sharded cluster instance, complete the following steps before starting the DTS task:

  • Without schema synchronization: Manually create the databases and collections to be sharded, configure data sharding, enable the balancer, and perform pre-sharding. For more information, see Configure sharding to maximize the performance of shards.

  • With schema synchronization: After schema synchronization completes, enable the balancer and perform pre-sharding. For more information, see the FAQ.

Configuring sharding distributes synchronized data across different shards to maximize sharded cluster performance. Enabling the balancer and performing pre-sharding also prevents data skew.

Limitations

Source and destination database requirements

Limitation Details
Bandwidth The source database server must have sufficient outbound bandwidth. Insufficient bandwidth reduces synchronization speed.
Unique constraints Collections to be synchronized must have primary key or unique key constraints with all fields unique. Otherwise, the destination may contain duplicate records.
_id uniqueness The _id field in each synchronized collection must be unique. Otherwise, data inconsistency may occur.
Collection limit If you select collections as synchronization objects and need to edit collections in the destination database, such as renaming them, a single task can synchronize a maximum of 1,000 collections. Exceeding this limit causes a request error. Configure multiple tasks in batches, or synchronize entire databases instead.
Document size A single document cannot exceed 16 MB. Larger documents cause the task to fail.
Unsupported sources Azure Cosmos DB for MongoDB clusters and Amazon DocumentDB elastic clusters are not supported as source databases.
Mongos nodes The number of mongos nodes in the source MongoDB sharded cluster cannot exceed 10.
Self-managed sharded cluster access If the source is a self-managed MongoDB sharded cluster, set Access Method to Express Connect, VPN Gateway, or Smart Access Gateway or Cloud Enterprise Network (CEN).
Scaling MongoDB sharded cluster databases in a running DTS task cannot be scaled. Scaling during the task causes it to fail.
TTL indexes If the source contains TTL indexes, data inconsistency may occur between the source and destination after synchronization.
Orphaned documents No orphaned documents can exist in the source or destination database. Their presence can cause data inconsistency or task failure. For more information, see Glossary of MongoDB and How do I delete orphaned documents of a MongoDB database deployed in the sharded cluster architecture?

oplog and change streams:

The source database must have oplog enabled and retain log data for at least 7 days, or have change streams enabled with at least 7 days of data change history. Without this, DTS may fail to obtain data changes, causing synchronization failure or data loss. Issues that occur in such circumstances are not covered by the DTS service level agreement (SLA).

Important
  • Use oplog to record data changes in the source database (recommended over change streams).

  • Change streams require MongoDB 4.0 or later and do not support two-way synchronization.

  • If the source is a non-elastic Amazon DocumentDB cluster, you must enable change streams and set Migration Method to ChangeStream and Architecture to Sharded Cluster.

Balancer:

Make sure that the MongoDB balancer of the source database is disabled during full data synchronization. Do not enable the balancer until all full data synchronization is complete and incremental data synchronization starts. Running the balancer during full sync may cause data inconsistency. For more information, see Manage the ApsaraDB for MongoDB balancer.

If the balancer of the source database is enabled to balance data, the DTS task may be delayed.

Schema changes during sync:

Do not modify the schemas of databases or collections, including array type updates, during schema synchronization or full data synchronization. Doing so may cause the task to fail or result in data inconsistency.

Writes during full sync:

Do not write to the source database during full-only data synchronization. Doing so causes data inconsistency.

Other limits

  • DTS cannot synchronize data from the admin, config, or local database.

  • The destination MongoDB version must be the same as or later than the source version. An earlier destination version may cause compatibility issues.

  • If the destination is a replica set instance:

  • If the destination is a sharded cluster with data sharding already configured: Do not select Schema Synchronization in the Configure Objects step. Doing so may cause data inconsistency or task failure due to shard conflicts.

  • Add shard keys to all data to be synchronized in the source database before starting the task. During synchronization, INSERT operations must include shard keys, and UPDATE operations cannot modify shard keys.

  • Transaction information is not retained. Transactions are converted into individual records in the destination database.

  • If a primary key or unique key conflict occurs when DTS writes to the destination collection, DTS skips the write and retains the existing record in the destination.

  • We recommend that you synchronize data during off-peak hours. During full data synchronization, DTS uses read and write resources of the source and destination databases. This may increase the loads on the database servers.

  • During full data synchronization, concurrent INSERT operations cause fragmentation in destination collections. After full sync completes, the destination storage space may be larger than the source.

  • If a destination collection has a unique index or the capped attribute set to true, the collection supports only single-threaded writes during incremental synchronization. This may increase synchronization latency.

  • Concurrent writes to the destination database cause the destination storage to be 5%–10% larger than the source data size.

  • To query the row count on the destination MongoDB database, use: db.$table_name.aggregate([{ $count:"myCount"}]).

  • Make sure the destination MongoDB database does not already contain documents with the same primary key (_id) as the source. If duplicates exist, delete them from the destination without stopping the DTS task.

  • If a DTS task fails, DTS technical support attempts to restore it within 8 hours. During restoration, the task may be restarted and task parameters may be modified. Database parameters are not modified. For parameters that may be changed, see the "Modify instance parameters" section of the Modify the parameters of a DTS instance topic.

Delete orphaned documents

Delete all orphaned documents from the source MongoDB database before starting the synchronization task.

Important

Orphaned documents in the source can degrade synchronization performance, introduce duplicate _id values, and cause unintended data to be synchronized.

ApsaraDB for MongoDB instances

Running the cleanup script on an ApsaraDB for MongoDB instance with a major version earlier than 4.2 or a minor version earlier than 4.0.6 causes an error. To check the version, see MongoDB minor version release notes. To upgrade, see Upgrade the major version of an instance and Update the minor version of an instance.

MongoDB 4.4 and later

  1. Create a JavaScript file named cleanupOrphaned.js on a server that can connect to the sharded cluster instance.

    This script deletes orphaned documents from all collections in multiple databases across multiple shards. To target a specific collection, modify the parameters in the script.
    Parameter Description
    shardNames The IDs of the shards to clean up. Find them in the Shard List section on the Basic Information page of the sharded cluster instance. Example: d-bp15a3796d3a****.
    databasesToProcess The names of the databases from which to delete orphaned documents.
    // The names of shards.
    var shardNames = ["shardName1", "shardName2"];
    // The databases from which you want to delete orphaned documents.
    var databasesToProcess = ["database1", "database2", "database3"];
    
    shardNames.forEach(function(shardName) {
        // Traverse the specified databases.
        databasesToProcess.forEach(function(dbName) {
            var dbInstance = db.getSiblingDB(dbName);
            // Obtain the names of all collections of the specified databases.
            var collectionNames = dbInstance.getCollectionNames();
    
            // Traverse all collections.
            collectionNames.forEach(function(collectionName) {
                // The complete collection name.
                var fullCollectionName = dbName + "." + collectionName;
                // Build the cleanupOrphaned command.
                var command = {
                    runCommandOnShard: shardName,
                    command: { cleanupOrphaned: fullCollectionName }
                };
    
                // Run the cleanupOrphaned command.
                var result = db.adminCommand(command);
                if (result.ok) {
                    print("Cleaned up orphaned documents for collection " + fullCollectionName + " on shard " + shardName);
                    printjson(result);
                } else {
                    print("Failed to clean up orphaned documents for collection " + fullCollectionName + " on shard " + shardName);
                }
            });
        });
    });

    Update the following parameters in the script:

  2. In the directory where cleanupOrphaned.js is stored, run:

    Parameter Description
    <Mongoshost> The endpoint of the mongos node. Format: s-bp14423a2a51****.mongodb.rds.aliyuncs.com.
    <Primaryport> The port number of the mongos node. Default: 3717.
    <database> The name of the authentication database for the account.
    <username> The database account.
    <password> The password for the database account.
    output.txt The file that stores execution results.
    mongo --host <Mongoshost> --port <Primaryport> --authenticationDatabase <database> -u <username> -p <password> cleanupOrphaned.js > output.txt

MongoDB 4.2 and earlier

  1. Create a JavaScript file named cleanupOrphaned.js on a server that can connect to the sharded cluster instance.

    This script deletes orphaned documents from a specific collection across multiple shards. To process multiple collections, update the fullCollectionName parameter and run the script again, or modify the script to iterate over all collections.
    Parameter Description
    shardNames The IDs of the shards to clean up. Find them in the Shard List section on the Basic Information page of the sharded cluster instance. Example: d-bp15a3796d3a****.
    fullCollectionName The full name of the collection to clean up. Format: database name.collection name.
    function cleanupOrphanedOnShard(shardName, fullCollectionName) {
        var nextKey = { };
        var result;
    
        while ( nextKey != null ) {
            var command = {
                runCommandOnShard: shardName,
                command: { cleanupOrphaned: fullCollectionName, startingFromKey: nextKey }
            };
    
            result = db.adminCommand(command);
            printjson(result);
    
            if (result.ok != 1 || !(result.results.hasOwnProperty(shardName)) || result.results[shardName].ok != 1 ) {
                print("Unable to complete at this time: failure or timeout.")
                break
            }
    
            nextKey = result.results[shardName].stoppedAtKey;
        }
    
        print("cleanupOrphaned done for coll: " + fullCollectionName + " on shard: " + shardName)
    }
    
    var shardNames = ["shardName1", "shardName2", "shardName3"]
    var fullCollectionName = "database.collection"
    
    shardNames.forEach(function(shardName) {
        cleanupOrphanedOnShard(shardName, fullCollectionName);
    });

    Update the following parameters in the script:

  2. In the directory where cleanupOrphaned.js is stored, run:

    Parameter Description
    <Mongoshost> The endpoint of the mongos node. Format: s-bp14423a2a51****.mongodb.rds.aliyuncs.com.
    <Primaryport> The port number of the mongos node. Default: 3717.
    <database> The name of the authentication database for the account.
    <username> The database account.
    <password> The password for the database account.
    output.txt The file that stores execution results.
    mongo --host <Mongoshost> --port <Primaryport> --authenticationDatabase <database> -u <username> -p <password> cleanupOrphaned.js > output.txt

Self-managed MongoDB databases

  1. Download the cleanupOrphaned.js script on a server that can connect to the self-managed MongoDB database:

    wget "https://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/120562/cn_zh/1564451237979/cleanupOrphaned.js"
  2. In the cleanupOrphaned.js file, replace test with the name of the database from which to delete orphaned documents.

    Important

    To process multiple databases, repeat steps 2 and 3 for each database.

  3. On each shard, run the following command to delete orphaned documents from all collections in the specified database:

    Parameter Description
    <Shardhost> The IP address of the shard.
    <Primaryport> The service port of the primary node in the shard.
    <database> The name of the authentication database for the account.
    <username> The account used to log in to the self-managed MongoDB database.
    <password> The password used to log in to the self-managed MongoDB database.
    mongo --host <Shardhost> --port <Primaryport> --authenticationDatabase <database> -u <username> -p <password> cleanupOrphaned.js

    Example: For a self-managed MongoDB database with three shards, run the command once per shard:

    mongo --host 172.16.1.10 --port 27018 --authenticationDatabase admin -u dtstest -p 'Test123456' cleanupOrphaned.js
    
    mongo --host 172.16.1.11 --port 27021 --authenticationDatabase admin -u dtstest -p 'Test123456' cleanupOrphaned.js
    
    mongo --host 172.16.1.12 --port 27024 --authenticationDatabase admin -u dtstest -p 'Test123456' cleanupOrphaned.js

Configure the synchronization task

Important

The following procedure configures a DTS task before purchasing a DTS instance. When you configure the task first, you do not need to specify the number of shards in the source sharded cluster. If you purchase a DTS instance before configuring the task, specify the number of shards at purchase time.

Step 1: Go to the data synchronization page

Use one of the following consoles:

DTS console

  1. Log in to the DTS console.

  2. In the left-side navigation pane, click Data Synchronization.

  3. In the upper-left corner, select the region where the synchronization task resides.

Data Management Service (DMS) console

The actual steps may vary based on the DMS console mode and layout. For more information, see Simple mode and Customize the layout and style of the DMS console.
  1. Log in to the DMS console.

  2. In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.

  3. From the drop-down list to the right of Data Synchronization Tasks, select the region where the synchronization instance resides.

Step 2: Configure source and destination databases

  1. Click Create Task to open the task configuration page.

  2. Configure the task name, source database, and destination database using the following parameters:

    Section Parameter Description
    N/A Task Name A name for the DTS task. DTS generates a name automatically. Specify a descriptive name to make the task easy to identify. Unique names are not required.
    Source Database Select Existing Connection If the instance is registered with DTS, select it from the drop-down list — DTS populates the remaining parameters automatically. Otherwise, configure the parameters below. In the DMS console, select the instance from the Select a DMS database instance drop-down list.
    Database Type Select MongoDB.
    Access Method Select Alibaba Cloud Instance.
    Instance Region The region where the source ApsaraDB for MongoDB instance resides.
    Replicate Data Across Alibaba Cloud Accounts Select No if using the current Alibaba Cloud account.
    Architecture Select Sharded Cluster.
    Migration Method The method for synchronizing incremental data. Select based on your requirements: Oplog (recommended) — available if oplog is enabled. Oplog synchronizes incremental data at low latency and is enabled by default for both self-managed MongoDB databases and ApsaraDB for MongoDB instances. ChangeStream — available if change streams are enabled. For more information, see Change Streams. If the source is a non-elastic Amazon DocumentDB cluster, select ChangeStream only. If Architecture is set to Sharded Cluster, the Shard account and Shard password parameters are not required.
    Instance ID The ID of the source ApsaraDB for MongoDB instance.
    Authentication Database The name of the database that stores the account credentials. Default: admin.
    Database Account The account used to access the source database. The account must have read permissions on the source database and the config, admin, and local databases.
    Database Password The password for the database account.
    Shard account The account used to access the shard nodes. Required only for self-managed MongoDB databases.
    Shard password The password used to access the shard nodes. Required only for self-managed MongoDB databases.
    Encryption Whether to encrypt the connection. Options: Non-encrypted, SSL-encrypted, or Mongo Atlas SSL. Available options depend on the Access Method and Architecture settings — refer to the DTS console for the options displayed. If Architecture is Sharded Cluster and Migration Method is Oplog, SSL-encrypted is unavailable. If the source is a self-managed MongoDB database using replica set architecture with Encryption set to SSL-encrypted, upload a CA certificate to verify the connection.
    Destination Database Select Existing Connection If the instance is registered with DTS, select it from the drop-down list. Otherwise, configure the parameters below.
    Database Type Select MongoDB.
    Access Method Select Alibaba Cloud Instance.
    Instance Region The region where the destination ApsaraDB for MongoDB instance resides.
    Replicate Data Across Alibaba Cloud Accounts Select No if using the current Alibaba Cloud account.
    Architecture The architecture of the destination instance.
    Instance ID The ID of the destination ApsaraDB for MongoDB instance.
    Authentication Database The name of the database that stores the account credentials. Default: admin.
    Database Account The account used to access the destination database. The account must have the dbAdminAnyDatabase permission, read and write permissions on the destination database, and read permissions on the local database.
    Database Password The password for the database account.
    Encryption Whether to encrypt the connection. Options: Non-encrypted, SSL-encrypted, or Mongo Atlas SSL. If the destination is an ApsaraDB for MongoDB sharded cluster, SSL-encrypted is unavailable. If the destination is a self-managed MongoDB database using replica set architecture with Encryption set to SSL-encrypted, upload a CA certificate to verify the connection.
  3. Click Test Connectivity and Proceed.

    - Make sure DTS server CIDR blocks are added to the security settings of both the source and destination databases. For more information, see Add the CIDR blocks of DTS servers. - If the source or destination is a self-managed database with an access method other than Alibaba Cloud Instance, click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.

Step 3: Configure synchronization objects

  1. In the Configure Objects step, set the following parameters:

    Parameter Description
    Synchronization Types Select all three types: Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. After the precheck completes, DTS synchronizes historical data from the source to the destination as the basis for incremental synchronization. If data sharding is already configured in the destination sharded cluster and you do not need DTS schema synchronization, do not select Schema Synchronization. Doing so may cause data inconsistency or task failure due to shard conflicts.
    Processing Mode of Conflicting Tables Precheck and Report Errors: checks whether the destination already has collections with the same names as in the source. If matching names are found, an error is returned and the task cannot start. To resolve naming conflicts without deleting or renaming destination collections, use the object name mapping feature. For more information, see Rename an object to be synchronized. Ignore Errors and Proceed: skips the name conflict check. If a record in the destination has the same primary key or unique key as a record in the source, DTS does not overwrite the destination record.
    Warning

    Selecting this option may cause data inconsistency.

    Synchronization Topology Select One-way Synchronization.
    Capitalization of Object Names in Destination Instance The capitalization policy for database and collection names in the destination. Default: DTS default policy. For more information, see Specify the capitalization of object names in the destination instance.
    Source Objects Select the databases or collections to synchronize, then click the arrow icon to add them to Selected Objects.
    Selected Objects To rename a synchronized object in the destination or map it to a different object, right-click it in Selected Objects. For more information, see Map object names. To remove an object, click it and then click the remove icon to move it back to Source Objects. To configure incremental sync by database or collection, right-click Selected Objects and set options in the dialog box. To filter data in a collection, right-click it in Selected Objects and configure filter conditions. Filters apply only during full data synchronization, not during incremental synchronization. For more information, see Specify filter conditions. If you use object name mapping to rename databases or collections, other objects that depend on them may fail to synchronize.
  2. Click Next: Advanced Settings.

Step 4: Configure advanced settings

Parameter Description
Dedicated Cluster for Task Scheduling By default, DTS schedules the task to the shared cluster. To improve stability, purchase a dedicated cluster. For more information, see What is a DTS dedicated cluster.
Retry Time for Failed Connections The time range for DTS to retry failed connections. Valid values: 10–1440 minutes. Default: 720 minutes. Set this to at least 30 minutes. If DTS reconnects within the specified window, the task resumes. Otherwise, it fails. If multiple tasks share the same source or destination database, the shortest retry window takes precedence. DTS charges apply during the retry period.
Retry Time for Other Issues The time range for DTS to retry failed DDL or DML operations. Valid values: 1–1440 minutes. Default: 10 minutes. Set this to at least 10 minutes. This value must be less than Retry Time for Failed Connections.
Enable Throttling for Full Data Synchronization Limits the read/write load on source and destination databases during full synchronization. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s) to reduce load. Available only when Full Data Synchronization is selected.
Only one data type for primary key _id in a single table Indicates whether _id in a collection uses a single data type. Yes: DTS skips scanning the _id data type and synchronizes only one data type per collection. No: DTS scans all _id data types and synchronizes all data. Configure this based on your actual data. Incorrect configuration may cause data loss. Available only when Full Data Synchronization is selected.
Enable Throttling for Incremental Data Synchronization Limits the load on the destination database during incremental synchronization. Configure RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Environment Tag A tag to identify the DTS instance. Select based on your requirements.
Configure ETL Whether to enable the extract, transform, and load (ETL) feature. Yes: enables ETL. Enter data processing statements in the code editor. For more information, see Configure ETL in a data migration or data synchronization task. No: disables ETL. For more information about ETL, see What is ETL?
Monitoring and Alerting Whether to configure alerts for the task. Yes: sends alerts when the task fails or synchronization latency exceeds the threshold. Configure the alert threshold and notification settings. For more information, see the "Configure monitoring and alerting when you create a DTS task" section of the Configure monitoring and alerting topic. No: disables alerting.

Step 5: Configure data verification (optional)

Click Next Step: Data Verification to configure data verification. For more information, see Configure a data verification task.

Step 6: Save settings and run a precheck

  • To view the API parameters for this task configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.

  • Click Next: Save Task Settings and Precheck.

DTS runs a precheck before starting the synchronization task. The task can only start after passing the precheck.
If the precheck fails, click View Details next to each failed item, diagnose the issue, fix it, and rerun the precheck.
If an alert appears during the precheck:
For alerts that cannot be ignored: click View Details, fix the issue, and rerun the precheck.
For alerts that can be ignored: click Confirm Alert Details, then click Ignore in the View Details dialog box, click OK, and click Precheck Again. Ignoring alerts may cause data inconsistency.

Step 7: Purchase an instance

  1. Wait until Success Rate reaches 100%, then click Next: Purchase Instance.

  2. On the buy page, configure the following parameters:

    Section Parameter Description
    New Instance Class Billing Method Subscription: pay upfront for a set period. More cost-effective for long-term use. Pay-as-you-go: billed hourly. Suitable for short-term use. Release the instance when no longer needed to stop charges.
    Resource Group Settings The resource group for the synchronization instance. Default: default resource group. For more information, see What is Resource Management?
    Instance Class DTS provides instance classes with varying synchronization speeds. Select based on your requirements. For more information, see Instance classes of data synchronization instances.
    Subscription Duration Available only for Subscription billing. Select 1–9 months, 1 year, 2 years, 3 years, or 5 years. Also specify the number of instances to create.
  3. Read and select Data Transmission Service (Pay-as-you-go) Service Terms.

  4. Click Buy and Start, then click OK in the confirmation dialog box.

The task appears in the task list. Once the precheck passes and the instance is purchased, DTS starts the synchronization task automatically.

What's next

  • Monitor synchronization status and latency in the DTS console task list.

  • After full data synchronization completes and incremental synchronization starts, re-enable the MongoDB balancer on the source database if needed.

  • If the destination is a sharded cluster, enable the balancer and perform pre-sharding to prevent data skew. For more information, see Configure sharding to maximize the performance of shards.