Use Data Transmission Service (DTS) to keep a MongoDB replica set instance continuously in sync with another replica set or sharded cluster instance — with no application downtime.
Supported source and destination databases
DTS supports the following source and destination combinations. The primary example in this topic uses ApsaraDB for MongoDB instances, but the same procedure applies to the other combinations.
| Source | Destination | Notes |
|---|---|---|
| ApsaraDB for MongoDB replica set instance | ApsaraDB for MongoDB replica set or sharded cluster instance | Primary example in this topic. If the destination is a sharded cluster, configure shard keys and enable the balancer before starting the task. |
| Self-managed MongoDB replica set on Elastic Compute Service (ECS) | Self-managed MongoDB replica set or sharded cluster on ECS | |
| Self-managed MongoDB replica set connected over Express Connect, VPN Gateway, or Smart Access Gateway | Self-managed MongoDB replica set or sharded cluster connected over Express Connect, VPN Gateway, or Smart Access Gateway |
Prerequisites
Before you create a synchronization task, make sure the following requirements are met:
The source ApsaraDB for MongoDB replica set instance and the destination ApsaraDB for MongoDB replica set or sharded cluster instance are created. For more information, see Create a replica set instance and Create a sharded cluster instance.
The available storage space of the destination instance is at least 10% larger than the total size of the data in the source instance.
If the destination instance is a sharded cluster, create the databases and collections to be sharded, configure data sharding, enable the balancer, and perform pre-sharding before starting the task. For more information, see Configure sharding to maximize the performance of shards and the FAQ topic.
Configuring sharding distributes synchronized data across shards, maximizing sharded cluster performance and preventing data skew.
Billing
| Synchronization type | Cost |
|---|---|
| Schema synchronization and full data synchronization | Free |
| Incremental data synchronization | Charged. For details, see Billing overview. |
Synchronization types
| Synchronization type | Description |
|---|---|
| Schema synchronization | Copies the schemas of selected databases and collections from source to destination. |
| Full data synchronization | Copies historical data of selected databases and collections. Supports databases and collections as synchronization objects. |
| Incremental data synchronization | Continuously replicates changes from source to destination after full data synchronization completes. |
Supported synchronization topologies
One-way one-to-one synchronization
One-way one-to-many synchronization
One-way many-to-one synchronization
One-way cascade synchronization
For a full description of each topology, see Synchronization topologies.
Supported incremental operations
oplog
The following operations are captured when DTS uses oplog:
CREATE COLLECTIONandCREATE INDEXDROP DATABASE,DROP COLLECTION, andDROP INDEXRENAME COLLECTIONInsert, update, and delete operations on documents
DTS does not synchronize incremental data from databases created after the task starts. For file-level incremental data, only the $set command is supported.Change streams
The following operations are captured when DTS uses change streams:
DROP DATABASEandDROP COLLECTIONRENAME COLLECTIONInsert, update, and delete operations on documents
For file-level incremental data, only the $set command is supported.Limitations
Review the following limitations before you configure a task.
Source database limitations
Bandwidth: The server hosting the source database must have enough outbound bandwidth. Insufficient bandwidth slows synchronization.
Primary key or unique constraint required: Collections must have a PRIMARY KEY or UNIQUE constraint with all fields unique. Without this, duplicate records may appear in the destination.
Collection limit: If you select collections as synchronization objects and need to rename them in the destination, a single task can synchronize up to 1,000 collections. For more than 1,000 collections, split the work across multiple tasks, or synchronize at the database level instead.
Document size limit: Individual documents cannot exceed 16 MB. Oversized documents cause the task to fail.
Unsupported source databases: Azure Cosmos DB for MongoDB clusters and Amazon DocumentDB elastic clusters are not supported as source databases.
oplog retention — at least 7 days required: The oplog must be enabled and retain data for at least 7 days so that DTS can read changes that occurred before the task caught up. If the oplog is flushed before DTS reads it, the task fails and data loss may occur. This type of failure falls outside the DTS service level agreement (SLA). Alternatively, enable change streams. Keep in mind:
Change streams require MongoDB 4.0 or later.
Two-way synchronization is not supported when using change streams.
For non-elastic Amazon DocumentDB clusters, change streams are required. Set Migration Method to ChangeStream and Architecture to Sharded Cluster.
oplog is the recommended method. It pulls logs faster, resulting in lower synchronization latency.
TTL indexes: If the source has TTL indexes, data may become inconsistent between source and destination after synchronization.
Schema changes during schema or full synchronization: Do not change database or collection schemas (including array type updates) while schema synchronization or full data synchronization is running. Doing so may cause the task to fail or result in data inconsistency.
Writes during full-only synchronization: If you run only full data synchronization (without incremental), do not write to the source database during the process. Concurrent writes cause data inconsistency.
Sharded cluster destination limitations
Orphaned documents: Clear all orphaned documents before starting the task. Orphaned documents degrade synchronization performance, and
_idconflicts may cause data inconsistency or task failure.Shard keys required before task starts: Add shard keys to all data before starting the task. If adding shard keys is not possible, see Synchronize MongoDB (without shard keys) to MongoDB (sharded cluster architecture).
INSERT operations: Documents must include the shard key.
UPDATE operations: Shard keys cannot be modified.
Replica set destination limitations
Connection over Express Connect, VPN Gateway, Smart Access Gateway, public IP address, or Cloud Enterprise Network (CEN): Set Domain Name or IP and Port Number to the IP address and port of the primary node, or configure a high-availability endpoint. For more information, see Create a DTS task with a high-availability MongoDB database.
Self-managed database on ECS: Set Port Number to the primary node's port.
General limitations
Version compatibility: The destination MongoDB version must be the same as or later than the source version. An older destination version may cause compatibility issues.
SRV connection strings: DTS cannot connect to MongoDB using an SRV connection string.
Excluded databases: DTS cannot synchronize data from the
admin,config, orlocaldatabase.Capped collections and unique indexes: Collections with a unique index or the
cappedattribute set totruesupport only single-thread writes and do not support concurrent replay during incremental synchronization. This may increase synchronization latency.Transactions: Transaction information is not retained. Transactions in the source are written as individual records in the destination.
Primary key or unique key conflicts: If a conflict occurs during a write to the destination, DTS skips the conflicting write and retains the existing data in the destination.
Synchronization timing: Full data synchronization uses read and write resources on both source and destination. Run synchronization during off-peak hours to reduce load. After full data synchronization, concurrent INSERT operations may cause fragmentation, so the destination storage may be slightly larger than the source.
Data from other sources during synchronization: Writing data from other sources to the destination while DTS is running may cause data inconsistency or loss.
Post-synchronization storage size: The destination database uses 5%–10% more storage space than the source after synchronization.
Count queries: Use
db.$table_name.aggregate([{ $count:"myCount"}])to query document counts in the destination MongoDB database.Duplicate primary keys: The default primary key is
_id. Make sure no documents in the destination share an_idwith documents in the source before starting the task. If duplicates exist, remove the conflicting documents from the destination first.Task failure recovery: If a task fails, DTS technical support will attempt to restore it within 8 hours. The task may be restarted and task parameters may be modified during restoration.
Only DTS task parameters may be modified. Database parameters are not changed. For the parameters that may be modified, see Modify the parameters of a DTS instance.
Self-managed source database limitations
Primary/secondary switchover: If a primary/secondary switchover occurs on the source while the task is running, the task fails.
Latency accuracy: DTS calculates synchronization latency based on the timestamp of the last synchronized document in the destination versus the current timestamp in the source. If no updates occur on the source for a long time, the latency reading may be inaccurate. To correct the latency reading, perform an update on the source.
If you synchronize an entire database, create a heartbeat table. DTS updates the heartbeat table every second, keeping the latency reading accurate.
Create a data synchronization task
Step 1: Go to the Data Synchronization page
Use one of the following consoles to navigate to the Data Synchronization page.
DTS console
Log on to the DTS console.DTS console
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the synchronization task will run.
DMS console
The exact navigation path depends on the DMS console mode and layout. For details, see Simple mode and Customize the layout and style of the DMS console.
Log on to the DMS console.DMS console
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list to the right of Data Synchronization Tasks, select the region where the synchronization instance resides.
Step 2: Configure source and destination databases
Click Create Task.
On the task configuration page, configure the source and destination databases.
WarningAfter configuring source and destination databases, read the Limits displayed on the page. Skipping this step may cause the task to fail or result in data inconsistency.
Source database parameters:
Parameter Description Task Name A name for the DTS task. DTS generates a name automatically. Specify a descriptive name for easier identification. The name does not need to be unique. Select Existing Connection If the source instance is registered with DTS, select it from the drop-down list — DTS fills in the remaining parameters automatically. Otherwise, configure the parameters below. In the DMS console, use the Select a DMS database instance drop-down list. Database Type Select MongoDB. Connection Type Select Alibaba Cloud Instance. Instance Region The region where the source ApsaraDB for MongoDB instance resides. Replicate Data Across Alibaba Cloud Accounts Select No (this example uses a single Alibaba Cloud account). Architecture Select Replica Set. Migration Method The method DTS uses to capture incremental changes. Oplog (recommended) requires the oplog feature to be enabled on the source. ChangeStream requires change streams to be enabled. For details on oplog, see the oplog section above. NoteIf you select Sharded Cluster for the Architecture parameter, you do not need to configure the Shard account and Shard password parameters.
Instance ID The ID of the source ApsaraDB for MongoDB instance. Authentication Database The database that stores account credentials. Default: admin.Database Account The account must have read permissions on the source database and the config,admin, andlocaldatabases.Database Password The password for the database account. Encryption Choose Non-encrypted, SSL-encrypted, or Mongo Atlas SSL based on your requirements. Available options depend on the Connection Type and Architecture settings — the DTS console shows only valid options. If Architecture is Sharded Cluster and Migration Method is Oplog, SSL-encrypted is unavailable. If the source is a self-managed replica set not using Alibaba Cloud Instance access and SSL-encrypted is selected, upload a certification authority (CA) certificate to verify the connection. Destination database parameters:
Parameter Description Select Existing Connection If the destination instance is registered with DTS, select it from the drop-down list. Otherwise, configure the parameters below. Database Type Select MongoDB. Connection Type Select Alibaba Cloud Instance. Instance Region The region where the destination ApsaraDB for MongoDB instance resides. Replicate Data Across Alibaba Cloud Accounts Select No (this example uses a single Alibaba Cloud account). Architecture The architecture of the destination instance (replica set or sharded cluster). Instance ID The ID of the destination ApsaraDB for MongoDB instance. Authentication Database The database that stores account credentials. Default: admin.Database Account The account must have the dbAdminAnyDatabasepermission, read and write permissions on the destination database, and read permissions on thelocaldatabase.Database Password The password for the database account. Encryption Choose Non-encrypted, SSL-encrypted, or Mongo Atlas SSL. If the destination is an ApsaraDB for MongoDB sharded cluster, SSL-encrypted is unavailable. If the destination is a self-managed replica set not using Alibaba Cloud Instance access and SSL-encrypted is selected, upload a CA certificate. Click Test Connectivity and Proceed.
DTS servers need access to your source and destination databases. Add the DTS server CIDR blocks to the security settings of both databases. For details, see Add the CIDR blocks of DTS servers. If the source or destination is a self-managed database (not Alibaba Cloud Instance access), click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.
Step 3: Select objects to synchronize
In the Configure Objects step, set the following parameters:
Parameter Description Synchronization Type Incremental Data Synchronization is selected by default. Also select Schema Synchronization and Full Data Synchronization. DTS first synchronizes historical data (the baseline), then continuously replicates changes on top of it. Processing Mode for Existing Destination Tables Precheck and Report Errors: checks whether the destination has collections with the same names as the source. If identical names exist, the precheck fails and the task cannot start. Use the object name mapping feature to rename conflicting collections if needed — see Rename an object to be synchronized. Ignore Errors and Proceed: skips the conflict check. If a destination record shares a primary key or unique key with a source record, DTS does not overwrite it — the existing destination record is kept. WarningThis mode may cause data inconsistency. Data may fail to be initialized, only specific columns are synchronized, or the data synchronization instance may fail.
Synchronization Topology Select One-way Synchronization. Case Policy for Destination Object Names Controls how database and collection names are capitalized in the destination. Default: DTS default policy. For details, see Specify the capitalization of object names. Source Objects Select databases or collections to synchronize, then click the
icon to move them to Selected Objects.Selected Objects Right-click an object to configure its name in the destination or specify which destination object receives the data. To remove an object, click it and then click the
icon to move it back. Right-click to filter data by condition during full data synchronization — filtering is not available during incremental synchronization. For details, see Specify filter conditions. Renaming a database or collection in the destination may break other objects that depend on it.Click Next: Advanced Settings.
Step 4: Configure advanced settings
| Parameter | Description |
|---|---|
| Dedicated Cluster for Task Scheduling | By default, DTS schedules to the shared cluster. For higher stability, purchase a dedicated cluster. See What is a DTS dedicated cluster. |
| Retry Time for Failed Connections | How long DTS retries after a connection failure. Range: 10–1440 minutes. Default: 720. Set to more than 30 minutes. If DTS reconnects within this window, the task resumes automatically. If not, the task fails. If multiple tasks share a source or destination database, the shortest retry window applies. Charges continue during retries. We recommend that you release the DTS instance promptly after the source and destination instances are released. |
| Retry Time for Other Issues | How long DTS retries after DDL or DML failures. Range: 1–1440 minutes. Default: 10. Set to more than 10 minutes. Must be less than Retry Time for Failed Connections. |
| Enable Throttling for Full Data Synchronization | Throttle full data synchronization to reduce load on source and destination databases. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected. |
| Only one data type for primary key _id in a single table | Whether all documents in a collection use the same data type for _id. Yesalert notification settings: DTS does not scan the data type for _id and synchronizes only the data of the primary key for a data type in a single collection. No: DTS scans _id types and synchronizes all documents. Set based on your data — incorrect configuration may cause data loss. Available only when Full Data Synchronization is selected. |
| Enable Throttling for Incremental Data Synchronization | Throttle incremental data synchronization. Configure RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s). |
| Environment Tag | An optional tag to identify the DTS instance environment. |
| Configure ETL | Enable the extract, transform, and load (ETL) feature. Yes: enter data processing statements in the code editor. See Configure ETL in a data migration or synchronization task. No: skip ETL. |
| Monitoring and Alerting | Configure alerts for task failures or high synchronization latency. Yes: set an alert threshold and notification contacts. See Configure monitoring and alerting. No: skip alerting. |
Click Next Step: Data Verification to optionally configure data verification. For details, see Configure a data verification task.
Step 5: Run the precheck
To preview the API parameters for this task, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters. Otherwise, click Next: Save Task Settings and Precheck directly.
Wait for the precheck to complete.
If the task passes, proceed to step 6.
If the task fails, click View Details next to each failed item, resolve the issue, and click Precheck Again.
If an alert is triggered: if the alert can be ignored, click Confirm Alert Details, then Ignore, then OK, then Precheck Again. Ignoring alerts may cause data inconsistency.
Step 6: Purchase an instance
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
On the purchase page, configure the following parameters:
Parameter Description Billing Method Subscription: pay upfront for a fixed term. More cost-effective for long-term use. Pay-as-you-go: billed hourly. Suitable for short-term use. Release the instance when it is no longer needed to stop charges. Resource Group Settings The resource group for this instance. Default: default resource group. For details, see What is Resource Management? Instance Class Determines synchronization speed. Select based on your data volume and latency requirements. For details, see Instance classes of data synchronization instances. Subscription Duration (Subscription billing only) Choose 1–9 months, or 1, 2, 3, or 5 years. Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the confirmation dialog box.
The task appears in the task list. You can monitor its progress there.
What's next
Monitor synchronization latency and task status in the DTS console.
After the task stabilizes, configure monitoring and alerting to get notified of failures or high latency. See Configure monitoring and alerting.
To verify data consistency between source and destination, run a data verification task. See Configure a data verification task.