Data Transmission Service (DTS) supports migrating data from an ApsaraDB for MongoDB instance with a replica set architecture (no shard keys) to an ApsaraDB for MongoDB instance with a sharded cluster architecture. During migration, you can assign default shard key values to collections that lack shard keys.
Prerequisites
Before you begin, make sure that:
-
The destination ApsaraDB for MongoDB instance (sharded cluster architecture) is created. For more information, see Create a sharded cluster instance.
-
The storage capacity of the destination instance exceeds that of the source instance by at least 10%.
-
(Optional) To prevent all data from landing on the same shard, create the target databases and collections in the destination instance, configure data sharding, enable the Balancer, and run pre-sharding. For more information, see Configure sharding to maximize shard performance and Address uneven data distribution in a MongoDB sharded cluster.
If the source is a sharded cluster instance, get the endpoint for each shard node and make sure the accounts and passwords are consistent across all shards. For more information, see Apply for an endpoint for a shard node. For supported MongoDB version combinations, see Migration scenarios overview.
Billing
| Migration type | Link configuration fee | Data transfer cost |
|---|---|---|
| Schema migration and full data migration | Free | Free when migrating between Alibaba Cloud instances. Charges apply when data exits Alibaba Cloud via the Internet. For details, see Billable items. |
| Incremental data migration | Charged. For details, see Billing overview. | — |
Migration types
| Type | Supported objects | Description |
|---|---|---|
| Schema migration | DATABASE, COLLECTION, INDEX | Migrates the schema from the source to the destination. |
| Full data migration | DATABASE, COLLECTION | Migrates all existing data from the source to the destination. |
| Incremental data migration | See below | Migrates ongoing changes after full data migration completes. |
Incremental migration via Oplog supports: CREATE COLLECTION/INDEX, DROP DATABASE/COLLECTION/INDEX, RENAME COLLECTION, and insert/update/delete operations. Only $set updates are supported.
Incremental migration via ChangeStream supports: DROP DATABASE/COLLECTION, RENAME COLLECTION, and insert/update/delete operations. Only $set updates are supported.
Incremental migration (Oplog) does not support databases created after the task starts running.
Required database account permissions
| Database | Schema migration | Full data migration | Incremental data migration |
|---|---|---|---|
| Source ApsaraDB for MongoDB | Read on the source database and the config database | Read on the source database, the admin database, and the local database | |
| Destination ApsaraDB for MongoDB | dbAdminAnyDatabase, readWrite on the destination database, read on the local database, and read on the config database |
For more information on creating and authorizing database accounts, see Manage user permissions on MongoDB databases.
Usage notes
Source database limits
-
The source server must have sufficient outbound bandwidth. Insufficient bandwidth reduces migration speed.
-
Collections to be migrated must have a primary key or UNIQUE constraint with unique fields. Otherwise, duplicate data may appear in the destination.
-
When the migration granularity is collection with name mapping, a single task supports up to 1,000 collections. Exceeding this limit triggers a request error. In that case, split the collections across multiple tasks, or migrate entire databases instead.
-
A single document in the source cannot exceed 16 MB.
-
If the source is a sharded cluster, the number of source Mongos nodes cannot exceed 10.
-
If the source is an Azure Cosmos DB for MongoDB cluster or an Amazon DocumentDB elastic cluster, only full data migration is supported.
-
For incremental data migration, one of the following must be met:
-
The oplog feature is enabled and operation logs are retained for at least 7 days.
-
Change streams are enabled and DTS can subscribe to changes from the last 7 days.
ImportantUse the oplog feature to get data changes from the source when possible. Change streams require the source to run MongoDB V4.0 or later. If the source is a non-elastic Amazon DocumentDB cluster, change streams are required — set Migration Method to ChangeStream and Architecture to Sharded Cluster when configuring the task.
-
-
During schema migration and full data migration, do not change the schema of any database or collection (including array type updates). Doing so causes task failure or data inconsistency.
-
If you run only full data migration (without incremental), do not write new data to the source during migration.
-
Collections with time to live (TTL) indexes cannot be migrated. If such collections exist in the source, data inconsistency may occur after migration.
-
If the source is a sharded cluster, make sure there are no orphaned documents. Orphaned documents can cause data inconsistency or task failure. See Orphaned document and How to clean orphaned documents.
-
If the source is a sharded cluster with an active Balancer (data balancing in progress), migration latency may increase.
Other limits
-
New collections added to the source after the task starts do not support default shard key values.
-
If the destination MongoDB (sharded cluster) version is lower than 4.4, the default ShardKey value takes effect. DTS fills original data with the specified default value and writes it to the destination.
-
If the destination MongoDB (sharded cluster) version is 4.4 or later, the default ShardKey value does not take effect. DTS writes original data to the destination as-is.
-
Keep the source and destination MongoDB versions consistent, or migrate from a lower version to a higher version. Migrating from a higher version to a lower version may cause compatibility issues.
-
The admin and local databases cannot be migrated.
-
Transaction information is not retained. Transactions are converted to individual records in the destination.
-
Run the migration during off-peak hours. During full data migration, DTS uses read and write resources on both the source and destination, which increases database load.
-
Full data migration uses concurrent INSERT operations, which causes fragmentation in destination collections. After full migration, the destination storage space will be larger than the source.
-
Because DTS writes data concurrently, the destination storage space is 5–10% larger than the source.
-
DTS attempts to resume failed tasks from the last 7 days. Before switching your workloads to the destination, stop or release the task, or revoke the write permissions from the DTS accounts on the destination — otherwise the task may resume automatically and overwrite destination data.
-
If a destination collection has a unique index or the
cappedattribute istrue, the collection supports only single-thread writes and does not support concurrent replay during incremental migration. This may increase latency. -
To query the document count in the destination, use:
db.$table_name.aggregate([{ $count: "myCount" }]) -
Make sure the destination does not already contain documents with the same primary key (
_id) as the source. If it does, delete those documents from the destination before starting migration. -
If a DTS task fails, DTS technical support will attempt to restore it within 8 hours. Task parameters (but not database parameters) may be modified during restoration.
Special cases (self-managed source)
-
If the source performs a primary-secondary switch during migration, the task fails.
-
If the source has no update operations for a long period, the reported latency may be inaccurate. Run an update on the source to refresh the latency, or create a heartbeat that writes data every second.
Migrate a MongoDB replica set to a sharded cluster
Step 1: Go to the Data Migration page
Use one of the following methods:
DTS console
-
Log on to the DTS console.
-
In the left-side navigation pane, click Data Migration.
-
In the upper-left corner, select the region where the data migration instance resides.
DMS console
The exact navigation path depends on the mode and layout of your DMS console. For more information, see Simple mode and Customize the layout and style of the DMS console.
-
Log on to the DMS console.
-
In the top navigation bar, choose Data + AI > DTS (DTS) > Data Migration.
-
From the drop-down list next to Data Migration Tasks, select the region where the instance resides.
Step 2: Create a task
-
Click Create Task.
-
(Optional) If New Configuration Page appears in the upper-right corner of the page, click it to switch to the new version. Skip this step if Back to Previous Version is displayed instead.
Step 3: Configure the source and destination databases
Configure the following parameters:
Source database
| Parameter | Description |
|---|---|
| Task Name | Enter a descriptive name for the task. Names do not need to be unique. |
| Select Existing Connection | Select an existing registered instance, or configure the database manually. |
| Database Type | Select MongoDB. |
| Access Method | Select Alibaba Cloud Instance. |
| Instance Region | Select the region of the source instance. |
| Replicate Data Across Alibaba Cloud Accounts | Select No if the source and destination are in the same Alibaba Cloud account. |
| Architecture | Select Replica Set. If you select Sharded Cluster, also fill in Shard account and Shard password. |
| Migration Method | Select how DTS reads incremental changes from the source: Oplog (recommended) or ChangeStream. Use ChangeStream only if the source does not support oplog, or if the source is Amazon DocumentDB (non-elastic cluster). When Architecture is set to Sharded Cluster and Migration Method is set to ChangeStream, the Shard account and Shard password fields are not required. |
| Instance ID | Select the instance ID. |
| Authentication Database | Enter the authentication database name. The default is admin. |
| Database Account | Enter the database account. For permission requirements, see Required database account permissions. |
| Database Password | Enter the database password. |
| Encryption | Select Non-encrypted, SSL-encrypted, or Mongo Atlas SSL. Available options depend on Access Method and Architecture. If Architecture is Sharded Cluster and Migration Method is Oplog, SSL-encrypted is unavailable. |
Destination database
| Parameter | Description |
|---|---|
| Select Existing Connection | Select an existing registered instance, or configure the database manually. |
| Database Type | Select MongoDB. |
| Access Method | Select Alibaba Cloud Instance. |
| Instance Region | Select the region of the destination instance. |
| Replicate Data Across Alibaba Cloud Accounts | Select No if the source and destination are in the same Alibaba Cloud account. |
| Architecture | Select Sharded Cluster. |
| Instance ID | Select the instance ID. |
| Authentication Database | Enter the authentication database name. The default is admin. |
| Database Account | Enter the database account. For permission requirements, see Required database account permissions. |
| Database Password | Enter the database password. |
| Encryption | Select Non-encrypted, SSL-encrypted, or Mongo Atlas SSL. If the destination is an ApsaraDB for MongoDB sharded cluster instance, SSL-encrypted is unavailable. |
Step 4: Test connectivity
Click Test Connectivity and Proceed.
DTS server CIDR blocks must be added to the security settings of both the source and destination databases. DTS can add them automatically for Alibaba Cloud instances. For self-managed databases, see Add the CIDR blocks of DTS servers. If the source or destination database uses an access method other than Alibaba Cloud Instance, click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.
Step 5: Configure objects to migrate
On the Configure Objects page, set the following parameters:
| Parameter | Description |
|---|---|
| Migration Types | Select Schema Migration and Full Data Migration for full migration only. To maintain service continuity, also select Incremental Data Migration. If you skip Schema Migration, create the target databases and collections in the destination manually and enable object name mapping in Selected Objects. |
| Processing Mode of Conflicting Tables | Precheck and Report Errors: Flags identical collection names between source and destination before migration starts. Use object name mapping to rename conflicting collections. Ignore Errors and Proceed: Skips this check. DTS does not migrate records with matching primary keys, and data consistency is not guaranteed. |
| Capitalization of Object Names in Destination Instance | Controls the capitalization of database, table, and column names in the destination. Default is DTS default policy. For details, see Specify the capitalization of object names. |
| Source Objects | Select databases or collections and click the right-arrow icon to add them to Selected Objects. |
| Selected Objects | To rename a database: right-click the database under Selected Objects, then update Schema Name in the Edit Schema dialog box. To rename a collection: right-click the collection, then update Table Name in the Edit Table dialog box. Note that object name mapping may cause dependent objects to fail migration. To set filter conditions for full migration, right-click a table in Selected Objects and configure the conditions. For details, see Set filter conditions. |
Step 6: Configure advanced settings
Click Next: Advanced Settings and configure the following:
| Parameter | Description |
|---|---|
| Dedicated Cluster for Task Scheduling | By default, DTS uses the shared cluster. For higher stability, purchase a dedicated cluster. See What is a DTS dedicated cluster. |
| Retry Time for Failed Connections | How long DTS retries after a connection failure. Valid range: 10–1,440 minutes. Default: 720 minutes. Set to more than 30 minutes. |
| Retry Time for Other Issues | How long DTS retries after DDL or DML operation failures. Valid range: 1–1,440 minutes. Default: 10 minutes. Set to more than 10 minutes. Must be less than Retry Time for Failed Connections. |
| Enable Throttling for Full Data Migration | Limits DTS read/write usage during full migration. Configure QPS to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Migration is selected. |
| Only one data type for primary key _id in a single table | Whether the _id data type is unique within a collection. Select Yes to skip type scanning during full migration. Select No to scan the _id data type. Available only when Full Data Migration is selected. |
| Enable Throttling for Incremental Data Migration | Limits DTS usage during incremental migration. Configure RPS of Incremental Data Migration and Data migration speed for incremental migration (MB/s). Available only when Incremental Data Migration is selected. |
| Environment Tag | Select an optional tag to identify the instance. |
| Configure ETL | Select Yes to configure extract, transform, and load (ETL) processing. See Configure ETL in a data migration or synchronization task. |
| Monitoring and Alerting | Select Yes to receive alerts when the task fails or latency exceeds a threshold. Configure the alert threshold and notification settings. See Configure monitoring and alerting. |
Step 7: Configure data verification (optional)
Click Next Step: Data Verification to set up a data verification task. For details, see Configure a data verification task.
Step 8: Set default shard key values
Click Next: Configure Database and Table Fields. For each target collection that has shard keys (where Number of Shard Keys is not 0), assign a default value:
-
Click Set Default Value in the row of the target collection.
-
Select a Shard key default value type. Supported types: string and int.
-
Enter the Default Value for the shard key.
-
The default shard key value takes effect only if the destination MongoDB version is lower than 4.4. For version 4.4 and later, DTS writes original data to the destination without applying the default value.
-
Assign default values to all shard keys in the migration scope. Missing values trigger an alert during precheck and may cause the task to fail.
Step 9: Run a precheck
Click Next: Save Task Settings and Precheck.
DTS performs a precheck before starting the task. The task can only start after the precheck passes.
If an item fails, click View Details to review the cause, fix the issue, and run the precheck again.
If an item triggers an alert and can be ignored, click Confirm Alert Details, then click Ignore > OK > Precheck Again. Ignoring alerts may lead to data inconsistency.
To preview the API parameters for this configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
Step 10: Purchase and start the instance
-
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
-
On the Purchase Instance page, configure the instance class:
Parameter Description Resource Group The resource group for the instance. Default: default resource group. See What is Resource Management? Instance Class Determines migration speed. See Instance classes of data migration instances. -
Read and select the Data Transmission Service (Pay-as-you-go) Service Terms check box.
-
Click Buy and Start, then click OK in the confirmation dialog box.
Monitor task progress on the Data Migration page.
What's next
Before switching your workloads to the destination instance, stop or release the task, or revoke the write permissions from the DTS accounts on the destination to prevent the task from resuming automatically and overwriting destination data.