Use Data Transmission Service (DTS) to migrate data from an ApsaraDB for MongoDB replica set to a PolarDB for MySQL cluster. DTS supports full data migration and incremental data migration.
How it works
Migrating from MongoDB to PolarDB for MySQL is a cross-paradigm migration: MongoDB stores data as flexible BSON documents, while PolarDB for MySQL stores data in fixed-schema relational tables. DTS handles this by letting you define field-level mappings using bson_value() expressions.
For each collection you migrate, DTS maps each MongoDB field to a column in the destination table. For example:
| MongoDB document field | PolarDB for MySQL column |
|---|---|
"_id": "62cd344c85c1ea6a2a9f****" | mongo_id (varchar, primary key) |
"person.name": "neo" | person_name (varchar) |
"person.age": 26 | person_age (decimal) |
Before you start, create the destination table with the correct schema. DTS writes data into that table — it does not create or modify the table structure.
Prerequisites
Before you begin, make sure you have:
A PolarDB for MySQL cluster with available storage space that exceeds the total size of the source data. Allocate at least 10% more storage than the source data size. See Purchase an Enterprise Edition cluster or Purchase a subscription cluster.
A database and table with a single-column primary key created in the destination cluster. Composite primary keys are not supported.
Do not name any column
_idor_value. Otherwise, the data migration task fails.Make sure destination column types are compatible with the corresponding MongoDB data types. For example, the MongoDB
_idfield (ObjectId type) must map to avarcharcolumn. For the full mapping, see Data type mapping.See Manage databases.
A database account on the source ApsaraDB for MongoDB instance with the permissions listed in Required permissions.
A database account on the destination PolarDB for MySQL cluster with read and write permissions on the destination database. See Create and manage a database account.
(Sharded cluster source only) Endpoints applied for all shard nodes. All shard nodes must share the same account password and endpoint. See Apply for an endpoint for a shard or ConfigServer component.
Migration types
| Migration type | Description |
|---|---|
| Full data migration | Migrates all historical data from the source ApsaraDB for MongoDB instance to the destination PolarDB for MySQL cluster. The link configuration fee is free. |
| Incremental data migration | After full migration completes, continuously migrates new changes from the source to the destination. Supported operations: insert, update, and delete on collection documents. Only documents updated using the $set command have their incremental changes migrated. A link configuration fee applies. |
For billing details, see Billing overview. A data transfer fee applies when migrating data out of Alibaba Cloud over the Internet.
Required permissions
| Database | Full data migration | Incremental data migration |
|---|---|---|
| Source ApsaraDB for MongoDB instance | Read on the source database | Read on the source database, the admin database, and the local database |
| Destination PolarDB for MySQL cluster | Read and write on the destination database | Read and write on the destination database |
For instructions on managing MongoDB accounts, see Account management. For PolarDB for MySQL accounts, see Create and manage a database account.
Limitations
Source database limitations
| Limitation | Details |
|---|---|
| Outbound bandwidth | The server hosting the source database must have sufficient outbound bandwidth. Insufficient bandwidth reduces migration speed. |
| Collection count | If you rename collections during migration, a single task supports a maximum of 1,000 collections. Split larger migrations into multiple tasks. |
| Full migration only | The following source types support only full data migration — incremental migration is not available: standalone ApsaraDB for MongoDB instances, Azure Cosmos DB for MongoDB clusters, Amazon DocumentDB elastic clusters. |
| Incremental migration requirements | To run incremental data migration, one of the following must be true: (1) The oplog is enabled on the source database and operation logs are retained for at least 7 days. (2) Change streams are enabled and DTS can subscribe to data changes within the last 7 days. If neither condition is met, DTS may fail to obtain operation logs, causing the task to fail or data inconsistency to occur. This is outside the scope of the DTS service level agreement (SLA). Use the oplog method when possible — it has lower incremental migration latency. Change streams require MongoDB V4.0 or later. |
| Amazon DocumentDB (non-elastic cluster) | Only the ChangeStream method is supported. When configuring the task, set Migration Method to ChangeStreamChange Streams and Architecture to Sharded Cluster. |
Sharded cluster — _id uniqueness | The _id field in each collection must be unique. Non-unique _id values cause data inconsistency. |
| Sharded cluster — Mongos node count | The source sharded cluster cannot have more than 10 Mongos nodes and must not contain orphaned documents. See the MongoDB documentation and the DTS FAQ. |
| Sharded cluster — balancer | If the MongoDB balancer is enabled on a sharded cluster, the instance may experience delays. |
| Operations during full migration | Do not change database or collection schemas. Do not modify data of the ARRAY type. If you run full migration only (without incremental), do not write to the source database during migration. Writes cause data inconsistency between source and destination. |
Other limitations
| Limitation | Details |
|---|---|
| Migration objects | Only collections can be selected as migration objects. |
| Primary key requirement | The destination table must have a single-column unique primary key. Assign bson_value("_id") to that primary key column. |
| Excluded databases | DTS cannot migrate data from the admin or local database. |
| Transactions | Transaction information is not preserved. Transactions are converted to individual records in the destination database. |
| 4-byte characters | If the source data contains rare characters or emojis (4-byte UTF-8), the destination database and tables must use the utf8mb4 character set. If you use DTS schema migration, set the character_set_server parameter in the destination database to utf8mb4. |
| FLOAT and DOUBLE precision | DTS reads FLOAT and DOUBLE values using ROUND(COLUMN, PRECISION). If no precision is specified, DTS uses 38 digits for FLOAT and 308 digits for DOUBLE. Confirm these defaults meet your requirements before starting the migration. |
| Task auto-resume | DTS automatically resumes a failed migration task within 7 days. Before switching your business workload to the destination instance, end or release the DTS task, or revoke the DTS account's write permissions on the destination instance using the revoke command. Otherwise, the resumed task may overwrite data in the destination. |
| Incremental latency display | DTS calculates incremental migration latency based on the timestamp of the latest migrated record and the current source timestamp. If no updates occur on the source for a long time, the displayed latency may be inaccurate. Perform an update on the source to refresh the latency reading. |
| DTS task failure recovery | If a DTS task fails, DTS technical support attempts to restore it within 8 hours. During restoration, the task may restart and task-level parameters (not database parameters) may be modified. |
Configure the migration task
Step 1: Open the Data Migration page
Use one of the following methods to navigate to the Data Migration page.
DTS console
Log on to the DTS console.DTS console
In the left-side navigation pane, click Data Migration.
In the upper-left corner, select the region where the migration instance resides.
DMS console
The exact navigation path may vary based on your DMS console mode and layout. See Simple mode and Customize the layout and style of the DMS console.
Log on to the DMS console.DMS console
In the top navigation bar, move the pointer over Data + AI > DTS (DTS) > Data Migration.
From the drop-down list to the right of Data Migration Tasks, select the region where the migration instance resides.
Step 2: Create a task
Click Create Task.
If the New Configuration Page button appears in the upper-right corner, click it to switch to the new configuration interface.
Skip this step if Back to Previous Version is displayed instead.
Step 3: Configure source and destination databases
Configure the following parameters.
| Category | Parameter | Description |
|---|---|---|
| N/A | Task Name | A name for the DTS task. DTS auto-generates a name. Specify a descriptive name that makes the task easy to identify. The name does not need to be unique. |
| Source Database | Select Existing Connection | Select an existing registered instance or configure a new one manually. If you select an existing instance, DTS auto-fills the database parameters. To register a database, see Manage database connections. In the DMS console, select a database from the Select a DMS database instance drop-down list, or click Add DMS Database Instance. See Register an Alibaba Cloud database instance and Register a database hosted on a third-party cloud service or a self-managed database. |
| Database Type | Select MongoDB. | |
| Access Method | Select Alibaba Cloud Instance. | |
| Instance Region | The region where the source ApsaraDB for MongoDB instance resides. | |
| Replicate Data Across Alibaba Cloud Accounts | Select No (same account). | |
| Architecture | The architecture of the source instance. Select Replica Set for this example. If the source is a Sharded Cluster, also specify Shard account and Shard password when using the Oplog method. | |
| Migration Method | The method for capturing incremental changes. Oplog (recommended): requires oplog to be enabled on the source. ChangeStream: requires change streams to be enabled; required for Amazon DocumentDB non-elastic clusters. When using ChangeStream with a Sharded Cluster architecture, Shard account and Shard password are not required. | |
| Instance ID | The ID of the source ApsaraDB for MongoDB instance. | |
| Authentication Database | The database that stores the account credentials. The default is admin. | |
| Database Account | The account used to connect to the source database. | |
| Database Password | The password for the account. | |
| Encryption | The connection encryption method: Non-encrypted, SSL-encrypted, or Mongo Atlas SSL. The available options depend on the values of Access Method and Architecture. SSL-encrypted is not available when Architecture is Sharded Cluster and Migration Method is Oplog. For self-managed MongoDB replica sets using SSL encryption, upload a CA certificate to verify the connection. | |
| Destination Database | Select Existing Connection | Select an existing registered instance or configure a new one manually. To register a database, see Manage database connections. |
| Database Type | Select PolarDB for MySQL. | |
| Access Method | Select Alibaba Cloud Instance. | |
| Instance Region | The region where the destination PolarDB for MySQL cluster resides. | |
| Replicate Data Across Alibaba Cloud Accounts | Select No (same account). | |
| PolarDB Cluster ID | The ID of the destination PolarDB for MySQL cluster. | |
| Database Account | The account used to connect to the destination cluster. | |
| Database Password | The password for the account. | |
| Encryption | The connection encryption method. See Configure SSL encryption. |
After completing the configuration, click Test Connectivity and Proceed at the bottom of the page.
Make sure the CIDR blocks of DTS servers are added to the security settings of the source and destination databases. See Add the CIDR blocks of DTS servers. If the source or destination is a self-managed database, click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.
Step 4: Configure migration objects
On the Configure Objects page, configure the following settings.
| Parameter | Description |
|---|---|
| Migration Types | Select Full Data Migration to migrate historical data only. Select both Full Data Migration and Incremental Data Migration to keep the destination synchronized during migration and minimize downtime. If you run full migration only, do not write to the source database during migration to avoid data inconsistency. |
| Processing Mode of Conflicting Tables | Precheck and Report Errors (recommended): DTS checks for collection name conflicts before starting. If conflicts exist, the precheck fails and the task cannot start. To resolve conflicts, use the object name mapping feature to rename collections in the destination. Ignore Errors and Proceed: DTS skips the name conflict check. Records with duplicate primary keys in the destination are not migrated. Data inconsistency may occur, data may fail to be initialized, only specific columns may be migrated, or the data migration task may fail. |
| Capitalization of object names in destination instance | Controls the capitalization of database names, table names, and column names in the destination. The default is DTS default policy. See Specify the capitalization of object names in the destination instance. |
| Source Objects | Select the collections to migrate and click the right-arrow icon to move them to Selected Objects. |
After selecting your objects, complete the following configuration in Selected Objects:
Map the database name: In Selected Objects, right-click the source database and change Database Name to the target schema name in the PolarDB for MySQL cluster. Click OK.
Map the collection (table) name: Right-click the collection and change Table Name to the target table name in the PolarDB for MySQL cluster.
(Optional) Specify filter conditions to migrate a subset of data. See Specify filter conditions.
(Optional) Select which DML operations to include during incremental migration.
Configure field mappings: DTS auto-generates
bson_value()expressions for each field. Review and adjust Column Name, Type, Length, and Precision for each column.To remove a field that does not need to be migrated, click the delete icon next to the field row.
If the auto-generated expression does not match your field structure (for example, for nested fields), click the delete icon to remove the row, click + Add Column, and enter the correct
bson_value()expression manually. See Field mapping example for reference.
ImportantAssign
bson_value("_id")to the primary key column of the destination table. Specify nested fields using their full hierarchical path inbson_value(). For example, usebson_value("person","name")for thenamesubfield ofperson. Using onlybson_value("person")maps the entirepersonobject to a single column and causes incremental changes to subfields to be lost.Click OK.
Click Next: Advanced Settings.
Step 5: Configure advanced settings
| Parameter | Description |
|---|---|
| Dedicated Cluster for Task Scheduling | By default, DTS schedules the task on a shared cluster. To improve stability, purchase a dedicated cluster. See What is a DTS dedicated cluster. |
| Select the engine type of the destination database | The storage engine for the destination database. InnoDB (default) is the standard engine. X-Engine is optimized for online transaction processing (OLTP) workloads. |
| Retry Time for Failed Connections | How long DTS retries after a connection failure. Valid range: 10–1,440 minutes. Default: 720 minutes. Set this to at least 30 minutes. If multiple tasks share a source or destination database, the most recently set value applies. DTS charges for the instance during retries. |
| Retry Time for Other Issues | How long DTS retries after DDL or DML failures. Valid range: 1–1,440 minutes. Default: 10 minutes. Set this to at least 10 minutes. This value must be less than Retry Time for Failed Connections. |
| Enable Throttling for Full Data Migration | Limits DTS read/write throughput during full migration to reduce load on your databases. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Migration is selected. |
| Only one data type for primary key _id in a single table | Whether the _id field has a consistent data type within a collection. Yesalert notification settings: DTS skips data type scanning during full migration. No: DTS scans and handles multiple data types. Available only when Full Data Migration is selected. |
| Enable Throttling for Incremental Data Migration | Limits DTS throughput during incremental migration. Configure RPS of Incremental Data Migration and Data migration speed for incremental migration (MB/s). Available only when Incremental Data Migration is selected. |
| Environment Tag | An optional tag for the task environment. |
| Configure ETL | Whether to enable extract, transform, and load (ETL) processing. If Yes, enter data transformation statements in the code editor. See Configure ETL in a data migration or data synchronization task. |
| Monitoring and Alerting | Whether to configure alerts for task failures or latency thresholds. If Yes, configure the alert threshold and notification settings. See Configure monitoring and alerting. |
Step 6: Save and run the precheck
Click Next: Save Task Settings and Precheck.
To preview the API parameters for this configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
DTS runs a precheck before starting the migration task. The task can only start after the precheck passes.
If the precheck fails, click View Details next to each failed item, fix the issues, and click Precheck Again.
If an alert is triggered: fix the issue and recheck, or click Confirm Alert Details > Ignore > OK > Precheck Again to proceed despite the alert. Ignoring alert items may cause data inconsistency.
Step 7: Purchase an instance and start the task
Wait for Success Rate to reach 100%, then click Next: Purchase Instance.
On the Purchase Instance page, configure the following parameters.
Parameter Description Resource Group The resource group for the migration instance. Default: default resource group. See What is Resource Management? Instance Class The instance class determines migration speed. Select based on your workload. See Instance classes of data migration instances. Read and accept the Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the confirmation dialog.
The migration task appears on the Data Migration page. Monitor progress from there.
Before running the migration task, evaluate the performance of the source and destination databases. Run migrations during off-peak hours when possible. During full data migration, DTS uses read and write resources on both databases, which may increase their load. Also be aware that concurrent INSERT operations during full migration cause table fragmentation in the destination. After full migration completes, space usage in the destination may be larger than in the source.
Data type mapping
| MongoDB data type | PolarDB for MySQL data type |
|---|---|
| ObjectId | VARCHAR |
| String | VARCHAR |
| Document | VARCHAR |
| DbPointer | VARCHAR |
| Array | VARCHAR |
| Date | DATETIME |
| TimeStamp | DATETIME |
| Double | DOUBLE |
| 32-bit integer (BsonInt32) | INTEGER |
| 64-bit integer (BsonInt64) | BIGINT |
| Decimal128 | DECIMAL |
| Boolean | BOOLEAN |
| Null | VARCHAR |
Field mapping example
The following example shows how to map a nested MongoDB document to a PolarDB for MySQL table.
Source MongoDB data structure
{
"_id": "62cd344c85c1ea6a2a9f****",
"person": {
"name": "neo",
"age": 26,
"sex": "male"
}
}Destination PolarDB for MySQL table schema
| Column name | Type |
|---|---|
| mongo_id | varchar (primary key) |
| person_name | varchar |
| person_age | decimal |
New column configuration
Use the full nested path in bson_value() expressions. Using bson_value("person") maps the entire person object to a single column and causes incremental changes to person.name, person.age, and person.sex to be lost.
| Column name | Type | Assign value |
|---|---|---|
| mongo_id | STRING | bson_value("_id") |
| person_name | STRING | bson_value("person","name") |
| person_age | DECIMAL | bson_value("person","age") |