Sync MongoDB to PolarDB MySQL via DTS - Data Transmission Service

Data Transmission Service (DTS) synchronizes data from an ApsaraDB for MongoDB replica set to a PolarDB for MySQL cluster. Use this topic to create and run a synchronization task from start to finish.

Before you begin

Before creating a synchronization task, complete the following preparation steps.

Set up the destination cluster

Create a PolarDB for MySQL cluster with available storage larger than the total size of source data. (Recommended: at least 10% larger.) See Custom purchase and Purchase a subscription cluster.
Create a database and a table with a primary key column in the destination cluster. See Manage databases.

Important

When designing the destination table schema:

Use varchar for any column that maps to a MongoDB ObjectId _id field.
Do not name any column _id or _value.

Configure accounts with the required permissions

Database	Required permissions	Reference
Source ApsaraDB for MongoDB	Read on the source database, the admin database, and the local database	Account management
Destination PolarDB for MySQL	Read and write on the destination database	Create and manage a database account

(Sharded cluster only) Apply for shard endpoints

If the source is a sharded cluster, apply for endpoints for all shard nodes. All shard nodes must share the same account password and endpoint. See Apply for an endpoint for a shard.

Billing

Synchronization type	Fee
Full data synchronization	Free
Incremental data synchronization	Charged. See Billing overview.

Synchronization types

Type	Description
Full data synchronization	Synchronizes historical data from the source ApsaraDB for MongoDB instance to the destination PolarDB for MySQL cluster.
Incremental data synchronization	After full data synchronization completes, continuously synchronizes insert, update, and delete operations. Only documents updated using the `$set` command are included.

Limitations

Source database limitations

Limitation	Details
Outbound bandwidth	The source server must have sufficient outbound bandwidth. Insufficient bandwidth reduces synchronization speed.
Collection limit	A single task supports up to 1,000 collections when object renaming is required. For more than 1,000 collections, configure multiple tasks.
Unsupported databases	DTS cannot synchronize data from the admin, config, or local databases.
Unsupported source types	Standalone ApsaraDB for MongoDB instances, Azure Cosmos DB for MongoDB clusters, and Amazon DocumentDB elastic clusters are not supported.
Oplog or change stream	The oplog feature must be enabled with operation logs retained for at least 7 days, OR change streams must be enabled and DTS must be able to subscribe to changes within the last 7 days. If neither condition is met, DTS may fail to obtain logs, causing task failure or data inconsistency.
Change stream version	Change streams require MongoDB V4.0 or later.
Amazon DocumentDB inelastic clusters	Set Migration Method to ChangeStream and Architecture to Sharded Cluster.
Operations during full synchronization	Do not change the schemas of databases or collections, or modify data of the ARRAY type. If running full synchronization only (no incremental), do not write to the source database.

Sharded cluster additional limitations:

The _id field in each collection must be unique. Duplicate _id values cause data inconsistency.
The number of mongos nodes cannot exceed 10.
The instance must not contain orphaned documents. See the MongoDB documentation and the FAQ topic.
If the ApsaraDB for MongoDB balancer is enabled, the instance may experience delays.

Destination database and task limitations

Limitation	Details
Sync object type	Only collections can be selected as synchronization objects.
Primary key requirement	The destination table must have a unique single-column primary key (composite primary keys are not supported). Assign `bson_value("_id")` to the primary key column.
Reserved column names	The destination table cannot have columns named _id or _value.
Transactions	Transactions are not retained. Synchronized transactions are converted to single records.
Character set	If the data includes rare characters or emojis (4-byte characters), the destination database and tables must use the UTF8mb4 character set. If you use DTS schema synchronization, set the `character_set_server` parameter to UTF8mb4.
FLOAT/DOUBLE precision	DTS uses `ROUND(COLUMN,PRECISION)` to handle FLOAT and DOUBLE values. If no precision is specified, DTS defaults to 38 digits for FLOAT and 308 digits for DOUBLE. Verify these defaults before starting synchronization.
Off-peak hours	Run synchronization during off-peak hours. Full data synchronization uses read and write resources of both databases, which may increase server load.
Post-synchronization storage	After full synchronization, concurrent INSERT operations may cause fragmentation in destination collections, resulting in higher storage usage than in the source.
Failed task resume	DTS attempts to resume failed tasks for up to 7 days. Before switching workloads to the destination, stop or release any failed tasks, or revoke DTS write permissions using `REVOKE`. Otherwise, source data may overwrite destination data when the task resumes.
Latency calculation	DTS calculates incremental synchronization latency based on the timestamp of the latest synced data in the destination and the current timestamp in the source. Extended periods without source updates may cause inaccurate latency readings. Perform an update on the source to refresh the latency.
Task failure recovery	If a DTS task fails, DTS technical support attempts to restore it within 8 hours. The task may be restarted and task parameters (not database parameters) may be modified.

Create a data synchronization task

The task configuration consists of five steps:

Go to the data synchronization page.
Configure source and destination databases.
Configure objects to synchronize.
Run a precheck.
Purchase an instance.

Step 1: Go to the data synchronization page

Use one of the following methods.

DTS console

Log on to the DTS console.DTS console
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the synchronization task resides.

DMS console

The actual steps may vary based on the mode and layout of the DMS console. See Simple mode and Customize the layout and style of the DMS console.

Log on to the DMS console.DMS console
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list to the right of Data Synchronization Tasks, select the region where the task resides.

Step 2: Configure source and destination databases

Click Create Task.
(Optional) Click New Configuration Page in the upper-right corner.
- Skip this step if Back to Previous Version is displayed — the page is already on the new version. - Use the new version of the configuration page when possible.
Configure the source and destination databases using the following parameters.

General

Parameter	Description
Task Name	The name of the DTS task. DTS generates a name automatically. Specify a descriptive name to identify the task. The name does not need to be unique.

Source database

Parameter	Description
Select Existing Connection	If the source instance is already registered with DTS, select it from the drop-down list. DTS auto-populates the remaining parameters. Otherwise, configure the parameters below manually. In the DMS console, select the instance from the Select a DMS database instance drop-down list.
Database Type	Select MongoDB.
Access Method	Select Alibaba Cloud Instance.
Instance Region	The region where the source ApsaraDB for MongoDB instance resides.
Replicate Data Across Alibaba Cloud Accounts	Select No if the source database belongs to the current Alibaba Cloud account.
Architecture	The architecture of the source instance. Select Replica Set for this example. If the source is a Sharded Cluster, also specify Shard account and Shard password.
Migration Method	The method used to synchronize incremental data. Options: Oplog (recommended) or ChangeStream. <br>- Oplog: Requires the oplog feature to be enabled. Oplog is enabled by default for ApsaraDB for MongoDB instances and delivers low synchronization latency due to fast log-pulling speed.<br>- ChangeStream: Requires change streams to be enabled. Available for MongoDB V4.0 or later. For inelastic Amazon DocumentDB clusters, use ChangeStream only. If Architecture is Sharded Cluster, the Shard account and Shard password parameters are not required. See Change Streams.
Instance ID	The ID of the source ApsaraDB for MongoDB instance.
Authentication Database	The database that stores the account credentials. Default: admin.
Database Account	The account with the required permissions.
Database Password	The password for the database account.
Encryption	The connection encryption method: Non-encrypted, SSL-encrypted, or Mongo Atlas SSL. Available options depend on the Access Method and Architecture settings — the options displayed in the console apply. Note SSL-encrypted is unavailable when Architecture is Sharded Cluster and Migration Method is Oplog. For self-managed MongoDB using Replica Set architecture with a non-Alibaba Cloud access method and SSL encryption, upload a CA certificate to verify the connection.

Destination database

Parameter	Description
Select Existing Connection	If the destination instance is already registered with DTS, select it from the drop-down list. DTS auto-populates the remaining parameters. Otherwise, configure the parameters below manually.
Database Type	Select PolarDB for MySQL.
Access Method	Select Alibaba Cloud Instance.
Instance Region	The region where the destination PolarDB for MySQL cluster resides.
Replicate Data Across Alibaba Cloud Accounts	Select No if the destination database belongs to the current Alibaba Cloud account.
PolarDB Cluster ID	The ID of the destination PolarDB for MySQL cluster.
Database Account	The account with the required permissions.
Database Password	The password for the database account.
Encryption	The connection encryption method. See Configure SSL encryption.

Click Test Connectivity and Proceed.
- DTS server CIDR blocks must be added to the security settings of both databases. DTS adds these automatically, or add them manually. See Add the CIDR blocks of DTS servers. - For self-managed databases where Access Method is not Alibaba Cloud Instance, click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.

Step 3: Configure objects to synchronize

In the Configure Objects step, set the following parameters.

Parameter	Description
Synchronization Types	Incremental Data Synchronization is selected by default. Select Full Data Synchronization if needed. Schema Synchronization cannot be selected.
Processing Mode of Conflicting Tables	Precheck and Report Errors (default): Checks for table name conflicts before starting. If identical table names exist, the precheck fails and the task cannot start. Use the object name mapping feature to rename conflicting tables if they cannot be deleted or renamed. <br>Ignore Errors and Proceed: Skips the conflict check. Warning This option risks data inconsistency. During full synchronization, existing destination records with matching primary or unique keys are retained. During incremental synchronization, they are overwritten. If schemas differ, initialization may fail or only partial columns are synchronized.
Capitalization of Object Names in Destination Instance	The capitalization policy for database names, table names, and column names in the destination. Default: DTS default policy. See Specify the capitalization of object names.
Source Objects	Select one or more collections from the Source Objects section, then click the icon to move them to the Selected Objects section.

In the Selected Objects section, configure object mapping. (Optional) To remove fields that do not need to be synchronized, click the icon after the row.
1. Rename the database: Right-click the database in Selected Objects. Change Schema Name to the target database name in PolarDB for MySQL. Click OK.
2. Rename collections: Right-click the collection in Selected Objects. Change Table Name to the target table name in PolarDB for MySQL.
  - (Optional) Specify filter conditions. See Specify filter conditions.
  - (Optional) In the Select DDL and DML Operations to Be Synchronized section, select which incremental operations to synchronize.
3. Map fields: DTS automatically maps collection data and generates bson_value() expressions in the Assign Value column. Verify that the expressions meet your requirements, then configure Column Name, Type, Length, and Precision for each field.
  Important
  Assign bson_value("_id") to the primary key column of the destination table. Specify both the field and any subfields in each bson_value() expression following the document hierarchy. Specifying only a parent field (for example, bson_value("person")) does not synchronize its subfields to the destination.
  Fields with correct expressions
  1. Set Column Name to the corresponding column name in the destination PolarDB for MySQL table.
  2. Select a Type compatible with the source data. For data type mappings, see the Data type mapping section.
  3. (Optional) Set Length and Precision.
  4. Repeat for each field.
  Fields with incorrect expressions
  1. Click the icon in the Actions column for the row.
  2. Click + Add Column.
  3. Set Column Name, Type, Length, and Precision.
  4. Enter the correct bson_value() expression in Assign Value. For examples, see the Field mapping examples section.
  5. Repeat for each field.
  Important
  - Assign bson_value("_id") to the primary key column of the destination table. - Specify both the field and any subfields in each bson_value() expression following the document hierarchy. Specifying only a parent field (for example, bson_value("person")) does not synchronize its subfields to the destination.
4. Click OK.
Click Next: Advanced Settings and configure the following parameters.

Parameter	Description
Dedicated Cluster for Task Scheduling	By default, DTS schedules tasks to a shared cluster. Purchase a dedicated cluster to improve synchronization stability. See What is a DTS dedicated cluster.
Select the engine type of the destination database	The storage engine of the destination database. Options: InnoDB (default) or X-Engine (for OLTP workloads).
Retry Time for Failed Connections	The time range in which DTS retries failed connections. Valid values: 10–1440 minutes. Default: 720. Set to a value greater than 30. If DTS reconnects within this range, the task resumes. Otherwise, the task fails. If multiple tasks share the same source or destination database with different retry ranges, the shortest range takes precedence. DTS continues to charge during retries.
Retry Time for Other Issues	The time range in which DTS retries failed DDL or DML operations. Valid values: 1–1440 minutes. Default: 10. Set to a value greater than 10 and less than the Retry Time for Failed Connections value.
Enable Throttling for Full Data Synchronization	Limits read/write resource usage during full synchronization to reduce database server load. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected.
Only one data type for primary key _id in a table of the data to be synchronized	Controls whether DTS scans the `_id` data type during full synchronization. Yesalert notification settings: Skip the scan. No: Scan the type. Displayed only when Full Data Synchronization is selected.
Enable Throttling for Incremental Data Synchronization	Limits resource usage during incremental synchronization. Configure RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Environment Tag	An optional tag for categorizing the task.
Configure ETL	Specifies whether to enable the extract, transform, and load (ETL) feature. Yes: Enter data processing statements in the code editor. See Configure ETL. No: Disable ETL.
Monitoring and Alerting	Specifies whether to configure alerting. Yes: Set alert thresholds and notification contacts. No: No alerts. See Configure monitoring and alerting.

Step 4: Save settings and run a precheck

To preview the API parameters for this task, move the pointer over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
To proceed, click Next: Save Task Settings and Precheck.

DTS runs a precheck before starting the task. The task starts only after passing the precheck.

If the precheck fails, click View Details next to each failed item, fix the issue, and click Precheck Again.

If an alert is triggered: for items that cannot be ignored, fix the issue and rerun the precheck. For ignorable items, click Confirm Alert Details > Ignore > OK > Precheck Again. Ignoring alerts may cause data inconsistency.

Step 5: Purchase an instance

Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
On the buy page, configure the following parameters.

Parameter	Description
Billing Method	Subscription: Pay upfront. More cost-effective for long-term use. Pay-as-you-go: Billed hourly. Suitable for short-term use. Release the instance when no longer needed to stop charges.
Resource Group Settings	The resource group for the synchronization instance. Default: default resource group. See What is Resource Management?
Instance Class	The instance class determines synchronization speed. See Instance classes.
Subscription Duration	Available only for the Subscription billing method. Options: 1–9 months, or 1, 2, 3, or 5 years.

Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the dialog box.

After the task starts, monitor its progress in the task list.

Data type mapping

The following table shows how MongoDB data types map to PolarDB for MySQL data types.

MongoDB data type	PolarDB for MySQL data type	Notes
ObjectId	VARCHAR	Stored as a string representation.
String	VARCHAR
Document	VARCHAR
DbPointer	VARCHAR
Array	VARCHAR
Date	DATETIME
TimeStamp	DATETIME
Double	DOUBLE	Precision defaults to 308 digits via `ROUND(COLUMN,PRECISION)` if not specified.
32-bit integer (BsonInt32)	INTEGER
64-bit integer (BsonInt64)	BIGINT
Decimal128	DECIMAL
Boolean	BOOLEAN
Null	VARCHAR

Field mapping examples

The examples below use the following source document structure and destination table schema.

Data structure of the source ApsaraDB for MongoDB instance

{
  "_id": "62cd344c85c1ea6a2a9f****",
  "person": {
    "name": "neo",
    "age": 26,
    "sex": "male"
  }
}

Table schema of the destination PolarDB for MySQL cluster

Column name	Type	Notes
mongo_id	varchar	Primary key
person_name	varchar
person_age	decimal

Configuration of new columns

All three destination columns require nested field expressions because person is a parent field containing subfields.

Column name	Type	Assign value
mongo_id	STRING	`bson_value("_id")`
person_name	STRING	`bson_value("person","name")`
person_age	DECIMAL	`bson_value("person","age")`