Use Data Transmission Service (DTS) to synchronize data from an ApsaraDB for MongoDB replica set or sharded cluster instance to a Lindorm wide table. This guide covers both full and incremental synchronization.
Prerequisites
Before you begin, make sure you have:
-
An ApsaraDB for MongoDB instance in the Germany (Frankfurt) region, deployed in the replica set or sharded cluster architecture.
ImportantIf the source is a sharded cluster instance, apply for endpoints for each shard node first. All shard nodes share the same account and password. For details, see Apply for an endpoint for a shard.
-
A Lindorm instance using the wide table engine, with available storage space larger than the total data size in the source MongoDB instance. For best results, provision at least 10% more storage than the source data size. See Create an instance.
-
A wide table created in the Lindorm instance. See Use Lindorm-cli to connect to and use LindormTable and Use Lindorm Shell to connect to LindormTable.
The table must comply with Quotas and limits. If you create the table using the Apache HBase API, add column mappings before configuring the synchronization task. See Add column mappings for an Apache HBase API table.
Billing
| Synchronization type | Cost |
|---|---|
| Full data synchronization | Free |
| Incremental data synchronization | Charged. See Billing overview. |
Supported synchronization types
| Type | Description |
|---|---|
| Full data synchronization | Copies all existing data from selected databases or collections in the source instance to the destination Lindorm instance. |
| Incremental data synchronization | Continuously replicates insert, update, and delete operations on collections. During incremental sync of a file, only the $set command is applied synchronously. |
Schema synchronization is not supported.
Required permissions
| Database | Required permissions | How to grant |
|---|---|---|
| Source ApsaraDB for MongoDB | Read permissions on the source, admin, and local databases |
Manage the permissions of MongoDB database users |
| Destination Lindorm | Read and write permissions on the target namespaces | Permission management for access control |
Limitations
Source database limitations
-
The source server must have sufficient outbound bandwidth, or synchronization speed is affected.
-
Collections to be synchronized must have PRIMARY KEY or UNIQUE constraints, and all field values must be unique. Otherwise, duplicate records may appear in the destination.
-
DTS cannot connect to a MongoDB instance over a SRV endpoint.
-
The source cannot be an Azure Cosmos DB for MongoDB cluster or an Amazon DocumentDB elastic cluster.
-
A single data entry cannot exceed 16 MB.
-
If the source is a sharded cluster instance:
-
The
_idfield in each collection must be unique. Otherwise, data inconsistency may occur. -
The number of Mongos nodes cannot exceed 10.
-
Do not run the following commands during synchronization:
shardCollection,reshardCollection,unshardCollection,moveCollection, ormovePrimary. These commands change data distribution and can cause data inconsistency. -
Make sure the source instance has no orphaned documents before starting the task. See the MongoDB documentation and the DTS FAQ for how to remove them.
-
If the balancer is active on the source, it may cause latency.
-
-
If the source contains TTL indexes, data inconsistency may occur after synchronization.
-
The oplog must be enabled and retain at least 7 days of data. Alternatively, change streams must be enabled and cover the last 7 days. If neither condition is met, DTS may fail to obtain incremental changes, which can result in data loss or inconsistency — not covered by the DTS service level agreement (SLA).
-
Use the oplog to record data changes when possible.
-
Change streams require MongoDB 4.0 or later and do not support two-way synchronization.
-
For non-elastic Amazon DocumentDB clusters, use change streams and set Migration Method to ChangeStream and Architecture to Sharded Cluster.
-
-
During full data synchronization, do not modify database or collection schemas, or data of the ARRAY type. Do not write to the source database if running full synchronization only.
-
If you select collections as the objects to synchronize and need to rename them in the destination, a single task supports up to 1,000 collections. For more than 1,000 collections, configure multiple tasks or select the entire database as the synchronization object.
Destination and task limitations
-
Only tasks within the Germany (Frankfurt) region are supported.
-
DTS cannot synchronize data from the
admin,config, orlocaldatabases. -
The destination Lindorm instance cannot have collections with
_idor_valuecolumns. -
Transactions are not retained. DTS converts each transaction to individual records at the destination.
-
The data written to Lindorm must meet the requirements in Limits on data requests.
-
For UPDATE and DELETE operations in incremental synchronization:
-
If the wide table is created using Lindorm SQL, add a non-primary key column named
_mongo_id_when creating the table. The column's data type must match the_idcolumn type in the source. Create a secondary index on this column. -
If the wide table is created using the Apache HBase API, add a non-primary key column named
_mongo_id_with column familyf. The column's data type must match the_idcolumn type in the source. Create a secondary index on this column. If you plan to add extra columns and use the extract, transform, and load (ETL) feature, make sure the Lindorm instance has no duplicate data.
-
-
DTS uses
ROUND(COLUMN,PRECISION)to retrieve values from FLOAT and DOUBLE columns. The default precision is 38 digits for FLOAT and 308 digits for DOUBLE. Verify that these precision settings meet your requirements. -
DTS attempts to resume failed tasks for up to 7 days. Before switching workloads to the destination, stop or release any failed tasks, or run
REVOKEto remove DTS write permissions on the destination. Otherwise, resumed tasks will overwrite destination data with source data. -
Run the task during off-peak hours. Full data synchronization reads and writes both source and destination databases, which increases server load and causes temporary fragmentation in destination collections.
-
Synchronization latency is calculated from the latest synchronized data timestamp versus the current source timestamp. If the source has no writes for an extended period, this metric may be inaccurate. Perform a write on the source to refresh the latency value.
-
If a DTS task fails, technical support attempts to restore it within 8 hours. The task may be restarted and certain task parameters may be modified during restoration. Database parameters are not modified.
Create a synchronization task
Step 1: Open the data synchronization page
Use one of the following methods.
DTS console
-
Log on to the DTS console.DTS console
-
In the left-side navigation pane, click Data Synchronization.
-
In the upper-left corner, select the region where the synchronization task resides.
DMS console
The exact steps may vary based on the DMS console mode and layout. See Simple mode and Customize the layout and style of the DMS console.
-
Log on to the DMS console.DMS console
-
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
-
From the drop-down list next to Data Synchronization Tasks, select the region where the synchronization instance resides.
Step 2: Configure source and destination databases
Click Create Task, then configure the parameters.
Task settings:
| Parameter | Description |
|---|---|
| Task Name | A name for the DTS task. DTS auto-generates a name. Specify a descriptive name to make the task easy to identify. The name does not need to be unique. |
Source database:
| Parameter | Description |
|---|---|
| Select Existing Connection | Select a registered database instance, or leave blank and configure the parameters below. In the DMS console, select from the Select a DMS database instance drop-down list. |
| Database Type | Select MongoDB. |
| Access Method | Select Alibaba Cloud Instance. |
| Instance Region | The region of the source ApsaraDB for MongoDB instance. |
| Replicate Data Across Alibaba Cloud Accounts | Select No for within-account synchronization. |
| Architecture | Select Replica Set or Sharded Cluster based on your source instance. If you select Sharded Cluster, also configure the Shard account and Shard password parameters. |
| Migration Method | How DTS reads incremental data from the source. Oplog (recommended) reads the operations log and offers low latency. ChangeStream uses MongoDB change streams (requires MongoDB 4.0 or later; does not support two-way synchronization). When using ChangeStream, if you select Sharded Cluster for the Architecture parameter, you do not need to configure the Shard account and Shard password parameters. |
| Instance ID | The ID of the source ApsaraDB for MongoDB instance. |
| Authentication Database | The database that stores the account credentials. The default is admin. |
| Database Account | The database account for the source instance. |
| Database Password | The password for the account. |
| Encryption | Select Non-encrypted, SSL-encrypted, or Mongo Atlas SSL. Available options depend on the Access Method and Architecture values. If Architecture is Sharded Cluster and Migration Method is Oplog, SSL-encrypted is unavailable. |
Destination database:
| Parameter | Description |
|---|---|
| Select Existing Connection | Select a registered database instance, or leave blank and configure the parameters below. In the DMS console, select from the Select a DMS database instance drop-down list. |
| Database Type | Select Lindorm. |
| Access Method | Select Alibaba Cloud Instance. |
| Instance Region | The region of the destination Lindorm instance. |
| Instance ID | The ID of the destination Lindorm instance. |
| Database Account | The database account for the destination instance. |
| Database Password | The password for the account. |
Step 3: Test connectivity
Click Test Connectivity and Proceed.
DTS automatically adds its server CIDR blocks to the security settings of the source and destination databases. If you are using a self-managed database with an access method other than Alibaba Cloud Instance, click Test Connectivity in the CIDR Blocks of DTS Servers dialog box to confirm connectivity. For details, see Add the CIDR blocks of DTS servers.
Step 4: Configure objects to synchronize
In the Configure Objects step, set the following parameters.
| Parameter | Description |
|---|---|
| Synchronization Types | Incremental Data Synchronization is selected by default. You can also select Full Data Synchronization only. Schema synchronization is not available. |
| Processing Mode of Conflicting Tables | Keep the default setting. |
| Capitalization of Object Names in Destination Instance | Controls the capitalization of database and collection names at the destination. Default is DTS default policy. See Specify the capitalization of object names in the destination instance. |
| Source Objects | Select one or more objects, then click |
| Selected Objects | Configure column mappings for each collection. Columns not added to the destination wide table are not synchronized. |
Map MongoDB fields to Lindorm columns:
When you add a collection to Selected Objects, DTS pre-populates a bson_value() expression for each field. Follow these steps to finalize the column mapping.
-
Edit the database name (optional): Right-click the database in Selected Objects, enter the target schema name in the Edit Schema dialog, and select the DML operations to synchronize. Click OK.

-
Edit the collection name (optional): Right-click the collection in Selected Objects, enter the target table name in the Edit Table Name dialog, and optionally specify filter conditions or select DML operations. Click OK.

-
Configure column mappings: DTS auto-generates a
bson_value()expression for each row. Check each row and specify Column Name, Type, Length, and Precision. In eachbson_value()expression, the string in""is the field name in the source MongoDB document. For example,bson_value("age")maps theagefield. To remove a field from synchronization, click
next to that row. The expression meets the requirements
-
Set Column Name:
-
For SQL-created tables: use the destination column name.
-
For Apache HBase API tables: use
ROWfor the primary key column, andColumn family:Column name(for example,person:name) for non-primary key columns. Create column mappings first if needed — see Add column mappings for an Apache HBase API table.
-
-
Select the Type for each column. Make sure it is compatible with the source data type.
-
Optionally set Length and Precision.
-
Repeat for all columns.
The expression does not meet the requirements
-
Click
next to the row, then click + New Column. 
-
Set Column Name, Type, Length, and Precision.
-
Enter the
bson_value()expression in Assign Value. See Value assignment example for reference. > Important: Assignbson_value("_id")to the primary key column. For nested fields, specify the full hierarchical path inbson_value(). For example, usebson_value("person","name")instead ofbson_value("person"). Using only the parent field causes data loss or task failure for incremental operations. -
Repeat for all columns.
-
-
Click OK.
Step 5: Configure advanced settings
Click Next: Advanced Settings and configure the following parameters.
| Parameter | Description |
|---|---|
| Dedicated Cluster for Task Scheduling | By default, DTS schedules the task to the shared cluster. For higher stability, purchase a dedicated cluster. See What is a DTS dedicated cluster. |
| Retry Time for Failed Connections | How long DTS retries after a connection failure. Valid values: 10–1440 minutes. Default: 720 minutes. Set this to at least 30 minutes. If multiple tasks share the same source or destination, the shortest retry time takes effect. DTS continues to charge during retry. |
| Retry Time for Other Issues | How long DTS retries after DDL or DML failures. Valid values: 1–1440 minutes. Default: 10 minutes. Set this to at least 10 minutes. This value must be smaller than Retry Time for Failed Connections. |
| Enable Throttling for Full Data Synchronization | Limits the load on source and destination databases during full synchronization. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected. |
| Only one data type for primary key _id in a table of the data to be synchronized | Set to Yesalert notification settings if all documents in a collection use the same data type for _id — DTS skips the type scan for faster full synchronization. Set to No if _id types vary — DTS scans all types before synchronizing. Specify this correctly to avoid data loss. Available only when Full Data Synchronization is selected. |
| Enable Throttling for Incremental Data Synchronization | Limits the load on the destination during incremental synchronization. Configure RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s). |
| Environment Tag | An optional tag to identify the DTS instance environment. |
| Configure ETL | Enable the ETL feature to transform data during synchronization. Set to Yes to enter processing statements in the code editor. See Configure ETL in a data migration or data synchronization task and ETL example for Apache HBase API tables. If the destination table is created using the Apache HBase API, specify which columns to include and exclude in the ETL script. By default, top-level fields are stored in the f column family. The following code shows how to write data rows of columns other than _id and name as dynamic columns to the destination table: script:e_expand_bson_value("*", "_id,name"). DTS does not synchronize extra columns or columns outside the ETL task. |
| Monitoring and Alerting | Set to Yes to receive alerts when the task fails or synchronization latency exceeds the threshold. Configure the alert threshold and notification settings. See Configure monitoring and alerting when you create a DTS task. |
Step 6: Run the precheck
Click Next: Save Task Settings and Precheck.
To view the OpenAPI parameters for this task configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters before proceeding.
DTS runs a precheck before starting the task. If the precheck fails:
-
Click View Details next to the failed item, resolve the issue, and click Precheck Again.
-
If an item shows an alert that can be ignored: click Confirm Alert Details, then in the View Details dialog click Ignore > OK, then click Precheck Again. Note that ignoring alerts may result in data inconsistency.
Step 7: Purchase an instance
-
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
-
On the buy page, configure the following parameters.
| Parameter | Description |
|---|---|
| Billing Method | Subscription: pay upfront for a fixed term; more cost-effective for long-term use. Pay-as-you-go: billed hourly; suited for short-term use. Release the instance when no longer needed to avoid unnecessary charges. |
| Resource Group Settings | The resource group for the synchronization instance. Default: default resource group. See What is Resource Management? |
| Instance Class | The synchronization throughput class. See Instance classes of data synchronization instances. |
| Subscription Duration | Available for the Subscription billing method. Options: 1–9 months, or 1, 2, 3, or 5 years. |
-
Read and accept Data Transmission Service (Pay-as-you-go) Service Terms.
-
Click Buy and Start, then click OK in the confirmation dialog.
The task appears in the task list. You can monitor its progress there.
Example of adding column mappings for a table created by calling the Apache HBase API
The following example uses SQL Shell. The Lindorm instance must be version 2.4.0 or later.
-
Create column mappings:
ALTER TABLE test MAP DYNAMIC COLUMN f:_mongo_id_ HSTRING/HINT/..., person:name HSTRING, person:age HINT; -
Create a secondary index:
CREATE INDEX idx ON test(f:_mongo_id_);
Example of configuring an ETL task for a table created by calling the Apache HBase API
Source document in ApsaraDB for MongoDB:
{
"_id": 0,
"person": {
"name": "cindy0",
"age": 0,
"student": true
}
}
ETL script — expands all top-level fields except _id as dynamic columns:
script:e_expand_bson_value("*", "_id")
Synchronization result:
Example of value assignment
Use this example as a reference when configuring bson_value() expressions for nested fields.
Source document structure:
{
"_id": "62cd344c85c1ea6a2a9f****",
"person": {
"name": "neo",
"age": "26",
"sex": "male"
}
}
Destination table schema in Lindorm:
| Column name | Type |
|---|---|
| id | STRING |
| person_name | STRING |
| person_age | INT |
Column configuration:
Always specify the full hierarchical path in bson_value(). Using bson_value("person") instead of bson_value("person","name") causes DTS to fail when writing incremental data to nested fields — data loss or task failure may result.
| Column name | Type | Assign Value |
|---|---|---|
| id | STRING | bson_value("_id") |
| person_name | STRING | bson_value("person","name") |
| person_age | BIGINT | bson_value("person","age") |