All Products
Search
Document Center

Data Transmission Service:Precautions and limits for synchronizing data from a MongoDB database

Last Updated:Mar 11, 2024

This topic describes the precautions and limits when you use Data Transmission Service (DTS) to synchronize data from a MongoDB database, such as a self-managed MongoDB database or an ApsaraDB for MongoDB instance. To ensure that your data synchronization task runs as expected, you must read the precautions and limits before you configure the task.

Scenarios of synchronizing data from a MongoDB database

The following list provides the scenarios of synchronizing data from a MongoDB database. The precautions and limits in the scenarios may vary. You can go to the related section to view the precautions and limits in a specific scenario.

Synchronize data from a MongoDB replica set database to another MongoDB replica set database or MongoDB sharded cluster database

The following table describes the precautions and limits when you synchronize data to a MongoDB database, such as a self-managed MongoDB database or an ApsaraDB for MongoDB instance.

Category

Description

Limits on the source database

  • Bandwidth requirements: The server to which the source database is deployed must have sufficient outbound bandwidth. Otherwise, the data synchronization speed is affected.

  • The collections to be synchronized must have PRIMARY KEY or UNIQUE constraints and all fields must be unique. Otherwise, the destination database may contain duplicate data records.

  • If you select collections as the objects to be synchronized and you need to edit collections in the destination database, such as renaming collections, up to 1,000 collections can be synchronized in a single data synchronization task. If you run a task to synchronize more than 1,000 collections, a request error occurs. In this case, we recommend that you configure multiple tasks to synchronize the collections in batches or configure a task to synchronize the entire database.

  • The oplog feature must be enabled.

    Note

    The operation logs of the source database must be retained for at least seven days. Otherwise, DTS may fail to obtain the operation logs and the task may fail. In exceptional circumstances, data inconsistency or loss may occur. Make sure that you set the retention period of operation logs based on the preceding requirements. Otherwise, the service level agreement (SLA) of DTS does not guarantee service reliability or performance.

  • You cannot synchronize collections that contain time to live (TTL) indexes. If the database to be synchronized contains TTL indexes, data inconsistency may occur between the source and destination databases due to inconsistent time zones and time clocks of the source and destination databases.

Other limits

  • If the destination database is a sharded cluster database, take note of the following limits:

    • Orphaned documents must be cleared. Otherwise, the synchronization performance is compromised. During data synchronization, if a _id conflict exists in the documents of the source and destination databases, data inconsistency may occur, or the data synchronization task may fail.

    • Before you start the data synchronization task, you must add sharding keys to the data to be synchronized in the source database. If you cannot add sharding keys to the data in the source database, you can synchronize data from a MongoDB database without sharding keys. For more information, see Synchronize data from a MongoDB instance without a sharding key to a MongoDB sharded cluster instance.

    • During the data synchronization, if you execute the INSERT statement to insert data into the data to be synchronized, the data to be synchronized must contain sharding keys. If you execute the UPDATE statement to modify the data to be synchronized, you cannot modify sharding keys.

  • To ensure compatibility, the version of the destination MongoDB database must be the same as or later than the version of the source MongoDB database. If the version of the destination database is earlier than that of the source database, database compatibility issues may occur.

  • DTS cannot synchronize data from the admin or local database.

  • If a collection of the destination database has a unique index or the capped attribute of a collection of the destination database is true, the collection supports only single-thread data writing and does not support concurrent replay during incremental data synchronization. This may increase synchronization latency.

  • Transaction information is not retained. When transactions are synchronized to the destination database, transactions are converted into a single record.

  • Before you synchronize data, evaluate the impact of data synchronization on the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. During full data synchronization, DTS uses read and write resources of the source and destination databases. This may increase the loads on the database servers.

  • During full data synchronization, concurrent INSERT operations cause fragmentation in the collections of the destination database. After full data synchronization is complete, the storage space for collections of the destination database is larger than that of the source database.

  • During data synchronization, we recommend that you use only DTS to write data to the destination database. This prevents data inconsistency between the source and destination databases. For example, if you use tools other than DTS to write data to the destination database, data loss may occur in the destination database when you use Data Management (DMS) to perform online DDL operations.

  • The data is concurrently written to the destination database. Therefore, the storage space occupied in the destination database is 5% to 10% larger than the size of the data in the source database.

  • You must use the db.$table_name.aggregate([{ $count:"myCount"}]) syntax to query the return value of a count operation on the destination MongoDB database.

  • Make sure that the destination MongoDB database does not have the same primary key as the source database. The default primary key is _id. Otherwise, data may be lost. If the data in the destination database has the same primary key as that in the source database, clear the related data in the destination database without interrupting the services of DTS. For example, if the same primary key is _id, you can delete the data in the destination database that has the same _id as the source database.

Special cases

If the source database is a self-managed MongoDB database, take note of the following limits:

  • If you perform a primary/secondary switchover on the source database when the data synchronization task is running, the task fails.

  • DTS calculates synchronization latency based on the timestamp of the latest synchronized data in the destination database and the current timestamp in the source database. If no update operation is performed on the source database for an extended period of time, the synchronization latency may be inaccurate. If the latency of the synchronization task is excessively high, you can perform an update operation on the source database to update the latency.

Note

If you select an entire database as the object to be synchronized, you can create a heartbeat table. The heartbeat table is updated or receives data every second.

Configure two-way data synchronization between MongoDB sharded cluster databases

The following table describes the precautions and limits when you synchronize data to a MongoDB database, such as a self-managed MongoDB database or an ApsaraDB for MongoDB instance.

Category

Description

Limits on the source and destination databases

  • Bandwidth requirements: The server to which the source database is deployed must have sufficient outbound bandwidth. Otherwise, the data synchronization speed is affected.

  • The collections to be synchronized must have PRIMARY KEY or UNIQUE constraints and all fields must be unique. Otherwise, the destination database may contain duplicate data records.

  • If you select collections as the objects to be synchronized and you need to edit collections in the destination database, such as renaming collections, up to 1,000 collections can be synchronized in a single data synchronization task. If you run a task to synchronize more than 1,000 collections, a request error occurs. In this case, we recommend that you configure multiple tasks to synchronize the collections in batches or configure a task to synchronize the entire database.

  • The oplog feature must be enabled.

    Note

    The operation logs of the source database must be retained for at least seven days. Otherwise, DTS may fail to obtain the operation logs and the task may fail. In exceptional circumstances, data inconsistency or loss may occur. Make sure that you set the retention period of operation logs based on the preceding requirements. Otherwise, the service level agreement (SLA) of DTS does not guarantee service reliability or performance.

  • During a data synchronization task, MongoDB sharded cluster databases involved in the task cannot be scaled. Otherwise, the task fails.

  • If the source database is a self-managed MongoDB database that uses the sharded cluster architecture, you can set the Access Method parameter only to Express Connect, VPN Gateway, or Smart Access Gateway or Cloud Enterprise Network (CEN) when you configure the DTS task.

  • The number of Mongos nodes in the source MongoDB sharded cluster database cannot exceed 10.

  • You cannot synchronize collections that contain time to live (TTL) indexes. If the database to be synchronized contains TTL indexes, data inconsistency may occur between the source and destination databases due to inconsistent time zones and time clocks of the source and destination databases.

  • Make sure that no orphaned document exists in the source and destination databases. Otherwise, data inconsistency or even task failure may occur. For more information, see orphaned document in official MongoDB documentation and the How do I delete orphaned documents of a MongoDB database deployed in the sharded cluster architecture? section of the "FAQ" topic.

Other limits

  • Before you start the data synchronization task, you must add sharding keys to the data to be synchronized in the source database. During the data synchronization, if you execute the INSERT statement to insert data into the data to be synchronized, the data to be synchronized must contain sharding keys. If you execute the UPDATE statement to modify the data to be synchronized, you cannot modify sharding keys.

  • To ensure compatibility, the version of the destination MongoDB database must be the same as or later than the version of the source MongoDB database. If the version of the destination database is earlier than that of the source database, database compatibility issues may occur.

  • If the source or the destination database resides in a region outside the Chinese mainland, two-way data synchronization is supported only between databases within the same region. For example, if the source database resides in the Japan (Tokyo) region, data can be synchronized only within the Japan (Tokyo) region and cannot be synchronized to or from the Germany (Frankfurt) region in two-way synchronization scenarios.

  • If a collection of the destination database has a unique index or the capped attribute of a collection of the destination database is true, the collection supports only single-thread data writing and does not support concurrent replay during incremental data synchronization. This may increase synchronization latency.

  • DTS cannot synchronize data from the admin or local database.

  • Transaction information is not retained. When transactions are synchronized to the destination database, transactions are converted into a single record.

  • Before you synchronize data, evaluate the impact of data synchronization on the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. During full data synchronization, DTS uses read and write resources of the source and destination databases. This may increase the loads on the database servers.

  • During full data synchronization, concurrent INSERT operations cause fragmentation in the collections of the destination database. After full data synchronization is complete, the storage space for collections of the destination database is larger than that of the source database.

  • During data synchronization, we recommend that you use only DTS to write data to the destination database. This prevents data inconsistency between the source and destination databases. For example, if you use tools other than DTS to write data to the destination database, data loss may occur in the destination database when you use DMS to perform online DDL operations.

  • A two-way synchronization instance contains a forward synchronization task and a reverse synchronization task. If an object is to be synchronized in both the forward and reverse synchronization tasks when you configure or reset the instance, the following rules apply:

    • Only one of the tasks can synchronize both the full data and incremental data of the object. The other task synchronizes only the incremental data of the object.

    • The source data of the current task can be synchronized only to the destination of the task. The synchronized data is not used as the source data of the other task.

  • The data is concurrently written to the destination database. Therefore, the storage space occupied in the destination database is 5% to 10% larger than the size of the data in the source database.

  • You must use the db.$table_name.aggregate([{ $count:"myCount"}]) syntax to query the return value of a count operation on the destination MongoDB database.

  • Make sure that the destination MongoDB database does not have the same primary key as the source database. The default primary key is _id. Otherwise, data may be lost. If the data in the destination database has the same primary key as that in the source database, clear the related data in the destination database without interrupting the services of DTS. For example, if the same primary key is _id, you can delete the data in the destination database that has the same _id as the source database.

  • Make sure that the MongoDB balancer of the source database is disabled during full data synchronization. Do not enable the balancer until all full data synchronization is complete and incremental data synchronization starts. Otherwise, data inconsistency may occur. For more information about the MongoDB balancer, see Manage the ApsaraDB for MongoDB balancer.

  • If data sharding is configured for the destination instance and you do not need to use the schema synchronization feature of DTS, do not select Schema Synchronization as one of the Synchronization Types in the Configure Objects and Advanced Settings step. Otherwise, data inconsistency may occur or the task may fail due to shard conflicts.

Configure one-way data synchronization between MongoDB sharded cluster databases

The following table describes the precautions and limits when you synchronize data to a MongoDB database, such as a self-managed MongoDB database or an ApsaraDB for MongoDB instance.

Category

Description

Limits on the source and destination databases

  • Bandwidth requirements: The server to which the source database is deployed must have sufficient outbound bandwidth. Otherwise, the data synchronization speed is affected.

  • The collections to be synchronized must have PRIMARY KEY or UNIQUE constraints and all fields must be unique. Otherwise, the destination database may contain duplicate data records.

  • If you select collections as the objects to be synchronized and you need to edit collections in the destination database, such as renaming collections, up to 1,000 collections can be synchronized in a single data synchronization task. If you run a task to synchronize more than 1,000 collections, a request error occurs. In this case, we recommend that you configure multiple tasks to synchronize the collections in batches or configure a task to synchronize the entire database.

  • The oplog feature must be enabled.

    Note

    The operation logs of the source database must be retained for at least seven days. Otherwise, DTS may fail to obtain the operation logs and the task may fail. In exceptional circumstances, data inconsistency or loss may occur. Make sure that you set the retention period of operation logs based on the preceding requirements. Otherwise, the service level agreement (SLA) of DTS does not guarantee service reliability or performance.

  • During a data synchronization task, MongoDB sharded cluster databases involved in the task cannot be scaled. Otherwise, the task fails.

  • If the source database is a self-managed MongoDB database that uses the sharded cluster architecture, you can set the Access Method parameter only to Express Connect, VPN Gateway, or Smart Access Gateway or Cloud Enterprise Network (CEN) when you configure the DTS task.

  • The number of Mongos nodes in the source MongoDB sharded cluster database cannot exceed 10.

  • You cannot synchronize collections that contain time to live (TTL) indexes. If the database to be synchronized contains TTL indexes, data inconsistency may occur between the source and destination databases due to inconsistent time zones and time clocks of the source and destination databases.

  • Make sure that no orphaned document exists in the source and destination databases. Otherwise, data inconsistency or even task failure may occur. For more information, see orphaned document in official MongoDB documentation and the How do I delete orphaned documents of a MongoDB database deployed in the sharded cluster architecture? section of the "FAQ" topic.

Other limits

  • Before you start the data synchronization task, you must add sharding keys to the data to be synchronized in the source database. During the data synchronization, if you execute the INSERT statement to insert data into the data to be synchronized, the data to be synchronized must contain sharding keys. If you execute the UPDATE statement to modify the data to be synchronized, you cannot modify sharding keys.

  • To ensure compatibility, the version of the destination MongoDB database must be the same as or later than the version of the source MongoDB database. If the version of the destination database is earlier than that of the source database, database compatibility issues may occur.

  • DTS cannot synchronize data from the admin or local database.

  • Transaction information is not retained. When transactions are synchronized to the destination database, transactions are converted into a single record.

  • Before you synchronize data, evaluate the impact of data synchronization on the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. During full data synchronization, DTS uses read and write resources of the source and destination databases. This may increase the loads on the database servers.

  • During full data synchronization, concurrent INSERT operations cause fragmentation in the collections of the destination database. After full data synchronization is complete, the storage space for collections of the destination database is larger than that of the source database.

  • If a collection of the destination database has a unique index or the capped attribute of a collection of the destination database is true, the collection supports only single-thread data writing and does not support concurrent replay during incremental data synchronization. This may increase synchronization latency.

  • The data is concurrently written to the destination database. Therefore, the storage space occupied in the destination database is 5% to 10% larger than the size of the data in the source database.

  • You must use the db.$table_name.aggregate([{ $count:"myCount"}]) syntax to query the return value of a count operation on the destination MongoDB database.

  • Make sure that the destination MongoDB database does not have the same primary key as the source database. The default primary key is _id. Otherwise, data may be lost. If the data in the destination database has the same primary key as that in the source database, clear the related data in the destination database without interrupting the services of DTS. For example, if the same primary key is _id, you can delete the data in the destination database that has the same _id as the source database.

  • Make sure that the MongoDB balancer of the source database is disabled during full data synchronization. Do not enable the balancer until all full data synchronization is complete and incremental data synchronization starts. Otherwise, data inconsistency may occur. For more information about the MongoDB balancer, see Manage the ApsaraDB for MongoDB balancer.

  • If data sharding is configured for the destination instance and you do not need to use the schema synchronization feature of DTS, do not select Schema Synchronization as one of the Synchronization Types in the Configure Objects and Advanced Settings step. Otherwise, data inconsistency may occur or the task may fail due to shard conflicts.

Configure two-way data synchronization between MongoDB replica set databases

The following table describes the precautions and limits when you synchronize data to a MongoDB database, such as a self-managed MongoDB database or an ApsaraDB for MongoDB instance.

Category

Description

Limits on the source and destination databases

  • Bandwidth requirements: The server to which the source database is deployed must have sufficient outbound bandwidth. Otherwise, the data synchronization speed is affected.

  • The collections to be synchronized must have PRIMARY KEY or UNIQUE constraints and all fields must be unique. Otherwise, the destination database may contain duplicate data records.

  • If you select collections as the objects to be synchronized and you need to edit collections in the destination database, such as renaming collections, up to 1,000 collections can be synchronized in a single data synchronization task. If you run a task to synchronize more than 1,000 collections, a request error occurs. In this case, we recommend that you configure multiple tasks to synchronize the collections in batches or configure a task to synchronize the entire database.

  • The oplog feature must be enabled.

    Note

    The operation logs of the source database must be retained for at least seven days. Otherwise, DTS may fail to obtain the operation logs and the task may fail. In exceptional circumstances, data inconsistency or loss may occur. Make sure that you set the retention period of operation logs based on the preceding requirements. Otherwise, the service level agreement (SLA) of DTS does not guarantee service reliability or performance.

  • You cannot synchronize collections that contain time to live (TTL) indexes. If the database to be synchronized contains TTL indexes, data inconsistency may occur between the source and destination databases due to inconsistent time zones and time clocks of the source and destination databases.

Other limits

  • To ensure compatibility, the version of the destination MongoDB database must be the same as or later than the version of the source MongoDB database. If the version of the destination database is earlier than that of the source database, database compatibility issues may occur.

  • If the source or the destination instance resides in a region outside the Chinese mainland, two-way data synchronization is supported only between databases within the same region. For example, if the source database resides in the Japan (Tokyo) region, data can be synchronized only within the Japan (Tokyo) region and cannot be synchronized to or from the Germany (Frankfurt) region in two-way synchronization scenarios.

  • DTS cannot synchronize data from the admin or local database.

  • Transaction information is not retained. When transactions are synchronized to the destination database, transactions are converted into a single record.

  • Before you synchronize data, evaluate the impact of data synchronization on the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. During full data synchronization, DTS uses read and write resources of the source and destination databases. This may increase the loads on the database servers.

  • During full data synchronization, concurrent INSERT operations cause fragmentation in the collections of the destination database. After full data synchronization is complete, the storage space for collections of the destination database is larger than that of the source database.

  • During data synchronization, we recommend that you use only DTS to write data to the destination database. This prevents data inconsistency between the source and destination databases. For example, if you use tools other than DTS to write data to the destination database, data loss may occur in the destination database when you use DMS to perform online DDL operations.

  • If a collection of the destination database has a unique index or the capped attribute of a collection of the destination database is true, the collection supports only single-thread data writing and does not support concurrent replay during incremental data synchronization. This may increase synchronization latency.

  • A two-way synchronization instance contains a forward synchronization task and a reverse synchronization task. If an object is to be synchronized in both the forward and reverse synchronization tasks when you configure or reset the instance, the following rules apply:

    • Only one of the tasks can synchronize both the full data and incremental data of the object. The other task synchronizes only the incremental data of the object.

    • The source data of the current task can be synchronized only to the destination of the task. The synchronized data is not used as the source data of the other task.

  • The data is concurrently written to the destination database. Therefore, the storage space occupied in the destination database is 5% to 10% larger than the size of the data in the source database.

  • You must use the db.$table_name.aggregate([{ $count:"myCount"}]) syntax to query the return value of a count operation on the destination MongoDB database.

  • Make sure that the destination MongoDB database does not have the same primary key as the source database. The default primary key is _id. Otherwise, data may be lost. If the data in the destination database has the same primary key as that in the source database, clear the related data in the destination database without interrupting the services of DTS. For example, if the same primary key is _id, you can delete the data in the destination database that has the same _id as the source database.