This topic describes the precautions and limits when you synchronize data from a MongoDB database, such as a self-managed MongoDB database and an ApsaraDB for MongoDB instance. To ensure that your data synchronization task runs as expected, you must read the precautions and limits before you configure the task.

Scenarios of synchronizing data from a MongoDB database

You can click one of the following synchronization scenarios to view its precautions and limits:

Synchronize data from a ApsaraDB for MongoDB instance (replica set architecture) to another ApsaraDB for MongoDB instance (replica set architecture or sharded cluster architecture)

The following table describes the precautions and limits when you synchronize data to a MongoDB database, such as a self-managed MongoDB database or an ApsaraDB for MongoDB instance.
Category Description
Limits on the source database
  • Bandwidth requirements: The server to which the source database belongs must have a sufficient egress bandwidth. Otherwise, the data synchronization speed is affected.
  • The collections to be synchronized must have PRIMARY KEY or UNIQUE constraints and all fields must be unique. Otherwise, the destination database may contain duplicate data records.
  • If you select collections as the objects to synchronize and you want to edit collections (such as renaming collections), up to 1,000 collections can be synchronized in a single data synchronization task. If you run a task to synchronize more than 1,000 collections, a request error occurs. In this case, we recommend that you split the collections to synchronize, configure multiple tasks to synchronize the tables, or configure a task to synchronize the entire database.
  • The following requirements must be met:
    • The oplog feature must be enabled.
    • For an incremental data synchronization task, the oplogs of the source database must be stored for more than 24 hours. If you perform both full data synchronization and incremental data synchronization, the oplogs of the source database must be stored for at least 7 days. After the full data synchronization is complete, you can set the retention period to more than 24 hours. Otherwise, Data Transmission Service (DTS) may fail to obtain the oplogs and the task may fail. In exceptional circumstances, data inconsistency or loss may occur. Make sure that you set the retention period of oplogs in accordance with the preceding requirements. Otherwise, the Service Level Agreement (SLA) of DTS does not ensure service reliability and performance

Other limits
  • To ensure compatibility, the version of the destination MongoDB database must be the same as or later than the version of the source MongoDB database. If the version of the destination database is earlier than the version of the source database, database compatibility issues may occur.
  • DTS cannot synchronize data from the admin or local database.
  • Transaction information is not retained. When transactions are synchronized to the destination database, they are converted into a single record.
  • Before you synchronize data, evaluate the impact of data synchronization on the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. During full data synchronization, DTS uses read and write resources of the source and destination databases. This may increase the loads of the database servers.
  • During full data synchronization, concurrent INSERT operations cause fragmentation in the collections of the destination database. After the full data synchronization is complete, the storage usage of collections in the destination database is larger than that of collections in the source database.
  • During data synchronization, we recommend that you use only DTS to write data to the destination database. This prevents data inconsistency between the source and destination databases. If you use tools other than DTS to write data to the destination database, data loss may occur in the destination database when you use Data Management (DMS) to perform online DDL operations.
Special cases If the source database is a self-managed MongoDB database, take note of the following limits:
  • If you perform a primary/secondary switchover on the source database when the data synchronization task is running, the task fails.
  • DTS calculates synchronization latency based on the timestamp of the latest synchronized data in the destination database and the current timestamp in the source database. If no update operation is performed on the source database for a long time, the synchronization latency may be inaccurate. If the latency of the synchronization task is too high, you can perform an update operation on the source database to update the latency.
Note If you select an entire database as the object to synchronize, you can create a heartbeat. The heartbeat is updated or receives data every second.

Configure two-way data synchronization between ApsaraDB for MongoDB sharded cluster instances

The following table describes the precautions and limits when you synchronize data to a MongoDB database, such as a self-managed MongoDB database or an ApsaraDB for MongoDB instance.
Category Description
Limits on the source and destination databases
  • Bandwidth requirements: The server to which the source database belongs must have a sufficient egress bandwidth. Otherwise, the data synchronization speed is affected.
  • The collections to be synchronized must have PRIMARY KEY or UNIQUE constraints and all fields must be unique. Otherwise, the destination database may contain duplicate data records.
  • If you select collections as the objects to synchronize and you want to edit collections (such as renaming collections), up to 1,000 collections can be synchronized in a single data synchronization task. If you run a task to synchronize more than 1,000 collections, a request error occurs. In this case, we recommend that you split the collections to synchronize, configure multiple tasks to synchronize the tables, or configure a task to synchronize the entire database.
  • The following requirements must be met:
    • The oplog feature must be enabled.
    • For an incremental data synchronization task, the oplogs of the source database must be stored for more than 24 hours. If you perform both full data synchronization and incremental data synchronization, the oplogs of the source database must be stored for at least 7 days. After the full data synchronization is complete, you can set the retention period to more than 24 hours. Otherwise, Data Transmission Service (DTS) may fail to obtain the oplogs and the task may fail. In exceptional circumstances, data inconsistency or loss may occur. Make sure that you set the retention period of oplogs in accordance with the preceding requirements. Otherwise, the Service Level Agreement (SLA) of DTS does not ensure service reliability and performance

  • During a data synchronization task, ApsaraDB for MongoDB sharded cluster instances involved in the task cannot be scaled. Otherwise, the task fails.
Other limits
  • To ensure compatibility, the version of the destination MongoDB database must be the same as or later than the version of the source MongoDB database. If the version of the destination database is earlier than the version of the source database, database compatibility issues may occur.
  • If the source or the destination instance is located in a region outside the Chinese mainland, two-way data synchronization is supported only between instances located within the same region. For example, two-way data synchronization is supported between instances within the Japan (Tokyo) region. Two-way data synchronization between an instance in the Japan (Tokyo) region and another instance in the Germany (Frankfurt) region is not supported.
  • DTS cannot synchronize data from the admin or local database.
  • Transaction information is not retained. When transactions are synchronized to the destination database, they are converted into a single record.
  • Before you synchronize data, evaluate the impact of data synchronization on the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. During full data synchronization, DTS uses read and write resources of the source and destination databases. This may increase the loads of the database servers.
  • During full data synchronization, concurrent INSERT operations cause fragmentation in the collections of the destination database. After the full data synchronization is complete, the storage usage of collections in the destination database is larger than that of collections in the source database.
  • During data synchronization, we recommend that you use only DTS to write data to the destination database. This prevents data inconsistency between the source and destination databases. If you use tools other than DTS to write data to the destination database, data loss may occur in the destination database when you use DMS to perform online DDL operations.