You can use the MongoShake tool developed by Alibaba Cloud to synchronize data between ApsaraDB for MongoDB instances. This feature can be used for data analysis, disaster recovery, and multi-active scenarios. This topic describes how to perform one-way data synchronization between replica set instances.

Note To implement two-way data synchronization between replica set instances, you must submit a ticket.

MongoShake overview

MongoShake is a general-purpose platform service tool developed by Alibaba Cloud with the Go language. It copies ApsaraDB for MongoDB data by reading the operation log (oplog) to meet specific requirements.

MongoShake also supports the subscription and consumption of log data. MongoShake can be connected through flexible methods such as SDKs, Kafka, and MetaQ. It is suitable for scenarios such as log subscription, data center synchronization, and asynchronous cache eviction.

Note For more information about MongoShake, see MongoShake homepage on GitHub.

Supported databases

Source database Destination database
User-created MongoDB database on ECS User-created MongoDB database on ECS
User-created local MongoDB database User-created local MongoDB database
ApsaraDB for MongoDB instance ApsaraDB for MongoDB instance
Third-party MongoDB database on the cloud Third-party MongoDB database on the cloud

Real-time data synchronization between ApsaraDB for MongoDB instances is used in this topic to illustrate the configuration procedure. This procedure also applies to data synchronization between user-created databases.

Precautions

  • Do not perform DDL operations on the source database before full data synchronization is complete. Otherwise, data inconsistency may occur.
  • You cannot synchronize data between the admin and local databases.

User permissions for databases

Data source Required permission
Source ApsaraDB for MongoDB instance Permissions to read ANY DATABASE, to read the local database, and to read and write the mongoshake database.
Destination ApsaraDB for MongoDB instance Permissions to read and write ANY DATABASE or to read and write the destination database.
Note For more information about how to create and authorize MongoDB users, see Use DMS to manage MongoDB users or db.createUser().

Preparations

  1. Create an ApsaraDB for MongoDB replica set instance as the synchronization destination. For more information, seeCreate a replica set instance.
    Note Set its network type to the same VPC as the source ApsaraDB for MongoDB instance to facilitate ECS connection through the VPC.
  2. Create an ECS instance where MongoShake runs. For more information, see Create an ECS instance.
    Note Set the operating system of the ECS instance to Linux and the network type to the same VPC as the new ApsaraDB for MongoDB instance.
  3. Add the IP address of the ECS instance to the whitelists of the source and destination MongoDB instances. Make sure that the ECS instance can connect to the source and destination ApsaraDB for MongoDB instances.
    Note We recommend that you use VPCs for connection between instances to minimize network latency.

Procedure

  1. Log on to the ECS instance.
  2. Run the following command to download the MongoShake program:
    wget https://github.com/alibaba/MongoShake/releases/download/release-v2.0.7-20190817/mongo-shake-2.0.7.tar.gz
    Note We recommend that you download the latest MongoShake program. For more information, see the Releases page.
  3. Run the following command to extact the MongoShake program:
    tar xvf mongoshake-2.0.tar.gz
  4. Run the vim command to modify the collector.conf profile of MongoShake. The following table describes the main parameters.
    Parameter Description Example value
    mongo_urls The connection string URI of the source instance.
    Note
    • We recommend that you use internal connection string URIs for connection between instances to minimize network latency.
    • For more information about the connection string URI format, see Overview of replica set instance connections.
    mongo_urls = mongodb://root:Ftxxxxxx@dds-bpxxxxxxxx.mongodb.rds.aliyuncs.com:3717,dds-bpxxxxxxxx.mongodb.rds.aliyuncs.com:3717
    tunnel.address The connection string URI of the destination instance. tunnel.address = mongodb://root:Ftxxxxxx@dds-bpxxxxxxxx.mongodb.rds.aliyuncs.com:3717,dds-bpxxxxxxxx.mongodb.rds.aliyuncs.com:3717
    sync_mode The data synchronization method. Valid values:
    • all: both full data synchronization and incremental data synchronization
    • document: only full data synchronization
    • oplog: only incremental data synchronization
    Note The default value is oplog.
    sync_mode = all
    replayer.dml_only Indicates whether only DML operations are synchronized. Valid values:
    • false
    • true
    Note The default value is true.
    replayer.dml_only = false
    filter.namespace.black The blacklist for data synchronization. The specified namespaces are not synchronized to the destination database. Separate multiple namespaces with semicolons (;).
    Note A namespace is the standard name of a collection or index in ApsaraDB for MongoDB. It is the combination of a database name and a collection or index name. Example: mongodbtest.customer.
    filter.namespace.black = mongodbtest.customer;testdata.test123
    filter.namespace.white The whitelist for data synchronization. Only the specified namespaces are synchronized to the target database. Separate multiple namespaces with semicolons (;). filter.namespace.white = mongodbtest.customer;test123
  5. Run the following command to start the data synchronization task and generate the log information:
    ./collector -conf=collector.conf -verbose
  6. Check the log information. If the following log is displayed, it indicates that the full data synchronization is complete and the incremental data synchronization starts.
    [09:38:57 CST 2019/06/20] [INFO] (mongoshake/collector.( *ReplicationCoordinator).Run:80) finish full sync, start incr sync with timestamp: fullBeginTs[1560994443], fullFinishTs[1560994737]

Monitor the MongoShake status

When the incremental data synchronization starts, you can open a command line window to monitor MongoShake.
./mongoshake-stat --port=9100

Monitoring output example.

Monitoring output
Parameter Description
logs_get/sec The number of oplogs obtained per second.
logs_repl/sec The number of oplogs for replay operations performed per second.
logs_success/sec The number of oplogs for successful replay operations per second.
lsn.time The time when the last oplog was sent.
lsn_ack.time The time when the destination database acknowledges the write operation.
lsn_ckpt.time The Check Point persistence time.
now.time The current time.
replset The name of the replica set for the source database.