You can use the MongoShake tool developed by Alibaba Cloud to synchronize data between ApsaraDB for MongoDB instances. This feature can be used for data analysis, disaster recovery, and multi-active scenarios. This topic describes how to perform one-way data synchronization between replica set instances.
MongoShake is a general-purpose platform service tool developed by Alibaba Cloud with the Go language. It copies ApsaraDB for MongoDB data by reading the operation log (oplog) to meet specific requirements.
MongoShake also supports the subscription and consumption of log data. MongoShake can be connected through flexible methods such as SDKs, Kafka, and MetaQ. It is suitable for scenarios such as log subscription, data center synchronization, and asynchronous cache eviction.
|Source database||Destination database|
|User-created MongoDB database on ECS||User-created MongoDB database on ECS|
|User-created local MongoDB database||User-created local MongoDB database|
|ApsaraDB for MongoDB instance||ApsaraDB for MongoDB instance|
|Third-party MongoDB database on the cloud||Third-party MongoDB database on the cloud|
Real-time data synchronization between ApsaraDB for MongoDB instances is used in this topic to illustrate the configuration procedure. This procedure also applies to data synchronization between user-created databases.
- Do not perform DDL operations on the source database before full data synchronization is complete. Otherwise, data inconsistency may occur.
- You cannot synchronize data between the admin and local databases.
User permissions for databases
|Data source||Required permission|
|Source ApsaraDB for MongoDB instance||Permissions to read ANY DATABASE, to read the local database, and to read and write the mongoshake database.|
|Destination ApsaraDB for MongoDB instance||Permissions to read and write ANY DATABASE or to read and write the destination database.|
- Create an ApsaraDB for MongoDB replica set instance as the synchronization destination.
For more information, seeCreate a replica set instance.
Note Set its network type to the same VPC as the source ApsaraDB for MongoDB instance to facilitate ECS connection through the VPC.
- Create an ECS instance where MongoShake runs. For more information, see Create an ECS instance.
Note Set the operating system of the ECS instance to Linux and the network type to the same VPC as the new ApsaraDB for MongoDB instance.
- Add the IP address of the ECS instance to the whitelists of the source and destination
MongoDB instances. Make sure that the ECS instance can connect to the source and destination
ApsaraDB for MongoDB instances.
Note We recommend that you use VPCs for connection between instances to minimize network latency.
- Log on to the ECS instance.
- Run the following command to download the MongoShake program:
wget https://github.com/alibaba/MongoShake/releases/download/release-v2.0.7-20190817/mongo-shake-2.0.7.tar.gzNote We recommend that you download the latest MongoShake program. For more information, see the Releases page.
- Run the following command to extact the MongoShake program:
tar xvf mongoshake-2.0.tar.gz
- Run the
vimcommand to modify the collector.conf profile of MongoShake. The following table describes the main parameters.
Parameter Description Example value mongo_urls The connection string URI of the source instance.Note
- We recommend that you use internal connection string URIs for connection between instances to minimize network latency.
- For more information about the connection string URI format, see Overview of replica set instance connections.
mongo_urls = mongodb://root:Ftxxxxxx@dds-bpxxxxxxxx.mongodb.rds.aliyuncs.com:3717,dds-bpxxxxxxxx.mongodb.rds.aliyuncs.com:3717
tunnel.address The connection string URI of the destination instance.
tunnel.address = mongodb://root:Ftxxxxxx@dds-bpxxxxxxxx.mongodb.rds.aliyuncs.com:3717,dds-bpxxxxxxxx.mongodb.rds.aliyuncs.com:3717
sync_mode The data synchronization method. Valid values:
Note The default value is oplog.
- all: both full data synchronization and incremental data synchronization
- document: only full data synchronization
- oplog: only incremental data synchronization
sync_mode = all
replayer.dml_only Indicates whether only DML operations are synchronized. Valid values:
Note The default value is true.
replayer.dml_only = false
filter.namespace.black The blacklist for data synchronization. The specified namespaces are not synchronized to the destination database. Separate multiple namespaces with semicolons (;).Note A namespace is the standard name of a collection or index in ApsaraDB for MongoDB. It is the combination of a database name and a collection or index name. Example:
filter.namespace.black = mongodbtest.customer;testdata.test123
filter.namespace.white The whitelist for data synchronization. Only the specified namespaces are synchronized to the target database. Separate multiple namespaces with semicolons (;).
filter.namespace.white = mongodbtest.customer;test123
- Run the following command to start the data synchronization task and generate the
./collector -conf=collector.conf -verbose
- Check the log information. If the following log is displayed, it indicates that the
full data synchronization is complete and the incremental data synchronization starts.
[09:38:57 CST 2019/06/20] [INFO] (mongoshake/collector.( *ReplicationCoordinator).Run:80) finish full sync, start incr sync with timestamp: fullBeginTs, fullFinishTs
Monitor the MongoShake status
Monitoring output example.
|logs_get/sec||The number of oplogs obtained per second.|
|logs_repl/sec||The number of oplogs for replay operations performed per second.|
|logs_success/sec||The number of oplogs for successful replay operations per second.|
|lsn.time||The time when the last oplog was sent.|
|lsn_ack.time||The time when the destination database acknowledges the write operation.|
|lsn_ckpt.time||The Check Point persistence time.|
|now.time||The current time.|
|replset||The name of the replica set for the source database.|