All Products
Search
Document Center

ApsaraDB for MongoDB:Use MongoShake to perform one-way synchronization between MongoDB instances

Last Updated:Nov 24, 2025

You can use MongoShake, an open source tool developed by Alibaba Cloud, to synchronize data between MongoDB databases. This feature is suitable for data analytics, disaster recovery, and active-active scenarios. This topic uses real-time data synchronization between ApsaraDB for MongoDB instances as an example to describe the configuration procedure.

MongoShake overview

MongoShake is a general-purpose Platform as a Service (PaaS) tool developed by Alibaba Cloud in the Go language. It reads the operation logs (oplogs) from a MongoDB database to replicate data for various purposes.

MongoShake also provides features to subscribe to and consume log data. You can connect to it using various methods, such as SDKs, Kafka, and MetaQ. This makes it suitable for scenarios such as log subscription, data center synchronization, and asynchronous cache eviction.

Note

For more information about MongoShake, see the MongoShake homepage on GitHub.

Supported data sources

Source database

Destination database

Self-managed MongoDB database on an ECS instance

Self-managed MongoDB database on an ECS instance

On-premises self-managed MongoDB database

On-premises self-managed MongoDB database

ApsaraDB for MongoDB instance

ApsaraDB for MongoDB instance

Third-party cloud MongoDB database

Third-party cloud MongoDB database

Usage notes

  • Do not perform Data Definition Language (DDL) operations on the source database before full data synchronization is complete. Otherwise, data inconsistency may occur.

  • The local database cannot be synchronized. The admin database can be synchronized. For more information, see Migrate business data from the admin database to a non-admin database.

Required permissions for database users

Data source

Required permissions

Source MongoDB instance

The readAnyDatabase permission, the read permission on the local database, and the readWrite permission on the mongoshake database.

Note

The mongoshake database is automatically created by the MongoShake program in the source instance when incremental synchronization starts.

Destination MongoDB instance

The readWriteAnyDatabase permission or the readWrite permission on the destination database.

Note

For more information about how to create and grant permissions to MongoDB database users, see Use DMS to manage MongoDB database users or the db.createUser command.

Preparations

  1. For optimal synchronization performance, ensure the source ApsaraDB for MongoDB replica set instance uses a virtual private cloud (VPC). If the instance uses the classic network, switch the network type to VPC. For more information, see Switch the network type of an instance from classic network to VPC.

  2. Create an ApsaraDB for MongoDB replica set instance as the synchronization destination. When you create the instance, select the same VPC as the source ApsaraDB for MongoDB replica set instance to minimize network latency. For more information, see Create a replica set instance.

  3. Create an ECS instance to run MongoShake. When you create the instance, select the same VPC as the source ApsaraDB for MongoDB instance to minimize network latency. For more information, see Create an ECS instance.

  4. Add the private IP address of the ECS instance to the whitelists of the source and destination MongoDB instances. Ensure that the ECS instance can connect to the source and destination MongoDB instances. For more information, see Modify a whitelist.

Note

If your network type does not meet the preceding requirements, you can apply for public endpoints for the source and destination MongoDB instances. Then, add the public IP address of the ECS instance to the whitelists of the source and destination instances. This lets you synchronize data over the Internet. For more information, see Apply for a public endpoint and Modify a whitelist.

Procedure

In this example, MongoShake is installed in the /test/mongoshake directory by default.

  1. Log on to the ECS server.

    Note

    Select a logon method based on your business scenario. For more information, see Overview of logon methods for ECS servers.

  2. Run the following command to download the MongoShake program and rename it to mongoshake.tar.gz.

    wget "http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/196977/jp_ja/1608863913991/mongo-shake-v2.4.16.tar.gz" -O mongoshake.tar.gz
    Note

    This topic provides the link to download MongoShake 2.4.16. To download the latest version of MongoShake, see the releases page.

  3. Run the following command to decompress the MongoShake package to the /test/mongoshake directory.

    tar zxvf mongoshake.tar.gz && mv mongo-shake-v2.4.16 /test/mongoshake && cd /test/mongoshake/mongo-shake-v2.4.16
  4. Run the vi collector.conf command to modify the collector.conf configuration file of MongoShake. The following table describes the main parameters.

    Parameter

    Description

    Example

    mongo_urls

    The connection string URI of the source MongoDB instance. In this example, the database account is test and it belongs to the admin database.

    Note

    mongo_urls = mongodb://test:****@dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717,dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717

    Note

    The password cannot contain the at sign (@). Otherwise, the connection fails.

    tunnel.address

    The connection string URI of the destination MongoDB instance. In this example, the database account is test and it belongs to the admin database.

    Note

    tunnel.address = mongodb://test:****@dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717,dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717

    Note

    The password cannot contain the at sign (@). Otherwise, the connection fails.

    sync_mode

    The data synchronization method. Valid values:

    • all: Performs both full and incremental data synchronization.

    • full: Performs only full data synchronization.

    • incr: Performs only incremental data synchronization.

    Note

    The default value is incr.

    sync_mode = all

    Note

    For a complete list of parameters in the collector.conf file, see the Appendix.

  5. Run the following command to start the sync task and output log information.

    ./collector.linux -conf=collector.conf -verbose
  6. Observe the log information. When the following log entry appears, full data synchronization is complete and incremental data synchronization has started.

    [09:38:57 CST 2019/06/20] [INFO] (mongoshake/collector.(*ReplicationCoordinator).Run:80) finish full sync, start incr sync with timestamp: fullBeginTs[1560994443], fullFinishTs[1560994737]

Monitor the MongoShake status

After incremental data synchronization starts, open another command-line window and run the following command to monitor MongoShake.

cd /test/mongoshake && ./mongoshake-stat --port=9100
Note

mongoshake-stat is a Python script. Before you run the script, install Python 2.7. For more information, see the Python official website.

Sample monitoring output:监控结果

Parameter descriptions:

Parameter

Description

logs_get/sec

The number of oplogs obtained per second.

logs_repl/sec

The number of oplog replay operations per second.

logs_success/sec

The number of successful oplog replay operations per second.

lsn.time

The time when the last oplog was sent.

lsn_ack.time

The time when the destination confirmed the write operation.

lsn_ckpt.time

The time when the checkpoint was persisted.

now.time

The current time.

replset

The name of the source database replica set.

Migrate business data from the admin database to a non-admin database

MongoDB does not recommend storing business data in the admin database. This is because locking behavior and conflicts with internal commands can degrade instance performance.

MongoShake supports synchronizing business data from the admin database to a non-admin database.

Follow the steps in the Procedure section. In Step 4, when you modify the collector.conf file, add the following configuration items:

filter.pass.special.db = admin

# Migrate all business collections from the admin database to newDB.
transform.namespace = admin:newDB
# Alternatively, migrate the abc collection from the admin database to the def collection in the target database. You can configure multiple rules.
transform.namespace = admin.abc:target.def

Appendix

Table 1. collector.conf parameters

Category

Parameter

Description

Example

None

conf.version

The version number of the current configuration file. Do not modify this value.

conf.version = 4

Global configuration options

id

The ID of the sync task. You can customize this value. It is used for the log name, the name of the database that stores checkpoint information for resumable transmission, and the name of the destination database.

id = mongoshake

master_quorum

A high availability option. When a primary and a standby MongoShake node synchronize data from the same source, set this parameter to true for the primary MongoShake node.

Valid values:

  • true: enabled

  • false: disabled

Note

The default value is false.

master_quorum = false

full_sync.http_port

The HTTP port. Open this port to view the current status of full synchronization from the Internet.

Note

The default value is 9101.

full_sync.http_port = 9101

incr_sync.http_port

The HTTP port. Open this port to view the current status of incremental synchronization from the Internet.

Note

The default value is 9100.

incr_sync.http_port = 9100

system_profile_port

The profiling port, used to view internal stack information.

system_profile_port = 9200

log.level

The log level. Valid values:

  • error: Logs that contain error-level information.

  • warning: Logs that contain warning-level information.

  • info: Logs that reflect the current system status.

  • debug: Logs that contain debugging information.

Default value: info.

log.level = info

log.dir

The directory for log files and PID files. If this parameter is not set, the logs directory in the current path is used by default.

Note

The path for this parameter must be an absolute path.

log.dir = ./logs/

log.file

The name of the log file. You can customize this value.

Note

The default value is collector.log.

log.file = collector.log

log.flush

The refresh rate of logs on the screen. Valid values:

  • true: Prints every log entry. This affects performance.

  • false: Does not guarantee that every log is printed, but ensures performance.

Note

The default value is false.

log.flush = false

sync_mode

The data synchronization method. Valid values:

  • all: Performs both full and incremental data synchronization.

  • full: Performs only full data synchronization.

  • incr: Performs only incremental data synchronization.

Note

The default value is incr.

sync_mode = all

mongo_urls

The connection string URI of the source MongoDB instance. In this example, the database account is test and it belongs to the admin database.

Note

mongo_urls = mongodb://test:****@dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717,dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717

mongo_cs_url

If the source is a sharded cluster instance, you must enter the endpoint of a Configserver (CS) node. To apply for an endpoint for a Configserver node, see Apply for an endpoint for a shard.

In this example, the database account is test and it belongs to the admin database.

mongo_cs_url = mongodb://test:****@dds-bp19f409d7512****-csxxx.mongodb.rds.aliyuncs.com:3717,dds-bp19f409d7512****-csxxx.mongodb.rds.aliyuncs.com:3717/admin

mongo_s_url

If the source is a sharded cluster instance, you must enter the endpoint of at least one Mongos node. Separate multiple Mongos addresses with a comma (,). To apply for an endpoint for a Mongos node, see Apply for an endpoint for a shard.

In this example, the database account is test and it belongs to the admin database.

mongos_s_url = mongodb://test:****@s-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717,s-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717/admin

tunnel

The type of channel for synchronization. Valid values:

  • direct: Synchronizes data directly to the destination MongoDB instance.

  • rpc: Synchronizes data using NET/RPC.

  • tcp: Synchronizes data using TCP.

  • file: Synchronizes data using file transfer.

  • kafka: Synchronizes data using Kafka.

  • mock: Used only for testing. Data is not written to the channel.

Note

The default value is direct.

tunnel = direct

tunnel.address

The endpoint of the destination. The following addresses are supported:

  • If you set the tunnel parameter to direct, enter the connection string URI of the destination MongoDB instance.

  • If you set the tunnel parameter to rpc, enter the RPC receiver address of the destination instance.

  • If you set the tunnel parameter to tcp, enter the TCP receiver address of the destination instance.

  • If you set the tunnel parameter to file, enter the file path for the data of the destination instance.

  • If you set the tunnel parameter to kafka, enter the Kafka address, for example, topic@brokers1,brokers2.

  • If you set the tunnel parameter to mock, leave this parameter empty.

In this example, the database account is test and it belongs to the admin database.

tunnel.address = mongodb://test:****@dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717,dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717

tunnel.message

The type of data in the channel. This parameter is valid only when the tunnel parameter is set to kafka or file. Valid values:

  • raw: The default type. Data is written and read in aggregation mode.

  • json: Writes data to Kafka in JSON format, which allows users to read it directly.

  • bson: Writes data to Kafka in BSON binary format.

Note

The default value is raw.

tunnel.message = raw

mongo_connect_mode

The connection mode for the MongoDB instance. This parameter is valid only when the tunnel parameter is set to direct. Valid values:

  • primary: Pulls data from the primary node.

  • secondaryPreferred: Pulls data from a secondary node.

  • standalone: Pulls data from a specified single node.

Note

The default value is secondaryPreferred.

mongo_connect_mode = secondaryPreferred

filter.namespace.black

Specifies the blacklist for data synchronization. The specified namespaces are not synchronized to the destination database. Separate multiple namespaces with a semicolon (;).

Note

A namespace is the canonical name for a collection or index in MongoDB. It is a combination of the database name and the collection or index name, for example, mongodbtest.customer.

filter.namespace.black = mongodbtest.customer;testdata.test123

filter.namespace.white

Specifies the whitelist for data synchronization. Only the specified namespaces are synchronized to the destination database. Separate multiple namespaces with a semicolon (;).

filter.namespace.white = mongodbtest.customer;test123

filter.pass.special.db

Enables synchronization for special databases. During normal synchronization, databases such as admin, local, mongoshake, config, and system.views are filtered out by the system. You can enable synchronization for these databases for special requirements. Separate multiple database names with a semicolon (;).

filter.pass.special.db = admin;mongoshake

filter.ddl_enable

Specifies whether to enable DDL synchronization. Valid values:

  • true: enabled

  • false: disabled

Note

This feature is not supported when the source is a MongoDB sharded cluster instance.

filter.ddl_enable = false

checkpoint.storage.url

Configures the checkpoint storage address to support resumable transmission. If this is not configured, the program writes to the following databases based on the instance type:

  • MongoDB replica set instance: Writes to the mongoshake database.

  • MongoDB sharded cluster instance: Writes to the admin database on the Configserver node.

In this example, the database account is test and it belongs to the admin database.

checkpoint.storage.url = mongodb://test:****@dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717,dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717

checkpoint.storage.db

The name of the database that stores checkpoints.

Note

The default value is mongoshake.

checkpoint.storage.db = mongoshake

checkpoint.storage.collection

The name of the collection that stores checkpoints. When you enable a primary and a standby MongoShake node to synchronize data from the same source, you can modify this table name to prevent conflicts caused by duplicate names.

Note

The default value is ckpt_default.

checkpoint.storage.collection = ckpt_default

checkpoint.start_position

The start position for resumable transmission. This parameter is invalid if a checkpoint already exists. The value format is YYYY-MM-DDTHH:MM:SSZ.

Note

The default value is 1970-01-01T00:00:00Z.

checkpoint.start_position = 1970-01-01T00:00:00Z

transform.namespace

Renames the source database or collection and synchronizes it to the destination database. For example, rename Database A.Collection B in the source database to Database C.Collection D and synchronize it to the destination database.

transform.namespace = fromA.fromB:toC.toD

Full data synchronization options

full_sync.reader.collection_parallel

Sets the maximum number of collections that MongoShake can pull concurrently at a time.

full_sync.reader.collection_parallel = 6

full_sync.reader.write_document_parallel

Sets the number of concurrent threads for MongoShake to write to a single collection.

full_sync.reader.write_document_parallel = 8

full_sync.reader.document_batch_size

Sets the batch size for a single write of documents to the destination. For example, 128 means that 128 documents are aggregated before being written.

full_sync.reader.document_batch_size = 128

full_sync.collection_exist_drop

Sets how to handle a collection in the destination database that has the same name as a source collection. Valid values:

  • true: Deletes the destination collection with the same name and then synchronizes.

    Warning

    This operation deletes the collection in the destination. Back up your data in advance.

  • false: Reports an error and exits if a collection with the same name is detected in the destination database.

full_sync.collection_exist_drop = true

full_sync.create_index

Specifies whether to create an index after synchronization is complete. Valid values:

  • foreground: Creates a foreground index.

  • background: Creates a background index.

  • none: Does not create an index.

full_sync.create_index = none

full_sync.executor.insert_on_dup_update

Specifies whether to change an INSERT statement to an UPDATE statement if a duplicate _id field exists in the destination database. Valid values:

  • true: change

  • false: do not change

full_sync.executor.insert_on_dup_update = false

full_sync.executor.filter.orphan_document

Specifies whether to filter orphaned documents when the source is a sharded cluster instance. Valid values:

  • true: filter

  • false: do not filter

full_sync.executor.filter.orphan_document = false

full_sync.executor.majority_enable

Specifies whether to enable the majority write feature on the destination. Valid values:

  • true: enable

  • false: disable

full_sync.executor.majority_enable = false

Incremental data synchronization options

incr_sync.mongo_fetch_method

Configures the method for pulling incremental data. Valid values:

  • oplog: Pulls oplogs from the source database.

  • change_stream: Pulls change events from the source database. This method is supported only for MongoDB 4.0 and later.

Default value: oplog

incr_sync.mongo_fetch_method = oplog

incr_sync.oplog.gids

Used to set up bidirectional replication for cloud clusters.

incr_sync.oplog.gids = xxxxxxxxxxxx

incr_sync.shard_key

The method MongoShake uses to handle concurrency internally. Do not modify this parameter.

incr_sync.shard_key = collection

incr_sync.worker

The number of concurrent threads for transmitting oplogs. If the host performance is sufficient, you can increase the number of threads.

Note

If the source is a sharded cluster instance, the number of threads must be equal to the number of shards.

incr_sync.worker = 8

incr_sync.worker.oplog_compressor

Enables data compression to reduce network bandwidth consumption. Valid values:

  • none: no compression

  • gzip: compresses in gzip format

  • zlib: compresses in zlib format

  • deflate: compresses in deflate format

Note

This parameter can be used only when the tunnel parameter is not set to direct. When tunnel is set to direct, set this parameter to none.

incr_sync.worker.oplog_compressor = none

incr_sync.target_delay

Sets a delayed synchronization between the source and destination. Changes in the source are typically synchronized to the destination in real-time. To prevent accidental operations, you can set this parameter to delay synchronization. For example, incr_sync.target_delay = 1800 sets a 30-minute delay. The unit is seconds.

Note

A value of 0 indicates that delayed synchronization is disabled.

incr_sync.target_delay = 1800

incr_sync.worker.batch_queue_size

Configuration parameters for MongoShake's internal queue. Do not modify unless necessary.

incr_sync.worker.batch_queue_size = 64

incr_sync.adaptive.batching_max_size

incr_sync.adaptive.batching_max_size = 1024

incr_sync.fetcher.buffer_capacity

incr_sync.fetcher.buffer_capacity = 256

MongoDB synchronization options (for direct mode only)

incr_sync.executor.upsert

Specifies whether to change an UPDATE statement to an INSERT statement when the _id (duplicate field) or unique index does not exist. Valid values:

  • true: change

  • false: do not change

incr_sync.executor.upsert = false

incr_sync.executor.insert_on_dup_update

Specifies whether to change an INSERT statement to an UPDATE statement when the _id (duplicate field) or unique index does not exist. Valid values:

  • true: change

  • false: do not change

incr_sync.executor.insert_on_dup_update = false

incr_sync.conflict_write_to

Specifies whether to record conflicting documents if a write conflict occurs during synchronization. Valid values:

  • none: do not record

  • db: writes conflict logs to mongoshake_conflict

  • sdk: writes conflict logs to the SDK

incr_sync.conflict_write_to = none

incr_sync.executor.majority_enable

Specifies whether to enable majority write on the destination. Valid values:

  • true: enable

  • false: disable

Note

Enabling this feature affects performance.

incr_sync.executor.majority_enable = false

FAQ

Refer to the MongoShake FAQ first. If you encounter any other issues when you use MongoShake, provide feedback directly in GitHub Issues.