Use MongoShake to implement delayed synchronization - ApsaraDB for MongoDB

Real-time synchronization with MongoShake means that a misoperation on the primary instance — such as accidentally dropping a collection or writing corrupt data — is replicated to secondary instances within seconds, leaving no time to react. The incr_sync.target_delay parameter in MongoShake 2.4.6 and later introduces a fixed delay between the primary and its synchronization targets, giving you a window to detect the problem, stop synchronization, and redirect traffic to a clean secondary before the bad data arrives.

This topic covers the incr_sync.target_delay parameter. For general MongoShake setup, see Use MongoShake to perform one-way synchronization between ApsaraDB for MongoDB instances.

Prerequisites

Before you begin, make sure you have:

MongoShake 2.4.6 or later. Download from the MongoShake releases page.
A source ApsaraDB for MongoDB replica set instance in a virtual private cloud (VPC). If the instance is on the classic network, switch it to VPC first.
A destination ApsaraDB for MongoDB replica set instance in the same VPC as the source, to minimize network latency. See Create a replica set instance.
An Elastic Compute Service (ECS) instance in the same VPC to run MongoShake, to minimize network latency. See Create an ECS instance.
The private IP address of the ECS instance added to the whitelists of both the source and destination instances, so MongoShake can connect to them. See Configure a whitelist.

If your network setup doesn't meet the VPC requirements, apply for public endpoints for both instances and add the ECS instance's public IP address to their whitelists instead. See Apply for a public endpoint and Configure a whitelist.

How it works

By default, MongoShake tails the primary instance's oplog and applies changes to secondary instances in near real time. This means a misoperation propagates to secondaries almost immediately.

Setting incr_sync.target_delay tells MongoShake to hold changes for the specified number of seconds before applying them. For example, with a delay of 1,800 seconds (30 minutes): if the current time is 10:00, the secondary reflects data as of 09:30. If you detect a misoperation before 09:30's data is applied, you can stop synchronization and redirect traffic to the secondary before the bad data arrives.

Configure delayed synchronization

The following steps use an Ubuntu ECS instance as the MongoShake host.

Log on to the ECS instance. See Connect to a Linux instance by using a username and password.

Download the MongoShake package:

wget <download-url>

For example:

wget https://github.com/alibaba/MongoShake/releases/download/release-v2.0.7-20190817/mongo-shake-2.0.7.tar.gz

Find the latest download URL on the MongoShake releases page.

Extract the package:

tar xvf <package-name>

For example:

tar xvf mongo-shake-2.0.7.tar.gz

Open collector.conf for editing:
```
vi collector.conf
```
Configure the source and destination connection strings and other required parameters. For the full parameter reference, see MongoShake parameters. Set incr_sync.target_delay to the desired delay in seconds. The following example sets a 30-minute buffer:
```
incr_sync.target_delay = 1800
```
Save and close collector.conf.
Start MongoShake:
```
./collector.linux -conf=collector.conf -verbose
```
MongoShake now applies changes to the secondary instance 30 minutes after they occur on the primary.

Recover from a misoperation

When a misoperation is detected on the primary instance — for example, an accidental write or a dropped collection — follow these steps to stop synchronization before the bad data reaches the secondary, then redirect traffic to the clean secondary.

Query the primary instance's oplog to identify when the misoperation occurred. The following example retrieves all oplog entries between June 1 and June 2, 2020:
```
use local  // Switch to the local database
db.oplog.rs.find({"o.createTime": {$gte: new Date(2020,6,1), $lte: new Date(2020,6,2)}})
```
For full oplog query syntax, see the MongoDB documentation.
Inject an ExitPoint to stop MongoShake at a specific point in time, before the misoperation is applied to the secondary:
```
curl -X POST --data '{"ExitPoint": <unix-timestamp>}' <mongoshake-host>:<port>/sentinel/options
```
For example:
```
curl -X POST --data '{"ExitPoint": 1593534600}' 127.0.0.1:9100/sentinel/options
```
The timestamp 1593534600 corresponds to 16:30:00 on June 30, 2020. MongoShake exits automatically when it reaches that point in the oplog.
Open collector.conf and exchange the IP addresses of the primary and secondary instances so the secondary becomes the new synchronization source.

Restart MongoShake:

./collector.linux -conf=collector.conf -verbose

Redirect your application's connection string to the new primary instance to complete the switchover.

Monitor the MongoShake status

For monitoring instructions, see Monitor the MongoShake status.