How to migrate data from Amazon DynamoDB to Alibaba Cloud using NimoShake - ApsaraDB for MongoDB

NimoShake (also known as DynamoShake) is a data synchronization tool developed by Alibaba Cloud that migrates Amazon DynamoDB databases to ApsaraDB for MongoDB. It supports full migration, incremental migration, or both in a single run.

Feature support

Feature	Supported
Full migration	Yes
Incremental migration	Yes
Resumable transmission (incremental)	Yes
Resumable transmission (full)	No
Index migration (full phase only)	Yes
Schema-only migration	Yes
Collection filtering	Yes
Index migration (incremental phase)	No

Prerequisites

Before you begin, make sure you have:

An ApsaraDB for MongoDB replica set instance or sharded cluster instance. See Create a replica set instance or Create a sharded cluster instance.
The AccessKey ID and AccessKey secret for Amazon DynamoDB.
Enough storage space in ApsaraDB for MongoDB to hold all data from the source DynamoDB database. The destination storage capacity must exceed the source.

How it works

NimoShake runs full migration and incremental migration as separate phases.

Full migration

Full migration consists of two parts: data migration followed by index migration.

Data migration uses three thread types in a pipeline:

Thread	Description
Fetcher	Calls Amazon's protocol conversion driver to batch-retrieve data from the source table and place it in queues. Only one fetcher thread runs at a time.
Parser	Reads data from queues and converts it to BSON format, then passes it to executors. Default: 2 threads. Controlled by `full.document.parser`.
Executor	Pulls data from queues, aggregates up to 16 MB or 1,024 entries, and writes to the destination. Default: 4 threads. Controlled by `full.document.concurrency`.

Index migration runs after data migration completes and creates the following indexes:

Auto-generated indexes:
- If the source table has a partition key and a sort key: a unique compound index on both keys, plus a hashed index on the partition key.
- If the source table has only a partition key: a hashed index and a unique index on the partition key.
User-created indexes: A hashed index based on the primary key is created for each user-defined index.

Incremental migration

Incremental migration captures ongoing changes from the source and writes them to ApsaraDB for MongoDB. It does not migrate indexes created during the incremental phase.

Thread	Description
Fetcher	Monitors shard changes in the stream.
Manager	Manages message notification and creates a Dispatcher for each shard.
Dispatcher	Retrieves incremental data from the source, resuming from the last checkpoint when resumable transmission is active.
Batcher	Parses, packages, and aggregates incremental data from Dispatcher threads.
Executor	Writes aggregated data to the destination ApsaraDB for MongoDB instance and updates the checkpoint.

Resumable transmission and checkpoints

Incremental migration supports resumable transmission through checkpoints. If a connection is lost and recovered quickly, migration resumes from the last checkpoint. A prolonged disconnection or loss of the checkpoint may trigger a full migration again.

By default, checkpoints are stored in the destination ApsaraDB for MongoDB database in a database named nimo-shake-checkpoint. Each collection has its own checkpoint table, and a status_table records whether the current sync is a full or incremental task.

Full migration does not support resumable transmission. If a full migration is interrupted, it restarts from the beginning.

Migrate DynamoDB to ApsaraDB for MongoDB

The following steps use Ubuntu as an example.

Step 1: Download NimoShake

wget https://github.com/alibaba/NimoShake/releases/download/release-v1.0.14-20250704/nimo-shake-v1.0.14.tar.gz

Download the latest version from the NimoShake releases page.

Step 2: Extract the package

tar zxvf nimo-shake-v1.0.14.tar.gz

Step 3: Enter the directory

cd nimo-shake-v1.0.14

Step 4: Configure NimoShake

Open the configuration file:

vi nimo-shake.conf

The tables below describe all configuration parameters, grouped by category. Start with the required parameters, then adjust optional parameters as needed.

Required parameters

Parameter	Description	Example
`source.access_key_id`	The AccessKey ID for the Amazon DynamoDB database.	`source.access_key_id = AKIAIOSFODNN7EXAMPLE`
`source.secret_access_key`	The AccessKey secret for the Amazon DynamoDB database.	`source.secret_access_key = wJalrXUtnFEMI/K7MDENG`
`source.region`	The AWS region of the DynamoDB database. Optional if the region is auto-detected or not applicable.	`source.region = us-east-2`
`target.type`	The type of the destination database. Set to `mongodb` for ApsaraDB for MongoDB. Set to `aliyun_dynamo_proxy` for a DynamoDB-compatible ApsaraDB for MongoDB instance. For more MongoDB addresses, see Connect to a replica set instance or Connect to a sharded cluster instance.	`target.type = mongodb`
`target.address`	The connection string of the destination database. See Connect to a replica set instance or Connect to a sharded cluster instance.	`target.address = mongodb://username:password@s-*****-pub.mongodb.rds.aliyuncs.com:3717`
`target.mongodb.type`	The type of the destination ApsaraDB for MongoDB instance. `replica` for a replica set instance. `sharding` for a sharded cluster instance.	`target.mongodb.type = sharding`
`sync_mode`	The migration mode. `all`: runs full migration followed by incremental migration. `full`: runs full migration only. Default: `all`. Note Only `full` is supported when the source is a DynamoDB-compatible ApsaraDB for MongoDB instance.	`sync_mode = all`

General settings

Parameter	Default	Required	Description	Example
`id`	—	Optional	The ID of the migration task. Used for PID files, log names, the checkpoint database name, and the destination database name.	`id = nimo-shake`
`log.file`	stdout	Optional	The path of the log file. If not set, logs are printed to stdout.	`log.file = nimo-shake.log`
`log.level`	`info`	Optional	The log level. Valid values: `none`, `error`, `warn`, `info`, `debug`.	`log.level = info`
`log.buffer`	`true`	Optional	Whether to enable log buffering. `true`: high performance, but may lose the last few log entries on exit. `false`: all log entries are flushed, but performance may decrease.	`log.buffer = true`
`system_profile`	—	Optional	The PPROF port for debugging and viewing stackful coroutine information.	`system_profile = 9330`
`full_sync.http_port`	—	Optional	The RESTful port for the full migration phase. Use `curl` to view monitoring statistics. See the wiki.	`full_sync.http_port = 9341`
`incr_sync.http_port`	—	Optional	The RESTful port for the incremental migration phase. Use `curl` to view monitoring statistics. See the wiki.	`incr_sync.http_port = 9340`

Source connection settings

Parameter	Default	Required	Description	Example
`source.session_token`	—	Optional	The temporary session token for accessing DynamoDB. Required only when using temporary credentials.	`source.session_token = AQoXnyc4lcK4w4...`
`source.endpoint_url`	—	Optional	The endpoint URL, if the source is an endpoint type. *Setting this parameter overrides all other `source.` parameters.**	`source.endpoint_url = "http://192.168.0.1:1010"`
`source.session.max_retries`	—	Optional	The maximum number of retries after a session failure.	`source.session.max_retries = 3`
`source.session.timeout`	—	Optional	The session timeout in milliseconds. Set to `0` to disable the timeout.	`source.session.timeout = 3000`

Collection filtering

Parameter	Default	Required	Description	Example
`filter.collection.white`	—	Optional	Whitelist of collections to migrate. Only the listed collections are migrated.	`filter.collection.white = c1;c2`
`filter.collection.black`	—	Optional	Blacklist of collections to exclude. All other collections are migrated. Cannot be used together with `filter.collection.white`. If both are set, all collections are migrated.	`filter.collection.black = c1;c2`

Destination settings

Parameter	Default	Required	Description	Example
`target.db.exist`	Error	Optional	How to handle existing collections with the same name at the destination. `rename`: renames the existing collection by appending a timestamp suffix (for example, `c1` becomes `c1.2019-07-01Z12:10:11`). `drop`: deletes the existing collection. If not set, the migration stops with an error if a same-name collection exists.	`target.db.exist = drop`
`sync_schema_only`	`false`	Optional	Whether to migrate only the table schema without data.	`sync_schema_only = false`

Full migration performance

Parameter	Default	Required	Description	Example
`full.concurrency`	`4`	Optional	The maximum number of collections migrated concurrently.	`full.concurrency = 4`
`full.read.concurrency`	`1`	Optional	The number of concurrent threads reading from a single source table. Corresponds to the `TotalSegments` parameter of the DynamoDB Scan API.	`full.read.concurrency = 1`
`full.document.concurrency`	`4`	Optional	The number of concurrent writer threads per table.	`full.document.concurrency = 4`
`full.document.write.batch`	—	Optional	The number of entries aggregated per write. When the destination is a DynamoDB-compatible database, the maximum value is 25.	`full.document.write.batch = 25`
`full.document.parser`	`2`	Optional	The number of concurrent parser threads for converting DynamoDB data to the destination protocol.	`full.document.parser = 2`
`full.enable_index.user`	`true`	Optional	Whether to migrate user-defined indexes.	`full.enable_index.user = true`
`full.executor.insert_on_dup_update`	`true`	Optional	Whether to convert an `INSERT` to `UPDATE` when a duplicate key exists at the destination.	`full.executor.insert_on_dup_update = true`
`qps.full`	`1000`	Optional	The maximum number of `Scan` command calls per second during full migration.	`qps.full = 1000`
`qps.full.batch_num`	`128`	Optional	The number of data entries pulled per second during full migration.	`qps.full.batch_num = 128`
`full.read.filter_expression`	—	Optional	A DynamoDB filter expression for full migration. Variables start with a colon, for example `:begin` and `:end`. Specify the variable values in `full.read.filter_attributevalues`.	`full.read.filter_expression = create_time > :begin AND create_time < :end`
`full.read.filter_attributevalues`	—	Optional	The values for variables in `full.read.filter_expression`. `N` represents Number, `S` represents String.	`full.read.filter_attributevalues = begin```N```1646724207280~~~end```N```1646724207283`

Incremental migration performance

Skip this section if you are running full migration only (sync_mode = full).

Parameter	Default	Required	Description	Example
`incr_sync_parallel`	`false`	Optional	Whether to enable parallel incremental migration. `true`: uses more memory. `false`: standard mode.	`incr_sync_parallel = false`
`increase.concurrency`	`16`	Optional	The maximum number of shards captured concurrently.	`increase.concurrency = 16`
`increase.executor.insert_on_dup_update`	`true`	Optional	Whether to convert an `INSERT` to `UPDATE` when the same keys exist at the destination.	`increase.executor.insert_on_dup_update = true`
`increase.executor.upsert`	`true`	Optional	Whether to convert an `UPDATE` to `UPSERT` when no matching keys exist at the destination. An `UPSERT` updates the record if the key exists, or inserts it if it does not.	`increase.executor.upsert = true`
`qps.incr`	`1000`	Optional	The maximum number of `GetRecords` command calls per second during incremental migration.	`qps.incr = 1000`
`qps.incr.batch_num`	`128`	Optional	The number of data entries pulled per second during incremental migration.	`qps.incr.batch_num = 128`

Checkpoint settings