×
Community Blog Performance Comparison Between Alibaba Cloud DTS and MongoShake

Performance Comparison Between Alibaba Cloud DTS and MongoShake

The article focuses on the performance comparison of DTS MongoDB-to-MongoDB and MongoShake data synchronization.

Introduction

MongoShake is a general-purpose Platform as a Service (PaaS) tool based on MongoDB, serving as a bridge that connects the channels of each closed-loop node. Written in Go language, MongoShake replicates MongoDB cluster data by reading oplogs to fulfill specific requirements.

Compared with MongoShake, DTS MongoDB-to-MongoDB synchronization has the following advantages:

• You can monitor the synchronization process in the Alibaba Cloud DTS console. This includes tracking the status of the synchronization task, such as whether it is in the process of structure migration, full migration, or incremental migration, and viewing synchronized RPS and BPS, as well as identifying any failures or latency in the synchronization task.

• As Alibaba Cloud DTS is a third-party migration tool, the source MongoDB and destination MongoDB can belong to different VPC networks.

• It supports ETL, object name mapping, and data filtering, enabling synchronization of specific table data based on defined conditions and the ability to synchronize only certain DML or DDL operations.

• You can pause and restart a synchronization task at any time, and it supports resumable upload.

• It provides full concurrent slicing for single tables where multiple types of _id exist.

• For source instances within a MongoDB sharded cluster, DTS reads data from shards and performs full and incremental data synchronization using oplogs. During incremental migration, there is no need to disable the balancer for the source instance.

This article focuses on the performance of DTS MongoDB-to-MongoDB and MongoShake data synchronization, comparing replica set/sharded cluster architecture, single table/multiple tables, and full/incremental synchronization.

1. Replica Set

1.1 Instance Configuration

Source: MongoDB 4.4, 8 cores, 16 GB memory, 300 GB storage.

Destination: MongoDB 4.4, 8 cores, 16 GB memory, 300 GB storage.

MongoShake 2.6.6 is deployed on Alibaba Cloud ECS CentOS 7.

Instance type: ecs.u1-c1m2.2xlarge (8 cores, 16 GB memory). It connects to the source or destination MongoDB instance by using a private connection in a virtual private cloud (VPC).

TABLE1

DTS connects to the source/destination MongoDB instance through the reverse VPC.

1.2 Single-table Full Migration

1.2.1 Source Data Preparation

Use the open source tool ycsb to generate test data for the source instance.

For the usage of ycsb, see https://github.com/brianfrankcooper/YCSB/

In this test, the source instance is a single table that contains 10 million pieces of data. Each piece of data contains 10 fields, and each field is 100 bytes.

Use the ycsb workloada template to generate data. The following are some parameters:

recordcount=10000000
operationcount=10000000
workload=site.ycsb.workloads.CoreWorkload
readallfields=true
fieldcount=10
fieldlength=100
readproportion=0
updateproportion=0.3
scanproportion=0
insertproportion=0.7
requestdistribution=zipfian

The following is a sample instruction to execute ycsb:

The directory where cd ycsb is stored:
./bin/ycsb load mongodb -s -P workloads/workloada -p mongodb.url=mongodb://user:password@server1.example.com:9999,server2.example.com:9999/dbname -p table=tablename -threads 64

1.2.2 MongoShake Single-table Full Migration Test

For the usage of MongoShake, see https://www.alibabacloud.com/blog/mongoshake-–-a-mongodb-based-cross-data-center-data-replication-platform_593813

1) MongoShake Parameter Settings

Modify the MongoShake collector.config. The following are some parameters:

# --------------------------- full sync configuration ---------------------------
# the number of collection concurrence
# The maximum number of tables that can be pulled concurrently. For example, 6 indicates that a maximum of six tables can be pulled by shake at the same time. 
full_sync.reader.collection_parallel = 6
# the number of document writer thread in each collection.
# The number of concurrent threads to write data to the same table. For example, 8 indicates that eight write threads concurrently write data to a table. 
full_sync.reader.write_document_parallel = 40
# number of documents in a batch insert in a document concurrence
# The size of the batch written to the destination instance. For example, 128 indicates that a thread aggregates 128 documents at a time and then writes them to the destination. 
full_sync.reader.document_batch_size = 1024
# max number of fetching thread per table. default is 1
# The maximum number of threads for pulling a table. By default, it is single-thread pulling. The splitVector permission is required. 
# Note: Only single tables whose values corresponding to indexes are of the same type are supported. If the table has different types of values, do not enable this configuration. 
full_sync.reader.parallel_thread = 3
# the parallel query index if set full_sync.reader.parallel_thread. index should only has
# 1 field.
# If full_sync.reader.parallel_thread is configured, you need to set this parameter to pull the scanned index and value in parallel.
# Must be of the same type. For replica sets, _id is recommended. For cluster editions, shard_key is recommended. A key can have only one field. 
full_sync.reader.parallel_index = _id
# enable majority write in full sync.
# the performance will degrade if enable.
# Whether majority write is enabled on the write side in the full phase.
full_sync.executor.majority_enable = true

2) MongoShake Test Result

During the migration, the CPU, memory, and internal bandwidth of the ECS instance where MongoShake is located encounter no bottlenecks.

1
2
3

According to the monitoring information of the destination MongoDB instance, the QPS of the preceding MongoShake single-table full migration is about 8,000.

4

1.2.3 DTS Single-table Full Migration Test

At present, DTS MongoDB-to-MongoDB full migration is available for free trial. The link of 2xlarge specification is used for the incremental migration test.

By default, the memory size is 8 GB, and 40 threads concurrently write data to the destination instance.

According to the monitoring information of the destination MongoDB instance, the QPS of the preceding DTS single-table full migration is about 11,000.

5

1.3 Multi-table Full Migration

1.3.1 Source Data Preparation

Use the open source tool ycsb to generate test data for the source instance.

In this test, the source instance is 5 tables. Each table contains 5 million pieces of data, each piece of data contains 10 fields, and each field is 100 bytes.

Use the ycsb workloada template to generate data. The following are some parameters:

recordcount=5000000
operationcount=5000000
workload=site.ycsb.workloads.CoreWorkload
readallfields=true
fieldcount=10
fieldlength=100
readproportion=0
updateproportion=0.3
scanproportion=0
insertproportion=0.7
requestdistribution=zipfian

1.3.2 MongoShake Multi-table Full Migration Test

During the migration, the CPU, memory, and internal bandwidth of the ECS instance where MongoShake is located encounter no bottlenecks.

6
7
8

According to the monitoring information of the destination MongoDB instance, the QPS of the preceding MongoShake multi-table full migration is about 18,000.

9

1.3.3 DTS Multi-table Full Migration Test

At present, DTS MongoDB-to-MongoDB full migration is available for free trial. The link of 2xlarge specification is used for the incremental migration test.

By default, the memory size is 8 GB, and 40 threads concurrently write data to the destination instance. Considering the stress of reading data on the source instance, DTS limits the number of threads that pull data from the source instance.

According to the monitoring information of the destination MongoDB instance, the QPS of the preceding DTS multi-table full migration is about 15,000 by default. After DTS modifies the number of read threads according to the setting of three read threads for a single table in MongoShake, the QPS of the preceding multi-table full migration is about 20,000.

10

1.4 Incremental Data Migration

1.4.1 Source Data Preparation

Use the open source tool ycsb to generate test data for the source instance.

In this test, the source is a single table that contains 5 million pieces of data. Each piece of data contains 10 fields, and each field is 100 bytes.

Use the ycsb workloada template to generate data. The parameters are the same as those in the previous section.

1.4.2 MongoShake Incremental Data Migration Test

1) MongoShake Parameter Settings

Modify the MongoShake collector.config. The following are some parameters:

# --------------------------- incrmental sync configuration ---------------------------
# fetch method:
# oplog: fetch oplog from source mongodb (default)
# change_stream: use change to receive change event from source mongodb, support MongoDB >= 4.0.
# we recommand to use change_stream if possible.
incr_sync.mongo_fetch_method = oplog
# The synchronization mode. All indicates full and incremental synchronization, full indicates full synchronization, and incr indicates incremental synchronization. 
sync_mode = incr
# The hash method. id indicates hash by document, collection indicates hash by table, and auto indicates the hash type is automatically selected. 
# If there is no index, id is recommended to achieve high synchronization performance. Otherwise, select collection. 
incr_sync.shard_key = auto
# The number of workers sent internally (written to the destination DB). If the machine has sufficient performance, the number of workers can be increased. 
incr_sync.worker = 40
# batch_queue_size: The queue length of each worker thread. The worker threads pull tasks from this queue.
# batching_max_size: The number of documents contained in a task distributed to the worker at a time.
# buffer_capacity: The minimum number of documents contained in a buffer in the PendingQueue queue for serialization.
incr_sync.worker.batch_queue_size = 64
incr_sync.adaptive.batching_max_size = 1024
incr_sync.fetcher.buffer_capacity = 256
# Whether majority write is enabled on the write side in the incremental phase.
incr_sync.executor.majority_enable = true

2) MongoShake Test Result

During the migration, the CPU, memory, and internal bandwidth of the ECS instance where MongoShake is located encounter no bottlenecks.

11
12
13

According to the monitoring information of the destination MongoDB instance, the QPS of the preceding MongoShake incremental migration is about 44,000.

14

1.4.3 DTS Incremental Migration Test

The link of 2xlarge specification is used for the incremental migration test. 40 threads concurrently write data to the destination instance.

According to the monitoring information of the destination MongoDB instance, the QPS of the preceding DTS incremental migration is about 49,000.

15

1.5 Summary

Migration Type/QPS MongoShake DTS
Single-table full migration 8,000 QPS 11,000 QPS
Multi-table full migration 18,000 QPS 20,000 QPS
Incremental migration 44,000 QPS 49,000 QPS

2 Sharded Cluster

2.1 Architecture Description

If the source instance is a MongoDB sharded cluster, MongoShake supports two incremental synchronization solutions: 1) Directly read the oplogs of each shard and replay them. However, the balancer for the source instance must be disabled. 2) Use Mongos to obtain the change stream. The balancer for the source instance may be enabled, but the performance is lower than the oplog solution.

DTS directly pulls oplogs for incremental synchronization and uses distributed subtasks to coordinate synchronization links corresponding to each shard. This solves the problem caused by chunk moves between source instance shards during synchronization, allowing the balancer for the source instance to be enabled during incremental migration.

Considering that the solution of directly reading data from each shard and migrating oplogs is similar to the case where the source is a replica set, this topic mainly compares the performance of MongoShake incremental synchronization (using change streams) with the performance of DTS incremental synchronization when the balancer for the source instance is enabled.

2.2 Instance Configuration

Source: MongoDB 4.2, two Mongos: 8 cores, 16 GB memory, 3 shards: 8 cores, 16 GB memory, 200 GB storage.

Destination: MongoDB 4.4, two Mongos: 8 cores, 16 GB memory, 3 shards: 8 cores, 16 GB memory, 200 GB storage.

MongoShake 2.6.6 is deployed on Alibaba Cloud ECS CentOS 7. Instance type: ecs.u1-c1m2.2xlarge (8 cores, 16 GB memory). It connects to the source or destination MongoDB instance by using a private connection in a virtual private cloud (VPC).

16

DTS connects to the source/destination MongoDB instance through the reverse VPC.

2.3 Incremental Migration

2.3.1 Source Data Preparation

The shard key {"_id":"hashed"} for the source and destination MongoDB instances is pre-configured.

Use the open source tool ycsb to generate test data for the source instance.

In this test, the source instance is a single table that contains 10 million pieces of data. Each piece of data contains 10 fields, and each field is 100 bytes.

Use the ycsb workloada template to generate data. The following are some parameters:

recordcount=10000000
operationcount=10000000
workload=site.ycsb.workloads.CoreWorkload
readallfields=true
fieldcount=10
fieldlength=100
readproportion=0
updateproportion=0.3
scanproportion=0
insertproportion=0.7
requestdistribution=zipfian

2.3.2 MongoShake Change Streams for Incremental Migration Test

1) MongoShake Parameter Settings

Modify the MongoShake collector.config. The following are some parameters:

# --------------------------- incrmental sync configuration ---------------------------
# fetch method:
# oplog: fetch oplog from source mongodb (default)
# change_stream: use change to receive change event from source mongodb, support MongoDB >= 4.0.
# we recommand to use change_stream if possible.
incr_sync.mongo_fetch_method = change_stream
# The synchronization mode. All indicates full and incremental synchronization, full indicates full synchronization, and incr indicates incremental synchronization. 
sync_mode = incr
# The hash method. id indicates hash by document, collection indicates hash by table, and auto indicates the hash type is automatically selected. 
# If there is no index, id is recommended to achieve high synchronization performance. Otherwise, select collection. 
incr_sync.shard_key = auto
# The number of workers sent internally (written to the destination DB). If the machine has sufficient performance, the number of workers can be increased. 
incr_sync.worker = 40
# batch_queue_size: The queue length of each worker thread. The worker threads pull tasks from this queue.
# batching_max_size: The number of documents contained in a task distributed to the worker at a time.
# buffer_capacity: The minimum number of documents contained in a buffer in the PendingQueue queue for serialization.
incr_sync.worker.batch_queue_size = 64
incr_sync.adaptive.batching_max_size = 1024
incr_sync.fetcher.buffer_capacity = 256
# Whether majority write is enabled on the write side in the incremental phase.
incr_sync.executor.majority_enable = true

2) MongoShake Test Result

When the balancer for the MongoDB sharded cluster is enabled, MongoShake only supports one incremental synchronization solution, that is, to use Mongos to obtain the change stream.

During the migration, the CPU, memory, and internal bandwidth of the ECS instance where MongoShake is located encounter no bottlenecks.

17
18
19

According to the monitoring information of the destination MongoDB instance, during the preceding MongoShake incremental migration, the QPS of Mongos1 is about 28,000, while no write traffic is found on Mongos2. The total migration speed is about 28,000.

Mongos1: The average QPS is 28,000.

20

Mongos2: No write traffic for incremental migration is found.

21

2.3.3 DTS Incremental Migration Test

DTS directly pulls oplogs for incremental synchronization and creates distributed subtasks to coordinate synchronization links corresponding to each shard. This allows the balancer for the MongoDB sharded cluster to be enabled during incremental migration.

The link of 2xlarge specification is used for the incremental migration test. 40 threads concurrently write data to the destination instance. According to the monitoring information of the destination MongoDB instance, during the preceding incremental migration, the QPS of Mongos1 and Mongos2 is about 23,000, and the total migration speed is about 46,000.

Mongos1: The average QPS is 23,000.

22

Mongos2: The average QPS is 23,000.

23

2.4 Summary

Migration Type/QPS MongoShake(ChangeStream) DTS
Incremental migration 28,000 QPS 46,000 QPS

3. Overall Summary

The article focuses on the performance comparison of DTS MongoDB-to-MongoDB and MongoShake data synchronization. It compares the performance in terms of replica set/sharded cluster architecture, single table/multiple tables, and full/incremental synchronization. Overall, DTS demonstrates better synchronization performance than MongoShake. The specific results are as follows:

Table

The Data Transmission Service (DTS) supports various data sources such as relational databases, NoSQL, and big data (OLAP), offering integrated data migration, subscription, real-time synchronization, and data verification functions. It addresses the challenges of long-distance, near-real-time asynchronous data transmission in public cloud and hybrid cloud scenarios.

Learn more: https://www.alibabacloud.com/product/data-transmission-service

0 1 0
Share on

ApsaraDB

425 posts | 91 followers

You may also like

Comments