All Products
Search
Document Center

ApsaraDB for MongoDB:Why does the data size change after data is migrated by using DTS?

Last Updated:Nov 10, 2023

After data is migrated from a source instance to a destination instance by using Data Transmission Service (DTS), the data size on the source instance is different from the data size on the destination instance. This topic describes how to troubleshoot the issue that the data size is different before and after data migration.

Troubleshoot logic and methods

If the data sizes of the source and destination instances are different after you migrate data, you can check the following items in sequence:

  • Whether the number of documents in the table is consistent.

  • Whether the number and size of indexes are consistent.

  • Whether the reusable space size of the WiredTiger engine is consistent.

We recommend that you use mongo shell to separately connect to the source and destination instances and run the following commands to confirm the cause of the data size difference:

  • Number of documents

    db.<collection>.stats().count
  • Indexes

    db.<collection>.stats().indexSizes
  • Reusable space

    db.<collection>.stats().wiredTiger["block-manager"]["file bytes available for reuse"]

Test results

The following table describes the environment for the migration test that is performed in this topic.

Item

Description

Region and zone

The source and destination instances are deployed in Zone H of the China (Hangzhou) region.

Instance version and specifications

The version and specifications of the source instance are the same as those of the destination instance. Related parameters:

  • Storage Engine: WiredTiger

  • Engine Version: MongoDB 4.2

  • Specifications: 4 Cores, 8 GB

  • Storage Type: Local SSD

  • Storage: 500 GB

Number of databases or tables

Four tables are used. These tables are named test1, test2, test3, and test4.

Test load

You can use YCSB to perform stress tests.

Number of documents in a table

A table contains 4 million documents.

Total document size

The total document size is around 10 MB. Each document consists of 10 fields that store pure binary data.

The following table compares the data in the source and destination instances after data migration.

Item

Source instance

Destination instance

Difference (%)

count

(Documents)

4 × 4000000

4 × 4000000

0%

size

(Logical storage size)

4 × 4671518072

4 × 4671518072

0%

storageSize

(Physical storage size)

test1: 10284060672

test2: 9059483648

test3: 8771850240

test4: 9562763264

test1: 4786364416

test2: 4786380800

test3: 4801978368

test4: 4786257920

test1: 5.1 GB (53%)

test2: 3.9 GB (47%)

test3: 3.7 GB (45%)

test5: 4.4 GB (50%)

indexSize

(Index size)

test1: {"_id_": 197046272}

test2: {"_id_": 210997248}

test3: {"_id_": 211079168}

test4: {"_id_": 201854976}

test1: {"_id_": 97783808}

test2: {"_id_": 97742848}

test3: {"_id_": 102924288}

test4: {"_id_": 97742848}

test1: 0.09 GB (50%)

test2: 0.11 GB (53%)

test3: 0.1 GB (51%)

test4: 0.1 GB (52%)

reuse size

(Reusable space size)

test1: 4987596800

test2: 3646660608

test3: 3984535552

test4: 4774940672

test1:237568

test2:237568

test3:176128

test4:237568

N/A

storage-reuse

(Physical storage size - reusable space size)

test1: 5296463872

test2: 5412823040

test3: 4787314688

test4: 4787822592

test1:4786126848

test2:4786143232

test3:4801802240

test4:4786020352

test1: 0.47 GB (9.6%)

test2: 0.58 GB (11.6%)

test3: -13.4 MB (-0.3%)

test4: 1.7 MB (< 0.1%)

The following conclusions can be drawn from the preceding table:

  • The number of documents and logical storage size are consistent between the source and destination instances, which indicates that DTS has migrated data from the source instance to the destination instance.

  • The data and indexes of the source instance occupy more storage space. This is caused by a batch of UPDATE operations that are performed on the source instance before DTS migrates data.

  • The actual physical sizes of collections that have the same number of documents and logical size vary slightly.

  • The storage space sizes of the source and destination instances may still differ by about 10% except the index size and reusable space size.

Different write patterns may result in variations in storage size between two instances with the same number of documents. This is caused by factors such as the storage and splitting mechanism employed by WiredTiger, the way in which indexes are generated, the amount of internal fragmentation incurred by padding for alignment, and the compression ratio of data blocks. Therefore, the reasonable difference between the storage space sizes of the source and destination instances is within 10%.

How DTS works

Different from the physical backup and snapshot backup methods of ApsaraDB for MongoDB, DTS uses logical synchronization to allow for compatibility across different versions and architectures. All document data is reinserted. In this case, indexes are regenerated.

The data migration process in DTS consists of two steps: full migration and incremental migration. Both of these steps utilize logical synchronization, similar to the concepts of mongodump and mongorestore in MongoDB. The processes for full migration and incremental migration can be summarized as follows:

  • Full migration: DTS scans all records of each table in the source instance and batch inserts the records into the corresponding table in the destination instance. Theoretically, each table is concurrently migrated by using a single thread. If a single table contains a significant amount of data, DTS splits the large table into segments based on the _id field and assigns a separate thread to each segment for migration. DTS sets a maximum limit on the total number of threads used for migration.

  • Incremental migration: DTS pulls oplogs from the source instance and replays the oplogs on the destination instance.

For more information about DTS, see What is DTS?