After data is migrated from a source instance to a destination instance by using Data Transmission Service (DTS), the data size on the source instance is different from the data size on the destination instance. This topic describes how to troubleshoot the issue that the data size is different before and after data migration.
Troubleshoot logic and methods
If the data sizes of the source and destination instances are different after you migrate data, you can check the following items in sequence:
Whether the number of documents in the table is consistent.
Whether the number and size of indexes are consistent.
Whether the reusable space size of the WiredTiger engine is consistent.
We recommend that you use mongo shell to separately connect to the source and destination instances and run the following commands to confirm the cause of the data size difference:
Number of documents
db.<collection>.stats().countIndexes
db.<collection>.stats().indexSizesReusable space
db.<collection>.stats().wiredTiger["block-manager"]["file bytes available for reuse"]
Test results
The following table describes the environment for the migration test that is performed in this topic.
Item | Description |
Region and zone | The source and destination instances are deployed in Zone H of the China (Hangzhou) region. |
Instance version and specifications | The version and specifications of the source instance are the same as those of the destination instance. Related parameters:
|
Number of databases or tables | Four tables are used. These tables are named test1, test2, test3, and test4. |
Test load | You can use YCSB to perform stress tests. |
Number of documents in a table | A table contains 4 million documents. |
Total document size | The total document size is around 10 MB. Each document consists of 10 fields that store pure binary data. |
The following table compares the data in the source and destination instances after data migration.
Item | Source instance | Destination instance | Difference (%) |
count (Documents) | 4 × 4000000 | 4 × 4000000 | 0% |
size (Logical storage size) | 4 × 4671518072 | 4 × 4671518072 | 0% |
storageSize (Physical storage size) | test1: 10284060672 test2: 9059483648 test3: 8771850240 test4: 9562763264 | test1: 4786364416 test2: 4786380800 test3: 4801978368 test4: 4786257920 | test1: 5.1 GB (53%) test2: 3.9 GB (47%) test3: 3.7 GB (45%) test5: 4.4 GB (50%) |
indexSize (Index size) | test1: {"_id_": 197046272} test2: {"_id_": 210997248} test3: {"_id_": 211079168} test4: {"_id_": 201854976} | test1: {"_id_": 97783808} test2: {"_id_": 97742848} test3: {"_id_": 102924288} test4: {"_id_": 97742848} | test1: 0.09 GB (50%) test2: 0.11 GB (53%) test3: 0.1 GB (51%) test4: 0.1 GB (52%) |
reuse size (Reusable space size) | test1: 4987596800 test2: 3646660608 test3: 3984535552 test4: 4774940672 | test1:237568 test2:237568 test3:176128 test4:237568 | N/A |
storage-reuse (Physical storage size - reusable space size) | test1: 5296463872 test2: 5412823040 test3: 4787314688 test4: 4787822592 | test1:4786126848 test2:4786143232 test3:4801802240 test4:4786020352 | test1: 0.47 GB (9.6%) test2: 0.58 GB (11.6%) test3: -13.4 MB (-0.3%) test4: 1.7 MB (< 0.1%) |
The following conclusions can be drawn from the preceding table:
The number of documents and logical storage size are consistent between the source and destination instances, which indicates that DTS has migrated data from the source instance to the destination instance.
The data and indexes of the source instance occupy more storage space. This is caused by a batch of UPDATE operations that are performed on the source instance before DTS migrates data.
The actual physical sizes of collections that have the same number of documents and logical size vary slightly.
The storage space sizes of the source and destination instances may still differ by about 10% except the index size and reusable space size.
Different write patterns may result in variations in storage size between two instances with the same number of documents. This is caused by factors such as the storage and splitting mechanism employed by WiredTiger, the way in which indexes are generated, the amount of internal fragmentation incurred by padding for alignment, and the compression ratio of data blocks. Therefore, the reasonable difference between the storage space sizes of the source and destination instances is within 10%.
How DTS works
Different from the physical backup and snapshot backup methods of ApsaraDB for MongoDB, DTS uses logical synchronization to allow for compatibility across different versions and architectures. All document data is reinserted. In this case, indexes are regenerated.
The data migration process in DTS consists of two steps: full migration and incremental migration. Both of these steps utilize logical synchronization, similar to the concepts of mongodump and mongorestore in MongoDB. The processes for full migration and incremental migration can be summarized as follows:
Full migration: DTS scans all records of each table in the source instance and batch inserts the records into the corresponding table in the destination instance. Theoretically, each table is concurrently migrated by using a single thread. If a single table contains a significant amount of data, DTS splits the large table into segments based on the _id field and assigns a separate thread to each segment for migration. DTS sets a maximum limit on the total number of threads used for migration.
Incremental migration: DTS pulls oplogs from the source instance and replays the oplogs on the destination instance.
For more information about DTS, see What is DTS?