Fragmentation may occur in the disk space when you frequently write and delete large amounts of data in ApsaraDB for MongoDB databases. The fragments occupy disk space and reduce disk usage. You can rewrite and defragment all the data and indexes in a collection to release idle space. This improves disk usage and query performance.

Prerequisites

The ApsaraDB for MongoDB instance uses WiredTiger as the storage engine.

Important notes

  • We recommend that you back up data in ApsaraDB for MongoDB databases before defragmentation. For more information, see Manually back up an ApsaraDB for MongoDB instance.
  • During defragmentation, the database where the collection is stored is locked and read and write operations are not allowed in the database. We recommend that you defragment the disk space during off-peak hours.
    Note The time required for defragmenting the disk space through the compact command depends on multiple factors, such as the data volume of the collection and the system load.

Background

How the disk space is reclaimed

Generally, if you run the db.collection.remove({}, {multi: true}) command to delete a document from the B tree, the disk space occupied by the document is not reclaimed. If you run the remove command to delete a large number of documents, but write little data to the disk later, disk usage is reduced. In this case, you can run the compact command to reclaim the idle disk space.

Note
  • The newly written data occupies the disk space that is not reclaimed. Therefore, you do not need to frequently run the compact command for defragmentation in scenarios where data is continuously written.
  • If you run the db.collection.drop() command to delete a collection, the files in the collection are deleted and the disk space occupied by the files is reclaimed.

Estimate the disk space to be reclaimed

  1. Connect to the ApsaraDB for MongoDB instance by using the mongo shell. For more information, see the following topics:
  2. Run the following command to switch to the database where the collection is stored:
    use <database_name>
    Note <database_name>: the name of the database.
  3. Run the following command to query the disk space that can be reclaimed from the collection:
    db.<collection_name>.stats().wiredTiger["block-manager"]["file bytes available for reuse"]
    Note <collection_name>: the name of the collection.

    Sample command:

    db.customer.stats().wiredTiger["block-manager"]["file bytes available for reuse"]

    Sample result:

    207806464

Defragment a standalone instance or a replica set instance

  1. Connect to the primary node of an ApsaraDB for MongoDB instance by using the mongo shell. For more information, see Connect to a replica set instance by using the mongo shell.
  2. Run the following command to switch to the database where the collection is stored:
    use <database_name>
    Note <database_name>: the name of the database.
  3. Run the db.stats() command to view the disk space occupied by the database before defragmentation.
  4. Run the following command to defragment a collection:
    db.runCommand({compact:"<collection_name>",force:true})
    Note
    • <collection_name>: the name of the collection.
    • The force parameter is optional. To run the compact command on the primary node of a replica set instance, you must set the force parameter to true.
  5. Wait until {"ok":1} is returned, indicating that the command is executed.
    Note The compact command executed on the primary node does not affect a secondary node. For a replica set instance, repeat the preceding steps to connect to a secondary node through the mongo shell and run the compact command.
    After defragmentation is complete, you can run the db.stats() command to view the disk space occupied by the database. The following figure shows the storage size before and after defragmentation.Storage size before and after defragmentation

Defragment a sharded cluster instance

  1. Connect to any mongos node in the sharded cluster instance by using the mongo shell. For more information, see Connect to a sharded cluster instance by using the mongo shell.
  2. Run the db.stats() command to view the disk space occupied by the database before defragmentation.
  3. Run the following command to defragment a collection on the primary node of a shard:
    db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>",force:true}})
    Note
    • <Shard ID>: the ID of the shard.
    • <collection_name>: the name of the collection.
  4. Run the following command to defragment a collection on a secondary node of a shard:
    db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>"},$queryOptions: {$readPreference: {mode: 'secondary'}}})
    Note
    • <Shard ID>: the ID of the shard.
    • <collection_name>: the name of the collection.
    After defragmentation is complete, you can run the db.runCommand({dbstats:1}) command to view the disk space occupied by the database.