When data of an ApsaraDB for MongoDB instance is deleted, the disk space is not reclaimed. Such unreclaimed disk space is called disk fragments. Fragments may occur in disks when you frequently delete data and delete large amounts of data. This topic describes how to defragment a disk to improve disk usage.

Prerequisites

The storage engine of the instance is WiredTiger.

Background information

When you run the db.runCommand({collStats: <collection_name>}) command to access a node, the following keywords are returned: size and storageSize. size indicates the logical storage size of the collection. storageSize indicates the physical storage size of the collection. If you run the remove command to delete documents, the size value decreases, but the storageSize value does not necessarily decrease. By comparing the size and storageSize values, you can determine whether disk fragments are generated.
Note
  • For ApsaraDB for MongoDB instances that run MongoDB 4.4 or later, you can also use the freeStorageSize keyword in the results to view the disk space that are idle and can be reclaimed in disk fragments.
  • For more information about the size, storageSize, and freeStorageSize keywords, see collStats-Output.

The compact command can be used in ApsaraDB for MongoDB to defragment data. You can run the compact command to reclaim disk fragments that are generated after data is deleted to improve disk usage. For more information about the compact command, see compact.

If the disk fragmentation percentage is not high, the compact command does not work well due to the limits of the WiredTiger storage engine of ApsaraDB for MongoDB instances. Therefore, we recommend that you do not frequently run the compact command.

When a collection contains a small amount of data, you can copy the data to a new collection, and then run the db.collection.drop() command to delete the existing collection to improve disk usage. If you run the db.collection.drop() command to delete a collection, all documents in the collection are deleted and the disk space occupied by the documents is reclaimed.

Precautions

  • Before you defragment a disk, we recommend that you first back up data . For more information about how to back up data, see Manually back up data of an ApsaraDB for MongoDB instance.
  • During defragmentation, the database where the collection is stored is locked and read and write operations are not allowed on the database. We recommend that you defragment disks during off-peak hours.
    Note The time required to defragment a disk by running the compact command depends on factors such as the amount of data in the collection and the system load.

Estimate disk space to be reclaimed

  1. Use the mongo shell to connect to an ApsaraDB for MongoDB instance. Connection methods vary based on the instance architecture. For more information, see the following topics:
  2. Switch to the database where the collection is stored.
    Syntax:
    use <database_name>
    <database_name>: the name of the database to which the collection belongs.
    Note You can run the show dbs command to query the name of the current database.

    Example:

    Switch to the test_database database.
    use test_database
  3. View the disk space to be reclaimed for the collection.
    Syntax:
    db.<collection_name>.stats().wiredTiger["block-manager"]["file bytes available for reuse"]
    <collection_name> the name of the collection.
    Note You can run the show tables command to query the name of the current collection.
    Example:
    db.test_database_collection.stats().wiredTiger["block-manager"]["file bytes available for reuse"]
    Sample result:
    207806464

    This result indicates that the estimated disk space to be reclaimed is 207,806,464 bytes.

Defragment the disk of a standalone instance or a replica set instance

  • A standalone instance has only one node. You can connect to the primary node and run the compact command to defragment the disk of the primary node.
  • A replica set instance has multiple nodes. You must connect to the primary node and the secondary node. Run the compact command on different nodes to defragment the disks of the nodes.
    Note If a replica set instance has a read-only node, you must defragment the disk of the read-only node by using the method similar to the method to defragment the disks of the primary and secondary nodes.
  1. Connect to a standalone or replica set instance by using the mongo shell. Connection methods vary based on the instance architecture. For more information, see the following topics:
  2. Switch to the database where the collection is stored.
    Syntax:
    use <database_name>
    <database_name>: the name of the database to which the collection belongs.
    Note You can run the show dbs command to query the name of the current database.

    Example:

    Switch to the replica_database database.
    use replica_database
  3. View the disk space occupied by the database before defragmentation.
    db.stats()
    Note This command can be used without any changes.
  4. Defragment the disk for the collection.
    Syntax:
    db.runCommand({compact:"<collection_name>",force:true})
    Parameters:
    • <collection_name>: the name of the collection.
      Note You can run the show tables command to query the name of the current collection.
    • force: Optional. Set the value to true.

      This parameter is required if you run the command on the primary node of an ApsaraDB for MongoDB instance that runs MongoDB 4.2 or earlier.

    Example:

    db.runCommand({compact:"sharded_collection"})
    Sample result:
    { "ok" : 1 }
  5. View the disk space occupied by the database after defragmentation.
    db.stats()
    Note This command can be used without any changes.

Defragment the disk of a sharded cluster instance

For a sharded cluster instance, you can defragment the disk of the shard node. The mongos and Configserver nodes do not store user data. In addition, more add and update operations and less delete operations are performed. You do not need to defragment the disks of the mongos and Configserver nodes.
Note The compact command is not supported on the read-only nodes of a sharded cluster instance Therefore, the disks of read-only nodes cannot be defragmented.
  1. Use the mongo shell to connect to a sharded cluster instance. For more information, see Connect to a sharded cluster ApsaraDB for MongoDB instance by using the mongo shell.
  2. Switch to the database where the collection is stored.
    Syntax:
    use <database_name>
    <database_name>: the name of the database to which the collection belongs.
    Note You can run the show dbs command to query the name of the current database.

    Example:

    Switch to the sharded_database database.
    use sharded_database
  3. View the disk space occupied by the database before defragmentation.
    db.stats()
    Note This command can be used without any changes.
  4. Defragment the disk for the collection.

    You must defragment the disks of the primary and secondary nodes in the shard node.

    • Defragment the disk of the primary node in the shard node.
      Syntax:
      db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>",force:true}})
      Parameters:
      • <Shard ID>: the ID of the shard node.
        Note You can log on to the ApsaraDB for MongoDB console and view the ID of the shard node in the Shard List section on the Basic Information page.
      • <collection_name>: the name of the collection.
        Note You can run the show tables command to query the name of the current collection.
      • force: Optional. Set the value to true.

        This parameter is required if you run the command on a sharded cluster instance that runs MongoDB 4.2 or earlier.

      Example:
      db.runCommand({runCommandOnShard:"shard01","command":{compact:"sharded_collection",force:true}})
    • Defragment the disk of the secondary node in the shard node.
      Syntax:
      db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>"},$queryOptions: {$readPreference: {mode: 'secondary'}}})
      Parameters:
      • <Shard ID>: the ID of the shard node.
        Note You can log on to the ApsaraDB for MongoDB console and view the ID of the shard node in the Shard List section on the Basic Information page.
      • <collection_name>: the name of the collection.
        Note You can run the show tables command to query the name of the current collection.
      • force: Optional. Set the value to true.
      Example:
      db.runCommand({runCommandOnShard:"shard01","command":{compact:"sharded_collection"},$queryOptions: {$readPreference: {mode: 'secondary'}}})
  5. View the disk space occupied by the database after defragmentation.
    db.stats()
    Note This command can be used without any changes.