All Products
Search
Document Center

ApsaraDB for MongoDB:Defragment a disk of an ApsaraDB for MongoDB instance to increase disk utilization

Last Updated:Sep 20, 2024

After data is deleted from an ApsaraDB for MongoDB instance, the storage used by the deleted data is marked as free storage space. Newly written data may be directly stored in the free storage, or stored in the end of files after the storage of the files is expanded. As a result, a part of the free storage is not used. Such unused free storage constitutes disk fragments. More disk fragments lowers disk utilization.

You can use the storage analysis feature to defragment a disk of an ApsaraDB for MongoDB instance. You can also use the feature to view the storage overview, storage trends, exceptions, and data space of the instance. For more information, see Space analysis.

Prerequisites

The storage engine of the instance is WiredTiger.

Background information

When you run the db.runCommand({collStats: <collection_name>}) command to access a node, the following keywords are returned: size and storageSize. size indicates the logical storage size of a collection. storageSize indicates the physical storage size of the collection. If you run the remove command to delete documents, the size value decreases, but the storageSize value does not necessarily decrease. If the storageSize value is greater than the size value, disk fragments are generated.

Note
  • For ApsaraDB for MongoDB instances that run MongoDB 4.4 or later, you can also use the freeStorageSize keyword in the results to view the disk space that are idle and can be reclaimed in disk fragments.

  • For more information about the size, storageSize, and freeStorageSize keywords, see collStats-Output.

The compact command can be used in ApsaraDB for MongoDB to defragment data. You can run the compact command to reclaim disk fragments that are generated after data is deleted to increase disk utilization. For more information about the compact command, see compact.

Usage notes

  • Before you defragment a disk of an ApsaraDB for MongoDB, we recommend that you first back up data. For more information about how to back up the data of the instance, see Manually back up data of an ApsaraDB for MongoDB instance.

  • If you run the compact command on collections in an ApsaraDB for MongoDB instance that runs a version earlier than MongoDB 4.4, the database to which the collections belong are locked, and read/write operations performed on the database are blocked. We recommend that you run the command during off-peak hours. For more information, see compact.

    Note

    The time required to defragment a disk of the instance by running the compact command depends on factors such as the amount of collection data and the system load.

  • A node on which the compact command is running in an instance that runs MongoDB 3.4.x, MongoDB 4.0.x, MongoDB 4.0.22 or earlier, or MongoDB 5.0.6 or earlier enters the RECOVERING state. If the node remains this state for a long period of time, the node is identified by the instance detection component as an unhealthy node. This triggers rebuilding operations. In instances that run a MongoDB version later than the preceding versions, a node on which the compact command is running remains in the SECONDARY state. This does not trigger rebuilding operations. For more information, see MongoDB documentation. For more information about MongoDB versions, see Release notes for the minor versions of ApsaraDB for MongoDB.

  • The compact command cannot be executed in the following scenarios. For more information, see Open source code.

    • The size of a physical collection is less than 1 MB.

    • Among the first 80% of the file storage, the free storage is less than 20%. Among the first 90% of the file storage, the free storage is less than 10%.

  • If the compact command is being executed, the released storage of an ApsaraDB for MongoDB instance may be smaller than the free storage. In this case, you can run the compact command repeatedly to defragment a disk of the instance. However, we recommend that you do not run the compact command at a high frequency.

  • The compact command can be executed when an ApsaraDB for MongoDB instance is locked due to a full disk space.

Estimate disk space to be reclaimed

  1. Use the mongo shell to connect to an ApsaraDB for MongoDB instance. Connection methods vary based on the instance architecture. For more information, see the following topics:

  2. Switch to the database where a specified collection is stored.

    Syntax:

    use <database_name>

    Parameter in the following command: <database_name>: the name of the database to which the collection belongs.

    Note

    You can run the show dbs command to query the name of the current database.

    Example:

    Switch to the test_database database.

    use test_database
  3. View the disk space to be reclaimed for the collection.

    Syntax:

    db.<collection_name>.stats().wiredTiger["block-manager"]["file bytes available for reuse"]

    Parameter in the following command: <collection_name> the name of the collection.

    Note

    You can run the show tables command to query the name of the current collection.

    Example:

    db.test_database_collection.stats().wiredTiger["block-manager"]["file bytes available for reuse"]

    The following result is returned.

    207806464

    This result indicates that the estimated disk space to be reclaimed is 207,806,464 bytes.

Defragment a disk of a standalone or replica set instance

  • A standalone instance has only one node. You can connect to the primary node and run the compact command to defragment a disk of the primary node.

  • A replica set instance has multiple nodes. You must connect to the primary and secondary nodes. Run the compact command on different nodes to defragment the disks of the nodes.

    Note

    If a replica set instance has a read-only node, you must defragment a disk of the read-only node by using the method similar to the method to defragment the disks of the primary and secondary nodes.

  1. Connect to a standalone or replica set instance by using the mongo shell. Connection methods vary based on the instance architecture. For more information, see the following topics:

  2. Switch to the database where a specified collection is stored.

    Syntax:

    use <database_name>

    Parameter in the following command: <database_name>: the name of the database to which the collection belongs.

    Note

    You can run the show dbs command to query the name of the current database.

    Example:

    Switch to the replica_database database.

    use replica_database
  3. View the disk space occupied by the database before defragmentation.

    db.stats()
    Note

    This command can be used without any changes.

  4. Defragment a disk of the collection.

    Syntax:

    db.runCommand({compact:"<collection_name>",force:true})

    Parameters in the preceding command:

    • <collection_name>: the name of the collection.

      Note

      You can run the show tables command to query the name of the current collection.

    • force: Optional. Set the value to true.

      This parameter is required if you run the command on the primary node of an ApsaraDB for MongoDB instance that runs MongoDB 4.2 or earlier.

    Example:

    db.runCommand({compact:"sharded_collection"})

    The following result is returned:

    { "ok" : 1 }
  5. View the disk space occupied by the database after defragmentation.

    db.stats()
    Note

    This command can be used without any changes.

Defragment a disk of a sharded cluster instance

For a sharded cluster instance, you need only to defragment a disk of a shard component. The mongos and ConfigServer components in the instance do not store user data. In addition, more add and update operations and less delete operations are performed. Therefore, you do not need to defragment the disks of the mongos and ConfigServer components.

Note

The compact command is not supported on the read-only nodes of a sharded cluster instance Therefore, the disks of read-only nodes cannot be defragmented.

  1. Use the mongo shell to connect to a sharded cluster instance. For more information, see Connect to a sharded cluster ApsaraDB for MongoDB instance by using the mongo shell.

  2. Switch to the database where a specified collection is stored.

    Syntax:

    use <database_name>

    Parameter in the preceding command: <database_name>: the name of the database to which the collection belongs.

    Note

    You can run the show dbs command to query the name of the current database.

    Example:

    Switch to the sharded_database database.

    use sharded_database
  3. View the disk space occupied by the database before defragmentation.

    db.stats()
    Note

    This command can be used without any changes.

  4. Defragment a disk of the collection.

    You must defragment the disks of the primary and secondary nodes in a shard component.

    • Defragment a disk of the primary node in the shard component.

      Syntax:

      db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>",force:true}})

      Parameters in the preceding command:

      • <Shard ID>: the ID of the shard component.

        Note

        You can log on to the ApsaraDB for MongoDB console and view the ID of the shard component in the Shard List section on the Basic Information page.

      • <collection_name>: the name of the collection.

        Note

        You can run the show tables command to query the name of the current collection.

      • force: Optional. Set the value to true.

        This parameter is required if you run the command on a sharded cluster instance that runs MongoDB 4.2 or earlier.

      Example:

      db.runCommand({runCommandOnShard:"shard01","command":{compact:"sharded_collection",force:true}})
    • Defragment a disk of a secondary node in the shard component.

      This operation is performed in the mongo shell in a different manner from that in mongosh. Select the operations that suit your client.

      mongo shell

      Syntax:

      db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>"},$queryOptions: {$readPreference: {mode: 'secondary'}}})

      Parameters in the preceding command:

      • <Shard ID>: the ID of the shard component.

        Note

        You can log on to the ApsaraDB for MongoDB console and view the ID of the shard component in the Shard List section on the Basic Information page.

      • <collection_name>: the name of the collection.

        Note

        You can run the show tables command to query the name of the current collection.

      Example:

      db.runCommand({runCommandOnShard:"shard01","command":{compact:"sharded_collection"},$queryOptions: {$readPreference: {mode: 'secondary'}}})

      mongosh

      Note

      The runCommandOnShard command is not supported by mongosh of v2.x. Run this command in mongosh v1.x.

      Syntax:

      db.getMongo().setReadPref('secondary')
      db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>"}})

      Parameters in the preceding command:

      • <Shard ID>: the ID of the shard component.

        Note

        You can log on to the ApsaraDB for MongoDB console and view the ID of the shard component in the Shard List section on the Basic Information page.

      • <collection_name>: the name of the collection.

        Note

        You can run the show tables command to query the name of the current collection.

      Example:

      db.getMongo().setReadPref('secondary')
      db.runCommand({runCommandOnShard:"d-2ze91ae9d55d6604","command":{compact:"test"}})
  5. View the disk space occupied by the database after defragmentation.

    db.stats()
    Note

    This command can be used without any changes.