All Products
Search
Document Center

ApsaraDB for MongoDB:Defragment a disk to improve disk usage

Last Updated:Mar 15, 2024

After data is deleted from an ApsaraDB for MongoDB instance, the storage space used by the deleted data is marked as free storage space. New written data may be directly stored in the free storage space, or stored in the end of files after the storage space of the files is expanded. As a result, a part of the free storage space is not used. Such unused free storage space constitutes disk fragments. More disk fragments lowers disk usage.

You can use the storage analysis feature to defragment a disk for an ApsaraDB for MongoDB instance. You can also use the feature to view the storage overview, storage trends, exceptions, and data space of the instance. For more information, see Space analysis.

Prerequisites

The storage engine of the instance is WiredTiger.

Background information

When you run the db.runCommand({collStats: <collection_name>}) command to access a node, the following keywords are returned: size and storageSize. size indicates the logical storage size of the collection. storageSize indicates the physical storage size of the collection. If you run the remove command to delete documents, the size value decreases, but the storageSize value does not necessarily decrease. If the storageSize keyword value is greater than the size value, disk fragments are generated.

Note
  • For ApsaraDB for MongoDB instances that run MongoDB 4.4 or later, you can also use the freeStorageSize keyword in the results to view the disk space that is free and can be reclaimed in disk fragments.

  • For more information about the size, storageSize, and freeStorageSize keywords, see collStats-Output.

The compact command can be used in ApsaraDB for MongoDB to defragment data. You can run the compact command to reclaim disk fragments that are generated after data is deleted to improve disk usage. For more information about the compact command, see compact.

Usage notes

  • Before you defragment a disk, we recommend that you first back up data. For more information about how to back up data, see Manually back up data of an ApsaraDB for MongoDB instance.

  • If you run the compact command on an instance that runs a MongoDB version earlier than MongoDB 4.4, the database of a collection in the instance is locked and the read and write operations on the database are blocked. We recommend that you run this command during off-peak hours. For more information, see MongoDB documentation.

    Note

    The time required to defragment a disk by running the compact command depends on factors such as the amount of data in the collection and the system load.

  • In instances that run MongoDB 3.4 or MongoDB 4.2 whose kernel version is 4.0.22 and earlier, or those that run MongoDB 4.4 whose kernel version is 5.0.6 and earlier, a node on which the compact command is running is in the RECOVERING state. If the node remains this state for a long time, the node is identified by the instance detection component as an unhealthy node. This triggers rebuilding operations. In instances that run a MongoDB version later than the preceding versions, a node on which the compact command is running remains in the SECONDARY state. This does not trigger rebuilding operations. For more information, see MongoDB documentation. For more information about MongoDB versions, see Release notes for the minor versions of ApsaraDB for MongoDB.

  • The compact command cannot be executed in the following scenarios. For more information, see Open source code.

    • The size of a physical collection is less than 1 MB.

    • Among the first 80% of the file storage space, the free storage space is less than 20%. Among the first 90% of the file storage space, the free storage space is less than 10%.

  • If the compact command is being executed, the released storage space may be smaller than the free storage space. In this case, you can run the compact command repeatedly to defragment a disk. However, we recommend that you do not run the compact command at a high frequency.

  • The compact command can be executed when an instance is locked due to a full disk space.

Estimate disk space to be reclaimed

  1. Use the mongo shell to connect to an ApsaraDB for MongoDB instance. Connection methods vary based on the instance architecture. For more information, see the following topics:

  2. Switch to the database where the collection is stored.

    Syntax:

    use <database_name>

    <database_name>: the name of the database where the collection is stored.

    Note

    You can run the show dbs command to query the name of the current database.

    Example:

    Switch to the test_database database.

    use test_database
  3. View the disk space to be reclaimed for the collection.

    Syntax:

    db.<collection_name>.stats().wiredTiger["block-manager"]["file bytes available for reuse"]

    <collection_name>: the name of the collection.

    Note

    You can run the show tables command to query the name of the current collection.

    Example:

    db.test_database_collection.stats().wiredTiger["block-manager"]["file bytes available for reuse"]

    The following output is returned:

    207806464

    This output indicates that the estimated disk space to be reclaimed is 207,806,464 bytes.

Defragment a disk of a standalone or replica set instance

  • A standalone instance has only one node. You can connect to the primary node and run the compact command to defragment a disk of the primary node.

  • A replica set instance has multiple nodes. You must connect to the primary node and the secondary node. Run the compact command on different nodes to defragment the disks of the nodes.

    Note

    If a replica set instance has a read-only node, you must defragment the disk of the read-only node by using the method similar to the method to defragment the disks of the primary and secondary nodes.

  1. Connect to a standalone or replica set instance by using the mongo shell. Connection methods vary based on the instance architecture. For more information, see the following topics:

  2. Switch to the database where the collection is stored.

    Syntax:

    use <database_name>

    <database_name>: the name of the database where the collection is stored.

    Note

    You can run the show dbs command to query the name of the current database.

    Example:

    Switch to the replica_database database.

    use replica_database
  3. View the disk space occupied by the database before defragmentation.

    db.stats()
    Note

    This command can be used without any changes.

  4. Defragment a disk of the collection.

    Syntax:

    db.runCommand({compact:"<collection_name>",force:true})

    • <collection_name>: the name of the collection.

      Note

      You can run the show tables command to query the name of the current collection.

    • force: Optional. Set the value only to true.

      This parameter is required if you run the command on the primary node of an ApsaraDB for MongoDB instance that runs MongoDB 4.2 or earlier.

    Example:

    db.runCommand({compact:"sharded_collection"})

    The following output is returned:

    { "ok" : 1 }
  5. View the disk space occupied by the database after defragmentation.

    db.stats()
    Note

    This command can be used without any changes.

Defragment a disk of a sharded cluster instance

For a sharded cluster instance, you can defragment a disk of the shard node. The mongos and Configserver nodes do not store user data. In addition, more add and update operations and less delete operations are performed. You do not need to defragment the disks of the mongos and Configserver nodes.

Note

The compact command is not supported on the read-only nodes of a sharded cluster instance Therefore, the disks of read-only nodes cannot be defragmented.

  1. Use the mongo shell to connect to a sharded cluster instance. For more information, see Connect to a sharded cluster ApsaraDB for MongoDB instance by using the mongo shell.

  2. Switch to the database where the collection is stored.

    Syntax:

    use <database_name>

    <database_name>: the name of the database where the collection is stored.

    Note

    You can run the show dbs command to query the name of the current database.

    Example:

    Switch to the sharded_database database.

    use sharded_database
  3. View the disk space occupied by the database before defragmentation.

    db.stats()
    Note

    This command can be used without any changes.

  4. Defragment a disk of the collection.

    You must defragment the disks of the primary and secondary nodes in the shard node.

    • Defragment a disk of the primary node in the shard node.

      Syntax:

      db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>",force:true}})

      • <Shard ID>: the ID of the shard node.

        Note

        You can log on to the ApsaraDB for MongoDB console and view the ID of the shard node in the Shard List section on the Basic Information page.

      • <collection_name>: the name of the collection.

        Note

        You can run the show tables command to query the name of the current collection.

      • force: Optional. Set the value only to true.

        This parameter is required if you run the command on a sharded cluster instance that runs MongoDB 4.2 or earlier.

      Example:

      db.runCommand({runCommandOnShard:"shard01","command":{compact:"sharded_collection",force:true}})
    • Defragment a disk of the secondary node in the shard node.

      Syntax:

      db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>"},$queryOptions: {$readPreference: {mode: 'secondary'}}})

      • <Shard ID>: the ID of the shard node.

        Note

        You can log on to the ApsaraDB for MongoDB console and view the ID of the shard node in the Shard List section on the Basic Information page.

      • <collection_name>: the name of the collection.

        Note

        You can run the show tables command to query the name of the current collection.

      Example:

      db.runCommand({runCommandOnShard:"shard01","command":{compact:"sharded_collection"},$queryOptions: {$readPreference: {mode: 'secondary'}}})
  5. View the disk space occupied by the database after defragmentation.

    db.stats()
    Note

    This command can be used without any changes.