Data operations, such as inserts, updates, and deletions, generate disk fragments over time. This topic describes how to use the compact command to reclaim disk fragments and improve disk utilization. You can use this command to reclaim fragments from primary and secondary nodes.
Use the storage analysis feature to reclaim disk fragments. This method is easier to perform in the console and has a smaller impact on your services. Storage analysis reclaims fragments only from hidden nodes. To reclaim fragments from primary and secondary nodes, you must first perform a primary/secondary failover.
Only the following MongoDB versions support using the storage analysis feature to reclaim disk fragments:
MongoDB 8.0: all minor versions.
MongoDB 7.0: all minor versions.
MongoDB 6.0: all minor versions.
MongoDB 5.0: all minor versions.
MongoDB 4.4: 5.0.7 or later.
MongoDB 4.2: 4.0.23 or later.
Prerequisites
The instance must use the WiredTiger storage engine.
Usage notes
Data backup: Before you reclaim disk fragments, back up your database.
Impacts of the
compactcommand:The
<a data-link-href-cangjie="https://www.mongodb.com/docs/manual/reference/command/compact/#dbcmd.compact" href="https://www.mongodb.com/docs/manual/reference/command/compact/#dbcmd.compact" id="xref_8b5_xj7_jq4" rel="noopener noreferrer" target="_blank">compact</a>command compacts data and reclaims space from disk fragmentation.Read/write blocking and performance impact
In versions earlier than MongoDB 4.4, running the
compactcommand locks the database that contains the collection and blocks all read and write operations on that database. If there is excessive fragmentation, thecompactcommand can take a long time to run. This creates a risk of high replication delay on hidden nodes. We recommend that you perform this operation during off-peak hours. If necessary, you can increase the Oplog size based on your write workload or upgrade the major version of the database to MongoDB 4.4 or later.In MongoDB 4.4 and later, the
compactcommand does not block read and write operations. However, it can affect performance during execution. We recommend that you perform this operation during off-peak hours.
Node rebuilding
For instances running MongoDB 3.4, MongoDB 4.0, minor versions of MongoDB 4.2 up to 4.0.22, or minor versions of MongoDB 4.4 up to 5.0.6, a node that is running the
compactcommand enters the RECOVERING state. If the node remains in this state for too long, the instance health check component considers the node unhealthy and triggers a rebuild operation. For more information about MongoDB versions, see MongoDB minor version release notes.For MongoDB instances of later versions, a node that is running the
compactcommand remains in the SECONDARY state and does not trigger a rebuild operation.
Invalid
compactcommand:The
compactcommand might fail in the following scenarios. For more information, see block_compact.The physical size of the collection is less than 1 MB.
The fragmentation rate is less than 20%.
Less than 20% of the first 80% of the file's storage space is idle. Less than 10% of the first 90% of the file's storage space is idle.
Reclamation time: The time required to reclaim disk fragments by running the
compactcommand depends on factors such as the amount of data in the collection and the system load.Other notes:
When you run the
compactcommand, the released storage space might be smaller than the idle storage space. If this occurs, ensure that the previouscompactcommand is complete before you start the next one. Avoid running the command repeatedly.The
compactcommand can be run when an instance is locked because the disk is full.
Background information
Why are disk fragments generated?
Formation: When you delete data from an ApsaraDB for MongoDB instance, the storage space used by the deleted data is marked as idle. New data might be stored directly in this idle space. Alternatively, the storage file is extended, and the new data is stored at the end of the file. In these cases, some idle storage space is not reused. This unused idle space creates disk fragments.
Impact: Many disk fragments lowers the effective disk utilization. For example, if the disk size is 100 GB, fragments occupy 20 GB, and business data occupies 60 GB, the disk utilization of the database instance is 80%. However, the effective disk utilization is only 60%.
When to reclaim disk fragments
Deleting a large amount of data at once
After you delete a large amount of data, the disk space that the documents occupied is not immediately returned to the operating system. Instead, it is reserved for future writes. This can lead to a large amount of fragmented space on the disk that is not used effectively.
ImportantBoth manual data deletion and the automatic expiration (TTL) mechanism do not automatically trigger disk fragment reclamation. You must reclaim disk fragments manually.
Running a high write workload for a long time
If your instance runs a high write workload for a long time, such as frequent insert, update, and delete operations, the amount of fragmented space on the disk increases over time. This leads to the accumulation of many disk fragments.
Disk space is low and fragmentation exceeds 20%
When the disk space of your database instance is low, for example, when disk utilization reaches 85% to 90% or higher, reclaiming disk fragments can free up space occupied by fragments. This reduces disk utilization and eases pressure on disk space.
View disk storage space
View the storage status of a specific collection
You can run the db.runCommand({collStats: <collection_name>}) command to view the storage status of a specific collection. The following list describes some of the keywords:
size: The logical storage size of the collection.storageSize: The physical storage size of the collection.freeStorageSize: The size of the idle storage space in the disk fragments that can be reclaimed. This keyword is supported on ApsaraDB for MongoDB 4.4 and later.
After you run the remove command to delete documents, the value of size decreases, but the value of storageSize does not necessarily decrease. You can observe the ratio of freeStorageSize to storageSize. A higher ratio indicates a higher disk fragmentation rate.
For more information about the size, storageSize, and freeStorageSize keywords, see collStats-Output.
Estimate the disk fragment space to be reclaimed
Connect to the ApsaraDB for MongoDB instance using the mongo shell. For a replica set instance, connect to a secondary node to minimize the impact on your services. The connection methods for different instance types are as follows:
Switch to the database that contains the collection.
Syntax:
use <database_name>Parameters:
<database_name>is the name of the database that contains the collection.NoteYou can run the
show dbscommand to query existing databases.Example:
Switch to the test_database database.
use test_databaseView the disk fragment space to be reclaimed for the collection.
Syntax:
db.<collection_name>.stats().wiredTiger["block-manager"]["file bytes available for reuse"]Parameters:
<collection_name>is the name of the collection.NoteYou can run the
show tablescommand to query existing collections.Example:
db.test_database_collection.stats().wiredTiger["block-manager"]["file bytes available for reuse"]The result is as follows:
207806464This output indicates that the estimated reclaimable disk fragment space is 207,806,464 bytes.
Reclaim fragmented disk space
Standalone or replica set instances
A standalone instance has only one node. To reclaim disk fragments, connect to this node and run the
compactcommand.A replica set instance has multiple nodes. You must perform the reclamation operation on both the primary and secondary nodes.
ImportantTo minimize the impact on your services, first reclaim fragments from the secondary nodes. Then, perform a primary/secondary failover to switch the primary node to a secondary node and reclaim fragments from the new secondary node. For more information about how to perform a primary/secondary failover, see Configure a primary/secondary failover for a replica set instance.
If the replica set instance has read-only nodes, you must also reclaim disk fragments from the read-only nodes. The command to run is the same as the one for reclaiming fragments from primary and secondary nodes.
Connect to a standalone or replica set instance using the mongo shell. The connection methods for different instance types are as follows:
Switch to the database that contains the collection.
Syntax:
use <database_name>Parameters:
<database_name>is the name of the database that contains the collection.NoteYou can run the
show dbscommand to query existing databases.Example:
Switch to the replica_database database.
use replica_databaseView the disk space occupied by the database before you reclaim disk fragments.
db.stats()NoteYou can copy and run this command directly without modification.
Reclaim disk fragments from the collection.
Syntax:
db.runCommand({compact:"<collection_name>",force:true})Parameters:
<collection_name>: The name of the collection.NoteYou can run the
show tablescommand to query existing collections.force: Optional. The value is fixed to true.This parameter is required if you run this command on the primary node of an ApsaraDB for MongoDB instance that is version 4.2 or earlier.
Example:
db.runCommand({compact:"sharded_collection"})A successful operation returns the following result:
{ "ok" : 1 }View the disk space occupied by the database after you reclaim disk fragments.
db.stats()NoteYou can copy and run this command directly without modification.
Sharded cluster instances
For a sharded cluster instance, you need to reclaim disk fragments only from the corresponding nodes in the shard components. The Mongos and Configserver components do not store user data. They also have more insert and update operations and fewer delete operations. Therefore, you do not need to reclaim their disk fragments.
The compact command is not supported on the read-only nodes of a sharded cluster instance. Therefore, you cannot reclaim disk fragments from read-only nodes.
Connect to a sharded cluster instance using the mongo shell. For more information, see Connect to a MongoDB sharded cluster instance using the mongo shell.
Switch to the database that contains the collection.
Syntax:
use <database_name>Parameters:
<database_name>is the name of the database that contains the collection.NoteYou can run the
show dbscommand to query existing databases.Example:
Switch to the sharded_database database.
use sharded_databaseView the disk space occupied by the database before you reclaim disk fragments.
db.stats()NoteYou can copy and run this command directly without modification.
Reclaim disk fragments from the collection.
You must perform the reclamation operation on both the primary and secondary nodes in the shard component.
ImportantTo minimize the impact on your services, first reclaim fragments from the secondary nodes. Then, perform a primary/secondary failover to switch the primary node to a secondary node and reclaim fragments from the new secondary node. For more information about how to perform a primary/secondary failover, see Configure a primary/secondary failover for a sharded cluster instance.
Reclaim disk fragments from the secondary nodes in the shard component.
The execution of this operation differs between the mongo shell and mongosh. Choose the method that corresponds to your client.
NoteCompared with mongosh 1.x, mongosh 2.x adds support for setting the read preference parameter. For more information, see Read Preference.
mongo shell
Syntax:
db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>"},$queryOptions: {$readPreference: {mode: 'secondary'}}})Parameters:
<Shard ID>: The ID of the shard component.NoteLog on to the MongoDB console. On the Basic Information page for the target instance, find the shard ID in the Shard List section.
<collection_name>: The name of the collection.NoteYou can run the
show tablescommand to query existing collections.
Example:
db.runCommand({runCommandOnShard:"shard01","command":{compact:"sharded_collection"},$queryOptions: {$readPreference: {mode: 'secondary'}}})mongosh 1.x
Syntax:
db.getMongo().setReadPref('secondary') db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>"}})Parameters:
<Shard ID>: The ID of the shard component.NoteLog on to the MongoDB console. On the Basic Information page for the target instance, find the shard ID in the Shard List section.
<collection_name>: The name of the collection.NoteYou can run the
show tablescommand to query existing collections.
Example:
db.getMongo().setReadPref('secondary') db.runCommand({runCommandOnShard:"d-2ze91ae9d55d6604","command":{compact:"test"}})mongosh 2.x
Syntax:
db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>"}},{readPreference: "secondary"})Parameters:
<Shard ID>: The ID of the shard component.NoteLog on to the MongoDB console. On the Basic Information page for the target instance, find the shard ID in the Shard List section.
<collection_name>: The name of the collection.NoteYou can run the
show tablescommand to query existing collections.
Example:
db.runCommand({runCommandOnShard:"d-2ze657bce53fb6d4","command":{compact:"test_collection"}}, { readPreference: "secondary" })Reclaim disk fragments from the primary node in the shard component.
Syntax:
db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>",force:true}})Parameters:
<Shard ID>: The ID of the shard component.NoteLog on to the MongoDB console. On the Basic Information page for the target instance, find the shard ID in the Shard List section.
<collection_name>: The name of the collection.NoteYou can run the
show tablescommand to query existing collections.force: Optional. The value is fixed to true.This parameter is required if your sharded cluster instance is version 4.2 or earlier.
Example:
db.runCommand({runCommandOnShard:"shard01","command":{compact:"sharded_collection",force:true}})
View the disk space occupied by the database after you reclaim disk fragments.
db.stats()NoteYou can copy and run this command directly without modification.
FAQ
Q: The command fails with the error message "Compaction interrupted on table:xxx due to cache eviction pressure' on server xxx".
A: When you run the compact command on an instance with low specifications that runs an earlier version, the command may fail due to cache pressure. We recommend that you perform this operation during off-peak hours.