When disk utilization on an ApsaraDB for MongoDB instance reaches 100%, the instance becomes unavailable and writes are blocked. Act when utilization exceeds 80%: either reduce disk usage or expand storage before the instance is taken offline.
This topic explains how to identify what is consuming disk space and how to resolve the most common causes of high disk utilization.
Check disk usage
Replica set instances
Open the ApsaraDB for MongoDB console and use one of the following methods.
Overview
Go to Basic Information and find the Specification Information section. The Disk Space and Utilization fields show current usage at a glance.
Monitoring charts
In the left-side navigation pane, click Monitoring Data. Select a node to view the Disk Usage (Bytes) and Disk Usage (%) metrics.
A replica set instance includes a primary node (read/write), one or more high-availability secondary nodes, a hidden node, and optional read-only nodes. Disk space on each node follows this formula:
ins_size = data_size + log_size| Component | Contents |
|---|---|
data_size | Physical data files (names start with collection), index files (names start with index), and metadata files such as WiredTiger.wt. Excludes data in the local database. |
log_size | Physical size of the local database, MongoDB runtime logs, and some audit logs. |
Detailed analysis
For a breakdown by collection, use MongoDB commands or CloudDBA:
Run
db.stats()to see database-level storage stats. Rundb.$collection_name.stats()for collection-level detail, including index size, data size, compression ratio, and average document size. See the MongoDB reference for db.stats(), db.collection.stats(), db.collection.storageSize(), db.collection.totalIndexSize(), and db.collection.totalSize().Go to CloudDBA > Storage Analysis. The Storage Analysis page shows disk usage by database and collection, average daily growth, predicted days until storage is exhausted, and details of anomalous collections.
Sharded cluster instances
Open the ApsaraDB for MongoDB console and use one of the following methods.
Monitoring charts
On the Monitoring Data page, select a node to view the Disk Usage (Bytes) and Disk Usage (%) metrics for that node.
Commands
Run db.stats() and db.$collection_name.stats() on each node to analyze disk usage per shard.
Common causes and resolutions
Disk fragmentation after compact operations
Running db.runCommand({compact:"collectionName"}) reclaims fragmented space but temporarily inflates disk usage during execution. If a collection has accumulated significant fragmentation, compact is the right tool—run it on a secondary node first, then trigger a primary/secondary switchover to minimize impact on your application.
Resolve
Run compact on a secondary node first, then trigger a primary/secondary switchover to minimize impact on your application:
db.runCommand({compact: "<collectionName>"})Replace <collectionName> with the actual collection name. For large collections, run compact during off-peak hours.
For step-by-step instructions, see Defragment the disks of an instance to increase disk utilization.
Verify
After compact completes, re-run db.$collection_name.stats() and confirm that storageSize has decreased. You can also check Disk Usage (Bytes) in Monitoring Data to confirm the reduction.
Excessive log space usage
Journals growing without bound (MongoDB earlier than 4.0)
In MongoDB versions earlier than 4.0, if the number of open files on the host reaches the system limit, the cleaner threads on the MongoDB log server exit silently. Journal files then grow without limit.
Look for entries like the following in the instance's runtime logs:
2019-08-25T09:45:16.867+0800 I NETWORK [thread1] Listener: accept() returns -1 Too many open files in system
2019-08-25T09:45:17.000+0800 I - [ftdc] Assertion: 13538:couldn't open [/proc/55692/stat] Too many open files in system src/mongo/util/processinfo_linux.cpp 74
2019-08-25T09:45:17.002+0800 W FTDC [ftdc] Uncaught exception in 'Location13538: couldn't open [/proc/55692/stat] Too many open files in system' in full-time diagnostic data capture subsystem. Shutting down the full-time diagnostic data capture subsystem.Resolve
Upgrade MongoDB to 4.0 or later. As a temporary measure, restart the mongod process. See the upstream bug report: WT-4083.
Verify
After upgrading or restarting, confirm that journal files are no longer growing by checking Disk Usage (Bytes) in Monitoring Data over a 10–15 minute window.
Oplog consuming growing space after replication lag or physical backup
Two scenarios cause oplog space to expand and not shrink automatically:
Replication lag: When secondary nodes fall behind, the available oplog space is no longer capped by the fixed collection size in the configuration file. It can reach up to 20% of the disk space provisioned for the instance. After the lag clears, the physical space is not automatically released.
Physical backup on a hidden node: A large number of checkpoints are generated during backup, producing substantial log data.
Resolve
Run compact on the oplog collection:
All write operations are blocked during the compact operation.
db.grantRolesToUser("root", [{db: "local", role: "dbAdmin"}])
use local
db.runCommand({ compact: "oplog.rs", force: true })Verify
After the operation, check Disk Usage (Bytes) in Monitoring Data to confirm that log space has decreased on the affected node.
Uneven disk usage across shards
Poor shard key choice: low cardinality
If most data ends up in a small number of chunks—while chunk count across shards stays balanced—the root cause is a low-cardinality shard key.
When a shard key has very few distinct values, the balancer can split and migrate chunks but cannot split a chunk whose documents all share the same key value. The result: chunk counts look balanced, but data sizes are heavily skewed.
Look for warnings in the output of sh.status():
2019-08-27T13:31:22.076+0800 W SHARDING [conn12681919] possible low cardinality key detected in superHotItemPool.haodanku_all - key is { batch: "201908260000" }
2019-08-27T13:31:22.076+0800 W SHARDING [conn12681919] possible low cardinality key detected in superHotItemPool.haodanku_all - key is { batch: "201908260200" }
2019-08-27T13:31:22.076+0800 W SHARDING [conn12681919] possible low cardinality key detected in superHotItemPool.haodanku_all - key is { batch: "201908260230" }Resolve
Redesign the shard key using a field with high cardinality. Consider hashed sharding, which distributes data evenly by applying a hash function to shard key values. Ranged sharding distributes data by value range, which tends to concentrate writes on a single chunk. See shard key concepts, hashed sharding, and ranged sharding.
When a chunk reaches 64 MB, MongoDB creates a new empty chunk so migration can continue. If chunks are balanced but data sizes differ greatly across shards, a low-cardinality shard key is the likely cause.
Unsharded databases creating a jumbo shard
Data in an unsharded database is stored entirely on one shard. If that database is large, one shard ends up holding significantly more data than the others. The same situation can occur when data is imported into a sharded cluster instance that was not sharded before the import.
Resolve
Choose the appropriate action based on your situation:
| Situation | Action |
|---|---|
| Import not yet started | Shard the destination instance before importing data |
| Multiple unsharded databases with similar sizes | Run the movePrimary command to distribute each database to a different shard |
| A single large unsharded database | Shard the database, or migrate it to a dedicated replica set instance |
| Disk space is sufficient | No action required |
For more on how chunks are partitioned and split, see Data partitioning with chunks and Split chunks in a sharded cluster.
Disk fragmentation from moveChunk operations
When the balancer migrates a chunk, it removes source documents after writing them to the destination. By default, that removal does not release the physical disk space—the data files and index files for the WiredTiger engine retain the space until explicitly reclaimed. This is common when sharding is added to an instance that has been running for some time.
Resolve
Run compact on each affected shard to reclaim fragmented space:
db.runCommand({compact: "<collectionName>"})See Migrate ranges in a sharded cluster and Manage sharded cluster balancer for context on moveChunk behavior.
Verify
After compact completes on each shard, compare the Disk Usage (Bytes) values across shards in Monitoring Data to confirm the distribution is more even.
What's next
If disk usage continues to grow after resolving the immediate cause, expand the storage space of the instance from the ApsaraDB for MongoDB console.
Review your shard key design to prevent data skew from recurring.
Schedule compact operations periodically during off-peak hours to keep fragmentation in check.