All Products
Search
Document Center

ApsaraDB for MongoDB:Troubleshoot high IOPS utilization issues on an instance

Last Updated:Mar 28, 2026

When input/output operations per second (IOPS) utilization reaches or approaches 100% on an ApsaraDB for MongoDB instance, the instance may respond slowly or become unavailable. This topic explains how to check IOPS utilization and resolve high IOPS issues.

Background information

To prevent hosts from competing for I/O resources, most cloud database providers use techniques such as control groups (cgroups) to isolate I/O resources and limit IOPS. The upper limit of IOPS varies based on instance specifications.

Monitoring limitations

The IOPS Usage and IOPS Usage (%) metrics cannot be displayed in the ApsaraDB for MongoDB console for the following instance types:

  • Standalone instances

  • Replica set instances that run MongoDB 4.2 and use cloud disks

  • Sharded cluster instances that run MongoDB 4.2 and use cloud disks

For these instances, both metrics appear as 0 on the Monitoring Data page. The value 0 does not reflect actual IOPS usage.

Check IOPS utilization

Use either of the following methods:

Common triggers

High IOPS is typically caused by one of the following:

  • Insufficient memory: A larger cache holds more hot data, reducing the disk I/O resources required and lowering the probability of I/O bottlenecks. A smaller cache holds less hot data. The system flushes dirty pages to disk more frequently, increasing I/O pressure and raising the probability of bottlenecks.

  • Parameter or configuration issues: Common misconfigurations include frequent refreshes of journal logs or runtime logs, an improperly configured Write Concern, or invalid moveChunk operations on a sharded cluster instance.

Fix the immediate problem

Take the following actions to reduce IOPS immediately:

Control concurrent write and read threads

MongoDB is a multi-threaded application. High volumes of concurrent writes and complex queries can cause IOPS bottlenecks and introduce continuous replication lag on secondary nodes. To horizontally scale out write throughput, upgrade to a sharded cluster instance, which distributes data across shards.

Schedule batch operations during off-peak hours

Regular batch writes or bulk data persistence can spike IOPS to the instance maximum. If peak write loads exceed current instance capacity, upgrade the instance configurations to meet peak write requirements. To smooth the write pattern and avoid simultaneous bursts, add a random timestamp offset to each batch write operation.

IOPS peak data chart

Schedule O&M operations during off-peak hours

Operations and maintenance (O&M) tasks that heavily affect I/O performance include batch writes, updates, or deletes, adding indexes, running compact operations on collections, and batch data exports. Perform these during off-peak hours to avoid competing with production workloads.

Implement a long-term solution

Address the root cause with these strategies:

Right-size your instance

Size your instance so that daily peak CPU and IOPS utilization both stay within 50%. The ratio of hot data to cache size is difficult to predict in advance.

Optimize indexes

Full table scans and poorly chosen indexes are common sources of high IOPS. Specific causes include:

  • Queries that scan entire collections consume large volumes of I/O.

  • Oversized indexes reduce the amount of hot data the WiredTiger cache can hold.

  • Each write operation requires more than one I/O to update associated indexes.

Create appropriate indexes to reduce unnecessary I/O and keep frequently accessed data in cache.