All Products
Search
Document Center

ApsaraDB for MongoDB:Troubleshoot high IOPS usage (%) on an ApsaraDB for MongoDB instance

Last Updated:Jun 29, 2023

The input/output operations per second (IOPS) usage on an ApsaraDB for MongoDB instance is a key metric. If the IOPS usage (%) reaches or is close to 100% on an instance, the instance may respond slowly and even become unavailable. This topic describes how to view IOPS usage (%) and troubleshoot high IOPS usage (%) on an ApsaraDB for MongoDB instance.

Background information

Typically, to prevent hosts from competing for I/O resources, most cloud database providers use techniques such as control groups (cgroups) to isolate I/O resources and limit IOPS. The upper limit of IOPS on instances varies based on instance specifications.

Usage notes

The IOPS Usage and IOPS Usage (%) metrics of the following instances cannot be displayed in the ApsaraDB for MongoDB console: standalone instances, replica set instances that run MongoDB 4.2 and use cloud disks, and sharded cluster instances that run MongoDB 4.2 and use cloud disks.

The IOPS Usage and IOPS Usage (%) metrics of the preceding instances are displayed as 0 on the Monitoring Data page in the console, which do not indicate the actual IOPS usage.

View IOPS usage (%)

View IOPS usage (%) in monitoring charts

  • Log on to the ApsaraDB for MongoDB console. In the Specification Information section on the Basic Information page of an instance, check the maximum IOPS of the instance. For more information about the upper limit of IOPS on instances of different instance specifications, see Instance types.

  • Log on to the ApsaraDB for MongoDB console. On the Monitoring Data page, determine the maximum IOPS based on the IOPS Usage and IOPS Usage (%) metrics. In most cases, the data files and log files of an ApsaraDB for MongoDB instance are stored in the same disk. The IOPS usage of data files is the same as that of log files.

Common causes of I/O issues

I/O issues may be caused by the following common reasons:

  • The memory is insufficient. I/O issues are closely related to the cache size in memory. Larger cache size allows larger volumes of hot cache data. The fewer disk I/O resources required by the system, the lower the probability of I/O bottlenecks. Smaller cache size allows less hot cache data. The system frequently flushes dirty pages to disks, which results in a higher probability of I/O bottlenecks.

  • Issues exist for parameters and configurations related to disk I/O. For example, journal logs and runtime logs frequently refresh, the WriteConcern parameter is improperly configured, or the moveChunk operation on a sharded cluster instance is invalid.

    For more information about journal files, see Journaling.

    For more information about WriteConcern, see Write Concern.

Optimization strategies for I/O issues

We recommend that you select appropriate specifications for ApsaraDB for MongoDB instances and optimize indexing and writing policies of some application systems.

  • Configure appropriate specifications for ApsaraDB for MongoDB instances

    The ratio of hot data size to cache size is difficult to predict. In most cases, the maximum CPU utilization and IOPS usage (%) per day are both within 50%.

  • Optimize indexes

    If queries cause full table scans or use improper indexes, high IOPS usage (%) occurs. For example, large amounts of I/O resources are consumed to export full table data. A large number of indexes generate large volumes of data, which results in the reduction of hot data in the WiredTiger cache. Business data writes require more than one I/O operation to update indexes, which affects I/O performance. In this case, we recommend that you create appropriate indexes. For more information, see Indexes.

  • Optimize business architecture and O&M

    Take the following optimization measures to prevent disk I/O from becoming a bottleneck of the business architecture:

    • Control the number of concurrent write/read threads

      MongoDB is a multi-threaded application. Fast concurrent writes and complex queries may cause IOPS bottlenecks and even lead to continuous latency on secondary nodes. If I/O bottlenecks are caused by the large volumes of written data, we recommend that you upgrade your instance to an ApsaraDB for MongoDB sharded cluster instance. This way, data is horizontally split by shards to linearly scale out the write performance of the instance.

    • Write data during off-peak hours

      Regular writes or batch data persistence may cause maximum IOPS. In this case, if the current instance configurations do not meet the requirements of peak writes, we recommend that you upgrade the configurations to smoothly write business data. For example, add a random timestamp for each batch write operation.IOPS峰值数据

    • Perform O&M operations during off-peak hours

      O&M operations that heavily affect performance may cause peak IOPS. We recommend that you perform O&M operations during off-peak hours. Peak IOPS may be caused when you batch write, update, or delete data, add indexes, perform compact operations on collections, or batch export data.