Diagnose and resolve high CPU utilization - ApsaraDB for MongoDB

High CPU utilization on an ApsaraDB for MongoDB instance can slow queries and, if left unaddressed, cause the instance to stop serving requests. This topic explains the common causes and gives you a step-by-step path from immediate relief to permanent fix.

How CPU utilization is measured

Monitor CPU utilization from the Monitoring Data page in the ApsaraDB for MongoDB console. For details on the monitoring interval and how to navigate to the page, see Basic monitoring.

Each node type in an instance has its own CPU utilization metric:

Replica set instances — a primary node, one or more secondary nodes, a hidden node, and optional read-only nodes.
Sharded cluster instances — one or more shard components, a ConfigServer component, and one or more mongos components. The CPU behavior of a shard component is the same as a replica set instance. ConfigServer stores only configuration metadata and is not a CPU bottleneck in most cases. The CPU utilization of mongos scales with aggregation result-set size and the number of concurrent requests.

CPU utilization is always shown as a percentage of the instance's total cores. An 8-core instance running at 100% means all 8 cores are exhausted — the metric never exceeds 100%.

Common causes

Too many documents scanned

ApsaraDB for MongoDB uses multi-threading. When a single query scans a large number of documents, its thread holds CPU resources for a longer time. At high concurrency, this accumulates and drives overall CPU utilization up. The total CPU load on an instance correlates directly with the total number of documents scanned across all queries.

Two patterns cause excessive document scans:

Full collection scans (COLLSCAN)

A COLLSCAN in a slow query log or in the system.profile collection means the query read every document in a collection. The system.profile collection is only created when you enable database profiling. See Explain Results and Cursor Methods for how to interpret query execution plans.

Inefficient index use (high `docsExamined`)

When docsExamined exceeds 1,000 in a frequently executed query, the query warrants attention. Common causes:

Multiple filter conditions without a compound index, or without satisfying the index prefix rule.
Complex queries or heavy aggregation pipelines that prevent effective index use.
A field with skewed data distribution (low selectivity) used as a query filter.

High concurrency

When the number of concurrent requests is genuinely high and no query issues are present, additional CPU capacity is needed. See Add CPU capacity.

Other causes

Short-lived connection storms

In versions later than MongoDB 3.X, the default authentication mechanism is SCRAM-SHA-1, which requires CPU-intensive hash computation per connection. When many short-lived connections are established simultaneously, the hash overhead multiplies and can exhaust all CPU resources. Operational logs contain many saslStart error messages in this scenario. ApsaraDB for MongoDB reduces this overhead at the kernel layer by optimizing its built-in random functions.

TTL index oplog replay on secondary nodes

When you use time-to-live (TTL) indexes to expire data, MongoDB transforms the resulting deletes into multiple oplog entries. Secondary nodes replay these oplogs using multi-threaded replication (controlled by the replWriterThreadCount parameter, which defaults to 16 in MongoDB 3.2 and later). Secondary nodes do not handle business-critical write workloads. However, oplog replay is less efficient than the original write, so a secondary node's CPU can run higher than the primary node's. In this case, we recommend that you ignore the high CPU utilization of the node.

Troubleshoot high CPU utilization

Use the following table to choose the right tool for your situation.

Goal	Tool
Stop ongoing slow queries immediately	CloudDBA > Sessions in the console, or `db.killOp()`
Identify slow queries after the fact	Logs > Slow Query Logs
Audit all requests (including non-slow ones)	Data Security > Audit Logs

Stop active sessions

If CPU utilization is near 100%, terminate the sessions causing the spike before investigating the root cause.

Option 1 — ApsaraDB for MongoDB console (recommended)

In the ApsaraDB for MongoDB console, click the instance ID.
In the left-side navigation pane, choose CloudDBA > Sessions.
Review the active sessions, identify operations that have run longer than expected, and terminate them.

The Sessions page lets you view and kill operations in the same interface, without needing shell access. This is the fastest option during a CPU spike.

Option 2 — MongoDB shell

Run db.currentOp() to list active operations, then run db.killOp() to terminate a specific operation. For syntax details, see db.currentOp() and db.killOp().

Analyze logs to find the root cause

After stopping the immediate spike, identify the queries that triggered it.

Audit logs

In the ApsaraDB for MongoDB console, choose Data Security > Audit Logs to enable audit logging. For setup instructions, see Enable the audit log feature.

Slow query logs

Important

Slow query logs are retained for seven days. If your instance was purchased after June 6, 2021, enable the audit log feature and select the admin and slow operation types before you can view slow query logs. Only logs generated after you enable the feature are available.

In the console, choose Parameters > Parameter List and configure the following parameters: Queries that exceed the threshold are recorded in the system.profile collection. For parameter details, see Database Profiler.

Parameter	Description
`operationProfiling.mode`	Sets the profiling level: disabled (no data collected), all requests (data recorded in system.profile), or slow queries only (queries exceeding the threshold recorded in system.profile)
`operationProfiling.slowOpThresholdMs`	Sets the threshold in milliseconds above which a query is classified as slow

Choose Logs > Slow Query Logs to view the logs.

In the logs, look for:

COLLSCAN — indicates a full collection scan.
docsExamined values that are much higher than expected — indicates low index efficiency.

Fix high CPU utilization

Optimize indexes

Index optimization is the most effective way to reduce documents scanned per query. Start with the queries that show COLLSCAN or high docsExamined values.

Resources for index optimization:

Best practices for creating indexes in ApsaraDB for MongoDB
Compound Indexes — for queries with multiple filter conditions
Use Indexes to Sort Query Results
cursor.hint() — to force a specific index
Create Queries that Ensure Selectivity — for fields with skewed data distribution

If indexing alone does not help, limit the data volume in the affected collections or reduce how frequently the full collection scan runs.

Add CPU capacity

If queries are already well-optimized and CPU is high because of genuine traffic volume, scale up the instance. Common approaches:

Scale up the instance for more read and write headroom.
Enable read/write splitting or add read-only nodes to a replica set instance to distribute the read load.
Upgrade to a sharded cluster instance for linear scale-out across multiple shards.
Add mongos nodes if CPU is exhausted on the mongos tier, and configure load balancing across them. For details, see the Balancer section in the sharded cluster introduction.

For steps to change instance configurations, see:

Use persistent connections

Short-lived connections trigger SCRAM-SHA-1 authentication on every connect, which is CPU-intensive at scale. Use a connection pool with persistent connections to eliminate per-connection authentication overhead.