All Products
Search
Document Center

ApsaraDB for MongoDB:Parameter tuning recommendations

Last Updated:Mar 28, 2026

ApsaraDB for MongoDB lets you modify instance parameters from the console. Misconfigured parameters can cause performance degradation or application errors. This topic covers key parameters, their default values, the symptoms that indicate a problem, and concrete tuning guidance.

Note

This topic covers kernel parameters only. Client-side driver parameters such as socketTimeout are not included.

Important

Some parameters require an instance restart to take effect. A restart causes a brief connection interruption. Before modifying any parameter, check the Restart required field and schedule the change during off-peak hours if needed.

Parameter quick reference

The following table summarizes all parameters covered in this topic. "Applicable scope" indicates whether the parameter applies to mongod nodes (replica set members and shard nodes) or Mongos routers.

ParameterDefaultRestart requiredApplicable scopeRecommendation
operationProfiling.modeoffYesmongodKeep default; enable only for targeted debugging
operationProfiling.slowOpThresholdMs100 msNomongod, mongosAdjust to slightly above average latency of core queries
replication.oplogGlobalIdEnabledfalseYesmongodEnable only for two-way sync with DTS or mongoShake
replication.oplogSizeMB10% of diskNomongodKeep default; increase for high-update-rate workloads
setParameter.cursorTimeoutMillis600000 msNomongod, mongosDo not increase; consider decreasing to 300000
setParameter.flowControlTargetLagSeconds10NomongodIncrease if throttling is confirmed; investigate root cause if it persists
setParameter.oplogFetcherUsesExhausttrueYesmongodDo not change
setParameter.maxTransactionLockRequestTimeoutMillis5 msNomongodIncrease if lock timeout errors are frequent
setParameter.replWriterThreadCount16YesmongodDo not adjust; contact support for guidance
setParameter.tcmallocAggressiveMemoryDecommit0NomongodEnable only for confirmed OOM or fragmentation; monitor closely
setParameter.transactionLifetimeLimitSeconds60NomongodDecrease (e.g., to 30); never increase
storage.oplogMinRetentionHours0NomongodKeep default for stable workloads; use a float > 1.0 for variable workloads
storage.wiredTiger.collectionConfig.blockCompressorsnappyYesmongodChange based on workload; use zstd for cold data
setParameter.minSnapshotHistoryWindowInSeconds / maxTargetSnapshotHistoryWindowInSeconds300NomongodSet to 0 if atClusterTime reads are not used
rsconf.chainingAllowedtrueNomongodSee guidance based on cluster size
setParameter.internalQueryMaxPushBytes / internalQueryMaxAddToSetBytes104857600 (100 MB)NomongodIncrease only if hitting the limit for a specific query
setParameter.migrateCloneInsertionBatchSize0Nomongod (shard)Adjust if chunk migration causes latency spikes
setParameter.rangeDeleterBatchDelayMS20 msNomongod (shard)Increase (e.g., to 200) if CPU spikes during balancing
setParameter.rangeDeleterBatchSize0 (auto ~128)Nomongod (shard)Adjust if CPU spikes during balancing
setParameter.receiveChunkWaitForRangeDeleterTimeoutMS10000 msNomongod (shard)Increase if timeout errors appear during balancing
setParameter.ShardingTaskExecutorPoolMaxConnecting2Yes (≤4.0) / No (≥4.2)mongosDo not adjust
setParameter.ShardingTaskExecutorPoolMaxSize2^64 - 1Yes (≤4.0) / No (≥4.2)mongosNo adjustment needed
setParameter.ShardingTaskExecutorPoolMinSize1Yes (≤4.0) / No (≥4.2)mongosSet to a value in [10, 50]

Replica sets

operationProfiling.mode

AttributeValue
Applicable versions3.0 and later
Restart requiredYes
Defaultoff

Controls the query profiler level.

Symptoms

  • Setting this to all or slowOp under high query load degrades instance performance.

  • A system.profile collection appears in a database, indicating the profiler was left enabled.

  • Some users mistakenly assume this parameter must be set to slowOp to generate slow query logs.

Recommendation

Keep the default value (off). Enabling the query profiler adds overhead, and slow query logs typically provide equivalent diagnostic information. Enable the profiler only for targeted debugging sessions, and disable it immediately after analysis.

For details on profiler overhead, see the MongoDB Database Profiler documentation.

operationProfiling.slowOpThresholdMs

AttributeValue
Applicable versions3.0 and later
Restart requiredNo
Default100 (ms)

Defines the threshold above which a query is classified as slow.

Symptoms

  • Value too small: Excessive slow query and audit log entries create noise that makes real problems harder to identify.

  • Value too large: Genuinely slow queries go unrecorded, obscuring performance issues.

Recommendation

Set this slightly above the average latency of your most critical queries.

Business profileTypical query timeSuggested value
Latency-sensitive~30 ms50 ms
Analytical workload300–400 ms500 ms

replication.oplogGlobalIdEnabled

AttributeValue
Applicable versions4.0 and later
Restart requiredYes
Defaultfalse

A self-developed parameter that adds global IDs (GIDs) to oplog entries. GIDs enable two-way synchronization with DTS (Data Transmission Service) or mongoShake by breaking circular synchronization loops.

Recommendation

Enable only when two-way synchronization is required. Because a restart is needed, schedule the change during off-peak hours.

replication.oplogSizeMB

AttributeValue
Applicable versions3.0 and later
Restart requiredNo
Default10% of instance disk space (e.g., 50 GB for a 500 GB disk)

Sets the maximum logical size of the oplog collection, which stores replication change records.

Symptoms

If the oplog is too small:

  • Secondary nodes fall behind and enter the RECOVERING state.

  • Log backups miss oplog records, creating gaps that prevent point-in-time restore.

Recommendation

Keep the default. Never decrease it. Increase it if your workload has a low data volume but a high update rate — such workloads generate oplog entries quickly and can exhaust a small oplog window. As a rule of thumb, size the oplog to cover at least one hour of writes.

This parameter is not changed via the MongoDB configuration file. The Alibaba Cloud control plane resizes the oplog using the replsetResizeOplog command.

setParameter.cursorTimeoutMillis

AttributeValue
Applicable versions3.0 and later
Restart requiredNo
Default600000 (10 minutes)

The idle timeout for server-side cursors, in milliseconds. MongoDB automatically closes cursors that exceed this threshold.

Symptom

Accessing a cursor after it has been closed returns:

Message: "cursor id xxxxxxx not found"
ErrorCode: CursorNotFound(43)

Recommendation

Do not increase this value. To reduce the resource overhead of idle cursors, lower it — for example, to 300000. Regardless of this setting, avoid holding cursors idle on the application side.

setParameter.flowControlTargetLagSeconds

AttributeValue
Applicable versions4.2 and later
Restart requiredNo
Default10

The replication-lag threshold, in seconds, at which MongoDB activates flow control. Flow control throttles writes on the primary to prevent secondary nodes from falling too far behind.

Symptom

Slow query logs show that durationMillis is nearly equal to flowControl.timeAcquiringMicros, indicating that the request was throttled — not slow due to query execution:

{
  "t": { "$date": "2024-04-25T13:28:45.840+08:00" },
  "s": "I",
  "c": "WRITE",
  "id": 51803,
  "ctx": "conn199253",
  "msg": "Slow query",
  "attr": {
    "type": "update",
    "ns": "xxx.xxxxx",
    "command": "...",
    "planSummary": "IDHACK",
    "flowControl": {
      "acquireCount": 1,
      "acquireWaitCount": 1,
      "timeAcquiringMicros": 959000
    },
    "durationMillis": 959
  }
}

Recommendation

Use the following decision path:

ObservationAction
durationMillisflowControl.timeAcquiringMicros in slow query logsConfirm flow control is the cause; increase flowControlTargetLagSeconds to reduce sensitivity
Throttling continues after increasing the valueThe instance has a deeper primary-secondary synchronization bottleneck; investigate replication lag
Replication lag root cause confirmedOptions include upgrading the instance, reducing write throughput, or setting write concern to {w: majority}

setParameter.oplogFetcherUsesExhaust

AttributeValue
Applicable versions4.4 and later
Restart requiredYes
Defaulttrue

Controls whether stream replication is used for primary-secondary oplog transfer. When disabled, secondary nodes revert to the pull model, where each batch requires a separate network round trip.

Symptom

In some environments, stream replication produces extra CPU or network bandwidth overhead.

Recommendation

Do not change this parameter. Stream replication reduces replication lag in high-load and high-latency environments, reduces the risk of data loss when a primary with {w: 1} write concern goes down unexpectedly, and lowers write latency for {w: majority} and {w: >1} write concerns.

setParameter.maxTransactionLockRequestTimeoutMillis

AttributeValue
Applicable versions4.0 and later
Restart requiredNo
Default5 (ms)

The time a transaction waits to acquire a lock before automatically aborting, in milliseconds.

Symptom

The client or server logs contain:

Message: "Unable to acquire lock '{8442595743001781021: Database, 1525066715360699165}' within a max lock request timeout of '5ms' milliseconds."
ErrorCode: LockTimeout(24)

Drivers that support TransientTransactionError retry automatically, so the error may only appear in server logs.

Recommendation

If lock timeout errors are frequent, increase this parameter to reduce aborts from transient lock contention. If errors persist after the increase, address the root cause in application logic:

  • Avoid concurrent modifications to the same document within a transaction.

  • Audit the transaction for operations that hold locks for long periods, such as Data Definition Language (DDL) operations or unoptimized queries without index coverage.

setParameter.replWriterThreadCount

AttributeValue
Applicable versions3.2 and later
Restart requiredYes
Default16

The maximum number of threads used for parallel oplog application on secondary nodes. The effective ceiling is twice the number of CPU cores of the instance type.

Symptom

In extreme cases, secondary nodes accumulate replication lag continuously because oplog application cannot keep up with write volume on the primary.

Recommendation

Do not adjust this parameter in normal operations. If replication lag persists despite tuning other parameters, contact Alibaba Cloud support for guidance specific to your workload.

setParameter.tcmallocAggressiveMemoryDecommit

AttributeValue
Applicable versions4.2 and later
Restart requiredNo
Default0 (disabled)

Controls whether TCMalloc uses aggressive memory decommit. When enabled, MongoDB actively merges contiguous free memory blocks and returns them to the operating system.

Symptoms

  • An out-of-memory (OOM) error occurs on a mongod node because memory cannot be reclaimed fast enough to keep up with query load.

  • Heap fragmentation causes memory usage to rise slowly past 80% and continue climbing steadily.

Recommendation

Do not adjust this parameter in normal operations. If OOM errors or heap fragmentation are confirmed, enable this parameter during off-peak hours.

Important

Enabling aggressive memory decommit may reduce throughput depending on your workload. After enabling, monitor performance and roll back promptly if business impact is observed.

setParameter.transactionLifetimeLimitSeconds

AttributeValue
Applicable versions4.0 and later
Restart requiredNo
Default60

The maximum duration for an open transaction, in seconds. Transactions that exceed this limit are marked as expired and aborted by a background cleanup thread.

Symptom

The client receives:

Message: "Aborting transaction with txnNumber xxx on session with lsid xxxxxxxxxx because it has been running for longer than 'transactionLifetimeLimitSeconds'"

Recommendation

Decrease this value (for example, to 30) rather than increase it. Long-running uncommitted transactions hold WiredTiger cache resources — an overloaded cache causes request latency spikes, database stalls, and full CPU utilization.

To address transaction timeouts without increasing the limit:

  • Break large transactions into smaller units that complete within the configured time.

  • Optimize queries inside the transaction to use indexes, reducing execution time.

For best practices on transactions, see Transactions and Read/Write Concern.

storage.oplogMinRetentionHours

AttributeValue
Applicable versions4.4 and later
Restart requiredNo
Default0 (disabled; oplog size governed entirely by replication.oplogSizeMB)

The minimum number of hours the oplog collection is retained, regardless of the replication.oplogSizeMB limit.

Symptoms

  • Setting this too high causes the oplog collection to consume disk space beyond the size cap.

  • Forgetting about a non-zero value here can make disk usage appear to fluctuate unexpectedly.

Recommendation

For stable write workloads, keep the default (0). For workloads with highly variable write volumes, set this to a floating-point number greater than 1.0. Before setting a value, estimate peak disk usage to avoid triggering a disk-full lock.

storage.wiredTiger.collectionConfig.blockCompressor

AttributeValue
Applicable versions3.0 and later
Restart requiredYes
Defaultsnappy
Supported algorithmsnone, snappy, zlib, zstd (zstd requires MongoDB 4.2 and later)

Sets the compression algorithm for new collections. Existing collections are not affected by this change.

Recommendation

Change based on your workload characteristics. Higher compression ratios come with higher CPU cost for compression and decompression — measure the trade-off in your own environment. For instances used primarily to store cold data, zstd offers a significantly higher compression ratio.

To use different compression algorithms for individual collections, use the createCollection command with explicit storage engine options. See the MongoDB documentation.

setParameter.minSnapshotHistoryWindowInSeconds / setParameter.maxTargetSnapshotHistoryWindowInSeconds

AttributeValue
Applicable versions4.4 and later
Restart requiredNo
Default300 (5 minutes)

The duration, in seconds, for which WiredTiger retains snapshot history. Setting this to 0 disables the snapshot history window. This parameter primarily supports reads at a specific cluster time using atClusterTime.

Symptom

This parameter adds pressure to the WiredTiger cache (WT cache), particularly when the same documents are updated frequently.

Recommendation

No adjustment is needed in most cases.

  • If your workload does not use atClusterTime reads, set this to 0 to reduce WT cache pressure.

  • If you need to read snapshot data older than 5 minutes, increase this value — but account for the additional memory and CPU overhead.

If the snapshot window is smaller than the age of the snapshot you request, MongoDB returns a SnapshotTooOld error.

rsconf.chainingAllowed

AttributeValue
Applicable versions4.0 and later
Restart requiredNo
Defaulttrue

Controls whether secondary nodes in the replica set can sync from another secondary (chained replication) rather than always syncing directly from the primary.

Symptoms

  • Disabling chained replication increases the primary node's CPU utilization and network traffic.

  • Enabling chained replication makes it easier for secondary nodes to accumulate replication lag.

Recommendation

Cluster sizeGuidance
4 or fewer nodesEnable or disable based on your network topology and latency requirements
5 or more nodes with {w: majority}Disabling chained replication improves write performance but significantly increases primary node load — evaluate the trade-off for your workload

setParameter.internalQueryMaxPushBytes / setParameter.internalQueryMaxAddToSetBytes

AttributeValue
Applicable versions4.2 and later
Restart requiredNo
Default104857600 (100 MB)

The maximum memory that the $push and $addToSet accumulator operators can use per query.

Symptom

A query using $push or $addToSet fails with:

"errMsg": "$push used too much memory and cannot spill to disk. Memory limit: 104857600...

Recommendation

No adjustment is needed in most cases. If you consistently hit this limit for a specific query, increase the value. Setting this to a very large value risks out-of-memory (OOM) errors on the mongod node.

Sharded clusters (Shard)

setParameter.migrateCloneInsertionBatchSize

AttributeValue
Applicable versions4.0 and later
Restart requiredNo
Default0 (bounded by the 16 MB BSON document size limit)

The maximum number of documents per batch during the clone phase of a chunk migration.

Symptom

Chunk migrations during balancing cause latency spikes on the affected shard.

Recommendation

No adjustment is needed in most cases. If chunk migration consistently causes performance fluctuations during balancing, set this to a fixed batch size to control migration throughput.

setParameter.rangeDeleterBatchDelayMS

AttributeValue
Applicable versions4.0 and later
Restart requiredNo
Default20 (ms)

The pause between consecutive batch deletions during the cleanup phase of a chunk migration. Also applies to the cleanupOrphaned command.

Symptoms

  • Asynchronous post-migration document deletion causes a CPU spike on the shard.

  • Setting this value too high delays orphaned document cleanup, and may result in a timeout:

    Message: "OperationFailed: Data transfer error: ExceededTimeLimit: Failed to delete orphaned <db>.<collection> range [xxxxxx,xxxxx] :: caused by :: operation exceeded time limit"

Recommendation

No adjustment is needed in most cases. If CPU spikes during balancing are traced to orphaned document deletion, increase this value — for example, to 200 — to throttle the deletion rate.

This parameter works together with setParameter.rangeDeleterBatchSize. Adjust them separately or in combination to control the overall deletion throughput.

setParameter.rangeDeleterBatchSize

AttributeValue
Applicable versions4.0 and later
Restart requiredNo
Default0 (auto-selected, typically 128 documents per batch)

The maximum number of documents per batch for asynchronous orphaned document deletion after chunk migration.

Symptom

Asynchronous post-migration deletion causes CPU utilization spikes on the shard.

Recommendation

No adjustment is needed in most cases. If CPU spikes during balancing are traced to orphaned document deletion, set this to a fixed batch size. Use this parameter together with setParameter.rangeDeleterBatchDelayMS to fine-tune deletion throughput.

setParameter.receiveChunkWaitForRangeDeleterTimeoutMS

AttributeValue
Applicable versions4.4 and later
Restart requiredNo
Default10000 (10 seconds)

The time a moveChunk operation waits for the range deleter to finish clearing orphaned documents before a migration starts, in milliseconds.

Symptom

The balancer logs a timeout error:

ExceededTimeLimit: Failed to delete orphaned <db.collection> range [{ <shard_key>: MinKey }, { <shard_key>: -9186000910690368367 }) :: caused by :: operation exceeded time limit

Recommendation

No adjustment is needed in most cases. If this timeout error appears consistently, increase this value to give the range deleter more time to complete before the next migration begins.

setParameter.minSnapshotHistoryWindowInSeconds / setParameter.maxTargetSnapshotHistoryWindowInSeconds

Same behavior and recommendation as the replica set section above. Applies to each shard's mongod nodes.

rsconf.chainingAllowed

Same behavior and recommendation as the replica set section above. Applies to each shard's replica set.

setParameter.internalQueryMaxPushBytes / setParameter.internalQueryMaxAddToSetBytes

Same behavior and recommendation as the replica set section above. Applies to shard nodes.

Sharded clusters (Mongos)

operationProfiling.slowOpThresholdMs

AttributeValue
Applicable versions3.0 and later
Restart requiredNo
Default100 (ms)

Same behavior and recommendation as the replica set section above. Applies to Mongos nodes.

setParameter.ShardingTaskExecutorPoolMaxConnecting

AttributeValue
Applicable versions3.6 and later
Restart requiredYes (3.6 and 4.0) / No (4.2 and later)
Default2

The maximum number of concurrent connection handshakes in the TaskExecutor connection pool on a Mongos node. This controls the rate at which Mongos establishes new connections to mongod nodes.

Symptom

When many connections are created simultaneously, the Mongos node experiences a CPU spike.

Recommendation

Do not adjust this parameter.

setParameter.ShardingTaskExecutorPoolMaxSize

AttributeValue
Applicable versions3.6 and later
Restart requiredYes (3.6 and 4.0) / No (4.2 and later)
Default2^64 - 1 (maximum 64-bit integer)

The maximum number of connections per TaskExecutor connection pool on a Mongos node.

Recommendation

No adjustment is needed. If you need to cap the number of Mongos-to-shard connections, set a lower bound — but avoid setting it too low. An exhausted connection pool causes requests on Mongos to queue and stall.

setParameter.ShardingTaskExecutorPoolMinSize

AttributeValue
Applicable versions3.6 and later
Restart requiredYes (3.6 and 4.0) / No (4.2 and later)
Default1

The minimum number of connections maintained per TaskExecutor connection pool on a Mongos node.

Symptom

A sudden burst of requests forces the connection pool to create many new connections at once, causing a CPU spike and request latency increase on the Mongos node.

Recommendation

Set this to a value in the range [10, 50]. The right value depends on your shard topology — the number of shards and the number of nodes per shard. Keep in mind that Mongos consumes a small amount of memory to maintain idle connections to each shard.

setParameter.cursorTimeoutMillis

AttributeValue
Applicable versions3.0 and later
Restart requiredNo
Default600000 (10 minutes)

Same behavior and recommendation as the replica set section above. Applies to Mongos nodes.