Replica Set Metrics Reference for CloudMonitor - ApsaraDB for MongoDB

This page lists the CloudMonitor metrics available for ApsaraDB for MongoDB replica set instances. Use these metrics to configure alert rules and monitor instance health.

Before you begin

When calling CloudMonitor API operations, set the following parameters:

Namespace: acs_mongodb
Period: an integer multiple of 60. Default: 60. Unit: seconds.

All metrics share the same Dimensions (userId, instanceId, role) and Statistics (Maximum, Minimum, Average).

Dimensions

Each metric supports the following dimensions for filtering CloudMonitor data:

Dimension	Description
`userId`	Filters data for a specific Alibaba Cloud account.
`instanceId`	Filters data for a specific replica set instance.
`role`	Filters data by node role. Use this dimension to monitor primary and secondary nodes separately.

Resource utilization metrics

Metric in alert rules	Indicator	Unit	MetricName	Description
CPU utilization	cpu_usage	%	CPUUtilization	Monitor to detect sustained high CPU load that may degrade query performance.
Memory usage	mem_usage	%	MemoryUtilization	Monitor to identify memory pressure that could cause increased disk I/O or OOM conditions.
Disk usage	disk_usage	%	DiskUtilization	Monitor to prevent the instance from running out of disk space and becoming unavailable.
IOPS usage	iops_usage	%	IOPSUtilization	Monitor to detect when disk throughput approaches the provisioned limit.
Disk size occupied by data	data_size	Byte	DataDiskAmount	Monitor to track data growth and plan capacity scaling.
Disk size occupied by instances	ins_size	Byte	InstanceDiskAmount	Monitor to understand total instance disk consumption across data, logs, and indexes.
Disk size occupied by logs	log_size	Byte	LogDiskAmount	Monitor to detect abnormal log growth caused by replication errors or high write loads.

Connection metrics

Metric in alert rules	Indicator	Unit	MetricName	Description
Number of used connections	current_conn	Count	ConnectionAmount	Monitor to determine whether the current connection limit is sufficient for your workload.
Connection usage	conn_usage	%	ConnectionUtilization	Monitor to detect when the instance is approaching its maximum connection count.

Traffic metrics

Metric in alert rules	Indicator	Unit	MetricName	Description
Internal inbound traffic	bytes_in	Byte	IntranetIn	Monitor to track data ingestion rates and detect unexpected traffic spikes.
Internal outbound traffic	bytes_out	Byte	IntranetOut	Monitor to track data egress and identify read-heavy workloads or hot data access patterns.

Operations metrics

QPS is the sum of all six operation types: insert, delete, update, query, getmore, and command.

Metric in alert rules	Indicator	Unit	MetricName	Description
Queries per second (QPS)	insert+delete+update+query+getmore+command	Count/s	QPS	Monitor overall throughput. A sudden spike or drop may indicate a workload change or an incident.
Number of requests	num_requests	Count	NumberRequests	Monitor total request volume to understand cumulative load on the instance.
Number of insert operations	insert	Count/s	OpInsert	Monitor to track write load. Combine with update and delete metrics to analyze the read/write ratio.
Number of query operations	query	Count/s	OpQuery	Monitor to identify query-heavy workloads that may benefit from index optimization.
Number of update operations	update	Count/s	OpUpdate	Monitor alongside insert and delete to understand mutation patterns.
Number of delete operations	delete	Count/s	OpDelete	Monitor for unexpected deletion spikes that may indicate application bugs or data pipeline issues.
Number of getMore operations	getmore	Count/s	OpGetmore	Monitor to detect cursor-heavy workloads that can exhaust memory on large result sets.
Number of command operations	command	Count/s	OpCommand	Monitor to track administrative and aggregation commands that may affect overall performance.

Replication metrics

Metric in alert rules	Indicator	Unit	MetricName	Description
Replication lag	repl_lag	Seconds	ReplicationLag	Monitor to detect when secondary nodes fall behind the primary. High lag may indicate replication issues that require investigation.