This page lists the CloudMonitor metrics available for ApsaraDB for MongoDB replica set instances. Use these metrics to configure alert rules and monitor instance health.
Before you begin
When calling CloudMonitor API operations, set the following parameters:
Namespace:
acs_mongodbPeriod: an integer multiple of 60. Default:
60. Unit: seconds.
All metrics share the same Dimensions (userId, instanceId, role) and Statistics (Maximum, Minimum, Average).
Dimensions
Each metric supports the following dimensions for filtering CloudMonitor data:
| Dimension | Description |
|---|---|
userId | Filters data for a specific Alibaba Cloud account. |
instanceId | Filters data for a specific replica set instance. |
role | Filters data by node role. Use this dimension to monitor primary and secondary nodes separately. |
Resource utilization metrics
| Metric in alert rules | Indicator | Unit | MetricName | Description |
|---|---|---|---|---|
| CPU utilization | cpu_usage | % | CPUUtilization | Monitor to detect sustained high CPU load that may degrade query performance. |
| Memory usage | mem_usage | % | MemoryUtilization | Monitor to identify memory pressure that could cause increased disk I/O or OOM conditions. |
| Disk usage | disk_usage | % | DiskUtilization | Monitor to prevent the instance from running out of disk space and becoming unavailable. |
| IOPS usage | iops_usage | % | IOPSUtilization | Monitor to detect when disk throughput approaches the provisioned limit. |
| Disk size occupied by data | data_size | Byte | DataDiskAmount | Monitor to track data growth and plan capacity scaling. |
| Disk size occupied by instances | ins_size | Byte | InstanceDiskAmount | Monitor to understand total instance disk consumption across data, logs, and indexes. |
| Disk size occupied by logs | log_size | Byte | LogDiskAmount | Monitor to detect abnormal log growth caused by replication errors or high write loads. |
Connection metrics
| Metric in alert rules | Indicator | Unit | MetricName | Description |
|---|---|---|---|---|
| Number of used connections | current_conn | Count | ConnectionAmount | Monitor to determine whether the current connection limit is sufficient for your workload. |
| Connection usage | conn_usage | % | ConnectionUtilization | Monitor to detect when the instance is approaching its maximum connection count. |
Traffic metrics
| Metric in alert rules | Indicator | Unit | MetricName | Description |
|---|---|---|---|---|
| Internal inbound traffic | bytes_in | Byte | IntranetIn | Monitor to track data ingestion rates and detect unexpected traffic spikes. |
| Internal outbound traffic | bytes_out | Byte | IntranetOut | Monitor to track data egress and identify read-heavy workloads or hot data access patterns. |
Operations metrics
QPS is the sum of all six operation types: insert, delete, update, query, getmore, and command.
| Metric in alert rules | Indicator | Unit | MetricName | Description |
|---|---|---|---|---|
| Queries per second (QPS) | insert+delete+update+query+getmore+command | Count/s | QPS | Monitor overall throughput. A sudden spike or drop may indicate a workload change or an incident. |
| Number of requests | num_requests | Count | NumberRequests | Monitor total request volume to understand cumulative load on the instance. |
| Number of insert operations | insert | Count/s | OpInsert | Monitor to track write load. Combine with update and delete metrics to analyze the read/write ratio. |
| Number of query operations | query | Count/s | OpQuery | Monitor to identify query-heavy workloads that may benefit from index optimization. |
| Number of update operations | update | Count/s | OpUpdate | Monitor alongside insert and delete to understand mutation patterns. |
| Number of delete operations | delete | Count/s | OpDelete | Monitor for unexpected deletion spikes that may indicate application bugs or data pipeline issues. |
| Number of getMore operations | getmore | Count/s | OpGetmore | Monitor to detect cursor-heavy workloads that can exhaust memory on large result sets. |
| Number of command operations | command | Count/s | OpCommand | Monitor to track administrative and aggregation commands that may affect overall performance. |
Replication metrics
| Metric in alert rules | Indicator | Unit | MetricName | Description |
|---|---|---|---|---|
| Replication lag | repl_lag | Seconds | ReplicationLag | Monitor to detect when secondary nodes fall behind the primary. High lag may indicate replication issues that require investigation. |