All Products
Search
Document Center

ApsaraMQ for Kafka:Cloud Monitor

Last Updated:Mar 11, 2026

When your Kafka workloads experience message accumulation, disk pressure, or traffic throttling, you need visibility into what is happening and a way to get notified before the problem escalates. ApsaraMQ for Kafka integrates with Cloud Monitor to provide real-time metrics for instances, topics, and consumer groups, along with configurable alert rules that notify you through phone calls, text messages, emails, or DingTalk chatbot messages.

Cloud Monitor is free for ApsaraMQ for Kafka.

Prerequisites

Before you begin, make sure you have:

  • The service-linked role AliyunServiceRoleForAlikafka with the AliyunServiceRolePolicyForAlikafka policy attached -- this allows ApsaraMQ for Kafka to access other Alibaba Cloud services, such as Cloud Monitor and Application Real-Time Monitoring Service (ARMS), for monitoring and dashboard features

For details, see Service-linked roles.

View monitoring data

  1. Log on to the ApsaraMQ for Kafka console. In the Resource Distribution section of the Overview page, select the region where your instance resides.

  2. On the Instances page, click the name of the target instance.

  3. In the left-side navigation pane, choose Observability > CloudMonitor.

  4. On the Monitoring Chart tab, set a time range.

Charts for all metrics of the selected resource are displayed automatically.

Create an alert rule

  1. Log on to the ApsaraMQ for Kafka console.

  2. In the Resource Distribution section of the Overview page, select the region where your instance resides.

  3. On the Instances page, click the name of the target instance.

  4. In the left-side navigation pane, choose Observability > CloudMonitor.

  5. Click the Alert Rule tab, and then click Create Alert Rule.

  6. In the Create Alert Rule panel, configure the alert rule and notification method, and then click OK.

To modify an existing rule, find the rule and click Modify in the Actions column.

View alert details

  1. Log on to the ApsaraMQ for Kafka console.

  2. In the Resource Distribution section of the Overview page, select the region where your instance resides.

  3. On the Instances page, click the name of the target instance.

  4. In the left-side navigation pane, choose Observability > CloudMonitor.

  5. Click the Alert Rule tab. Find the target rule and click Details in the Actions column.

Metrics reference

All metrics are aggregated at one-minute intervals. Traffic metrics are reported in bytes per second (B/s) and represent the average value over each one-minute period. Metric data has a one-minute latency.

Instance metrics

All instance metrics use the dimensions userId and instanceId.

Traffic metrics

Metric nameMetric IDUnit
Inbound traffic of the instance cluster (including replication traffic)ClusterMessageInputV3B/s
Actual inbound traffic of the instanceInstanceMessageInputV3B/s
Actual outbound traffic of the instanceInstanceMessageOutputV3B/s
Number of messages produced for the instanceInstanceMessageNumInputV3count/s
Number of messages consumed for the instanceInstanceMessageNumOutputV3count/s
Number of message production requests for the instanceInstanceReqsInputV3count/s
Number of message consumption requests for the instanceInstanceReqsOutputV3count/s
Public network write bandwidth of the instanceInstanceInternetTxRateV3bit/s
Public network read bandwidth of the instanceInstanceInternetRxRateV3bit/s

Storage metrics

Metric nameMetric IDUnit
Instance disk usageDiskInstanceRatioV3%
Instance storage sizeInstanceDiskLogSizeV3B

Connection metrics

Metric nameMetric IDUnit
Maximum connections on a single node (public and private networks)InstanceMaxConnectionV3count
Maximum connections on a single node (public network)InstanceMaxInternetConnectionV3count
Total connections of the instance (public and private networks)InstanceTotalConnectionV3count
Total connections of the instance (public network)InstanceTotalInternetConnectionV3count
Usage of maximum connections on a single node (public and private networks)InstanceMaxConnectionRatioV3%
Usage of maximum connections on a single node (public network)InstanceMaxInternetConnectionRatioV3%

Capacity ratio metrics

Metric nameMetric IDUnit
Ratio of production traffic on the busiest node to the elastic limit of the nodeInstanceMaxNodeInputRatioV3%
Ratio of consumption traffic on the busiest node to the elastic limit of the nodeInstanceMaxNodeOutputRatioV3%
Ratio of production traffic to the elastic limitInstanceMessageInputRatioV3%
Ratio of consumption traffic to the elastic limitInstanceMessageOutputRatioV3%
Instance partition usagePartitionInstanceRatioV3%

Throttling metrics

Metric nameMetric IDUnit
Production throttling duration of the instanceInstanceThrottleTimeP99InputV3ms
Consumption throttling duration of the instanceInstanceThrottleTimeP99OutputV3ms

Consumer group metrics

Consumer group metrics track message accumulation (lag) and consumption throughput. A rising accumulation value means consumers are falling behind producers -- scale your consumer group or investigate processing bottlenecks.

Metric nameMetric IDDimensionsUnit
Message accumulationMessageAccumulationV3userId, instanceId, consumerGroupcount
Number of unconsumed messages of a topic in a consumer groupMessageAccumulationOnetopicV3userId, instanceId, consumerGroup, topiccount
MessageNumOutputV3GroupMessageNumOutputV3userId, instanceId, consumerGroupcount/s
MessageNumOutputOnetopicV3GroupMessageNumOutputOnetopicV3userId, instanceId, consumerGroup, topiccount/s
MessageNumOutputOnetopicOnepartitionV3GroupMessageNumOutputOnetopicOnepartitionV3userId, instanceId, consumerGroup, topic, partitioncount/s

Topic metrics

Metric nameMetric IDDimensionsUnit
Number of partitions with abnormal HA in a topicTopicAbnormalHaPartitionNumV3userId, instanceId, topiccount

References