ApsaraMQ for Kafka provides built-in dashboards that display instance health, topic throughput, and consumer group lag. The dashboards are powered by the metric storage and display capabilities of Managed Service for Prometheus and Grafana. You can view dashboards directly in the ApsaraMQ for Kafka console or in a Managed Service for Grafana workspace.
Prerequisites
Before you begin, make sure that you have:
A service-linked role with the following configuration: For details, see Service-linked role.
Item Value Role name AliyunServiceRoleForAlikafka Policy name AliyunServiceRolePolicyForAlikafka Permissions Allows ApsaraMQ for Kafka to access services such as CloudMonitor and ARMS on your behalf. Required for CloudMonitor and dashboard features.
Billing
Dashboard metrics for ApsaraMQ for Kafka are classified as basic metrics in Managed Service for Prometheus and are free of charge. No additional fees apply for using the dashboard feature.
For details, see Metric description and Pay-as-you-go.
View dashboards in the ApsaraMQ for Kafka console
Log on to the ApsaraMQ for Kafka console. In the Resource Distribution section of the Overview page, select the region where your instance resides.
On the Instances page, click the name of the target instance.
In the left-side navigation pane, choose .
On the Dashboard page, set a time range in the upper-right corner to view metrics such as instance storage size, partition count, connection count, and inbound and outbound traffic.
Topic and consumer group dashboards
On the Topic Details page, click the Dashboard tab to view production metrics and producer client metrics for the topic.
On the Group Details page, click the Dashboard tab to view consumption metrics and consumer client metrics for the consumer group.
View dashboards in Managed Service for Grafana
Log on to the Managed Service for Grafana console. In the left-side navigation pane, click Workspace Management.
On the Workspace Management page, click the URL in the Endpoint column of the workspace you want to open.
Follow the instructions for your Grafana version:
Grafana 9.x
In the left-side navigation pane, click the
icon, then click Browse.On the Browse tab, open a dashboard in the target folder. For example, to view instance monitoring data in the China (Hangzhou) region, open kafka-dashboard-v3 in the folder that starts with
cloud-product-prometheus_cn-hangzhou. Then enter the instance ID to filter the dashboard.
Grafana 10.x
Click the
icon in the upper-left corner.In the left-side navigation pane, select Dashboards. Open a dashboard in the target folder. For example, to view instance monitoring data in the China (Hangzhou) region, open kafka-dashboard-v3 in the folder that starts with
cloud-product-prometheus_cn-hangzhou. Then enter the instance ID to filter the dashboard.
Metric reference
Each metric has a type, a name, a description, and one or more labels. The following sections list all available metrics by scope: instance, topic, and consumer group.
Metric types
| Type | Behavior | Example |
|---|---|---|
| Counter | Cumulative value that only increases. | Total producer requests |
| Gauge | Point-in-time value that can increase or decrease. | Reserved specification for instance sending |
| Summary | Similar to a histogram. Measures the statistical distribution of observed values. | Request body size |
Labels
Labels let you filter and aggregate metrics. The following labels are available across metrics:
| Label | Description |
|---|---|
tenant_userid | Your Alibaba Cloud account ID |
instance_id | ApsaraMQ for Kafka instance ID |
instance_name | ApsaraMQ for Kafka instance name |
topic | Topic name |
partition | Partition number |
group_id | Consumer group ID |
authentication_type | Connection authentication method: VPC_PLAINTEXT, PUB_SASL_SSL, VPC_SASL_PLAINTEXT, or VPC_SASL_SSL |
Instance metrics
| Type | Metric name | Description | Labels |
|---|---|---|---|
| Gauge | kafka_disk_log_size | Storage size of the instance, in bytes. | tenant_userid, instance_id, instance_name |
| Gauge | kafka_server_cloudenhancedreplicamanager_allreplicascount | Total partition count across all replicas. | tenant_userid, instance_id, instance_name |
| Gauge | kafka_server_socket_server_metrics_connection_count | Number of connections. | tenant_userid, instance_id, instance_name, authentication_type |
| Gauge | kafka_instance_io_spec_write | Reserved specification for instance sending. | tenant_userid, instance_id, instance_name |
| Gauge | kafka_instance_io_spec_read | Reserved specification for instance consumption. | tenant_userid, instance_id, instance_name |
| Counter | kafka_server_brokertopicmetrics_bytesin_total | Production traffic (jmx-exporter), in bytes. | tenant_userid, instance_id, instance_name, authentication_type |
| Counter | kafka_server_brokertopicmetrics_bytesout_total | Consumption traffic (jmx-exporter), in bytes. | tenant_userid, instance_id, instance_name, authentication_type |
| Counter | kafka_server_brokertopicmetrics_failedproducerequests_total | Number of failed producer requests. | tenant_userid, instance_id, instance_name |
| Counter | kafka_server_brokertopicmetrics_totalproducerequests_total | Total number of producer requests. | tenant_userid, instance_id, instance_name, authentication_type |
| Counter | kafka_server_brokertopicmetrics_failedfetchrequests_total | Number of failed consumer requests. | tenant_userid, instance_id, instance_name, authentication_type |
| Counter | kafka_server_brokertopicmetrics_totalfetchrequests_total | Total number of consumer requests. | tenant_userid, instance_id, instance_name, authentication_type |
| Gauge | kafka_network_socketserver_expiredconnectionskilledcount | Number of expired connections. | tenant_userid, instance_id, instance_name |
| Summary | kafka_network_requestmetrics_requestbytes | Request body size distribution. | tenant_userid, instance_id, instance_name |
| Counter | kafka_server_brokertopicmetrics_bytesrejected_total | Traffic rejected by each topic when the record batch size exceeds max.message.bytes, in bytes. | tenant_userid, instance_id, instance_name |
| Counter | kafka_server_brokertopicmetrics_nokeycompactedtopicrecords_total | Number of records in a compacted topic that lack a key. | tenant_userid, instance_id, instance_name |
| Counter | kafka_server_brokertopicmetrics_invalidmessagecrcrecords_total | Number of cyclic redundancy check (CRC) failures. | tenant_userid, instance_id, instance_name |
| Counter | kafka_server_brokertopicmetrics_invalidmagicnumberrecords_total | Number of message version verification failures. | tenant_userid, instance_id, instance_name |
| Counter | kafka_server_brokertopicmetrics_invalidoffsetorsequencerecords_total | Number of message authentication failures caused by discontinuous offsets or sequence numbers. | tenant_userid, instance_id, instance_name |
Topic metrics
| Type | Metric name | Description | Labels |
|---|---|---|---|
| Gauge | kafka_log_log_size | Storage size of the topic partition. | tenant_userid, instance_id, instance_name, topic, partition |
| Gauge | kafka_topic_partition_current_offset | Maximum offset of the partition. | tenant_userid, instance_id, instance_name, topic, partition |
| Gauge | kafka_topic_partition_oldest_offset | Minimum offset of the partition. | tenant_userid, instance_id, instance_name, topic, partition |
| Gauge | kafka_consumergroup_lag | Message accumulation for the topic. A sustained increase may indicate that consumers cannot keep up with producers. | tenant_userid, instance_id, instance_name, topic |
| Counter | kafka_server_brokertopicmetrics_totalfetchrequests_total | Total number of fetch requests for the topic. | tenant_userid, instance_id, instance_name, topic, authentication_type |
| Counter | kafka_server_brokertopicmetrics_bytesin_total | Production traffic for the topic (jmx-exporter), in bytes. | tenant_userid, instance_id, instance_name, authentication_type |
| Counter | kafka_server_brokertopicmetrics_bytesout_total | Outbound traffic for the topic, in bytes. | tenant_userid, instance_id, instance_name, topic, authentication_type |
| Counter | kafka_server_brokertopicmetrics_messagesin_total | Number of inbound messages for the topic. Unit: messages. | tenant_userid, instance_id, instance_name, topic, authentication_type |
Consumer group metrics
| Type | Metric name | Description | Labels |
|---|---|---|---|
| Gauge | kafka_consumergroup_lag | Total amount of message accumulation for the consumer group. Steadily growing lag may indicate that consumers are processing too slowly. | tenant_userid, instance_id, instance_name, topic, group_id |
| Counter | kafka_consumergroup_current_offset | Current consumer offset of the group. | tenant_userid, instance_id, instance_name, topic, partition, group_id |
FAQ
How do I obtain dashboard metric data?
Integrate ApsaraMQ for Kafka with Managed Service for Prometheus through the ARMS Integration Center.
Log on to the ARMS console with your Alibaba Cloud account.
In the left-side navigation pane, click Integration Center.
Search for
Kafka, then select Alibaba Cloud Kafka Message Queue Service. For details, see Step 1: Integrate monitoring data of an Alibaba Cloud service.NoteTo collect metric data for ApsaraMQ for Kafka Serverless instances, turn on the Advanced Monitoring Metrics switch in the Configuration Information section during integration. Without this setting, metric data is collected only for non-Serverless instances.
After the integration succeeds, click Integration Management in the left-side navigation pane.
Click the Cloud Service Region Environment tab.
Click the name of the target environment to open its details page.
On the Component Management tab, in the Basic Information section, click the region of the Prometheus Instance.
On the Settings tab, view the available data access methods.
How do I connect dashboard metric data to a self-managed Grafana instance?
All ApsaraMQ for Kafka metric data is stored in your Managed Service for Prometheus instance. You can use the APIs provided by Managed Service for Prometheus to connect the dashboard metric data to your self-managed Grafana instance.
For details, see Connect Prometheus data to a Grafana instance by using an HTTP API endpoint.
Before connecting, make sure that Managed Service for Prometheus in the region where your ApsaraMQ for Kafka instance resides is integrated with Alibaba Cloud Kafka Message Queue Service.