ApsaraMQ for RocketMQ provides a dashboard for real-time data statistics that uses the metric storage and display capabilities of Alibaba Cloud ARMS Managed Service for Prometheus and Grafana. This feature helps you centrally collect and observe metrics from multiple dimensions to quickly understand the operational status of your business. This topic describes the scenarios, billing, metrics, and usage of the dashboard.
Scenarios
Scenario 1: You need to receive alerts and locate issues in a timely manner when exceptions occur during online message consumption.
Scenario 2: You need to check whether messages are sent as expected in the messaging system when the status of specific online orders is abnormal.
Scenario 3: You need to analyze the change trend of message traffic, the characteristics of traffic distribution, or message volume to help you analyze the business trend and make business plans.
Scenario 4: You need to view and analyze the upstream and downstream dependency topologies of applications to upgrade, optimize, or transform the architecture.
Prerequisites
Create a service-linked role.
Role name: AliyunServiceRoleForOns
Policy name: AliyunServiceRolePolicyForOns
Permissions: Allows ApsaraMQ for RocketMQ to use this role to access other Alibaba Cloud services, such as CloudMonitor and ARMS, to implement features for monitoring, alerting, and dashboards.
For more information, see Service-linked Role.
Billing
The dashboard metrics for ApsaraMQ for RocketMQ are basic metrics in ARMS Managed Service for Prometheus. Basic metrics are free of charge. Therefore, the dashboard feature is also free.
For more information, see Metrics and Pay-as-you-go.
Concepts
Before you view dashboard metrics, you need to understand the following concepts related to message accumulation.
The following figure shows the status of each message in a queue of a specific topic.

In the preceding figure, ApsaraMQ for RocketMQ calculates the number of messages and the processing duration at different processing stages. The metrics that are used in this process reflect the processing rate and message accumulation in the queue. By monitoring the metrics, you can determine whether exceptions occur during consumption. The following table describes the details of the metrics and the formulas that are used to calculate the metrics.
Category | Metric | Description | Calculation formula |
Message quantity | Inflight messages | The messages that a consumer client is processing and for which the client has not returned the consumption results. | Number of inflight messages = Offset of the latest pulled message - Offset of the latest acknowledged message |
Ready messages | The messages that are visible to consumers and are ready for consumption on the ApsaraMQ for RocketMQ broker. | Number of ready messages = Maximum offset - Offset of the latest pulled message | |
Consumer lag | The messages that are being processed and ready to be processed. | Consumer lag = Number of inflight messages + Number of ready messages | |
Duration | Ready time |
| N/A |
Ready message queue time | The interval between the current point in time and the ready time of the earliest ready message. This metric indicates how soon a consumer pulls messages. | Ready message queue time = Current time - Ready time of the earliest ready message | |
Consumer lag time | The interval between the ready time of the earliest unacknowledged message and the current time. This metric indicates how soon a consumer processes messages. | Consumer lag time = Current time - Ready time of the earliest unacknowledged message |
Metric details
The ApsaraMQ for RocketMQ dashboard provides the following metrics:
Producer: View metrics for a topic, such as the number of messages sent, send success rate, and send latency.
Consumer: View metrics related to a group's subscription to a specific topic, such as consumption volume, consumption success rate, and message accumulation.
Instance Top 20 overview: View the top 20 topics or groups for specific metric values within an instance.
Billing metrics: View metrics for an instance, such as message TPS, API calls, and average message size. These metrics can be used as a reference for estimating billing items.
The collection period for all metrics is 1 minute. ApsaraMQ for RocketMQ supports queries for data from the last 15 days. The maximum time range for a single query is 24 hours.
Producer
Metric | Description |
Message Production Rate | The message production rate and the API call rate for message production for a topic. Units:
|
Peak Message Production Rate | The maximum message production rate. Unit: messages/second. |
Total Messages Produced | The total number of messages produced in a specific instance. Unit: messages. |
Message Production Call Success Rate | The success rate of message production for a topic. |
Message Production Call Latency | The latency of message production for a topic. Unit: ms. |
Consumer
Metric | Description |
Average Consumption Success Rate | The consumption success rate for all messages in a specific instance. |
Accumulated Messages (Ready + Inflight) | The total number of accumulated messages in a specific instance, including ready and inflight messages. Unit: messages. |
Inflight Messages | The number of messages that are being processed by a consumer client but for which a success response has not been returned. Unit: messages. |
Ready Messages | The number of messages that are ready on the ApsaraMQ for RocketMQ server and can be consumed. This metric reflects the scale of messages that have not yet been processed by consumers. Unit: messages. |
Ready Message Queue Time | The time difference between the current time and the ready time of the earliest ready message. This metric reflects the latency of unprocessed messages and is a critical measure for time-sensitive services. The metric value in the overview represents the average ready message queue time for the instance. The metric value in a specific chart represents the ready message queue time for a specific group subscribing to a specific topic. Unit: ms. |
Message Consumption Rate | The rate at which a group consumes messages. Unit: messages/second |
Peak Message Consumption Rate | The maximum message consumption rate. Unit: messages/second |
Total Messages Consumed | The total number of messages consumed in a specific instance. Unit: messages. |
Consumption Accumulation | The number of accumulated messages for a group, including ready and inflight messages. Unit: messages. |
Message Processing Latency | The time it takes for a group to process a message, from the start of consumption to completion. Unit: ms. |
Consumer Local Wait Time | The time it takes for a message to be processed after it arrives at the consumer client. Unit: ms. |
Consumption Success Rate | The success rate of message consumption. |
Consumer Client Access Protocol Ratio | The ratio of consumed messages by protocol type. |
Instance Top 20 overview
Metric | Description |
Top 20 Topics by Message Production Rate | The top 20 topics with the highest message production rate. Unit: messages/second. |
Top 20 GroupIDs by Message Consumption Rate | The top 20 groups with the highest message consumption rate. Unit: messages/second. |
Top 20 GroupIDs by Number of Ready Messages | The top 20 groups with the most ready messages. Unit: messages. |
Top 20 GroupIDs by Ready Message Queue Time | The top 20 groups with the longest ready message queue time. Unit: ms. |
Top 20 GroupIDs by Number of Accumulated Messages (Ready + Inflight) | The top 20 groups with the most accumulated messages. Unit: messages. |
Top 20 GroupIDs by Number of Inflight Messages | The top 20 groups with the most inflight messages. Unit: messages. |
Top 20 GroupIDs by Consumption Processing Latency | The top 20 groups with the longest consumption processing latency. Unit: ms. |
Top 20 GroupIDs by Consumer Local Wait Time | The top 20 groups with the longest consumer local wait time. Unit: ms. |
Top 20 Topics by Message Production Call Failure Rate | The top 20 topics with the highest failure rate for message production. |
Top 20 GroupIDs by Message Consumption Failure Rate | The top 20 groups with the highest failure rate for message consumption. |
Billing metrics
The values of the following billing metrics include multipliers for large messages and advanced features.
Large message multiplier: The unit of measurement is 4 KB. For example, if you send a 16 KB message, the number of API calls is calculated as 16 KB / 4 KB = 4.
Advanced feature multiplier: The number of API calls for messages with advanced features, such as ordered, scheduled, delayed, and transactional messages, is five times the number of API calls for normal messages.
Metric | Description |
Peak Production TPS | The maximum message production TPS. This metric can be used as a reference for estimating the peak TPS specification in the instance's billing items. Unit: calls/second. |
Peak Consumption TPS | The maximum message consumption TPS. This metric can be used as a reference for estimating the peak TPS specification in the instance's billing items. Unit: calls/second. |
Peak TPS | The maximum value of the sum of message production TPS and message consumption TPS. This metric can be used as a reference for estimating the peak TPS specification in the instance's billing items. Unit: calls/second. |
Total API Calls | The total number of API calls. This metric can be used as a reference for estimating the number of API calls in the instance's billing items. Unit: calls. |
Average Message Size | The average size of all produced messages. Unit: bytes. |
Production And Consumption TPS | The sum of message production TPS and message consumption TPS. Unit: calls/second. |
Daily API Calls | The daily total number of API calls for message production and consumption. Unit: calls. |
Metrics Details
When calculating metrics related to message TPS, the number of messages sent and received, or the total number of messages, the base unit is a 4 KB normal message. Multipliers for message size and advanced message types are applied to this base unit.
The following table describes the fields in the metrics.
Field | Value |
Metric type | Gauge: A metric that can increase or decrease. Its value represents an instantaneous measurement of the statistical object. For example, the TPS of API calls. |
Label |
|
Server-side metrics
Metric type | Metric name | Unit | Description | Label |
Gauge | rocketmq_instance_requests_threshold | count/s | Instance throttling threshold. |
|
Gauge | rocketmq_instance_requests_max | count/s | The maximum TPS of an instance per minute. Requests that are throttled are not included. Rule: The maximum value among the 60 TPS samples taken within 1 minute. |
|
Producer metrics
Metric type | Metric name | Unit | Description | Label |
Gauge | rocketmq_producer_requests (commercialCount, billable requests) | count | Number of API calls related to sending messages. |
|
Gauge | rocketmq_producer_messages | message | Number of sent messages. |
|
Gauge | rocketmq_producer_message_size_bytes | byte | Total size of sent messages. |
|
Gauge | rocketmq_producer_send_success_rate | % | Send success rate. |
|
Gauge | rocketmq_producer_failure_api_calls | count | Number of failed API calls for sending messages. |
|
Gauge | rocketmq_producer_send_rt_milliseconds_avg | ms | Average latency of sending messages. |
|
Gauge | rocketmq_producer_send_rt_milliseconds_min | ms | Minimum latency of sending messages. |
|
Gauge | rocketmq_producer_send_rt_milliseconds_max | ms | Maximum latency of sending messages. |
|
Gauge | rocketmq_producer_send_rt_milliseconds_p95 | ms | P95 latency of sending messages. |
|
Gauge | rocketmq_producer_send_rt_milliseconds_p99 | ms | P99 latency of sending messages. |
|
Consumer metrics
Metric type | Metric name | Unit | Description | Label |
Gauge | rocketmq_consumer_requests | count | Number of API calls related to consuming messages. |
|
Gauge | rocketmq_consumer_send_back_requests | count | Number of API calls to send back messages that failed to be consumed. |
|
Gauge | rocketmq_consumer_send_back_messages | message | Messages that failed to be consumed and were sent back by consumers. |
|
Gauge | rocketmq_consumer_messages | message | Number of consumed messages. |
|
Gauge | rocketmq_consumer_message_size_bytes | byte | Size of consumed messages (accumulated over one minute). |
|
Gauge | rocketmq_consumer_ready_and_inflight_messages | message | Message consumption lag (includes ready and inflight messages). |
|
Gauge | rocketmq_consumer_ready_messages | message | Number of ready messages. Actual accumulation: maxOffset - lastPullOffset |
|
Gauge | rocketmq_consumer_inflight_messages | message | Number of inflight messages. Rule: lastPullOffset - committedOffset |
|
Gauge | rocketmq_consumer_queue_time_milliseconds | ms | Message queue time. |
|
Gauge | rocketmq_consumer_message_await_time_milliseconds_avg | ms | Average time that a message waits for processing resources on the consumer client. |
|
Gauge | rocketmq_consumer_message_await_time_milliseconds_min | ms | Minimum time that a message waits for processing resources on the consumer client. |
|
Gauge | rocketmq_consumer_message_await_time_milliseconds_max | ms | Maximum time that a message waits for processing resources on the consumer client. |
|
Gauge | rocketmq_consumer_message_await_time_milliseconds_p95 | ms | P95 time that a message waits for processing resources on the consumer client. |
|
Gauge | rocketmq_consumer_message_await_time_milliseconds_p99 | ms | P99 time that a message waits for processing resources on the consumer client. |
|
Gauge | rocketmq_consumer_message_process_time_milliseconds_avg | ms | Average message processing latency for a consumer. |
|
Gauge | rocketmq_consumer_message_process_time_milliseconds_min | ms | Minimum message processing latency for a consumer. |
|
Gauge | rocketmq_consumer_message_process_time_milliseconds_max | ms | Maximum message processing latency for a consumer. |
|
Gauge | rocketmq_consumer_message_process_time_milliseconds_p95 | ms | P95 message processing latency for a consumer. |
|
Gauge | rocketmq_consumer_message_process_time_milliseconds_p99 | ms | P99 message processing latency for a consumer. |
|
Gauge | rocketmq_consumer_consume_success_rate | % | Message consumption success rate. |
|
Gauge | rocketmq_consumer_failure_api_calls | count | Number of failed API calls for consumption. |
|
Gauge | rocketmq_consumer_to_dlq_messages | message | Number of messages sent to the dead-letter queue (DLQ). |
|
View the dashboard
Log on to the ApsaraMQ for RocketMQ console. In the left-side navigation pane, click Instances.
In the top navigation bar, select a region, such as China (Hangzhou). On the Instances page, click the name of the instance that you want to manage.
Use one of the following methods to view the dashboard:
On the Instance Details page, click the Dashboard tab.
In the left-side navigation pane of the Instance Details page, click Dashboard.
In the left-side navigation pane of the Instance Details page, click Topics. On the page that appears, click the name of the topic that you want to manage. On the Topic Details page, click the Dashboard tab.
In the left-side navigation pane of the Instance Details page, click Groups. On the page that appears, click the name of the group that you want to manage. On the Group Details page, click the Dashboard tab.
Dashboard FAQ
How do I obtain dashboard metric data?
Log on to the ARMS console with your Alibaba Cloud account.
In the navigation pane on the left, click Integration Center.
On the Integration Center page, enter
RocketMQin the search box and click the search icon.In the search results, select the Alibaba Cloud service that you want to integrate, such as Alibaba Cloud RocketMQ (4.0) Service. For more information, see Step 1: Integrate monitoring data of an Alibaba Cloud service.
After the integration is successful, click Provisioning in the navigation pane on the left.
In the Cloud Service Area Environment list, click the name of the target environment to go to its details page.
On the Component Management tab, in the Basic Information section, click the region of the Prometheus Instance.
On the Settings tab, you can find different data access methods.
How do I integrate metric data provided by the dashboard of ApsaraMQ for RabbitMQ into a self-managed Grafana system?
All metric data on the dashboard of ApsaraMQ for RocketMQ are stored in Alibaba Cloud Managed Service for Prometheus. You can follow the procedure in the "How do I obtain metrics on the dashboard?" section to integrate the monitoring data of ApsaraMQ for RocketMQ into Managed Service for Prometheus, obtain the environment name and HTTP API URL, and then use the HTTP API URL to integrate the metric data on the dashboard of ApsaraMQ for RocketMQ into a self-managed Grafana system. For more information, see Use an HTTP API URL to connect a Prometheus instance to a self-managed Grafana system.
How do I understand the average TPS and max TPS of an instance?
Average TPS = Total requests in 1 minute / 60 seconds
Max TPS: Within a 1-minute statistical period, the TPS value is sampled once per second. The max TPS is the highest of these 60 sampled values.
For example:
Assume that an instance produces 60 messages in 1 minute. All messages are normal messages and each is 4 KB in size. The production rate of the instance is 60 messages per minute.
Average instance TPS = 60 calls / 60 seconds = 1 call per second
The max instance TPS is calculated as follows:
If the 60 messages are sent in the first second, the TPS values for each second of that minute are 60, 0, 0, ..., 0.
Max instance TPS = 60 calls per second.
If 40 messages are sent in the first second and 20 messages are sent in the second second, the TPS values for each second of that minute are 40, 20, 0, 0, ..., 0.
Max instance TPS = 40 calls per second.