Message Queue for Apache RocketMQ uses Prometheus Service and Grafana Service that are provided by Application Real-Time Monitoring Service (ARMS) to offer the dashboard feature. Prometheus Service is used for monitoring and Grafana Service is used to store and display metrics. The dashboard feature allows you to monitor metrics and collect metric data in an all-in-one, comprehensive, and multi-dimensional manner. This helps you obtain information about the status of your business. This topic describes the billing details and scenarios of the dashboard feature. This topic also describes available dashboard metrics and how to use the dashboard feature.

Prerequisites

  • ARMS Prometheus Service is activated. For more information, see Activate and upgrade ARMS.
  • The service-linked role is created.
    • Role name: AliyunServiceRoleForOns.
    • Role policy name: AliyunServiceRolePolicyForOns.
    • Permission description: allows Message Queue for Apache RocketMQ to assume this role to access CloudMonitor and ARMS to implement the monitoring, alerting, and dashboard features.
    • Reference: Service linked role for.

Billing

Dashboard metrics that are used in Message Queue for Apache RocketMQ are basic metrics that are used in Alibaba Cloud ARMS Prometheus Service. You are not charged for basic metrics. Therefore, you are not charged for using the dashboard feature.

For more information, see Basic metrics and Billing description of Alibaba Cloud Prometheus Monitoring.

Scenarios

The dashboard feature collects and displays statistics about metrics to help you obtain information about message accumulation. These metrics include the number of ready messages, the ready message queuing duration, and message consumption duration. For example, when a large number of messages are accumulated, you can monitor metrics such as the ready message queuing duration and message consumption duration. These metrics can help you determine whether the accumulation is caused by issues in the consumption logic or the consumer client capacity.
In O&M scenarios, you can view the overview information that is generated during message delivery and subscription for a specified instance. You can also view the data of produced messages in all topics and consumed messages in all groups of the instance. You can evaluate the status of the instance and the impacts on your business based on message trends or the errors in query results. You can also monitor overall data by using multiple dimensions and identify errors in metrics. This helps improve inspection efficiency.
You can view relevant metrics to evaluate the capacity or consumption of an instance or release unnecessary resources based on the usage to reduce costs. For example, you can sort all topics in a specified instance by the number of sent and received messages. If the number of messages that are sent by a topic is 0 within a long time range, you can further determine whether the service that uses the topic is disconnected and whether the corresponding resources can be released.
You can plan your business architecture based on the changes in relevant metrics. For example, you can observe the distribution of peak and trough values of metrics such as the number of received and sent messages and the message transactions per second (TPS) on a daily, weekly, or monthly basis. This can help you analyze the change trend of workloads and provide a powerful reference for your subsequent business optimization.

Concepts

This section introduces concepts related to metrics that are used in message accumulation. These concepts can help you understand the dashboard metrics that are used in Message Queue for Apache RocketMQ. ConceptsThe preceding figure shows the status of each message in a queue of a specific topic.
  • Inflight message: A message that is being processed on the consumer client and for which no success response is returned.
  • Ready message: A message that is ready on the Message Queue for Apache RocketMQ broker and can be consumed by consumers.

    The Ready messages metric reflects the number of messages that have not been processed by consumers.

    • Ready time of a ready message
      • For a normal message, the ready time equals the point in time when the normal message is stored.
      • For a scheduled message, the ready time equals the point in time that is scheduled for the broker to deliver the message. For a delayed message, the ready time equals the point in time when the specified delay period elapses.
      • For a transactional message, the ready time equals the point in time when the transaction is committed.
    • Ready message queuing duration: the offset between the current point in time and the point in time when the earliest message is ready.

      The Ready message queue time metric reflects the delay period for ready messages before they are processed. This metric is important for time-sensitive workloads.

      For example, in the preceding figure, the ready time of the first ready message M1 is 12:00:00, and the ready time of the last ready message M2 is 12:00:30. If the current point in time is 12:00:50, the ready message queuing duration can be calculated based on the following formula: Current point in time (12:00:50) - Ready time of M1 (12:00:00) = 50 seconds.

Metrics

In Message Queue for Apache RocketMQ, the metrics that are used in the dashboard of an instance are displayed in the following sections:
  • Producer: displays metrics that collect message production statistics for a specific topic or all topics, such as the number of sent messages, the success rate of message sending, and the sending duration.
  • Consumer: displays metrics that collect message consumption statistics for a specific group or all groups, such as the number of consumed messages from a specific topic, the success rate of message consumption, and message accumulation.
  • Instance top 20 info: displays the top 20 values of some metrics for a specified instance and the topic or group to which each value corresponds.
  • Billing metrics overview: displays metrics that collect billing statistics for a specified instance, such as the message sending TPS, the message consumption TPS, the number of API calls, and the average message size. These metrics can be used to estimate the billable items of the instance.
Notice Message Queue for Apache RocketMQ collects data of each metric every minute. You can query message data that is generated within the last 15 days. The maximum time range for a query is 24 hours.

Producer

Metric Description
Send message rate

The rate at which messages are sent to a specified topic or all topics and the rate at which API operations are called to send messages.

Unit:
  • Message sending rate: messages per second.
  • API call rate: calls per second.
Max send message rate

The maximum rate at which messages are sent.

Unit: messages per second.

Total sent messages

The total number of messages produced in a specified instance.

Unit: messages.

Send API call success rate The percentage of successful API calls that are made to send messages to a specified topic or all topics.
Send RT

The amount of time that is used to send a message to a topic.

Unit: milliseconds.

Consumer

Metric Description
Avg consumption success rate The percentage of all messages that are successfully consumed in a specified instance.
Consumer lag

The total number of accumulated messages in a specified instance, including ready messages and inflight messages.

Unit: messages.

Inflight messages.

The number of messages that are being processed on the consumer clients and for which no success response is returned.

Unit: messages.

Ready messages.

The number of messages that are ready on the Message Queue for Apache RocketMQ broker and can be consumed by consumers.

This metric reflects the number of messages that have not been processed by consumers.

Unit: messages.

Ready message queue time

The offset between the current point in time and the point in time when the earliest message is ready.

This metric reflects the delay period for ready messages before they are processed. This metric is important for time-sensitive workloads.

The metric value in the overview information indicates the average queuing duration of ready messages in a specified instance. The metric value in a specific chart indicates the queuing duration of ready messages in a specified topic to which a specified group subscribes.

Unit: milliseconds.

Receive message rate

The rate at which a specified group or all groups consume messages.

Unit: messages per second.

Max receive message rate

The maximum rate at which a specified group or all groups consume messages.

Unit: messages per second.

Total received messages

The total number of consumed messages in a specified instance.

Unit: messages.

Consumer lag

The number of accumulated messages for a specified group or all groups, including ready messages and inflight messages.

Unit: messages.

Message processing time

The amount of time that is used to consume a message in a specified group or all groups.

Unit: milliseconds.

Wait to process time

The amount of time before a message starts to be consumed by a group after the consumer client receives the message.

Unit: milliseconds.

Consumption success rate The percentage of messages that are successfully consumed.
Consumption messages each protocol The proportion of consumed messages of each client protocol.

Instance top 20 info

Metric Description
Send message rate per Topic

The top 20 topics with the highest message sending rate.

Unit: messages per second.

Receive message rate per GroupID

The top 20 groups with the highest message consumption rate.

Unit: messages per second.

Ready messages per GroupID

The top 20 groups with the largest number of ready messages.

Unit: messages.

Ready message queue time per GroupID

The top 20 groups with the longest queuing duration of ready messages.

Unit: milliseconds.

Consumer lag per GroupID

The top 20 groups with the largest number of accumulated messages.

Unit: messages.

Inflight messages per GroupID The top 20 groups with the largest number of messages that are being processed.

Unit: messages.

Message processing time per GroupID

The top 20 groups with the longest message consumption duration.

Unit: milliseconds.

Message wait time per GroupID

The top 20 groups with the longest wait duration before a message is consumed.

Unit: milliseconds.

Send API call failure rate per Topic The top 20 topics with the highest failure rate for API calls that are made to send messages.
Consumption failure rate per GroupID The top 20 groups with the highest failure rate for message consumption.

Billing metrics overview

Note The values of the following billing metrics are the results that are calculated by using the large message multiple and the featured message multiple.
  • Large message multiple: The message body in each API request has a size limit of 4 KB. If you need to send a message that is larger than 4 KB, you must use multiple API requests to send the message. For example, if you need to send a message of 16 KB, the number of API calls is calculated by using the following formula: Message size (16 KB)/4 KB = 4 calls.
  • Featured message multiple: The number of API calls to send and subscribe to featured messages is counted as five times as those to send and subscribe to normal messages. Featured messages include ordered messages, scheduled messages, delayed messages, transactional messages.
Metric Description
Max send TPS

The maximum TPS for sending messages to a specified topic or all topics. This metric can be used to estimate the maximum TPS specification in the billable items for an instance.

Unit: TPS.

Max receive TPS

The maximum TPS for message consumption of a specified group or all groups. This metric can be used to estimate the maximum TPS specification in the billable items for an instance.

Unit: TPS.

Max TPS

The maximum sum of the message sending TPS and the message consumption TPS. This metric can be used to estimate the maximum TPS specification in the billable items for an instance.

Unit: TPS.

Total API calls

The total number of API calls. This metric can be used to estimate the number of API calls in the billable items for an instance.

Unit: calls.

Average message size

The average size of all messages that are produced.

Unit: bytes.

Send and receive TPS The sum of the message sending TPS and the message consumption TPS.

Unit: TPS.

Total API calls per day

The sum of the number of API calls that are made to send messages and the number of API calls that are made to subscribe to messages on a daily basis.

Unit: calls.

View the dashboard of an instance

  1. Log on to the Message Queue for Apache RocketMQ console.
  2. In the left-side navigation pane, click Instances.
  3. In the top navigation bar, select a region such as China (Hangzhou).
  4. On the Instances page, click the name of the instance that you want to manage.
  5. Use one of the following methods to view the dashboard:
    • View the dashboard on the Instance Details page: Navigate to the Instance Details page and click the Statistics tab.
    • View the dashboard on the Dashboard page: In the left-side navigation pane, click Dashboard.
  6. On the top of the dashboard, click the Topic and GroupID drop-down lists. In each drop-down list, select the topic or group that you want to view. If you do not select a topic or group, the metrics of all topics or groups in the instance are queried by default.
    Notice When you view the dashboard on the Instance Details page, you cannot select a topic or group. By default, the metrics of all topics and groups in the specified instance are queried.
    Select a resource
  7. In the upper-right corner of the dashboard, click the time range drop-down list and select a predefined relative time range from the drop-down list. You can also specify the start time and end time to customize an absolute time range.
    Time range
    After the configuration is complete, the Dashboard page displays the metric data within the specified time range. For more information about the metrics, see Metrics.