All Products
Search
Document Center

ApsaraMQ for RocketMQ:Dashboard

Last Updated:Oct 31, 2023

ApsaraMQ for RocketMQ can integrate with Managed Service for Prometheus and Managed Service for Grafana that are provided by Application Real-Time Monitoring Service (ARMS) to provide the dashboard feature. Managed Service for Prometheus is used to monitor metrics, and Managed Service for Grafana is used to store and display metrics. The dashboard feature allows you to monitor metrics and collect metric data in an all-in-one, comprehensive, and multi-dimensional manner. This helps you obtain information about your business status. This topic describes the billing and common scenarios of the dashboard feature. This topic also describes the available dashboard metrics and how to use the dashboard feature.

Common scenarios

  • Scenario 1: You need to receive alerts and locate issues in a timely manner when exceptions occur during online message consumption.

  • Scenario 2: You need to check whether messages are sent as expected in the messaging system when the status of some online orders is abnormal.

  • Scenario 3: You need to analyze the change trend of message traffic, the characteristics of traffic distribution, or message volume to help you analyze the business trend and make business plans.

  • Scenario 4: You need to view and analyze the upstream and downstream dependency topologies of applications to upgrade, optimize, or transform the architecture.

Prerequisites

  • Managed Service for Prometheus is activated. For more information, see Activate ARMS.

  • The service-linked role is created.

    • Role name: AliyunServiceRoleForOns.

    • Role policy name: AliyunServiceRolePolicyForOns.

    • Permission description: Allow ApsaraMQ for RocketMQ to assume the role to access CloudMonitor and ARMS to implement the monitoring, alerting, and dashboard features.

    • For more information, see Service-linked roles

Billing

Dashboard metrics that are used in ApsaraMQ for RocketMQ are basic metrics in Managed Service for Prometheus. You are not charged for basic metrics in Managed Service for Prometheus. Therefore, you can use the dashboard feature of ApsaraMQ for RocketMQ free of charge.

For more information, see Metrics and Pay-as-you-go.

Terms

Before you use the dashboard feature, you must understand the terms that are related to message accumulation. This section describes the terms.

The following figure shows the status of each message in a queue of a specified topic.

队列消息状态

In the preceding figure, ApsaraMQ for RocketMQ calculates the number of messages and the processing duration at different processing stages. The metrics that are used in this process reflect the processing rate and message accumulation in the queue. By monitoring the metrics, you can determine whether exceptions occur during consumption. The following table describes the details of the metrics and the formulas that are used to calculate the metrics.

Category

Metric

Definition

Calculation formula

Message quantity

Inflight messages

The messages that a consumer client is processing and for which the client has not returned the consumption results.

Number of inflight messages = Offset of the latest pulled message - Offset of the latest acknowledged message

Ready messages

The messages that are visible to consumers and are ready for consumption on the ApsaraMQ for RocketMQ broker.

Number of ready messages = Maximum offset - Offset of the latest pulled message

Consumer lag

The total number of messages that are being processed and ready to be processed.

Consumer lag = Number of inflight messages + Number of ready messages

Duration

Ready time

  • For a normal message or ordered message, the ready time is the time when the message is stored to the broker.

  • For a scheduled message, the ready time is the time that is scheduled for the broker to deliver the message. For a delayed message, the ready time is the time when the specified delay period elapses.

  • For a transactional message, the ready time is the time when a transaction is committed.

N/A

Ready message queue time

The interval between the current point in time and the ready time of the earliest ready message.

This metric indicates how soon a consumer pulls messages.

Ready message queue time = Current time - Ready time of the earliest ready message

Consumer lag time

The interval between the ready time of the earliest unacknowledged message and the current time.

This metric indicates how soon a consumer processes messages.

Consumer lag time = Current time - Ready time of the earliest unacknowledged message

Metric details

The following categories of metrics are displayed on the dashboard of ApsaraMQ for RocketMQ:

  • Producer: displays metrics that collect message production statistics for a specific topic or all topics, such as the number of sent messages, the success rate of message sending, and the sending duration.

  • Consumer: displays metrics that collect message consumption statistics for a specific group or all groups, such as the number of consumed messages from a specific topic, the success rate of message consumption, and message accumulation.

  • Instance top 20 info: displays the top 20 values of some metrics for a specific instance and the topic or group to which each value corresponds.

  • Billing metrics overview: displays metrics that collect billing statistics for a specific instance, such as the message sending TPS, the message consumption TPS, the number of API calls, and the average message size. These metrics can be used to estimate the billable items of the instance.

Important

ApsaraMQ for RocketMQ collects the data of each metric every minute. You can query message data that is generated in the previous 15 days. The maximum time range for a query is 24 hours.

Producer

Metric

Description

Send message rate

The rate at which messages are sent to a specific topic or all topics, and the rate at which API operations are called to send messages.

Units:

  • Message sending rate: messages per second.

  • API call rate: calls per second.

Max send message rate

The maximum rate at which messages are sent.

Unit: messages per second.

Total sent messages

The total number of messages that are sent on a specific instance.

Unit: messages.

Send API call success rate

The percentage of successful API calls that are initiated to send messages to a specific topic or all topics.

Send RT

The amount of time that is used to send a message to a topic.

Unit: milliseconds.

Consumer

Metric

Description

Avg consumption success rate

The percentage of messages that are successfully consumed on a specific instance.

Consumer lag

The total number of accumulated messages on a specific instance, including ready messages and inflight messages.

Unit: messages.

Inflight messages

The number of messages that are being processed on consumer clients and for which no success response is returned.

Unit: messages.

Ready messages

The number of messages that are ready for consumption on the ApsaraMQ for RocketMQ broker.

This metric reflects the number of messages that have not been processed by consumers.

Unit: messages.

Ready message queue time

The time difference between the current point in time and the point in time when the earliest message became ready.

This metric reflects the delay period of ready messages before they are processed. The metric is important for time-sensitive workloads.

The metric value in the overview information indicates the average queuing time of ready messages on a specific instance. The metric value in a specific chart indicates the queuing time of ready messages in a specific topic to which a specific group subscribes.

Unit: milliseconds.

Receive message rate

The rate at which a specific group or all groups consume messages.

Unit: messages per second.

Max receive message rate

The maximum rate at which a specific group or all groups consume messages.

Unit: messages per second.

Total received messages

The total number of consumed messages on a specific instance.

Unit: messages.

Consumer lag

The number of accumulated messages in a specific group or all groups, including ready messages and inflight messages.

Unit: messages.

Message processing time

The amount of time that is used to consume a message in a specific group or all groups.

Unit: milliseconds.

Wait to process time

The amount of time before a message starts to be consumed by a consumer client after the consumer client receives the message.

Unit: milliseconds.

Consumption success rate

The percentage of messages that are successfully consumed.

Consumption messages each protocol

The proportion of consumed messages of each client protocol.

Instance top 20 info

Metric

Description

Send message rate per Topic

The top 20 topics that have the highest message sending rate.

Unit: messages per second.

Receive message rate per GroupID

The top 20 groups that have the highest message consumption rate.

Unit: messages per second.

Ready messages per GroupID

The top 20 groups that have the largest number of ready messages.

Unit: messages.

Ready message queue time per GroupID

The top 20 groups whose ready messages have the longest queuing time.

Unit: milliseconds.

Consumer lag per GroupID

The top 20 groups that have the largest number of accumulated messages.

Unit: messages.

Inflight messages per GroupID

The top 20 groups that have the largest number of processing messages.

Unit: messages.

Message processing time per GroupID

The top 20 groups that have the longest message consumption duration.

Unit: milliseconds.

Message wait time per GroupID

The top 20 groups that have the longest wait duration before a message is consumed.

Unit: milliseconds.

Send API call failure rate per Topic

The top 20 topics that have the highest failure rate of API calls that are initiated to send messages.

Consumption failure rate per GroupID

The top 20 groups that have the highest failure rate of message consumption.

Billing metric overview

Note

The values of the billing metrics in the following table are calculated by using the multiple for large messages and the multiple for advance-featured messages.

  • Large messages: The number of API calls that are initiated to send and receive a large message is calculated based on a message size of 4 KB. For example, if you want to send a message whose size is 16 KB, four API calls are counted. The number of API calls is calculated by using the following formula: 16/4 = 4.

  • Advance-featured messages: The number of API calls that are initiated to send and receive an advance-featured message is counted as five times the number of API calls that are initiated to send and receive a normal message. Advance-featured messages include ordered messages, scheduled messages, delayed messages, and transactional messages.

Metric

Description

Max send TPS

The maximum TPS for sending messages. This metric can be used to estimate the peak TPS in the billable items of an instance.

Unit: TPS.

Max receive TPS

The maximum TPS for receiving messages. This metric can be used to estimate the peak TPS in the billable items of an instance.

Unit: TPS.

Max TPS

The maximum sum of the TPS for sending messages and the TPS for receiving messages. This metric can be used to estimate the peak TPS in the billable items of an instance.

Unit: TPS.

Total API calls

The total number of API calls. This metric can be used to estimate the number of API calls in the billable items of an instance.

Unit: calls.

Average message size

The average size of all messages that are sent.

Unit: bytes.

Send and receive TPS

The sum of the TPS for sending messages and the TPS for receiving messages.

Unit: TPS.

Total API calls per day

The sum of the number of API calls that are initiated to send messages and the number of API calls that are initiated to receive messages on a daily basis.

Unit: calls.

View the dashboard

  1. Log on to the ApsaraMQ for RocketMQ console. In the left-side navigation pane, click Instances.

  2. In the top navigation bar, select a region. Example: China (Hangzhou). Then, click the ID of the instance that you want to manage.

  3. Use one of the following methods to view the dashboard:

    • On the Instance Details page, click the Dashboard tab.

    • In the left-side navigation pane of the Instance Details page, click Dashboard.

    • In the left-side navigation pane of the Instance Details page, click Topics. On the page that appears, click the name of the topic that you want to manage. On the Topic Details page, click the Dashboard tab.

    • In the left-side navigation pane of the Instance Details page, click Groups. On the page that appears, click the name of the group that you want to manage. On the Group Details page, click the Dashboard tab.

FAQ about the dashboard

How do I import metric data on the dashboard of ApsaraMQ for RocketMQ to a self-managed Grafana system?

All metric data on the dashboard of ApsaraMQ for RocketMQ are stored in Alibaba Cloud Managed Service for Prometheus. You can use the APIs provided by Managed Service for Prometheus to import metric data on the dashboard of ApsaraMQ for RocketMQ to a self-managed Grafana system.

For more information, see Import data from Managed Service for Prometheus to a local Grafana system.

What are the average TPS and maximum TPS of an instance?

  • Average TPS = Total number of API calls in a minute/60

  • Maximum TPS: The system collects one TPS value every second based on a 1-minute cycle. The maximum value among the 60 values is known as the maximum TPS.

Examples:

An ApsaraMQ for RocketMQ instance produces 60 normal messages in a specific minute. If each of the message is 4 KB in size, the message production rate of the instance is 60 messages per minute. The following section describes how to calculate the average TPS and maximum TPS of the instance.

Average TPS: 60/60 = 1

Maximum TPS:

  • If all 60 messages are sent in the first second, the TPS value for the first second is 60, and the TPS values for the other 59 seconds are all 0.

    In this case, the maximum TPS of the instance is 60.

  • If 40 messages are sent in the first second and 20 messages are sent in the second second, the TPS value for the first second is 40, the TPS value for the second second is 20, and the TPS values for the other 58 seconds are all 0.

    In this case, the maximum TPS of the instance is 40.