All Products
Search
Document Center

ApsaraMQ for RocketMQ:Configure the monitoring and alerting feature for risk warning

Last Updated:Jan 09, 2025

ApsaraMQ for RocketMQ allows you to configure alert rules by using CloudMonitor. This helps you monitor the status and key metrics of your instance in real time and enables you to receive exception notifications at the earliest opportunity to implement risk warnings in production environments.

Background information

ApsaraMQ for RocketMQ provides fully managed messaging services and a Service Level Agreement (SLA) for each instance edition. After you purchase an ApsaraMQ for RocketMQ instance, the capabilities of the instance, such as the messaging transactions per second (TPS) and message storage capabilities, are guaranteed.

You do not need to worry about the instance performance. However, you must monitor your instance usage in production environments to make sure that you do not exceed the thresholds that are specified for your instance. ApsaraMQ for RocketMQ integrates with CloudMonitor to provide monitoring and alerting services free of charge and for immediate use. You can use the services to monitor the following items:

  • Instance usage

    If your actual instance usage exceeds the specification limit, ApsaraMQ for RocketMQ forcibly throttles the instance. To prevent faults that are caused by instance throttling, you can configure the instance usage alert in advance and upgrade your instance configurations when an excess usage risk is detected.

  • Business logic errors

    Errors may occur when you send and receive messages. You can configure the invocation error alert to detect and fix errors and prevent negative impacts on your business.

  • Performance metrics

    If performance metrics such as response time (RT) and message delay are required for your message system, you can configure the corresponding metric alerts in advance to prevent business risks.

Rules for configuring alerts

ApsaraMQ for RocketMQ provides various metrics and monitoring and alerting items. For more information, see Dashboard and Monitoring and alerting. Monitoring items can be divided into the following categories: resource usage, messaging performance, and messaging errors.

Based on accumulated best practices in production environments, we recommend that you follow the rules that are described in the following table to configure alerts.

Note

The following monitoring items are basic configurations that are recommended by Alibaba Cloud. ApsaraMQ for RocketMQ also provides other monitoring items. You can configure fine-grained alerts based on your business requirements. For more information, see Monitoring and alerting.

Category

Monitoring item

Configuration timing and reason

Related personnel

Resource usage

API calls on an instance

  • We recommend that you configure this item immediately after an instance is created.

  • The resource usage of an instance is not determined by one topic or group. You must consider the overall resource usage of the instance.

Resource operators

Messaging performance

  • Message sending TPS in a topic

  • Message receiving TPS in a consumer group

  • Message accumulation in a consumer group

  • Consumption delay time in a consumer group

  • We recommend that you configure these items immediately after your business is launched.

  • After your business is launched, you must estimate the messaging performance of your business.

  • Resource operators

  • Business developers

Messaging errors

  • Generation of dead-letter messages

  • Number of times that throttling occurs

  • We recommend that you configure these items immediately after your business is launched.

  • After your business is launched, you must predict failures that may occur during message production. This helps you troubleshoot issues.

  • Resource operators

  • Business developers

Procedure for configuring alerts

  1. Log on to the ApsaraMQ for RocketMQ console. In the left-side navigation pane, click Instances.

  2. In the top navigation bar, select a region, such as China (Hangzhou). On the Instances page, click the name of the instance that you want to manage.

  3. In the left-side navigation pane, click Monitoring and Alerts. In the upper-left corner of the page that appears, click Create Alert Rule.

Best practices

Configure alerts about the number of API calls on an instance

  • Background: In ApsaraMQ for RocketMQ, the number of API calls that you can initiate to send and receive messages on an instance is measured by messaging TPS. A peak messaging TPS is specified for each instance. If the messaging TPS on your instance exceeds the specification limit, the instance is throttled. For example, a peak TPS of 5,000 is specified for a Standard Edition instance. If the limit is exceeded, the instance is throttled.

  • Risk caused by not configuring the alerts: If you do not configure the alerts, you cannot receive alerts before the number of API calls exceeds the specification limit. As a result, your instance is throttled and specific messages fail to be sent or received.

  • Configuration timing: We recommend that you configure the alerts after the instance is created.

image

  • Recommended threshold: We recommend that you set the alert threshold to 70% of the peak messaging TPS of the instance. For example, if the peak messaging TPS of the instance that you purchased is 10,000, set the alert threshold to 7,000. You can view the peak messaging TPS of an ApsaraMQ for RocketMQ instance on the Instance Details page in the ApsaraMQ for RocketMQ console.

  • Alert handling: After you receive an alert about the number of API calls, we recommend that you perform the following steps to handle the alert:

    1. On the Instance Details page, click the Dashboard tab.

    2. In the Overview of instance message volume section, view the TPS Max value curve in Metrics related to instance request times (production and consumption).

    3. In the Message Business Metrics Overview section, view the curves in Message production rate top20 Topics (bar/minute) and message consumption rate top20 GroupIDs (per minute). Then, find the topic and group whose data is abnormal and analyze whether the business changes are normal.

    4. If the business changes are abnormal, contact your users for further analysis.

    5. If the business changes are normal, the computing specification of your instance is insufficient to maintain normal business operations. In this case, we recommend that you upgrade your instance configurations. For more information, see Upgrade or downgrade instance configurations.

Configure alerts about the number of messages sent by producers or received by consumers per minute

  • Background: ApsaraMQ for RocketMQ provides metrics to monitor messaging TPS by topic and consumer group. You can use the metrics to monitor messaging TPS in a specific business item and understand your business scale.

  • Risks caused by not configuring the alerts: Messaging TPS in a topic specifies the number of API calls that you can initiate to send and receive messages in the topic. If you do not configure the alerts, you cannot receive alerts before traffic drops to zero or traffic spikes occur. This may cause unexpected risks.

  • Configuration timing: We recommend that you configure the alerts after your business stabilizes.

Configure alerts about the number of messages sent by producers per minute

image

  • Recommended threshold: We recommend that you configure the threshold based on the traffic volume after your business stabilizes.

  • Alert handling: After you receive an alert about the number of messages sent by producers per minute, we recommend that you perform the following steps to handle the alert:

    1. On the Topics page, click the name of the topic configured in the alert rule.

    2. On the Topic Details page, click the Dashboard tab.

    3. View the Production curve in Message volume (pieces/minute). Then, determine whether the changes are normal based on the business model.

Configure alerts about the number of messages received by consumers per minute

image

  • Recommended threshold: We recommend that you configure the threshold based on the traffic volume after your business stabilizes.

  • Alert handling: After you receive an alert about the number of messages received by consumers per minute, we recommend that you perform the following steps to handle the alert:

    1. On the Groups page, click the ID of the group configured in the alert rule.

    2. On the Group Details page, click the Dashboard tab.

    3. View the Consumption curve in Trend of message production and consumption rate (bar/minute). Then, determine whether the changes are normal based on the business model.

Configure message accumulation alerts

Note

Fluctuation and errors may exist in the statistics about message accumulation. We recommend that you do not set the threshold for accumulated messages to less than 100. If your business is affected even if the number of accumulated messages is small, we recommend that you configure consumption delay time alerts to monitor message accumulation.

  • Background: ApsaraMQ for RocketMQ allows you to monitor message accumulation by consumer group. You can use message accumulation alerts to prevent faults that are caused by message accumulation.

  • Risks caused by not configuring the alerts: Message accumulation is a typical scenario and capability of ApsaraMQ for RocketMQ. In scenarios in which messages must be processed in real time, you must monitor and manage the number of accumulated messages to prevent negative impacts caused by message accumulation on your business.

  • Configuration timing: We recommend that you configure the alerts after your business stabilizes.

image

  • Recommended threshold: We recommend that you configure the threshold based on the actual performance of your business.

  • Alert handling: After you receive a message accumulation alert, we recommend that you perform the following steps to handle the alert:

    1. On the Groups page, click the ID of the group configured in the alert rule.

    2. On the Group Details page, click the Dashboard tab.

    3. View the Accumulation curve in Stacking related indicators (bars). Then, analyze the change trend of accumulated messages and find the start time of message accumulation.

    4. Analyze the cause of message accumulation based on business changes and application logs. For more information, see How can I handle accumulated messages?

    5. Determine whether to scale out consumer applications or fix the consumption logic defect based on the cause of message accumulation.

Configure consumption delay time alerts

Note

Consumption delay time is calculated based on the delay time of the first unconsumed message in a consumer group. Consumption delay time is cumulative and sensitive to business changes. After you receive a consumption delay time alert, you must determine whether a small number of messages or all messages are delayed.

  • Background: ApsaraMQ for RocketMQ allows you to monitor consumption delay by consumer group. The consumption delay time alert provides a detailed metric for analyzing message accumulation.

  • Risks caused by not configuring the alerts: Message accumulation is a typical scenario and capability of ApsaraMQ for RocketMQ. In scenarios in which messages must be processed in real time, you must monitor and manage the number of accumulated messages to prevent negative impacts caused by message accumulation on your business.

  • Configuration timing: We recommend that you configure the alerts after your business stabilizes.

image

  • Recommended threshold: We recommend that you configure the threshold based on the actual performance of your business.

  • Alert handling: After you receive a message accumulation alert, we recommend that you perform the following steps to handle the alert:

    1. On the Groups page, click the ID of the group configured in the alert rule.

    2. On the Group Details page, click the Dashboard tab.

    3. View the Accumulation curve in Stacking related indicators (bars). Then, analyze the change trend of accumulated messages and find the start time of message accumulation.

    4. Analyze the cause of message accumulation based on business changes and application logs. For more information, see How can I handle accumulated messages?

    5. Determine whether to scale out consumer applications or fix the consumption logic defect based on the cause of message accumulation.

Configure dead-letter message alerts

  • Background: ApsaraMQ for RocketMQ supports dead-letter messages. Messages that fail to be consumed after the specified maximum number of retries is reached are sent to dead-letter queues. You can manage dead-letter messages. You can monitor the number of dead-letter messages to help you detect unexpected issues and exceptions in your business.

  • Risks caused by not configuring the alerts: Dead-letter messages are messages that cannot be correctly processed by consumers. Consumption applications must handle dead-letter messages. If you do not configure dead-letter message alerts, message consumption may be incomplete.

  • Configuration timing: We recommend that you configure the alerts after your business stabilizes.

image

  • Recommended threshold: We recommend that you configure the threshold based on the traffic volume after your business stabilizes.

  • Alert handling: After you receive an alert about dead-letter messages, we recommend that you perform the following steps to handle the alert:

    1. Query dead-letter messages and analyze the original messages. For more information, see Dead-letter queues.

    2. Query the consumption traces of the messages and analyze the cause of consumption failure based on the topics and message IDs. For more information, see Query a message trace.

    3. Determine the appropriate solution based on the cause of message consumption failure.

Configure alerts about the number of times that throttling occurs

  • Background: ApsaraMQ for RocketMQ allows you to use events that trigger throttling on a specific instance as alert metrics. This helps you understand negative impacts on your business.

  • Risks caused by not configuring the alerts: A large number of times that throttling occurs indicates that your traffic usage frequently exceeds the specification limit. In this case, we recommend that you upgrade your instance configurations.

  • Configuration timing: We recommend that you configure the alerts after your business stabilizes.

    • We recommend that you configure alerts about the number of times that throttling occurs on an instance after the instance is created.

    • We recommend that you configure alerts about the number of times that throttling occurs in a topic or consumer group after your business stabilizes.

image

  • Recommended threshold: We recommend that you configure the threshold based on the actual performance of your business.

  • Alert handling: After you receive an alert about the number of times that throttling occurs, we recommend that you perform the following steps to handle the alert:

    1. On the Instance Details page, click the Dashboard tab.

    2. In the Overview of instance message volume section, view the curve in Number of throttling requests. Then, analyze the time when throttling occurs and the rules for throttling.

    3. In the Message Business Metrics Overview section, view the curve in Message production rate top20 Topics (bar/minute). Then, find the topic whose data is abnormal based on the time when throttling occurs and the rules for throttling and view the curve of the topic to determine whether the traffic increase meets your business requirements.

    4. If the traffic increase meets your business requirements, upgrade your instance configurations. Otherwise, troubleshoot the issue.