- On the Group Details page, check the value of the Real-time Accumulated Messages field of the group ID. Then, I find that the value is higher than expected.
- In the left-side navigation pane, click Message Tracing. On the page that appears, click Create Query Task. In the Create Query Task dialog box, click the Query by Message ID tab and set the parameters. Then, I find that some messages are sent to the broker but are not delivered to consumers.
In Message Queue for Apache RocketMQ, messages are first sent to the broker. Then, the client that is configured with the group ID pulls some messages from the broker to the local machine for consumption based on the current consumption offset. In most cases, messages are not accumulated when the client pulls messages from the broker. However, if the consumption time is long or the concurrency is low, the consumption capability of the client is insufficient. Therefore, messages are accumulated. For more information about the consumption mechanism and message accumulation causes, see Message accumulation and latency.
If messages are accumulated, perform the following operations for troubleshooting:
- Determine whether messages are accumulated on the Message Queue for Apache RocketMQ broker or client.Check the local log file
ons.logof the client to search for the following information:
the cached message count exceeds the threshold
- If the preceding information is found, the local buffer queue on the client is full and messages are accumulated on the client. Go to Step 2.
- If the preceding information is not found, messages are not accumulated on the client. Then, you can submit a ticket to contact Alibaba Cloud Customer Services.
- Check whether the message consumption time is reasonable.
You can view the consumption time by using one of the following methods:
- If the consumption is time-consuming, go to Step 3 to view the client stack information and troubleshoot the specific business logic.
- If the consumption time is normal, messages may be accumulated due to low consumption concurrency. You must gradually increase the number of consumption threads or add nodes.
- Log on to the Message Queue for Apache RocketMQ console and perform message tracing. View the consumption time of a single message in the Consumer section. For more information, see Query a message trace.
- Log on to the Message Queue for Apache RocketMQ console and view the consumption status. In the Connection Information section, view the business processing time to obtain the average consumption time. For more information, see View the status of consumers.
- Use other Alibaba Cloud monitoring services such as Application Real-Time Monitoring Service (ARMS) to collect the message consumption time.
- View the client stack information. You only need to take note of the thread named
ConsumeMessageThread. This is the logic for the business to consume messages. For
more information about how to determine the thread status and modify the business
logic based on specific problems, see Java official documentation.You can obtain the client stack information by using one of the following methods:
The common exception stack information is similar to the following examples:
- Log on to the Message Queue for Apache RocketMQ console and view the consumer status. In the Connection Information section, view the stack information. For more information, see View the status of consumers.
- Use the Jstack tool to print stack information.
- Obtain the host IP address of the consumer instance that has accumulated messages, and log on to the host. For more information, see View the consumer status.
- Run one of the following commands to view the PID of the Java process and note it:
ps -ef |grep javajps -lm
- Run the following command to view the stack information:
jstack -l pid > /tmp/pid.jstack
- Run the following command to view information about the thread named
cat /tmp/pid.jstack|grep ConsumeMessageThread -A 10 --color
- Example 1: The stack is idle and has no accumulated messages.
When the consumption thread is idle, it is in the WAITING state and waits to obtain messages from the consumption task queue.
- Example 2: The consumption logic is in situations such as lock stealing or sleep.
The consumption thread is blocked on an internal sleep() method, resulting in slow consumption.
- Example 3: The consumption logic is stuck when operations are performed on external
storage devices such as databases.
The consumption thread is blocked on external HTTP calls, causing slow consumption.
- If the accumulated messages have affected business operating and the accumulated messages can be discarded, you can reset the consumer offset to skip the accumulated messages and recover the consumption. For more information, see Reset consumer offsets. The consumer client must be online when you reset the consumer offset.