The working mechanism, version compatibility, and usage notes of the consumption retry feature of ApsaraMQ for RocketMQ - ApsaraMQ for RocketMQ

If a message fails to be consumed, ApsaraMQ for RocketMQ redelivers the message based on the consumption retry policy. This topic describes the common scenarios, working mechanism, version compatibility, and usage notes of the consumption retry feature.

Common scenarios

The consumption retry feature of ApsaraMQ for RocketMQ ensures consumption integrity when consumption logic fails. This feature is a protective measure against consumption logic failures and cannot be used to control the business process.

Recommended scenarios
- The business fails due to message content. For example, the consumption is expected to be executed after a period of time because the transaction status that corresponds to the messages is not returned.
- The consumption failure of one message does not necessarily cause the consumption failure of other messages. If one message fails to be consumed, the probability of consumption success for subsequent messages is high. In this case, you can perform retries on the failed message to prevent processes from being blocked.
Unrecommended scenarios
- Consumption failures are used as conditions to determine the consumption logic that needs to be executed. This can cause issues because the consumption logic assumes that a large number of messages may fail to be consumed.
- Consumption failures are used to limit the rate of message processing. Throttling is used to temporarily stack excessive messages in the queue for later processing instead of delivering the messages for retries.

Purpose

When message-oriented middleware is used for asynchronous decoupling, a challenge that needs to be overcome is how to ensure the integrity of the invocation chain if the downstream service fails to process messages. As a financial-grade message-oriented middleware, ApsaraMQ for RocketMQ supports reliable transmission and uses a well-designed message acknowledgment mechanism and retry policies to ensure that every message is processed as expected.

Understanding the message acknowledgment mechanism and retry policies of ApsaraMQ for RocketMQ helps resolve the following issues:

How to ensure the integrity of message consumption. You can use the message acknowledgment mechanism and retry policies to ensure that every message is fully processed. This prevents messages from being ignored when exceptions occur and ensures status consistency.
How to restore the status of inflight messages when exceptions occur. You can use the message acknowledgment mechanism and retry policies to restore the status of inflight messages and check the status consistency when exceptions such as service failures occur.

Consumption retry policies

A consumption retry policy defines the maximum number of retries and the interval between two consecutive retries after a message fails to be consumed.

Trigger conditions

A message fails to be consumed. In this case, the consumer returns a failure status or the system throws an exception.
A timeout error occurs during the processing of a message, including a timeout error in the push consumer queue.

Behaviors

Retry status: controls the status and the logic of status changes during message retries.
Retry interval: the interval from the point in time when the consumption failure or timeout of a message occurs to the point in time when the next consumption of the message starts.
Maximum retries: the maximum number of times that a message can be retried for consumption.

Policy differences

Consumption retry policies use different retry mechanisms and configuration methods based on consumer types. The following table describes the differences between the policies.

Consumer type

Retry status

Retry interval

Maximum retries

Push consumer

Ready
Inflight
WaitingRetry
Commit
DLQ
Discard

Specified in the metadata when a consumer group is created.

Unordered messages: incremental
Ordered messages: fixed

Specified in the ApsaraMQ for RocketMQ console or by calling API operations.

For more information, see Change maximum retries.

Simple consumer

Ready
Inflight
Commit
DLQ
Discard

Specified in the InvisibleDuration parameter in the corresponding API operation.

Specified in the ApsaraMQ for RocketMQ console or by calling API operations.

For more information, see Change maximum retries.

For more information, see Retry policies for messages consumed in Push mode and Retry policies for messages consumed in Simple mode.

Retry policies for messages consumed in Push mode

Retry status

If a message is consumed in Push mode, the message can be in one of the following states. Push消费状态机

Ready
The message is ready to be consumed in the ApsaraMQ for RocketMQ broker.
Inflight
The message is obtained and being consumed by the consumer, but the consumption result is not returned.
WaitingRetry
This state is exclusive for messages consumed in Push mode. If a message fails to be consumed or the consumption of the message times out, the consumption retry logic is triggered. The message is in the WaitingRetry state until the maximum number of retries is reached. After the retry interval is elapsed, the status of the message changes to Ready. Messages that are in the Ready state can be consumed again. You can increase the interval between two consecutive retries to prevent frequent retries.
Commit
The message is consumed. After the consumer returns a success response, the consumption is complete.
DLQ
A mechanism that is used to ensure the implementation of consumption logic. If the dead-letter message feature is enabled, a message that fails to be consumed after the maximum number of retries is reached is sent to the dead-letter topic. You can consume dead-letter messages to restore your business. For more information, see Dead-letter messages.
Discard
If the dead-letter message feature is not enabled, a message that fails to be consumed after the maximum number of retries is reached is discarded.

消息间隔时间

The preceding figure shows a sample retry process. In the figure, the message remains in the Ready state for 5 seconds and requires 6 seconds to be consumed.

Each time the message is retried, the status of the message changes from Ready to Inflight and then to WaitingRetry. A retry interval refers to the interval from the point in time when the consumption failure or timeout of a message occurs to the point in time when the next consumption of the message starts. The interval between two consecutive consumptions includes the retry interval, the consumption duration, and the period of time for which the message remains in the Ready state. Example:

The first time a message is delivered for consumption, the message enters the Ready state at the 0th second.
The message is pulled at the 5th second. At the 6th second, a consumption error occurs, and a message that indicates consumption failure is returned by the client.
The message cannot be immediately retried because a 10-second retry interval is specified.
At the 21st second, the message re-enters the Ready state.
Five seconds later, the client starts to consume the message again.

The consumption interval is calculated as 21 seconds by using the following formula: Consumption interval = Consumption duration + Retry interval + Duration in the Ready state = 6 + 10 + 5 = 21.

Retry intervals

The retry intervals for unordered messages are incremental. The following table describes the details.

Retry number	Interval	Retry number	Interval
1	10 seconds	9	7 minutes
2	30 seconds	10	8 minutes
3	1 minute	11	9 minutes
4	2 minutes	12	10 minutes
5	3 minutes	13	20 minutes
6	4 minutes	14	30 minutes
7	5 minutes	15	1 hour
8	6 minutes	16	2 hours

Note

If the number of retries exceeds 16, the interval for all subsequent retries is 2 hours.

The retry interval for ordered messages is fixed. For more information, see Limits on parameters.

Maximum retries

Default value: 16.
Maximum value: 1000.

The maximum number of retries for messages consumed in Push mode is specified in the metadata when the consumer group is created. For more information, see Change maximum retries.

For example, if the maximum number of retries is specified as three, the message can be delivered four times, including one original attempt and three retries.

Example

If a message is consumed in Push mode, consumption retry can be triggered by using the status code that is returned if the message fails to be consumed. If unexpected exceptions occur, SDKs can capture the exceptions.

SimpleConsumer simpleConsumer = null;
        // Consumption example: When a normal message is consumed in Push mode, consumption retry can be triggered if the message fails to be consumed. 
        MessageListener messageListener = new MessageListener() {
            @Override
            public ConsumeResult consume(MessageView messageView) {
                System.out.println(messageView);
                // The system retries the message until the message is consumed or the maximum number of retries is reached. 
                return ConsumeResult.FAILURE;
            }
        };

View consumption retry logs

If ordered messages are consumed in Push mode, the messages are retried on the consumer client and the broker cannot obtain the details of the retry logs. If the delivery result displayed in the trace of an ordered message indicates that the message delivery failed, you must check the information about the maximum number of retries and the consumer client in the consumer client logs.

For information about the log path of a consumer client, see Log configuration.

You can search the following keywords to query the information about consumption failures in client logs:

Message listener raised an exception while consuming messages
Failed to consume fifo message finally, run out of attempt times

Retry policies for messages consumed in Simple mode

Retry status

When a message is consumed in Simple mode, the message can be in one of the following states. PushConsumer状态机

Ready
The message is ready to be consumed in the ApsaraMQ for RocketMQ broker.
Inflight
The message is obtained and being consumed by the consumer, but the consumption result is not returned.
Commit
The message is consumed. After the consumer returns a success response, the consumption is complete.
DLQ
A mechanism that is used to ensure the implementation of consumption logic. If the dead-letter message feature is enabled, a message that fails to be consumed after the maximum number of retries is reached is sent to the dead-letter topic. You can consume dead-letter messages to restore your business. For more information, see Dead-letter messages.
Discard
If the dead-letter message feature is not enabled, a message that fails to be consumed after the maximum number of retries is reached is discarded.

The retry interval of a message that is consumed in Simple mode is preset. Before you call the corresponding API operation to obtain the message, you must configure the InvisibleDuration parameter that is used to specify the maximum period for processing the message. If the message fails to be consumed and retries are triggered, you can use the value of the InvisibDuration parameter instead of specifying an interval for the retries.

simpleconsumer重试

If the preset invisible period does not meet your business requirements, you can call the corresponding API operation to change the period.

For example, if you set the InvisibleDuration parameter to 20 milliseconds but a message cannot be processed in the period, you can change the value to a larger value to increase the retry interval.

Before you change the value of the InvisibleDuration parameter, you must make sure that the following conditions are met:

A timeout error did not occur in the message.
The consumption status is not committed.

The following figure shows that after you change the value of the InvisibleDuration parameter, the invisible period is immediately recalculated when you call the corresponding API operation.

修改不可见时间

Retry intervals

Retry interval = The value of InvisibleDuration - Actual message processing duration

The retry interval for messages consumed in Simple mode is determined by the InvisibleDuration parameter. For example, you set the value of the InvisibleDuration parameter to 30 milliseconds. In actual business scenarios, a consumption failure is returned 10 milliseconds after the consumption starts, and the system waits for 20 milliseconds to start the next retry. In this case, the retry interval is 20 milliseconds. If no consumption result is returned until the 30th millisecond, a timeout error occurs and a retry is triggered. Then, the retry interval becomes 0 ms.

Maximum retries

Default value: 16.
Maximum value: 1000.

The maximum number of retries for messages consumed in Simple mode is specified in the metadata when the consumer group is created. For more information, see Change maximum retries.

For example, if the maximum number of retries is specified as three, the message can be delivered four times, including one original attempt and three retries.

Example

No operation is required for the retries for messages consumed in Simple mode.

// Consumption example: When a normal message is consumed in Simple mode, the consumer needs to only wait until the consumption times out. After a consumption timeout occurs, the broker automatically retries the message. 
        List<MessageView> messageViewList = null;
        try {
            messageViewList = simpleConsumer.receive(10, Duration.ofSeconds(30));
            messageViewList.forEach(messageView -> {
                System.out.println(messageView);
                // If you want a message to be retried after the message fails to be consumed, ignore the failure and wait until the message is visible again. 
            });
        } catch (ClientException e) {
            // If the message fails to be pulled due to throttling or other reasons, you must re-initiate the request to obtain the message. 
            e.printStackTrace();
        }

Change maximum retries

You can use one of the following methods to change the maximum number of retries for messages consumed in Push and Simple modes:

Call the UpdateConsumerGroup API operation
Use the ApsaraMQ for RocketMQ console
Path: Instances > Instance Details > Groups

Usage notes

Do not use the consumption retry feature to deal with consumption throttling

As mentioned in Common scenarios, consumption retry is suitable for scenarios in which the probability of message consumption failure is small. Consumption retry is not suitable for scenarios in which the failure of one message causes the failure of subsequent messages, such as consumption throttling.

Incorrect example
Return consumption failures to trigger retries if the current consumption rate is higher than the specified limit.
Correct example
Obtain and consume messages at a later time if the current consumption rate is higher than the specified limit.

FAQ

How do I configure a timeout period for message consumption?

You can configure a timeout period for message consumption on a consumer client.

If you consume messages in Simple mode, you can configure a timeout period that ranges from 10 seconds to 12 hours.

Sample code:

private long minInvisiableTimeMillsForRecv = Duration.ofSeconds(10).toMillis();
private long maxInvisiableTimeMills = Duration.ofHours(12).toMillis();

If you consume messages in Push mode, the timeout period for message consumption is 1 minute. You cannot change the value.