Message Retry & Flow Control Overview - ApsaraMQ for RocketMQ

When a producer sends a message to the ApsaraMQ for RocketMQ broker, the request can fail due to network issues, broker restarts, or capacity limits. The client SDK handles these failures through two built-in mechanisms:

Sending retry re-sends failed messages automatically until they succeed or the retry limit is reached.
Throttling protects the broker from overload by rejecting requests when capacity is insufficient.

Both mechanisms work together: when throttling triggers a rejection, the retry mechanism uses exponential backoff to re-send the message without overwhelming the broker further.

Sending retry

Retry process

The client SDK includes built-in retry logic. When a send request fails, the SDK re-sends the message automatically -- no application-level retry code is needed.

Set the maximum number of retries when you initialize the producer. If a request fails, the SDK retries until the message is delivered or the retry limit is reached. After the final retry fails, the SDK returns an error to your application.

The retry behavior differs by sending mode:

Sending mode	Thread behavior	On final failure
Synchronous	Calling thread blocks for the entire retry sequence	SDK throws an exception
Asynchronous	Calling thread is not blocked	SDK delivers a failure callback event

Retry triggers

Retries are triggered by two categories of failure:

Client-side failures

A network exception causes a connection failure or request timeout.
The broker is restarting or being undeployed, causing connection failures.
The broker is running slowly, causing request timeouts.

Broker-side errors

System logic error: An internal processing error on the broker.
System throttling error: The broker rejects the request because it has exceeded capacity. See Throttling.

Note

Transactional messages only support transparent retries. The SDK does not retry transactional messages on network exceptions or timeouts.

Retry interval

The retry interval depends on the error type:

Error type	Retry interval
All errors except throttling	Immediate (no delay)
System throttling error	Exponential backoff with jitter

For throttling errors, the SDK uses exponential backoff with the following parameters:

Parameter	Description	Default
`INITIAL_BACKOFF`	Delay before the first retry	1 second
`MULTIPLIER`	Factor by which the delay increases after each retry	1.6
`JITTER`	Randomization factor applied to each delay	0.2
`MAX_BACKOFF`	Maximum delay between retries	120 seconds
`MIN_CONNECT_TIMEOUT`	Minimum connection timeout	20 seconds

The backoff algorithm works as follows:

ConnectWithBackoff()
  current_backoff = INITIAL_BACKOFF
  current_deadline = now() + INITIAL_BACKOFF
  while (TryConnect(Max(current_deadline, now() + MIN_CONNECT_TIMEOUT)) != SUCCESS)
    SleepUntil(current_deadline)
    current_backoff = Min(current_backoff * MULTIPLIER, MAX_BACKOFF)
    current_deadline = now() + current_backoff +
      UniformRandom(-JITTER * current_backoff, JITTER * current_backoff)

For the full specification, see gRPC connection backoff.

Understand the total retry time budget

The SDK exposes only one retry control: the maximum number of retries. In synchronous mode, the calling thread blocks for the entire retry sequence, so the total blocking time depends on the relationship between your per-request timeout and the maximum retry count:

Total blocking time (worst case) = max_retries x per_request_timeout + sum_of_backoff_delays

For non-throttling errors (immediate retry), the backoff delay is zero:

Total blocking time = max_retries x per_request_timeout

For throttling errors, backoff delays accumulate exponentially. For example, with the default parameters and 5 retries:

Retry	Backoff delay (approximate)	Cumulative delay
1	1 s	1 s
2	1.6 s	2.6 s
3	2.56 s	5.16 s
4	4.1 s	9.26 s
5	6.55 s	15.81 s

Evaluate your per-request timeout and maximum retries together to avoid blocking the calling thread for too long in synchronous mode.

Handle failed messages after exhausted retries

Built-in retries do not guarantee delivery. If all retries fail, the SDK returns an error. Catch this error in your application and implement a fallback strategy:

Write the failed message to a local log or dead-letter store for later reprocessing.
Alert your monitoring system so you can investigate the root cause.

Handle duplicate messages from retries

When a send request times out, the SDK cannot determine whether the broker already received and stored the message. A retry may produce a duplicate on the broker. This is a fundamental trade-off in at-least-once delivery systems.

To handle duplicates, design your consumers for idempotent processing:

Assign each message a unique business key (such as an order ID or transaction ID).
Before processing, check whether the key has already been processed.
Use database constraints or deduplication caches to enforce uniqueness.

Throttling

Throttling is a normal operational mechanism in cloud messaging systems. When system capacity is insufficient or usage exceeds a predefined threshold, the ApsaraMQ for RocketMQ broker immediately rejects the request and returns a system throttling error. The SDK's built-in retry logic then handles the rejected request using exponential backoff.

Throttling triggers

Throttling is triggered in the following scenarios:

Storage pressure surge: A consumer group starts consuming from the maximum offset of a queue. In scenarios such as business rollouts where a consumer group must begin consuming at a specific time, storage pressure on the queue spikes. For more information, see Consumer progress management.
Message accumulation: When consumers cannot keep up with the rate of incoming messages, unconsumed messages accumulate in the queue. If the accumulation exceeds the threshold, the broker triggers throttling to reduce pressure on the downstream system.

Error codes and retry behavior by client type

When throttling is triggered, the error code and retry behavior depend on your client protocol.

gRPC clients

Item	Value
Error code	`530`
Error message keyword	`TOO_MANY_REQUESTS`
Retry behavior	Automatic retry with exponential backoff

Remoting clients

Item	Value
Error code	`215`
Error message keyword	`messages flow control`

Retry behavior for Remoting clients varies by SDK version:

SDK	Retry behavior on throttling
ApsaraMQ for RocketMQ TCP client SDK for Java < 1.9.0.Final	No retry
ApsaraMQ for RocketMQ TCP client SDK for Java >= 1.9.0.Final	Automatic retry with exponential backoff
Open-source Apache RocketMQ SDK (producer)	No retry
Open-source Apache RocketMQ SDK (consumer)	Automatic retry with exponential backoff

If your SDK version does not retry automatically on throttling errors, implement retry logic with exponential backoff in your application code.

Note

For supported client versions, see SDK compatibility.

Prevent and handle throttling

Monitor capacity before traffic spikes

Use the ApsaraMQ for RocketMQ observability features to monitor system usage and capacity. Before business rollouts or anticipated traffic spikes:

Verify that your instance has sufficient resources for expected traffic.
Check consumer group lag to identify accumulation risks.
Scale your instance or optimize consumer throughput if needed.

Handle unexpected throttling at runtime

If throttling occurs unexpectedly and the SDK's built-in retries cannot recover:

Route requests to a fallback system until the throttling condition clears.
Log throttling events (look for error code 530 / TOO_MANY_REQUESTS for gRPC or 215 / messages flow control for Remoting) to help diagnose the root cause.