When a producer sends a message to the ApsaraMQ for RocketMQ broker, the request can fail due to network issues, broker restarts, or capacity limits. The client SDK handles these failures through two built-in mechanisms:
Sending retry re-sends failed messages automatically until they succeed or the retry limit is reached.
Throttling protects the broker from overload by rejecting requests when capacity is insufficient.
Both mechanisms work together: when throttling triggers a rejection, the retry mechanism uses exponential backoff to re-send the message without overwhelming the broker further.
Sending retry
Retry process
The client SDK includes built-in retry logic. When a send request fails, the SDK re-sends the message automatically -- no application-level retry code is needed.
Set the maximum number of retries when you initialize the producer. If a request fails, the SDK retries until the message is delivered or the retry limit is reached. After the final retry fails, the SDK returns an error to your application.
The retry behavior differs by sending mode:
| Sending mode | Thread behavior | On final failure |
|---|---|---|
| Synchronous | Calling thread blocks for the entire retry sequence | SDK throws an exception |
| Asynchronous | Calling thread is not blocked | SDK delivers a failure callback event |
Retry triggers
Retries are triggered by two categories of failure:
Client-side failures
A network exception causes a connection failure or request timeout.
The broker is restarting or being undeployed, causing connection failures.
The broker is running slowly, causing request timeouts.
Broker-side errors
System logic error: An internal processing error on the broker.
System throttling error: The broker rejects the request because it has exceeded capacity. See Throttling.
Transactional messages only support transparent retries. The SDK does not retry transactional messages on network exceptions or timeouts.
Retry interval
The retry interval depends on the error type:
| Error type | Retry interval |
|---|---|
| All errors except throttling | Immediate (no delay) |
| System throttling error | Exponential backoff with jitter |
For throttling errors, the SDK uses exponential backoff with the following parameters:
| Parameter | Description | Default |
|---|---|---|
INITIAL_BACKOFF | Delay before the first retry | 1 second |
MULTIPLIER | Factor by which the delay increases after each retry | 1.6 |
JITTER | Randomization factor applied to each delay | 0.2 |
MAX_BACKOFF | Maximum delay between retries | 120 seconds |
MIN_CONNECT_TIMEOUT | Minimum connection timeout | 20 seconds |
The backoff algorithm works as follows:
ConnectWithBackoff()
current_backoff = INITIAL_BACKOFF
current_deadline = now() + INITIAL_BACKOFF
while (TryConnect(Max(current_deadline, now() + MIN_CONNECT_TIMEOUT)) != SUCCESS)
SleepUntil(current_deadline)
current_backoff = Min(current_backoff * MULTIPLIER, MAX_BACKOFF)
current_deadline = now() + current_backoff +
UniformRandom(-JITTER * current_backoff, JITTER * current_backoff)For the full specification, see gRPC connection backoff.
Understand the total retry time budget
The SDK exposes only one retry control: the maximum number of retries. In synchronous mode, the calling thread blocks for the entire retry sequence, so the total blocking time depends on the relationship between your per-request timeout and the maximum retry count:
Total blocking time (worst case) = max_retries x per_request_timeout + sum_of_backoff_delaysFor non-throttling errors (immediate retry), the backoff delay is zero:
Total blocking time = max_retries x per_request_timeoutFor throttling errors, backoff delays accumulate exponentially. For example, with the default parameters and 5 retries:
| Retry | Backoff delay (approximate) | Cumulative delay |
|---|---|---|
| 1 | 1 s | 1 s |
| 2 | 1.6 s | 2.6 s |
| 3 | 2.56 s | 5.16 s |
| 4 | 4.1 s | 9.26 s |
| 5 | 6.55 s | 15.81 s |
Evaluate your per-request timeout and maximum retries together to avoid blocking the calling thread for too long in synchronous mode.
Handle failed messages after exhausted retries
Built-in retries do not guarantee delivery. If all retries fail, the SDK returns an error. Catch this error in your application and implement a fallback strategy:
Write the failed message to a local log or dead-letter store for later reprocessing.
Alert your monitoring system so you can investigate the root cause.
Handle duplicate messages from retries
When a send request times out, the SDK cannot determine whether the broker already received and stored the message. A retry may produce a duplicate on the broker. This is a fundamental trade-off in at-least-once delivery systems.
To handle duplicates, design your consumers for idempotent processing:
Assign each message a unique business key (such as an order ID or transaction ID).
Before processing, check whether the key has already been processed.
Use database constraints or deduplication caches to enforce uniqueness.
Throttling
Throttling is a normal operational mechanism in cloud messaging systems. When system capacity is insufficient or usage exceeds a predefined threshold, the ApsaraMQ for RocketMQ broker immediately rejects the request and returns a system throttling error. The SDK's built-in retry logic then handles the rejected request using exponential backoff.
Throttling triggers
Throttling is triggered in the following scenarios:
Storage pressure surge: A consumer group starts consuming from the maximum offset of a queue. In scenarios such as business rollouts where a consumer group must begin consuming at a specific time, storage pressure on the queue spikes. For more information, see Consumer progress management.
Message accumulation: When consumers cannot keep up with the rate of incoming messages, unconsumed messages accumulate in the queue. If the accumulation exceeds the threshold, the broker triggers throttling to reduce pressure on the downstream system.
Error codes and retry behavior by client type
When throttling is triggered, the error code and retry behavior depend on your client protocol.
gRPC clients
| Item | Value |
|---|---|
| Error code | 530 |
| Error message keyword | TOO_MANY_REQUESTS |
| Retry behavior | Automatic retry with exponential backoff |
Remoting clients
| Item | Value |
|---|---|
| Error code | 215 |
| Error message keyword | messages flow control |
Retry behavior for Remoting clients varies by SDK version:
| SDK | Retry behavior on throttling |
|---|---|
| ApsaraMQ for RocketMQ TCP client SDK for Java < 1.9.0.Final | No retry |
| ApsaraMQ for RocketMQ TCP client SDK for Java >= 1.9.0.Final | Automatic retry with exponential backoff |
| Open-source Apache RocketMQ SDK (producer) | No retry |
| Open-source Apache RocketMQ SDK (consumer) | Automatic retry with exponential backoff |
If your SDK version does not retry automatically on throttling errors, implement retry logic with exponential backoff in your application code.
For supported client versions, see SDK compatibility.
Prevent and handle throttling
Monitor capacity before traffic spikes
Use the ApsaraMQ for RocketMQ observability features to monitor system usage and capacity. Before business rollouts or anticipated traffic spikes:
Verify that your instance has sufficient resources for expected traffic.
Check consumer group lag to identify accumulation risks.
Scale your instance or optimize consumer throughput if needed.
Handle unexpected throttling at runtime
If throttling occurs unexpectedly and the SDK's built-in retries cannot recover:
Route requests to a fallback system until the throttling condition clears.
Log throttling events (look for error code
530/TOO_MANY_REQUESTSfor gRPC or215/messages flow controlfor Remoting) to help diagnose the root cause.