Fault tolerance and error handling mechanisms - ApsaraMQ for RabbitMQ

When a Message Integration task fails to deliver a message, the retry policy controls how and when delivery is reattempted. After all retries are exhausted, the fault tolerance policy takes over. Depending on the configuration, the message is routed to a dead-letter queue, discarded, or the task is paused.

How retry, fault tolerance, and dead-letter queues interact

When a message fails to deliver:

The retry policy reattempts delivery at the configured intervals until the retry limit is reached.
If all retries are exhausted, the fault tolerance policy takes effect:
- Fault tolerance allowed + dead-letter queue enabled -- The message is routed to the dead-letter queue.
- Fault tolerance allowed + dead-letter queue disabled -- The message is discarded.
- Fault tolerance prohibited -- The task pauses and its status changes to Ready.

Note

If a retry cannot run due to invalid resource configurations, the task status changes to Start Failed.

Retry policies

A retry policy controls how failed messages are retried within a Message Integration task. Message Integration supports two retry policies: backoff retry and exponential decay retry.

Backoff retry (default)

Retries a failed message up to 3 times. Each retry waits a random interval between 10 and 20 seconds.

Use backoff retry for transient failures that resolve within seconds, such as brief network interruptions or temporary service unavailability.

Exponential decay retry

Retries a failed message up to 176 times over one day. The interval doubles with each attempt, from 1 second up to 512 seconds:

1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s, 256s, 512s

After the interval reaches 512 seconds, the remaining 167 retries all use the 512-second interval.

Use exponential decay retry when failures may persist for minutes to hours, such as downstream service outages or rate-limiting.

Compare retry policies

	Backoff retry	Exponential decay retry
Max retries	3	176
Retry window	--	1 day
Interval	Random, 10--20 s	Doubles from 1 s to 512 s
Best for	Transient failures that resolve within seconds	Failures that may persist for minutes to hours

Fault tolerance policies

A fault tolerance policy determines how the task responds after all retries are exhausted. Message Integration supports two fault tolerance policies.

Policy	Behavior after retries are exhausted	Task continues?
Fault tolerance allowed	The failed message is delivered to the dead-letter queue (if configured) or discarded.	Yes
Fault tolerance prohibited	The task stops processing. Its status changes to Ready.	No

Dead-letter queues

A dead-letter queue preserves messages that fail to be processed or sent after the retry policy is exhausted. The raw message data is stored in the dead-letter queue.

Dead-letter queues are disabled by default. Each dead-letter queue is scoped to a single Message Integration task.

Supported queue types

The following services can serve as dead-letter queues:

Service
ApsaraMQ for RocketMQ
Simple Message Queue (formerly MNS)
ApsaraMQ for Kafka
Event buses in EventBridge