All Products
Search
Document Center

EventBridge:Retry policies and dead-letter queues

Last Updated:Mar 11, 2026

Event streams in EventBridge use retry policies, fault tolerance policies, and dead-letter queues (DLQs) to handle event delivery failures. When delivery to a target fails, EventBridge retries based on the configured retry policy. If all retries are exhausted, the fault tolerance policy determines whether to skip the failed event or block the stream. Route undeliverable events to a DLQ to preserve them for later inspection.

The following diagram shows how these three policies interact:

Event delivery fails
       |
       v
  Retry policy
  (backoff or exponential decay)
       |
  All retries exhausted?
     /        \
   No          Yes
   |            |
  Retry     Fault tolerance policy
  again       /              \
        Allowed           Prohibited
          |                    |
     DLQ configured?     Stream blocked,
       /       \         task status -> Ready
     Yes        No
      |          |
  Send to     Discard
   DLQ        event

Retry policies

A retry policy controls how EventBridge reattempts delivery after a failure. Each event stream supports two retry policies:

PolicyMax retriesRetry intervalMax durationDefault
Backoff retry3Random, 10--20 seconds between attempts--Yes
Exponential decay retry176Starts at 1 s, doubles up to 512 s1 dayNo

Backoff retry

Backoff retry is the default policy. EventBridge retries a failed event up to 3 times, with a random interval of 10 to 20 seconds between consecutive attempts. Use this policy when you expect transient failures that resolve quickly.

Exponential decay retry

Exponential decay retry provides a longer retry window for targets that may take longer to recover. EventBridge retries a failed event up to 176 times over a maximum period of 1 day. The interval doubles with each attempt, up to a ceiling of 512 seconds:

1 s, 2 s, 4 s, 8 s, 16 s, 32 s, 64 s, 128 s, 256 s, 512 s

After the interval reaches 512 seconds, the remaining 167 retries continue at 512-second intervals.

Non-retryable errors

Note

If retries cannot be performed due to errors such as invalid resource configurations, the task status changes to Start Failed regardless of the retry or fault tolerance policy. EventBridge does not retry these errors because the underlying issue requires manual intervention.

Fault tolerance policies

A fault tolerance policy controls how EventBridge handles an event that still fails after all retries are exhausted. Each event stream supports two fault tolerance policies:

PolicyBehavior after retries exhaustedEffect on subsequent events
Fault tolerance allowedEvent is sent to the DLQ (if configured) or discardedProcessing continues
Fault tolerance prohibitedTask status changes to ReadyProcessing blocked until you resolve the issue

Fault tolerance allowed

When fault tolerance is allowed, delivery failures do not block event processing. After all retries are exhausted, EventBridge delivers the event to the DLQ or discards it, then continues processing.

Choose this policy when event loss is acceptable or when you have a DLQ configured to capture failed events.

Fault tolerance prohibited

When fault tolerance is prohibited, delivery failures block event processing after all retries are exhausted. The task status changes to Ready, and no further events are processed until you resolve the issue.

Choose this policy when every event must be delivered and you prefer to halt processing rather than lose events.

Dead-letter queues

A dead-letter queue (DLQ) captures events that fail delivery after all retries are exhausted. When you enable a DLQ on a task, EventBridge sends the raw event data to the DLQ instead of discarding it. The DLQ feature is disabled by default.

Supported DLQ targets

The following services are supported as DLQ targets:

ServiceDescription
ApsaraMQ for RocketMQMessage queue service
Simple Message Queue (formerly MNS)Lightweight message queue service
ApsaraMQ for KafkaKafka-compatible message queue service
EventBridge event busRoute failed events to another event bus for further processing

When to enable a DLQ

Enable a DLQ when you need to:

  • Inspect and debug events that failed delivery

  • Reprocess failed events after fixing the root cause

  • Maintain a record of all delivery failures for auditing

Note

If you use the Fault tolerance allowed policy without a DLQ, failed events are permanently discarded after retries are exhausted. To prevent data loss, configure a DLQ before enabling fault tolerance.