Throttling policy and threshold details - Simple Message Queue (formerly MNS)

Simple Message Queue (formerly MNS) applies a throttling policy to requests that exceed the throttling threshold to prevent excessive pressure on the underlying resources. Understanding the throttling policy helps you plan message send and receive frequency reasonably and take appropriate actions when throttling is triggered.

Throttling behavior

When traffic approaches or reaches the throttling threshold, the server automatically and elastically adjusts the throttling threshold based on real-time resource usage. In most scenarios, this dynamically supports higher concurrent requests without users noticing. If temporary throttling is triggered (for example, sudden traffic spikes or cluster resource bottlenecks), the system restores traffic processing capacity and synchronously raises the throttling threshold after automatic scale-out completes.

When throttling is triggered, the system activates a backpressure mechanism. Requests that exceed the threshold are briefly held on the server and then return a 429 (TooManyRequests) error. The hold duration is dynamically adjusted by the server based on real-time load, typically between 10 and 500 milliseconds, with a maximum of 5 seconds. This prevents the system from being overloaded, which could affect overall performance and stability.

Error code

When the throttling policy is triggered, the Simple Message Queue (formerly MNS) server returns the following error code information.

HTTPS status code	Error code	Error message
429	TooManyRequests	The request is denied by cluster flow limiter for too many requests.

Throttling thresholds

Abnormal consumption throttling policy in queue consumption mode

In the standard queue consumption pattern, after a client successfully processes a message, it should send a request to the server to delete the message. If a client repeatedly exhibits the non-standard behavior of "receiving messages without sending delete requests", the system flags this as abnormal consumption and triggers a throttling mechanism to ensure system stability. After throttling is applied, the rate at which the client receives new messages is significantly reduced.

Throttling is triggered when any of the following conditions is met:

Duration: The abnormal consumption persists for more than 30 minutes.
Message count: The cumulative number of received-but-not-deleted messages reaches 5,000.
Traffic rate: The instantaneous rate of received-but-not-deleted messages exceeds 1,000 TPS.

Throttling policy for high-traffic requests

The default throttling threshold per Alibaba Cloud account per region is 20,000 TPS. If your traffic exceeds 20,000 TPS, log on to the Quota Center console to apply for an increase to the maximum TPS per region. For detailed steps, see Submit an application to increase a quota.

Request count rules:

Each API call counts as one request.
TPS calculation in batch send scenarios: When you call the BatchSendMessage operation against a queue, BatchSendMessage TPS = actual BatchSendMessage requests per second × number of messages per request. For example, if BatchSendMessage is called 100 times per second and each call contains 10 messages, the TPS consumed by a single queue = 100 × 10 = 1,000.
TPS calculation in batch consumption scenarios: When you call the BatchReceiveMessage operation against a queue, BatchReceiveMessage TPS = actual BatchReceiveMessage requests per second (independent of the number of messages per batch). For example, if BatchReceiveMessage is called 100 times per second and each call contains 10 messages, the TPS consumed by a single queue = 100.

Avoid throttling impact

To avoid the impact of the throttling policy on your business, focus on the following two aspects:

Plan traffic carefully and communicate peak traffic in advance: If you anticipate large-scale traffic growth and the TPS exceeds the maximum value that you can apply for in the Quota Center console, submit a ticket to contact technical support to reserve more resources and avoid triggering throttling.
Monitoring and alerting: We recommend that you integrate with the Simple Message Queue (formerly MNS) Create an alert rule. View the real-time TPS usage of each queue or topic in the CloudMonitor console (CloudMonitor console > Cloud Service Monitoring > Message Service MNS) to detect when the throttling threshold is being approached.

FAQ

Does the service support only 20,000 TPS?

No. 20,000 TPS is the default guaranteed value. The actual supported TPS may be higher, depending on cluster load and elastic scaling capabilities.

Why does throttling sometimes occur above 20,000 TPS but not always?

When the server determines whether to trigger throttling, there is a certain elastic margin. Take the default throttling threshold of 20,000 TPS as an example: the actual QPS that triggers throttling is not strictly equal to 20,000, but may be slightly higher (for example, 20,000 to 20,200; this range is just an example - the actual range is dynamically adjusted). However, this elastic margin is not a fixed value. When cluster load is high, the elastic margin automatically narrows, and in extreme cases can be reduced to 0, meaning that throttling is triggered immediately upon reaching the threshold.

We strongly recommend that you configure API maximum value or API watermark alerts to avoid triggering throttling, which can affect business by causing message send/receive failures. Reference: Create an alert rule

API QPS maximum per minute	MaxApiQpsPerUser
API QPS watermark percentage	WatermarkOfApiQps
Console API QPS maximum per minute	MaxConsoleApiQpsPerUser
Console API QPS watermark percentage	WatermarkOfConsoleApiQps

Does throttling affect business?

Throttling is a system overload protection mechanism designed to prevent the cluster from being overloaded by sudden traffic spikes, which could affect overall stability. We recommend that clients adopt an exponential backoff retry strategy when receiving 429 errors (for example, gradually increasing intervals at 1s, 2s, and 4s) to avoid immediate large-scale retries that exacerbate the pressure.