Simple Message Queue (formerly MNS) elastically scales throughput based on real-time cluster resources. When requests exceed the cluster's current capacity, throttling temporarily limits traffic to maintain stability.
How throttling works
The throttling threshold is not a fixed cap. As traffic approaches the threshold, the server automatically scales resources and raises the limit. In most cases, this elastic adjustment absorbs traffic spikes without affecting your workload.
If a sudden burst exceeds what elastic scaling can absorb, the server activates a backpressure mechanism. It holds excess requests for approximately 500 ms before returning an error. After automatic scale-out completes, the system resumes normal processing at a higher threshold.
Default throughput
The system elastically supports throughput above the baseline when cluster resources are available. Each Alibaba Cloud account has a default guaranteed throughput of 20,000 Transactions Per Second (TPS) per region.
To raise the default threshold, submit a ticket.
How TPS is calculated
Each API operation counts as one request, with one exception: batch operations are counted by the number of messages, not the number of API calls.
| Operation | TPS formula | Example |
|---|---|---|
| Single-message operations | 1 request = 1 TPS | 100 requests/sec = 100 TPS |
| BatchSendMessage | Requests/sec x messages per request | 100 requests/sec x 10 messages = 1,000 TPS |
| BatchReceiveMessage | Requests/sec x messages per request | 100 requests/sec x 10 messages = 1,000 TPS |
Throttling for abnormal queue consumption
In a standard queue consumption pattern, clients delete messages after processing them. If a client repeatedly receives messages without sending delete requests, the system flags this as abnormal behavior. It then reduces the rate at which that client can receive messages.
Throttling activates when any one of the following conditions is met:
| Condition | Threshold |
|---|---|
| Duration | Abnormal behavior persists for more than 30 minutes |
| Message count | Total received-but-not-deleted messages reaches 5,000 |
| Rate | Instantaneous receive-without-delete rate exceeds 1,000 TPS |
Error response
When throttling is triggered, the server returns the following error:
| HTTP status code | Error code | Error message |
|---|---|---|
| 429 | TooManyRequests | The request is denied by cluster flow limiter for too many requests. |
Handle this error by waiting briefly and retrying the request. Throttling is temporary -- the system automatically scales out to restore capacity.
Mitigate throttling
Plan ahead for traffic spikes. If you expect a large increase in traffic, submit a ticket in advance. The team can pre-allocate resources to prevent throttling during peak periods.
Set up monitoring and alerts. Use the monitoring tools for Simple Message Queue to track real-time traffic and throttling status. Early detection helps you respond before throttling affects your workload.
Implement retry with exponential backoff. When a 429 error occurs, retry with exponential backoff. This gives the system time to scale out and prevents retry storms from worsening throttling.