To ensure fair access to model calls, Model Studio sets baseline rate limits. Rate limits are model-specific and associated with the Alibaba Cloud account from which a model is called. A limit is applied based on the total number of calls to a model by using all API keys within the Alibaba Cloud account. If your account exceeds a limit, your API requests will fail, until your request frequency falls below the limit.
Text generation
Qwen
Qwen LLMs
Name | Rate limit (Triggered when either limit is exceeded) | |
Queries per minute (QPM) | Tokens consumed per minute (TPM) | |
qwq-plus | 60 | 100,000 |
qwen-max | 600 | 1,000,000 |
qwen-max-latest | 60 | 100,000 |
qwen-max-2025-01-25 (qwen-max-0125) | ||
qwen-plus | 600 | 1,000,000 |
qwen-plus-latest | 60 | 100,000 |
qwen-plus-2025-04-28 (qwen-plus-0428) | ||
qwen-plus-2025-01-25 (qwen-plus-0125) | ||
qwen-turbo | 600 | 5,000,000 |
qwen-turbo-latest | 60 | |
qwen-turbo-2025-04-28 (qwen-turbo-0428) | ||
qwen-turbo-2024-11-01 (qwen-turbo-1101) |
Qwen VL (visual understanding/image-to-text)
Name | Rate limit (Triggered when either limit is exceeded) | |
Queries per minute (QPM) | Tokens consumed per minute (TPM) | |
qvq-max | 60 | 100,000 |
qvq-max-latest | ||
qvq-max-2025-03-25 (qvq-max-0325) | ||
qwen-vl-max | 1,200 | 1,000,000 |
qwen-vl-max-latest | ||
qwen-vl-max-2025-04-08 (qwen-vl-max-0408) | ||
qwen-vl-plus | ||
qwen-vl-plus-latest | ||
qwen-vl-plus-2025-05-07 (qwen-vl-plus-0507) | 120 | |
qwen-vl-plus-2025-01-25 (qwen-vl-plus-0125) | 1,200 |
Open source Qwen
Open source Qwen LLM
Name | Rate limit (Triggered when either limit is exceeded) | |
Queries per minute (QPM) | Tokens consumed per minute (TPM) | |
qwen3-235b-a22b | 600 | 1,000,000 |
qwen3-32b | ||
qwen3-30b-a3b | ||
qwen3-14b | ||
qwen3-8b | ||
qwen3-4b | ||
qwen3-1.7b | ||
qwen3-0.6b | ||
qwen2.5-14b-instruct-1m | 1,200 | 5,000,000 |
qwen2.5-7b-instruct-1m | ||
qwen2.5-72b-instruct | 1,000,000 | |
qwen2.5-32b-instruct | ||
qwen2.5-14b-instruct | ||
qwen2.5-7b-instruct | ||
qwen2-72b-instruct Deprecated | 60 | 150,000 |
qwen2-57b-a14b-instruct Deprecated | 30,000 | |
qwen2-7b-instruct Deprecated | ||
qwen1.5-110b-chat Deprecated | 10 | 20,000 |
qwen1.5-72b-chat Deprecated | 120 | 200,000 |
qwen1.5-32b-chat Deprecated | 10 | 20,000 |
qwen1.5-14b-chat Deprecated | 120 | 200,000 |
qwen1.5-7b-chat Deprecated |
Open source Qwen VL (visual understanding/image-to-text)
Name | Rate limit (Triggered when either limit is exceeded) | |
Queries per minute (QPM) | Tokens consumed per minute (TPM) | |
qwen2.5-vl-72b-instruct | 1,200 | 1,000,000 |
qwen2.5-vl-32b-instruct | 60 | 100,000 |
qwen2.5-vl-7b-instruct | 1,200 | 1,000,000 |
qwen2.5-vl-3b-instruct |
Qwen Omni (Multi-modal)
Name | Rate limit (Triggered when either limit is exceeded) | |
Queries per minute (QPM) | Tokens consumed per minute (TPM) | |
qwen2.5-omni-7b | 60 | 100,000 |
Image generation
Wan
Name | Task submission per second | Number of concurrent tasks |
wan2.1-t2i-turbo | 2 | 2 |
wan2.1-t2i-plus |
Video generation
Wan
Name | Task submission per second | Number of concurrent tasks |
wan2.1-t2v-turbo | 2 | 2 |
wan2.1-t2v-plus | ||
wan2.1-i2v-turbo | ||
wan2.1-i2v-plus | ||
wan2.1-kf2v-plus | ||
wan2.1-vace-plus |
Embedding models
General text embedding
Name | Rate limit (Triggered when either limit is exceeded) | |
Queries per minute (QPM) | Tokens consumed per minute (TPM) | |
text-embedding-v3 | 6,000 | 24,000,000 |