To ensure fair access to model calls, Model Studio sets baseline throttling thresholds. Throttling is model-specific and is associated with the Alibaba Cloud account from which a model is called. Throttling is applied based on the total number of calls to the model by using all API keys within the Alibaba Cloud account. If your account exceeds a threshold, your API requests will fail, until your request frequency falls below the threshold.
Text generation
Qwen
Qwen LLMs
Name | Throttling threshold (Throttling is triggered when either limit is exceeded) | |
Queries per minute (QPM) | Tokens consumed per minute (TPM) | |
qwq-plus | 60 | 100,000 |
qwen-max | 600 | 1,000,000 |
qwen-max-latest | 60 | 100,000 |
qwen-max-2025-01-25 (qwen-max-0125) | ||
qwen-plus | 600 | 1,000,000 |
qwen-plus-latest | 60 | 100,000 |
qwen-plus-2025-01-25 (qwen-plus-0125) | ||
qwen-turbo | 600 | 5,000,000 |
qwen-turbo-latest | 60 | |
qwen-turbo-2024-11-01 (qwen-turbo-1101) |
Qwen VL (visual understanding/image-to-text)
Name | Throttling threshold (Throttling is triggered when either limit is exceeded) | |
Queries per minute (QPM) | Tokens consumed per minute (TPM) | |
qvq-max | 60 | 100,000 |
qvq-max-latest | ||
qvq-max-2025-03-25 | ||
qwen-vl-plus | 1,200 | 1,000,000 |
qwen-vl-max |
Qwen-MT
Name | Throttling threshold (Throttling is triggered when either limit is exceeded) | |
Queries per minute (QPM) | Tokens consumed per minute (TPM) | |
qwen-mt-plus | 60 | 23,797 |
qwen-mt-turbo | 19,020 |
Open source Qwen
Open source Qwen LLM
Name | Throttling threshold (Throttling is triggered when either limit is exceeded) | |
Queries per minute (QPM) | Tokens consumed per minute (TPM) | |
qwen2.5-14b-instruct-1m | 1,200 | 5,000,000 |
qwen2.5-7b-instruct-1m | ||
qwen2.5-72b-instruct | 1,000,000 | |
qwen2.5-32b-instruct | ||
qwen2.5-14b-instruct | ||
qwen2.5-7b-instruct | ||
qwen2-72b-instruct | 60 | 150,000 |
qwen2-57b-a14b-instruct | 30,000 | |
qwen2-7b-instruct | ||
qwen1.5-110b-chat | 10 | 20,000 |
qwen1.5-72b-chat | 120 | 200,000 |
qwen1.5-32b-chat | 10 | 20,000 |
qwen1.5-14b-chat | 120 | 200,000 |
qwen1.5-7b-chat |
Open source Qwen VL (visual understanding/image-to-text)
Name | Throttling threshold (Throttling is triggered when either limit is exceeded) | |
Queries per minute (QPM) | Tokens consumed per minute (TPM) | |
qwen2.5-vl-72b-instruct | 1,200 | 1,000,000 |
qwen2.5-vl-32b-instruct | 60 | 100,000 |
qwen2.5-vl-7b-instruct | 1,200 | 1,000,000 |
qwen2.5-vl-3b-instruct |
Qwen Omni (Multi-modal)
Name | Throttling threshold (Throttling is triggered when either limit is exceeded) | |
Queries per minute (QPM) | Tokens consumed per minute (TPM) | |
qwen2.5-omni-7b | 60 | 100,000 |
Embedding models
General text embedding
Name | Throttling threshold (Throttling is triggered when either limit is exceeded) | |
Queries per minute (QPM) | Tokens consumed per minute (TPM)/Jobs | |
text-embedding-v3 | 1,800 | 600,000 |