All Products
Search
Document Center

Alibaba Cloud Model Studio:Throttling

Last Updated:Mar 28, 2025

To ensure fair access to model calls, Model Studio sets baseline throttling thresholds. Throttling is model-specific and is associated with the Alibaba Cloud account from which a model is called. Throttling is applied based on the total number of calls to the model by using all API keys within the Alibaba Cloud account. If your account exceeds a threshold, your API requests will fail, until your request frequency falls below the threshold.

Text generation

Qwen

Qwen LLMs

Name

Throttling threshold (Throttling is triggered when either limit is exceeded)

Queries per minute (QPM)

Tokens consumed per minute (TPM)

qwq-plus

60

100,000

qwen-max

600

1,000,000

qwen-max-latest

60

100,000

qwen-max-2025-01-25

(qwen-max-0125)

qwen-plus

600

1,000,000

qwen-plus-latest

60

100,000

qwen-plus-2025-01-25

(qwen-plus-0125)

qwen-turbo

600

5,000,000

qwen-turbo-latest

60

qwen-turbo-2024-11-01

(qwen-turbo-1101)

Qwen VL (visual understanding/image-to-text)

Name

Throttling threshold (Throttling is triggered when either limit is exceeded)

Queries per minute (QPM)

Tokens consumed per minute (TPM)

qvq-max

60

100,000

qvq-max-latest

qvq-max-2025-03-25

qwen-vl-plus

1,200

1,000,000

qwen-vl-max

Qwen-MT

Name

Throttling threshold (Throttling is triggered when either limit is exceeded)

Queries per minute (QPM)

Tokens consumed per minute (TPM)

qwen-mt-plus

60

23,797

qwen-mt-turbo

19,020

Open source Qwen

Open source Qwen LLM

Name

Throttling threshold (Throttling is triggered when either limit is exceeded)

Queries per minute (QPM)

Tokens consumed per minute (TPM)

qwen2.5-14b-instruct-1m

1,200

5,000,000

qwen2.5-7b-instruct-1m

qwen2.5-72b-instruct

1,000,000

qwen2.5-32b-instruct

qwen2.5-14b-instruct

qwen2.5-7b-instruct

qwen2-72b-instruct

60

150,000

qwen2-57b-a14b-instruct

30,000

qwen2-7b-instruct

qwen1.5-110b-chat

10

20,000

qwen1.5-72b-chat

120

200,000

qwen1.5-32b-chat

10

20,000

qwen1.5-14b-chat

120

200,000

qwen1.5-7b-chat

Open source Qwen VL (visual understanding/image-to-text)

Name

Throttling threshold (Throttling is triggered when either limit is exceeded)

Queries per minute (QPM)

Tokens consumed per minute (TPM)

qwen2.5-vl-72b-instruct

1,200

1,000,000

qwen2.5-vl-32b-instruct

60

100,000

qwen2.5-vl-7b-instruct

1,200

1,000,000

qwen2.5-vl-3b-instruct

Qwen Omni (Multi-modal)

Name

Throttling threshold (Throttling is triggered when either limit is exceeded)

Queries per minute (QPM)

Tokens consumed per minute (TPM)

qwen2.5-omni-7b

60

100,000

Embedding models

General text embedding

Name

Throttling threshold (Throttling is triggered when either limit is exceeded)

Queries per minute (QPM)

Tokens consumed per minute (TPM)/Jobs

text-embedding-v3

1,800

600,000