All Products
Search
Document Center

Alibaba Cloud Model Studio:Rate limits

Last Updated:Jun 04, 2025

To ensure fair access to model calls, Model Studio sets baseline rate limits. Rate limits are model-specific and associated with the Alibaba Cloud account from which a model is called. A limit is applied based on the total number of calls to a model by using all API keys within the Alibaba Cloud account. If your account exceeds a limit, your API requests will fail, until your request frequency falls below the limit.

Text generation

Qwen

Qwen LLMs

Name

Rate limit (Triggered when either limit is exceeded)

Queries per minute (QPM)

Tokens consumed per minute (TPM)

qwq-plus

60

100,000

qwen-max

600

1,000,000

qwen-max-latest

60

100,000

qwen-max-2025-01-25

(qwen-max-0125)

qwen-plus

600

1,000,000

qwen-plus-latest

60

100,000

qwen-plus-2025-04-28

(qwen-plus-0428)

qwen-plus-2025-01-25

(qwen-plus-0125)

qwen-turbo

600

5,000,000

qwen-turbo-latest

60

qwen-turbo-2025-04-28

(qwen-turbo-0428)

qwen-turbo-2024-11-01

(qwen-turbo-1101)

Qwen VL (visual understanding/image-to-text)

Name

Rate limit (Triggered when either limit is exceeded)

Queries per minute (QPM)

Tokens consumed per minute (TPM)

qvq-max

60

100,000

qvq-max-latest

qvq-max-2025-03-25

(qvq-max-0325)

qwen-vl-max

1,200

1,000,000

qwen-vl-max-latest

qwen-vl-max-2025-04-08

(qwen-vl-max-0408)

qwen-vl-plus

qwen-vl-plus-latest

qwen-vl-plus-2025-05-07

(qwen-vl-plus-0507)

120

qwen-vl-plus-2025-01-25

(qwen-vl-plus-0125)

1,200

Open source Qwen

Open source Qwen LLM

Name

Rate limit (Triggered when either limit is exceeded)

Queries per minute (QPM)

Tokens consumed per minute (TPM)

qwen3-235b-a22b

600

1,000,000

qwen3-32b

qwen3-30b-a3b

qwen3-14b

qwen3-8b

qwen3-4b

qwen3-1.7b

qwen3-0.6b

qwen2.5-14b-instruct-1m

1,200

5,000,000

qwen2.5-7b-instruct-1m

qwen2.5-72b-instruct

1,000,000

qwen2.5-32b-instruct

qwen2.5-14b-instruct

qwen2.5-7b-instruct

qwen2-72b-instruct Deprecated

60

150,000

qwen2-57b-a14b-instruct Deprecated

30,000

qwen2-7b-instruct Deprecated

qwen1.5-110b-chat Deprecated

10

20,000

qwen1.5-72b-chat Deprecated

120

200,000

qwen1.5-32b-chat Deprecated

10

20,000

qwen1.5-14b-chat Deprecated

120

200,000

qwen1.5-7b-chat Deprecated

Open source Qwen VL (visual understanding/image-to-text)

Name

Rate limit (Triggered when either limit is exceeded)

Queries per minute (QPM)

Tokens consumed per minute (TPM)

qwen2.5-vl-72b-instruct

1,200

1,000,000

qwen2.5-vl-32b-instruct

60

100,000

qwen2.5-vl-7b-instruct

1,200

1,000,000

qwen2.5-vl-3b-instruct

Qwen Omni (Multi-modal)

Name

Rate limit (Triggered when either limit is exceeded)

Queries per minute (QPM)

Tokens consumed per minute (TPM)

qwen2.5-omni-7b

60

100,000

Image generation

Wan

Name

Task submission per second

Number of concurrent tasks

wan2.1-t2i-turbo

2

2

wan2.1-t2i-plus

Video generation

Wan

Name

Task submission per second

Number of concurrent tasks

wan2.1-t2v-turbo

2

2

wan2.1-t2v-plus

wan2.1-i2v-turbo

wan2.1-i2v-plus

wan2.1-kf2v-plus

wan2.1-vace-plus

Embedding models

General text embedding

Name

Rate limit (Triggered when either limit is exceeded)

Queries per minute (QPM)

Tokens consumed per minute (TPM)

text-embedding-v3

6,000

24,000,000