All Products
Search
Document Center

Alibaba Cloud Model Studio:Rate limits

Last Updated:Dec 05, 2025

To ensure fair use, Alibaba Cloud Model Studio applies basic rate limits. These limits are model-specific and linked to your Alibaba Cloud account. The limit is calculated based on the total calls to a model from all RAM users, workspaces, and API keys under your account. If you exceed the limit, API requests will fail. You must wait for the limit to reset before making another call.

Rules

  • Account-level limits: Rate limits apply at the Alibaba Cloud account level. They are calculated based on the total calls from all RAM users, workspaces, and API keys under the account.

  • Model-specific limits: Each model has an independent rate limit. See the tables below for details.

FAQ

Why is rate limiting triggered?

Check the error message:

  • Requests rate limit exceeded or You exceeded your current requests list: This error indicates that the call frequency limit was triggered.

  • Allocated quota exceeded or You exceeded your current quota: This error indicates that the token consumption limit was triggered.

  • Request rate increased too quickly: This error indicates that a sudden surge in call frequency triggered the system's stability protection, even if the Requests Per Minute (RPM) or Tokens Per Minute (TPM) limits were not reached.

  • For other errors, see Error messages to identify the cause.

Note: In addition to RPM and TPM, rate limits may also be enforced at the per-second level. These limits are Requests per Second (RPS), calculated as RPM/60, and Requests per Second (RPS), calculated as TPM/60. A burst of requests in a short period can trigger rate limiting, even if the total number of calls is below the per-minute limit.

How to view model call usage?

One hour after you call a model, go to the Model Observation (Singapore or Beijing) page. Set the query conditions, such as the time range and workspace. Then, in the Models area, find the target model and click Monitor in the Actions column to view the model's call statistics. For more information, see the Model Observation document.

Data is updated hourly. During peak periods, there may be an hour-level latency.

image

How long does it take to recover after a rate limit is triggered?

The limit typically resets within one minute. If other errors occur, see Error messages for solutions.

How to avoid rate limiting?

  1. Choose a model with a higher rate limit: Stable or latest versions have higher rate limits than older snapshot versions.

  2. Optimize your calling strategy

    • Adjust the call frequency: If you receive a "Requests rate limit exceeded" or "You exceeded your current requests list" error, reduce the call frequency.

    • Reduce token consumption: If you receive an "Allocated quota exceeded" or "You exceeded your current quota" error, shorten the input or output length.

    • Smooth the request rate: If a sudden increase in call frequency triggers system stability protection, you may receive a "Request rate increased too quickly" error. In this case, optimize your client-side calling logic. You can adopt a request smoothing strategy, such as uniform scheduling, exponential backoff, or a request queue buffer. This strategy distributes requests evenly over the time window and avoids instantaneous peaks.

  3. Add a backup model

    If you encounter a rate limit error, switch to a backup model to continue generation. This improves concurrency and reduces the failure rate. The following code shows an example of retrying a request with qwen-plus-2025-07-14 after a rate limit is triggered for qwen-plus-2025-07-28.

    Sample code

    import os
    import asyncio
    from openai import AsyncOpenAI, APIStatusError
    
    # Configuration
    API_KEY = os.getenv("DASHSCOPE_API_KEY")
    # Primary model
    MODEL = "qwen-plus-2025-07-28"
    # Backup model
    BACKUP_MODEL = "qwen-plus-2025-07-14"
    # Test question
    QUESTION = "Who are you?"
    # Concurrency setting
    NUM_REQUESTS = 10
    
    client = AsyncOpenAI(
        api_key=API_KEY,
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    )
    
    async def send_request(model):
        """Send a single request"""
        try:
            await client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": QUESTION}]
            )
            return True
        except APIStatusError as e:
            if e.status_code == 429:
                print(f"[Rate limit triggered] Model {model}")
                return False
            raise
        except Exception as e:
            print(f"[Request failed] Model {model}, Error: {e}")
            return False
    
    async def task(i):
        # Try the primary model
        if await send_request(MODEL):
            return True
        # If rate-limited, try the backup model
        return await send_request(BACKUP_MODEL)
    
    async def main():
        results = await asyncio.gather(*(task(i) for i in range(NUM_REQUESTS)))
        print(f"Successful requests: {sum(results)}, Failed requests: {len(results) - sum(results)}")
    
    if __name__ == "__main__":
        asyncio.run(main())
  4. Split tasks: Processing long conversations or large documents can consume many tokens quickly. Split large batch tasks into smaller batches and submit them at different times.

  5. Use batch inference: If you do not need real-time results, use batch inference (Batch API). It is not subject to real-time rate limits, but you must consider queuing and processing time.

Text generation - Qwen

Qwen language models

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-max

600

1,000,000

qwen3-max-2025-09-23

60

100,000

qwen3-max-preview

600

1,000,000

qwen-max

600

1,000,000

qwen-max-latest

60

100,000

qwen-max-2025-01-25

(qwen-max-0125)

qwen-plus

600

1,000,000

qwen-plus-latest

60

100,000

qwen-plus-2025-12-01

1,000,000

qwen-plus-2025-09-11

120

qwen-plus-2025-07-28

60

100,000

qwen-plus-2025-07-14

(qwen-plus-0714)

qwen-plus-2025-04-28

(qwen-plus-0428)

qwen-plus-2025-01-25

(qwen-plus-0125)

qwen-flash

600

5,000,000

qwen-flash-2025-07-28

600

5,000,000

qwq-plus

60

100,000

qwen-turbo

600

5,000,000

qwen-turbo-latest

60

qwen-turbo-2025-04-28

(qwen-turbo-0428)

qwen-turbo-2024-11-01

(qwen-turbo-1101)

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-max

600

1,000,000

qwen3-max-2025-09-23

60

100,000

qwen3-max-preview

600

1,000,000

qwen-max

1,200

1,000,000

qwen-max-latest

qwen-max-2025-01-25

(qwen-max-0125)

60

100,000

qwen-max-2024-09-19

(qwen-max-0919)

qwen-plus

15,000

5,000,000

qwen-plus-latest

1,200,000

qwen-plus-2025-12-01

60

1,000,000

qwen-plus-2025-09-11

qwen-plus-2025-07-28

(qwen-plus-0728)

qwen-plus-2025-07-14

(qwen-plus-0714)

100,000

qwen-plus-2025-04-28

(qwen-plus-0428)

1,000,000

qwen-plus-2025-01-25

(qwen-plus-0125)

150,000

qwen-plus-2025-01-12

(qwen-plus-0112)

qwen-plus-2024-12-20

(qwen-plus-1220)

qwen-plus-2024-11-27

(qwen-plus-1127)

qwen-plus-2024-11-25

(qwen-plus-1125)

qwen-plus-2024-09-19

(qwen-plus-0919)

qwen-plus-2024-08-06

(qwen-plus-0806)

qwen-flash

15,000

10,000,000

qwen-flash-2025-07-28

60

1,000,000

qwq-plus

600

1,000,000

qwq-plus-latest

qwq-plus-2025-03-05

60

100,000

qwen-turbo

1,200

5,000,000

qwen-turbo-latest

qwen-turbo-2025-04-28

(qwen-turbo-0428)

60

1,000,000

qwen-turbo-2025-02-11

(qwen-turbo-0211)

5,000,000

qwen-turbo-2024-11-01

(qwen-turbo-1101)

qwen-turbo-2024-09-19

(qwen-turbo-0919)

150,000

qwen-long-latest

1,200

60,000

qwen-long-2025-01-25

(qwen-long-0125)

3

7,500

Qwen-Omni (omni-modal)

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-omni-flash

60

100,000

qwen3-omni-flash-2025-12-01

qwen3-omni-flash-2025-09-15

qwen-omni-turbo

qwen-omni-turbo-latest

qwen-omni-turbo-2025-03-26

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-omni-flash

60

100,000

qwen3-omni-flash-2025-12-01

qwen3-omni-flash-2025-09-15

qwen-omni-turbo

qwen-omni-turbo-latest

qwen-omni-turbo-2025-03-26

(qwen-omni-turbo-0326)

qwen-omni-turbo-2025-01-19

(qwen-omni-turbo-0119)

Qwen-Omni-Realtime (real-time multimodal)

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-omni-flash-realtime

60

100,000

qwen3-omni-flash-realtime-2025-12-01

qwen3-omni-flash-realtime-2025-09-15

qwen-omni-turbo-realtime

qwen-omni-turbo-realtime-latest

qwen-omni-turbo-realtime-2025-05-08

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-omni-flash-realtime

60

100,000

qwen3-omni-flash-realtime-2025-12-01

qwen3-omni-flash-realtime-2025-09-15

qwen-omni-turbo-realtime

qwen-omni-turbo-realtime-latest

qwen-omni-turbo-realtime-2025-05-08

Qwen-VL (visual understanding/image-to-text)

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qvq-max

60

100,000

qvq-max-latest

qvq-max-2025-03-25

(qvq-max-0325)

qwen-vl-max

1,200

1,000,000

qwen-vl-max-latest

qwen-vl-max-2025-08-13

(qwen-vl-max-0813)

60

100,000

qwen-vl-max-2025-04-08

(qwen-vl-max-0408)

1,200

1,000,000

qwen3-vl-plus

qwen-vl-plus

qwen-vl-plus-latest

qwen3-vl-plus-2025-09-23

60

100,000

qwen-vl-plus-2025-08-15

(qwen-vl-plus-0815)

120

1,000,000

qwen-vl-plus-2025-05-07

(qwen-vl-plus-0507)

qwen-vl-plus-2025-01-25

(qwen-vl-plus-0125)

1,200

qwen3-vl-flash

1,200

1,000,000

qwen3-vl-flash-2025-10-15

120

1,000,000

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qvq-max

60

100,000

qvq-max-latest

qvq-max-2025-05-15

(qvq-max-0515)

qvq-max-2025-03-25

(qvq-max-0325)

qvq-plus

qvq-plus-latest

qvq-plus-2025-05-15

(qvq-plus-0515)

qwen-vl-max

1,200

1,000,000

qwen-vl-max-latest

qwen-vl-max-2025-08-13

(qwen-vl-max-0813)

60

100,000

qwen-vl-max-2025-04-08

(qwen-vl-max-0408)

qwen-vl-max-2025-04-02

(qwen-vl-max-0402)

qwen-vl-max-2025-01-25

(qwen-vl-max-0125)

qwen-vl-max-2024-12-30

(qwen-vl-max-1230)

qwen-vl-max-2024-11-19

(qwen-vl-max-1119)

qwen-vl-max-2024-10-30

(qwen-vl-max-1030)

qwen-vl-max-2024-08-09

(qwen-vl-max-0809)

15

25,000

qwen3-vl-plus

1,200

1,000,000

qwen-vl-plus

qwen-vl-plus-latest

qwen3-vl-plus-2025-09-23

60

100,000

qwen-vl-plus-2025-08-15

(qwen-vl-plus-0815)

qwen-vl-plus-2025-07-10

(qwen-vl-plus-0710)

qwen-vl-plus-2025-05-07

(qwen-vl-plus-0507)

qwen-vl-plus-2025-01-25

(qwen-vl-plus-0125)

qwen-vl-plus-2025-01-02

(qwen-vl-plus-0102)

qwen-vl-plus-2024-08-09

(qwen-vl-plus-0809)

qwen3-vl-flash

1,200

1,000,000

qwen3-vl-flash-2025-10-15

60

100,000

Qwen-OCR (text extraction)

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen-vl-ocr

600

6,000,000

qwen-vl-ocr-2025-11-20

1,200

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen-vl-ocr

600

6,000,000

qwen-vl-ocr-latest

1,200

qwen-vl-ocr-2025-11-20

qwen-vl-ocr-2025-04-13

6,00

qwen-vl-ocr-2024-10-28

Qwen-Math

Note

Supported only in the China (Beijing) region.

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen-math-plus

1,200

1,000,000

qwen-math-plus-latest

qwen-math-plus-2024-09-19

(qwen-math-plus-0919)

60

100,000

qwen-math-plus-2024-08-16

(qwen-math-plus-0816)

10

20,000

qwen-math-turbo

1200

1,000,000

qwen-math-turbo-latest

qwen-math-turbo-2024-09-19

(qwen-math-turbo-0919)

60

100,000

Qwen-Coder

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-coder-plus

2,400

2,000,000

qwen3-coder-plus-2025-09-23

60

1,000,000

qwen3-coder-plus-2025-07-22

60

1,000,000

qwen3-coder-flash

600

5,000,000

qwen3-coder-flash-2025-07-28

600

5,000,000

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-coder-plus

2,400

2,000,000

qwen3-coder-plus-2025-09-23

60

1,000,000

qwen3-coder-plus-2025-07-22

qwen3-coder-flash

1,200

qwen3-coder-flash-2025-07-28

60

qwen-coder-plus

1,200

qwen-coder-plus-latest

qwen-coder-plus-2024-11-06

(qwen-coder-plus-1106)

60

100,000

qwen-coder-turbo

1,200

1,000,000

qwen-coder-turbo-latest

qwen-coder-turbo-2024-09-19

(qwen-coder-turbo-0919)

60

100,000

Qwen-MT

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen-mt-plus

60

100,000

qwen-mt-flash

qwen-mt-lite

qwen-mt-turbo

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen-mt-plus

60

25,000

qwen-mt-flash

35,000

qwen-mt-lite

100,000

qwen-mt-turbo

35,000

Qwen data mining model

Note

Supported only in the China (Beijing) region.

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen-doc-turbo

600

3,000,000

Qwen deep research models

Note

Supported only in the China (Beijing) region.

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen-deep-research

120

1,200,000

Text generation - Open-source Qwen

Open-source Qwen language models

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-next-80b-a3b-thinking

600

1,000,000

qwen3-next-80b-a3b-instruct

qwen3-235b-a22b-thinking-2507

qwen3-235b-a22b-instruct-2507

qwen3-30b-a3b-thinking-2507

qwen3-30b-a3b-instruct-2507

qwen3-235b-a22b

qwen3-32b

qwen3-30b-a3b

qwen3-14b

qwen3-8b

qwen3-4b

qwen3-1.7b

qwen3-0.6b

qwen2.5-14b-instruct-1m

60

1,000,000

qwen2.5-7b-instruct-1m

qwen2.5-72b-instruct

100,000

qwen2.5-32b-instruct

qwen2.5-14b-instruct

qwen2.5-7b-instruct

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-next-80b-a3b-thinking

600

1,000,000

qwen3-next-80b-a3b-instruct

qwen3-235b-a22b-thinking-2507

qwen3-235b-a22b-instruct-2507

qwen3-30b-a3b-thinking-2507

qwen3-30b-a3b-instruct-2507

qwen3-235b-a22b

qwen3-30b-a3b

qwen3-32b

qwen3-14b

qwen3-8b

qwen3-4b

qwen3-1.7b

qwen3-0.6b

qwq-32b

qwq-32b-preview

1,200

qwen2.5-72b-instruct

qwen2.5-32b-instruct

qwen2.5-14b-instruct

qwen2.5-14b-instruct-1m

qwen2.5-7b-instruct

qwen2.5-7b-instruct-1m

qwen2.5-3b-instruct

2,000,000

qwen2.5-1.5b-instruct

qwen2.5-0.5b-instruct

Qwen3-Omni

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen2.5-omni-7b

60

100,000

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen2.5-omni-7b

60

100,000

Qwen3-Omni-Captioner

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-omni-30b-a3b-captioner

60

100,000

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-omni-30b-a3b-captioner

60

100,000

Qwen-VL (visual understanding/image-to-text)

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-vl-32b-thinking

60

100,000

qwen3-vl-32b-instruct

qwen3-vl-30b-a3b-thinking

qwen3-vl-30b-a3b-instruct

qwen3-vl-8b-thinking

qwen3-vl-8b-instruct

qwen3-vl-235b-a22b-thinking

qwen3-vl-235b-a22b-instruct

qwen2.5-vl-72b-instruct

qwen2.5-vl-32b-instruct

qwen2.5-vl-7b-instruct

qwen2.5-vl-3b-instruct

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-vl-32b-thinking

600

1,000,000

qwen3-vl-32b-instruct

qwen3-vl-30b-a3b-thinking

qwen3-vl-30b-a3b-instruct

qwen3-vl-8b-thinking

qwen3-vl-8b-instruct

qwen3-vl-235b-a22b-thinking

60

100,000

qwen3-vl-235b-a22b-instruct

qwen2.5-vl-72b-instruct

qwen2.5-vl-32b-instruct

qwen2.5-vl-7b-instruct

1,200

1,000,000

qwen2.5-vl-3b-instruct

qwen2-vl-72b-instruct

60

100,000

qwen2-vl-7b-instruct

1,200

1,000,000

qwen2-vl-2b-instruct

qvq-72b-preview

60

100,000

Qwen-Math

Note

Supported only in the China (Beijing) region.

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen2.5-math-72b-instruct

1,200

1,000,000

qwen2.5-math-7b-instruct

qwen2.5-math-1.5b-instruct

Qwen-Coder

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-coder-480b-a35b-instruct

600

1,000,000

qwen3-coder-30b-a3b-instruct

600

1,000,000

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-coder-480b-a35b-instruct

600

1,000,000

qwen3-coder-30b-a3b-instruct

600

qwen2.5-coder-32b-instruct

1,200

qwen2.5-coder-14b-instruct

qwen2.5-coder-7b-instruct

qwen2.5-coder-3b-instruct

2,000,000

qwen2.5-coder-1.5b-instruct

qwen2.5-coder-0.5b-instruct

Text generation - Third-party models

DeepSeek

Note

Supported only in the China (Beijing) region.

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

deepseek-v3.2

15,000

1,200,000

deepseek-v3.2-exp

15,000

1,200,000

deepseek-v3.1

15,000

1,200,000

deepseek-r1-0528

60

100,000

deepseek-r1

15,000

1,200,000

deepseek-v3

deepseek-r1-distill-qwen-7b

deepseek-r1-distill-qwen-14b

deepseek-r1-distill-qwen-32b

deepseek-r1-distill-qwen-1.5b

60

100,000

deepseek-r1-distill-llama-8b

deepseek-r1-distill-llama-70b

Kimi

Note

Supported only in the China (Beijing) region.

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

kimi-k2-thinking

60

100,000

Moonshot-Kimi-K2-Instruct

60

100,000

Image generation

Qwen (Qwen-Image)

International (Singapore)

Service

Model

Rate limit (triggered if any value is exceeded)

Task submission RPS limit

Concurrent tasks

Text-to-image

qwen-image-plus

2

2

qwen-image

2

2

Image editing

qwen-image-edit-plus

2

No limit for sync APIs

qwen-image-edit-plus-2025-10-30

2

No limit for sync APIs

qwen-image-edit

2

No limit for sync APIs

Mainland China (Beijing)

Service

Model

Rate limit (triggered if any value is exceeded)

Task submission RPS limit

Concurrent tasks

Text-to-image

qwen-image-plus

2

2

qwen-image

2

2

Image editing

qwen-image-edit-plus

2

No limit for sync APIs

qwen-image-edit-plus-2025-10-30

2

No limit for sync APIs

qwen-image-edit

2

No limit for sync APIs

Image translation

qwen-mt-image

1

2

Wan

International (Singapore)

Service

Model

Rate limit (triggered if any value is exceeded)

Task submission RPS limit

Concurrent tasks

Text-to-image

wan2.5-t2i-preview

5

5

wan2.2-t2i-flash

2

2

wan2.2-t2i-plus

wan2.1-t2i-turbo

wan2.1-t2i-plus

Image editing

wan2.5-i2i-preview

5

5

Mainland China (Beijing)

Service

Model

Rate limit (triggered if any value is exceeded)

Task submission RPS limit

Concurrent tasks

Text-to-image

wan2.5-t2i-preview

5

5

wanx2.0-t2i-turbo

2

2

wanx2.1-t2i-turbo

wanx2.1-t2i-plus

wan2.2-t2i-flash

wan2.2-t2i-plus

General image editing

wan2.5-i2i-preview

5

5

wanx2.1-imageedit

2

2

OutfitAnyone

Note

Supported only in the China (Beijing) region.

Model

Rate limit (triggered if any value is exceeded)

Task submission RPS limit

Concurrent tasks

aitryon-plus

10

5

aitryon-parsing-v1

10

No limit for sync APIs

Video generation

Wan series

International (Singapore)

Service

Model

Rate limit (triggered if any value is exceeded)

Task submission RPS limit

Concurrent tasks

Text-to-image

wan2.5-t2v-preview

5

5

wan2.2-t2v-plus

2

2

wan2.1-t2v-turbo

wan2.1-t2v-plus

Image-to-video - first frame

wan2.5-i2v-preview

5

5

wan2.2-i2v-flash

2

2

wan2.1-i2v-plus

wan2.1-i2v-turbo

wan2.2-i2v-plus

Image-to-video - first and last frames

wan2.1-kf2v-plus

General video editing

wan2.1-vace-plus

Animate image

wan2.2-animate-move

5

1

Video character swap

wan2.2-animate-mix

5

1

Mainland China (Beijing)

Service

Model

Rate limit (triggered if any value is exceeded)

Task submission RPS limit

Concurrent tasks

Text-to-video

wan2.5-t2v-preview

5

5

wan2.2-t2v-plus

2

2

wanx2.1-t2v-turbo

wanx2.1-t2v-plus

Image-to-video - first frame

wan2.5-i2v-preview

5

5

wan2.2-i2v-plus

2

2

wanx2.1-i2v-turbo

wanx2.1-i2v-plus

Image-to-video - first and last frames

wanx2.1-kf2v-plus

General video editing

wanx2.1-vace-plus

Digital human

wan2.2-s2v-detect

5

No limit for sync APIs

wan2.2-s2v

1

Animate image

wan2.2-animate-move

5

1

Video character swap

wan2.2-animate-mix

5

1

AnimateAnyone

Note

Supported only in the China (Beijing) region.

Model

Task submission RPS limit

Concurrent tasks

animate-anyone-detect-gen2

5

No limit for sync APIs

animate-anyone-template-gen2

1

At any given time, only one task is running. Other tasks in the queue are in a pending state.

animate-anyone-gen2

EMO

Note

Supported only in the China (Beijing) region.

Model

Task submission RPS limit

Concurrent tasks

emo-detect-v1

5

No limit for sync APIs

emo-v1

1

At any given time, only one task is running. Other tasks in the queue are in a pending state.

LivePortrait

Note

Supported only in the China (Beijing) region.

Model

Task submission RPS limit

Concurrent tasks

liveportrait-detect

5

No limit for sync APIs

liveportrait

1

At any given time, only one task is running. Other tasks in the queue are in a pending state.

VideoRetalk

Note

Supported only in the China (Beijing) region.

Model

Task submission RPS limit

Concurrent tasks

videoretalk

1

1

At any given time, only one task is running. Other tasks in the queue are in a pending state.

Emoji

Note

Supported only in the China (Beijing) region.

Model

Task submission RPS limit

Concurrent tasks

emoji-detect-v1

1

No limit for sync APIs

emoji-v1

1

At any given time, only one task is running. Other tasks in the queue are in a pending state.

Video style transform

Note

Supported only in the China (Beijing) region.

Model

Task submission RPS limit

Concurrent tasks

video-style-transform

2

1

At any given time, only one task is running. Other tasks in the queue are in a pending state.

Speech synthesis (text-to-speech)

Qwen speech synthesis

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute throttling conditions. The service may impose limits based on RPS, calculated as RPM/60.

RPM

qwen3-tts-flash

10

qwen3-tts-flash-2025-11-27

180

qwen3-tts-flash-2025-09-18

10

Mainland China (Beijing)

Qwen3-TTS-Flash

Model

RPM

qwen3-tts-flash

10

qwen3-tts-flash-2025-11-27

180

qwen3-tts-flash-2025-09-18

10

Qwen-TTS

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute throttling conditions. The service may impose limits based on RPS, calculated as RPM/60, and TPS, calculated as TPM/60.

RPM

TPM

Includes input and output tokens

qwen-tts

10

100,000

qwen-tts-latest

qwen-tts-2025-05-22

qwen-tts-2025-04-10

Qwen real-time speech synthesis

International (Singapore)

Qwen3-TTS-VC-Realtime

Model

RPM

qwen3-tts-vc-realtime-2025-11-27

180

Qwen3-TTS-Flash-Realtime

Model

RPM

qwen3-tts-flash-realtime

10

qwen3-tts-flash-realtime-2025-11-27

180

qwen3-tts-flash-realtime-2025-09-18

10

Mainland China (Beijing)

Qwen3-TTS-VC-Realtime

Model

RPM

qwen3-tts-vc-realtime-2025-11-27

180

Qwen3-TTS-Flash-Realtime

Model

RPM

qwen3-tts-flash-realtime

10

qwen3-tts-flash-realtime-2025-11-27

180

qwen3-tts-flash-realtime-2025-09-18

10

Qwen-TTS-Realtime

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute throttling conditions. The service may impose limits based on RPS, calculated as RPM/60, and TPS, calculated as TPM/60.

RPM

TPM

Includes input and output tokens

qwen-tts-realtime

10

100,000

qwen-tts-realtime-latest

qwen-tts-realtime-2025-07-15

Qwen voice cloning

International (Singapore)

Model

Task submission RPS limit

qwen-voice-enrollment

3

Mainland China (Beijing)

Model

Task submission RPS limit

qwen-voice-enrollment

3

CosyVoice speech synthesis

Note

Supported only in the China (Beijing) region.

Speech synthesis

Model

Task submission RPS limit

cosyvoice-v3-plus

3

cosyvoice-v3-flash

cosyvoice-v2

Voice cloning

Model

Task submission RPS limit

cosyvoice-v3-plus

10

The total concurrent request limit for the voice cloning feature is 10 RPS. This limit applies whether you call a single model version or multiple model versions at the same time. This means:

  • If you call only v2, its maximum concurrent request rate is 10 RPS.

  • If you call v2 and v3 at the same time, their combined request rate cannot exceed 10 RPS. For example, if v2 uses 7 RPS, v3 can use a maximum of 3 RPS.

cosyvoice-v3-flash

cosyvoice-v2

Speech recognition (speech-to-text) and translation (speech-to-translation)

Qwen3-LiveTranslate-Flash-Realtime

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute throttling conditions. The service may impose limits based on RPS, calculated as RPM/60, and TPS, calculated as TPM/60.

RPM

TPM

Includes input and output tokens

qwen3-livetranslate-flash-realtime

10

100,000

qwen3-livetranslate-flash-realtime-2025-09-22

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute throttling conditions. The service may impose limits based on RPS, calculated as RPM/60, and TPS, calculated as TPM/60.

RPM

TPM

Includes input and output tokens

qwen3-livetranslate-flash-realtime

10

100,000

qwen3-livetranslate-flash-realtime-2025-09-22

Qwen audio file recognition

International (Singapore)

Qwen3-ASR-Flash-Filetrans

Model

RPM

qwen3-asr-flash-filetrans

100

qwen3-asr-flash-filetrans-2025-11-17

Qwen3-ASR-Flash

Model

RPM

qwen3-asr-flash

100

qwen3-asr-flash-2025-09-08

Mainland China (Beijing)

Qwen3-ASR-Flash-Filetrans

Model

RPM

qwen3-asr-flash-filetrans

100

qwen3-asr-flash-filetrans-2025-11-17

Qwen3-ASR-Flash

Model

RPM

qwen3-asr-flash

100

qwen3-asr-flash-2025-09-08

Qwen real-time speech recognition

International (Singapore)

Model

RPS

qwen3-asr-flash-realtime

20

qwen3-asr-flash-realtime-2025-10-27

Mainland China (Beijing)

Model

RPS

qwen3-asr-flash-realtime

20

qwen3-asr-flash-realtime-2025-10-27

Paraformer speech recognition

Note

Supported only in the China (Beijing) region.

Model

Task submission RPS limit

paraformer-realtime-v2

20

paraformer-realtime-8k-v2

Model

Task submission RPS limit

Task query RPS limit

paraformer-v2

20

20

paraformer-8k-v2

20

Fun-ASR audio file recognition

International (Singapore)

Model

Task submission RPS limit

Task query RPS limit

fun-asr

10

20

fun-asr-2025-11-07

fun-asr-2025-08-25

Mainland China (Beijing)

Model

Task submission RPS limit

Task query RPS limit

fun-asr

10

20

fun-asr-2025-11-07

fun-asr-2025-08-25

fun-asr-mtl

fun-asr-mtl-2025-08-25

Fun-ASR real-time speech recognition

Note

Supported only in the China (Beijing) region.

Model

Task submission RPS limit

fun-asr-realtime

20

fun-asr-realtime-2025-11-07

fun-asr-realtime-2025-09-15

Text embedding

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM/Tasks

Includes input and output tokens

text-embedding-v4

1,800

1,000,000

text-embedding-v3

6,000

24,000,000

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

RPS

TPM/Tasks

Includes input and output tokens

text-embedding-v4

30

1,200,000

Multimodal embedding

Note

Supported only in the China (Beijing) region.

Model

Rate limit

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Input tokens only

multimodal-embedding-v1

120

200,000

Text rerank

Note

Supported only in the China (Beijing) region.

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

gte-rerank-v2

5,040

4,980,000,000

Domain specific

Intention recognition

Note

Supported only in the China (Beijing) region.

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

tongyi-intent-detect-v3

1,200

1,000,000

Role playing

International (Singapore)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen-plus-character-ja

60

100,000

Mainland China (Beijing)

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen-plus-character

120

20,000

Retired models

For more information, see Model deprecation.

Retired on August 20, 2025

Category

Model

Rate limit (triggered if any value is exceeded)

RPM

TPM

Includes input and output tokens

Text generation - Qwen

qwen2-72b-instruct

0

0

qwen2-57b-a14b-instruct

qwen2-7b-instruct

qwen1.5-110b-chat

qwen1.5-72b-chat

qwen1.5-32b-chat

qwen1.5-14b-chat

qwen1.5-7b-chat