Rate limits - Alibaba Cloud Model Studio - Alibaba Cloud Documentation Center

To ensure fair use, Alibaba Cloud Model Studio applies basic rate limits. These limits are model-specific and linked to your Alibaba Cloud account. The limit is calculated based on the total calls to a model from all RAM users, workspaces, and API keys under your account. If you exceed the limit, API requests will fail. You must wait for the limit to reset before making another call.

Rules

Account-level limits: Rate limits apply at the Alibaba Cloud account level. They are calculated based on the total calls from all RAM users, workspaces, and API keys under the account.
Model-specific limits: Each model has an independent rate limit. See the tables below for details.

FAQ

Why is rate limiting triggered?

Check the error message:

Requests rate limit exceeded or You exceeded your current requests list: This error indicates that the call frequency limit was triggered.
Allocated quota exceeded or You exceeded your current quota: This error indicates that the token consumption limit was triggered.
Request rate increased too quickly: This error indicates that a sudden surge in call frequency triggered the system's stability protection, even if the Requests Per Minute (RPM) or Tokens Per Minute (TPM) limits were not reached.
For other errors, see Error messages to identify the cause.

Note: In addition to RPM and TPM, rate limits may also be enforced at the per-second level. These limits are Requests per Second (RPS), calculated as RPM/60, and Requests per Second (RPS), calculated as TPM/60. A burst of requests in a short period can trigger rate limiting, even if the total number of calls is below the per-minute limit.

How to view model call usage?

One hour after you call a model, go to the Monitoring (Singapore or Beijing) page. Set the query conditions, such as the time range and workspace. Then, in the Models area, find the target model and click Monitor in the Actions column to view the model's call statistics. For more information, see the Monitoring document.

Data is updated hourly. During peak periods, there may be an hour-level latency.

How long does it take to recover after a rate limit is triggered?

The limit typically resets within one minute. If other errors occur, see Error messages for solutions.

How to avoid rate limiting?

Choose a model with a higher rate limit: Stable or latest versions have higher rate limits than older snapshot versions.
Optimize your calling strategy
- Adjust the call frequency: If you receive a "Requests rate limit exceeded" or "You exceeded your current requests list" error, reduce the call frequency.
- Reduce token consumption: If you receive an "Allocated quota exceeded" or "You exceeded your current quota" error, shorten the input or output length.
- Smooth the request rate: If a sudden increase in call frequency triggers system stability protection, you may receive a "Request rate increased too quickly" error. In this case, optimize your client-side calling logic. You can adopt a request smoothing strategy, such as uniform scheduling, exponential backoff, or a request queue buffer. This strategy distributes requests evenly over the time window and avoids instantaneous peaks.

Add a backup model

If you encounter a rate limit error, switch to a backup model to continue generation. This improves concurrency and reduces the failure rate. The following code shows an example of retrying a request with qwen-plus-2025-07-14 after a rate limit is triggered for qwen-plus-2025-07-28.

Sample code

import os
import asyncio
from openai import AsyncOpenAI, APIStatusError

# Configuration
API_KEY = os.getenv("DASHSCOPE_API_KEY")
# Primary model
MODEL = "qwen-plus-2025-07-28"
# Backup model
BACKUP_MODEL = "qwen-plus-2025-07-14"
# Test question
QUESTION = "Who are you?"
# Concurrency setting
NUM_REQUESTS = 10

client = AsyncOpenAI(
    api_key=API_KEY,
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

async def send_request(model):
    """Send a single request"""
    try:
        await client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": QUESTION}]
        )
        return True
    except APIStatusError as e:
        if e.status_code == 429:
            print(f"[Rate limit triggered] Model {model}")
            return False
        raise
    except Exception as e:
        print(f"[Request failed] Model {model}, Error: {e}")
        return False

async def task(i):
    # Try the primary model
    if await send_request(MODEL):
        return True
    # If rate-limited, try the backup model
    return await send_request(BACKUP_MODEL)

async def main():
    results = await asyncio.gather(*(task(i) for i in range(NUM_REQUESTS)))
    print(f"Successful requests: {sum(results)}, Failed requests: {len(results) - sum(results)}")

if __name__ == "__main__":
    asyncio.run(main())

Split tasks: Processing long conversations or large documents can consume many tokens quickly. Split large batch tasks into smaller batches and submit them at different times.
Use batch inference: If you do not need real-time results, use batch inference (Batch API). It is not subject to real-time rate limits, but you must consider queuing and processing time.

Text generation - Qwen

Qwen language models

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-max	600	1,000,000
qwen3-max-2025-09-23	60	100,000
qwen3-max-preview	600	1,000,000
qwen-plus	15,000	5,000,000
qwen-plus-2025-12-01	60	1,000,000
qwen-plus-2025-09-11
qwen-plus-2025-07-28
qwen-flash	15,000	10,000,000
qwen-flash-2025-07-28	60	1,000,000

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-max	600	1,000,000
qwen3-max-2026-01-23	600	1,000,000
qwen3-max-2025-09-23	60	100,000
qwen3-max-preview	600	1,000,000
qwen-max	600	1,000,000
qwen-max-latest	60	100,000
qwen-max-2025-01-25 (qwen-max-0125)	60	100,000
qwen-plus	600	1,000,000
qwen-plus-latest	60	100,000
qwen-plus-2025-12-01	60	1,000,000
qwen-plus-2025-09-11	120	1,000,000
qwen-plus-2025-07-28	60	100,000
qwen-plus-2025-07-14 (qwen-plus-0714)
qwen-plus-2025-04-28 (qwen-plus-0428)
qwen-plus-2025-01-25 (qwen-plus-0125)
qwen-flash	600	5,000,000
qwen-flash-2025-07-28	600	5,000,000
qwq-plus	60	100,000
qwen-turbo	600	5,000,000
qwen-turbo-latest	60
qwen-turbo-2025-04-28 (qwen-turbo-0428)
qwen-turbo-2024-11-01 (qwen-turbo-1101)

US

In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen-plus-us	600	1,000,000
qwen-plus-2025-12-01-us	60
qwen-flash-us	600	5,000,000
qwen-flash-2025-07-28-us

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-max	30,000	5,000,000
qwen3-max-2026-01-23	600	1,000,000
qwen3-max-2025-09-23	60	100,000
qwen3-max-preview	600	1,000,000
qwen-max	1,200	1,000,000
qwen-max-latest	1,200	1,000,000
qwen-max-2025-01-25 (qwen-max-0125)	60	100,000
qwen-max-2024-09-19 (qwen-max-0919)
qwen-max-2024-09-19 (qwen-max-0919)
qwen-plus	30,000	5,000,000
qwen-plus-latest	15,000	1,200,000
qwen-plus-2025-12-01	60	1,000,000
qwen-plus-2025-09-11
qwen-plus-2025-07-28 (qwen-plus-0728)
qwen-plus-2025-07-14 (qwen-plus-0714)		100,000
qwen-plus-2025-04-28 (qwen-plus-0428)		1,000,000
qwen-plus-2025-01-25 (qwen-plus-0125)		150,000
qwen-plus-2025-01-12 (qwen-plus-0112)
qwen-plus-2024-12-20 (qwen-plus-1220)
qwen-flash	30,000	10,000,000
qwen-flash-2025-07-28	60	1,000,000
qwq-plus	600	1,000,000
qwq-plus-latest	600	1,000,000
qwq-plus-2025-03-05	60	100,000
qwen-turbo	1,200	5,000,000
qwen-turbo-latest	1,200	5,000,000
qwen-turbo-2025-04-28 (qwen-turbo-0428)	60	1,000,000
qwen-turbo-2025-02-11 (qwen-turbo-0211)		5,000,000
qwen-turbo-2024-11-01 (qwen-turbo-1101)		5,000,000
qwen-long-latest	1,200	60,000
qwen-long-2025-01-25 (qwen-long-0125)	3	7,500

Qwen-VL (visual understanding/image-to-text)

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-vl-plus	1,200	1,000,000
qwen3-vl-plus-2025-09-23	60	100,000
qwen3-vl-flash	1,200	1,000,000
qwen3-vl-flash-2025-10-15	60	100,000

International

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-vl-plus	1,200	1,000,000
qwen3-vl-plus-2025-12-19	60	100,000
qwen3-vl-plus-2025-09-23	60	100,000
qwen3-vl-flash	1,200	1,000,000
qwen3-vl-flash-2026-01-22	60	100,000
qwen3-vl-flash-2025-10-15	120	1,000,000
qwen-vl-max	1,200	1,000,000
qwen-vl-max-latest	1,200	1,000,000
qwen-vl-max-2025-08-13 (qwen-vl-max-0813)	60	100,000
qwen-vl-max-2025-04-08 (qwen-vl-max-0408)	1,200	1,000,000
qwen-vl-plus
qwen-vl-plus-latest
qwen-vl-plus-2025-08-15 (qwen-vl-plus-0815)	120	1,000,000
qwen-vl-plus-2025-05-07 (qwen-vl-plus-0507)	120
qwen-vl-plus-2025-01-25 (qwen-vl-plus-0125)	1,200
qvq-max	60	100,000
qvq-max-latest
qvq-max-2025-03-25 (qvq-max-0325)

US

In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-vl-flash-us	1,200	1,000,000
qwen3-vl-flash-2025-10-15-us	120	1,000,000

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-vl-plus	3,000	5,000,000
qwen3-vl-plus-2025-12-19	60	100,000
qwen3-vl-plus-2025-09-23	60	100,000
qwen3-vl-flash	3,000	5,000,000
qwen3-vl-flash-2026-01-22	60	100,000
qwen3-vl-flash-2025-10-15	60	100,000
qwen-vl-max	1,200	1,000,000
qwen-vl-max-latest	1,200	1,000,000
qwen-vl-max-2025-08-13 (qwen-vl-max-0813)	60	100,000
qwen-vl-max-2025-04-08 (qwen-vl-max-0408)
qwen-vl-max-2025-04-02 (qwen-vl-max-0402)
qwen-vl-max-2025-01-25 (qwen-vl-max-0125)
qwen-vl-max-2024-12-30 (qwen-vl-max-1230)
qwen-vl-max-2024-11-19 (qwen-vl-max-1119)
qwen-vl-plus	1,200	1,000,000
qwen-vl-plus-latest	1,200	1,000,000
qwen-vl-plus-2025-08-15 (qwen-vl-plus-0815)	60	100,000
qwen-vl-plus-2025-07-10 (qwen-vl-plus-0710)
qwen-vl-plus-2025-05-07 (qwen-vl-plus-0507)
qwen-vl-plus-2025-01-25 (qwen-vl-plus-0125)
qwen-vl-plus-2025-01-02 (qwen-vl-plus-0102)
qvq-max
qvq-max-latest
qvq-max-2025-05-15 (qvq-max-0515)
qvq-max-2025-03-25 (qvq-max-0325)
qvq-plus
qvq-plus-latest
qvq-plus-2025-05-15 (qvq-plus-0515)

Qwen-Omni (omni-modal)

International

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen3-omni-flash	60	100,000
qwen3-omni-flash-2025-12-01
qwen3-omni-flash-2025-09-15
qwen-omni-turbo
qwen-omni-turbo-latest
qwen-omni-turbo-2025-03-26

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen3-omni-flash	60	100,000
qwen3-omni-flash-2025-12-01
qwen3-omni-flash-2025-09-15
qwen-omni-turbo
qwen-omni-turbo-latest
qwen-omni-turbo-2025-03-26 (qwen-omni-turbo-0326)
qwen-omni-turbo-2025-01-19 (qwen-omni-turbo-0119)

Qwen-Omni-Realtime (real-time multimodal)

International

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen3-omni-flash-realtime	60	100,000
qwen3-omni-flash-realtime-2025-12-01
qwen3-omni-flash-realtime-2025-09-15
qwen-omni-turbo-realtime
qwen-omni-turbo-realtime-latest
qwen-omni-turbo-realtime-2025-05-08

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen3-omni-flash-realtime	60	100,000
qwen3-omni-flash-realtime-2025-12-01
qwen3-omni-flash-realtime-2025-09-15
qwen-omni-turbo-realtime
qwen-omni-turbo-realtime-latest
qwen-omni-turbo-realtime-2025-05-08

Qwen-OCR (text extraction)

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen-vl-ocr	600	6,000,000
qwen-vl-ocr-2025-11-20	1,200

International

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen-vl-ocr	600	6,000,000
qwen-vl-ocr-2025-11-20	1,200

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen-vl-ocr	600	6,000,000
qwen-vl-ocr-latest	1,200
qwen-vl-ocr-2025-11-20
qwen-vl-ocr-2025-04-13	600
qwen-vl-ocr-2024-10-28

Qwen-Math

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen-math-plus	1,200	1,000,000
qwen-math-plus-latest	1,200	1,000,000
qwen-math-plus-2024-09-19 (qwen-math-plus-0919)	60	100,000
qwen-math-plus-2024-08-16 (qwen-math-plus-0816)	10	20,000
qwen-math-turbo	1200	1,000,000
qwen-math-turbo-latest	1200	1,000,000
qwen-math-turbo-2024-09-19 (qwen-math-turbo-0919)	60	100,000

Qwen-Coder

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-coder-plus	2,400	2,000,000
qwen3-coder-plus-2025-09-23	60	1,000,000
qwen3-coder-plus-2025-07-22	60
qwen3-coder-flash	1,200
qwen3-coder-flash-2025-07-28	60

International

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-coder-plus	2,400	2,000,000
qwen3-coder-plus-2025-09-23	60	1,000,000
qwen3-coder-plus-2025-07-22	60	1,000,000
qwen3-coder-flash	600	5,000,000
qwen3-coder-flash-2025-07-28	600	5,000,000

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-coder-plus	5,000	5,000,000
qwen3-coder-plus-2025-09-23	60	1,000,000
qwen3-coder-plus-2025-07-22	60	1,000,000
qwen3-coder-flash	5,000	5,000,000
qwen3-coder-flash-2025-07-28	60	1,000,000
qwen-coder-plus	1,200
qwen-coder-plus-latest	1,200
qwen-coder-plus-2024-11-06 (qwen-coder-plus-1106)	60	100,000
qwen-coder-turbo	1,200	1,000,000
qwen-coder-turbo-latest	1,200	1,000,000
qwen-coder-turbo-2024-09-19 (qwen-coder-turbo-0919)	60	100,000

Qwen translation

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen-mt-plus	60	25,000
qwen-mt-flash		35,000
qwen-mt-lite		100,000

International

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen-mt-plus	60	100,000
qwen-mt-flash
qwen-mt-lite
qwen-mt-turbo

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen-mt-plus	60	25,000
qwen-mt-flash		35,000
qwen-mt-lite		100,000
qwen-mt-turbo		35,000

Qwen data mining

Note

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen-doc-turbo

600

3,000,000

Qwen deep research

Note

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen-deep-research

120

1,200,000

Text generation - Qwen - Open source

Open-source Qwen language models

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-next-80b-a3b-thinking	600	1,000,000
qwen3-next-80b-a3b-instruct
qwen3-235b-a22b-thinking-2507
qwen3-235b-a22b-instruct-2507
qwen3-30b-a3b-thinking-2507
qwen3-30b-a3b-instruct-2507
qwen3-235b-a22b
qwen3-30b-a3b
qwen3-32b
qwen3-14b
qwen3-8b

International

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-next-80b-a3b-thinking	600	1,000,000
qwen3-next-80b-a3b-instruct
qwen3-235b-a22b-thinking-2507
qwen3-235b-a22b-instruct-2507
qwen3-30b-a3b-thinking-2507
qwen3-30b-a3b-instruct-2507
qwen3-235b-a22b
qwen3-32b
qwen3-30b-a3b
qwen3-14b
qwen3-8b
qwen3-4b
qwen3-1.7b
qwen3-0.6b
qwen2.5-14b-instruct-1m	60	1,000,000
qwen2.5-7b-instruct-1m		1,000,000
qwen2.5-72b-instruct		100,000
qwen2.5-32b-instruct
qwen2.5-14b-instruct
qwen2.5-7b-instruct

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-next-80b-a3b-thinking	600	1,000,000
qwen3-next-80b-a3b-instruct
qwen3-235b-a22b-thinking-2507
qwen3-235b-a22b-instruct-2507
qwen3-30b-a3b-thinking-2507
qwen3-30b-a3b-instruct-2507
qwen3-235b-a22b
qwen3-30b-a3b
qwen3-32b
qwen3-14b
qwen3-8b
qwen3-4b
qwen3-1.7b
qwen3-0.6b
qwq-32b
qwq-32b-preview	1,200
qwen2.5-72b-instruct
qwen2.5-32b-instruct
qwen2.5-14b-instruct
qwen2.5-14b-instruct-1m
qwen2.5-7b-instruct
qwen2.5-7b-instruct-1m
qwen2.5-3b-instruct		2,000,000
qwen2.5-1.5b-instruct
qwen2.5-0.5b-instruct

Qwen-VL (visual understanding/image-to-text)

International

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
qwen3-vl-32b-thinking	60	100,000
qwen3-vl-32b-instruct
qwen3-vl-30b-a3b-thinking
qwen3-vl-30b-a3b-instruct
qwen3-vl-8b-thinking
qwen3-vl-8b-instruct
qwen3-vl-235b-a22b-thinking
qwen3-vl-235b-a22b-instruct
qwen2.5-vl-72b-instruct
qwen2.5-vl-32b-instruct
qwen2.5-vl-7b-instruct
qwen2.5-vl-3b-instruct

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen3-vl-32b-thinking	600	1,000,000
qwen3-vl-32b-instruct
qwen3-vl-30b-a3b-thinking
qwen3-vl-30b-a3b-instruct
qwen3-vl-8b-thinking
qwen3-vl-8b-instruct
qwen3-vl-235b-a22b-thinking	60	100,000
qwen3-vl-235b-a22b-instruct
qwen2.5-vl-72b-instruct
qwen2.5-vl-32b-instruct
qwen2.5-vl-7b-instruct	1,200	1,000,000
qwen2.5-vl-3b-instruct
qwen2-vl-72b-instruct	60	100,000
qwen2-vl-7b-instruct	1,200	1,000,000
qwen2-vl-2b-instruct
qvq-72b-preview	60	100,000

Qwen-Omni

International

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen2.5-omni-7b

100,000

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen2.5-omni-7b

100,000

Qwen3-Omni-Captioner

International

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-omni-30b-a3b-captioner

100,000

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen3-omni-30b-a3b-captioner

100,000

Qwen-Math

Note

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen2.5-math-72b-instruct	1,200	1,000,000
qwen2.5-math-7b-instruct
qwen2.5-math-1.5b-instruct

Qwen-Coder

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen3-coder-480b-a35b-instruct	600	1,000,000
qwen3-coder-30b-a3b-instruct

International

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen3-coder-480b-a35b-instruct	1,000,000	600
qwen3-coder-30b-a3b-instruct

Mainland China

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen3-coder-480b-a35b-instruct	600	1,000,000
qwen3-coder-30b-a3b-instruct
qwen2.5-coder-32b-instruct	1,200
qwen2.5-coder-14b-instruct
qwen2.5-coder-7b-instruct
qwen2.5-coder-3b-instruct		2,000,000
qwen2.5-coder-1.5b-instruct
qwen2.5-coder-0.5b-instruct

Text generation - Third-party

DeepSeek

Note

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
deepseek-v3.2	15,000	1,200,000
deepseek-v3.2-exp	15,000	1,200,000
deepseek-v3.1	15,000	1,200,000
deepseek-r1-0528	60	100,000
deepseek-r1	15,000	1,200,000
deepseek-v3
deepseek-r1-distill-qwen-7b
deepseek-r1-distill-qwen-14b
deepseek-r1-distill-qwen-32b
deepseek-r1-distill-qwen-1.5b	60	100,000
deepseek-r1-distill-llama-8b
deepseek-r1-distill-llama-70b

Kimi

Note

Supported only in Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM Includes input and output tokens
kimi-k2.5	60	100,000
kimi-k2-thinking	60	100,000
Moonshot-Kimi-K2-Instruct	60	100,000

GLM

Note

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
glm-4.7	60	1,000,000
glm-4.6

Image generation

Qwen (Qwen-Image)

International

Service	Model	Rate limit (triggered if any value is exceeded)
Service	Model	Task submission limit	Concurrent tasks
Text-to-image	qwen-image-max	2 per minute	No limit for sync API
	qwen-image-max-2025-12-30	2 per minute	No limit for sync API
	qwen-image-plus	2 per second	No limit for sync API / 2 for async API
	qwen-image-plus-2026-01-09	2 per second	No limit for sync API
	qwen-image	2 per second	No limit for sync API / 2 for async API
Image editing	qwen-image-edit-max	2 per minute	No limit for sync API
	qwen-image-edit-max-2026-01-16	2 per minute	No limit for sync API
	qwen-image-edit-plus	2 per second	No limit for sync API
	qwen-image-edit-plus-2025-12-15	2 per second	No limit for sync API
	qwen-image-edit-plus-2025-10-30	2 per second	No limit for sync API
	qwen-image-edit	2 per second	No limit for sync API

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Service	Model	Rate limit (triggered if any value is exceeded)
Service	Model	Task submission limit	Concurrent tasks
Text-to-image	qwen-image-max	2 per minute	No limit for sync API
	qwen-image-max-2025-12-30	2 per minute	No limit for sync API
	qwen-image-plus	2 per second	No limit for sync API / 2 for async API
	qwen-image-plus-2026-01-09	2 per second	No limit for sync API
	qwen-image	2 per second	No limit for sync API / 2 for async API
Image editing	qwen-image-edit-max	2 per minute	No limit for sync API
	qwen-image-edit-max-2026-01-16	2 per minute	No limit for sync API
	qwen-image-edit-plus	2 per second	No limit for sync API
	qwen-image-edit-plus-2025-12-15	2 per second	No limit for sync API
	qwen-image-edit-plus-2025-10-30	2 per second	No limit for sync API
	qwen-image-edit	2 per second	No limit for sync API
Image translation	qwen-mt-image	1 per second	2

Tongyi - text-to-image - Z-Image

International

Model	Rate limit (triggered if any value is exceeded)
Model	Task submission RPS limit	Concurrent tasks
z-image-turbo	2	No limit for sync API

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model	Rate limit (triggered if any value is exceeded)
Model	Task submission RPS limit	Concurrent tasks
z-image-turbo	2	No limit for sync API

Wan

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Service	Model	Rate limit (triggered if any value is exceeded)
Service	Model	Task submission RPS limit	Concurrent tasks
Text-to-image	wan2.6-t2i	5	5
Image generation	wan2.6-image	5	5

International

Service	Model	Rate limit (triggered if any value is exceeded)
		Task submission RPS limit	Concurrent tasks
Text-to-image	wan2.6-t2i	5	5
	wan2.5-t2i-preview
	wan2.2-t2i-flash	2	2
	wan2.2-t2i-plus
	wan2.1-t2i-turbo
	wan2.1-t2i-plus
Image editing	wan2.5-i2i-preview	5	5
Image generation	wan2.6-image	5	5

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Service	Model	Rate limit (triggered if any value is exceeded)
Service	Model	Task submission RPS limit	Concurrent tasks
Text-to-image	wan2.6-t2i	1	5
	wan2.5-t2i-preview	5	5
	wanx2.0-t2i-turbo	2	2
	wanx2.1-t2i-turbo
	wanx2.1-t2i-plus
	wan2.2-t2i-flash
	wan2.2-t2i-plus
General image editing	wan2.5-i2i-preview	5	5
General image editing	wanx2.1-imageedit	2	2
Image generation	wan2.6-image	5	5

OutfitAnyone

Note

Supported only in Mainland China.

Model	Rate limit (triggered if any value is exceeded)
Model	Task submission RPS limit	Concurrent tasks
aitryon-plus	10	5
aitryon-parsing-v1	10	No limit for sync API

Video generation

Wan series

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Service	Model	Rate limit (triggered if any value is exceeded)
		Task submission RPS limit	Concurrent tasks
Text-to-image	wan2.6-t2v	5	5
Image-to-video - first frame	wan2.6-i2v
Referece-to-video	wan2.6-r2v

International

Service	Model	Rate limit (triggered if any value is exceeded)
		Task submission RPS limit	Concurrent tasks
Text-to-image	wan2.6-t2v	5	5
	wan2.5-t2v-preview
	wan2.2-t2v-plus	2	2
	wan2.1-t2v-turbo
	wan2.1-t2v-plus
Image-to-video - first frame	wan2.6-i2v-flash	5	5
	wan2.6-i2v
	wan2.5-i2v-preview
	wan2.2-i2v-flash	2	2
	wan2.1-i2v-plus
	wan2.1-i2v-turbo
	wan2.2-i2v-plus
Image-to-video - first and last frames	wan2.2-kf2v-flash
	wan2.1-kf2v-plus
General video editing	wan2.1-vace-plus
Referece-to-video	wan2.6-r2v-flash	5	5
	wan2.6-r2v	5	5
Animate image	wan2.2-animate-move	5	1
Video character swap	wan2.2-animate-mix	5	1

US

In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.

Service	Model	Rate limit (triggered if any value is exceeded)
		Task submission RPS limit	Concurrent tasks
Text-to-image	wan2.6-t2v-us	5	5
Image-to-video - first frame	wan2.6-i2v-us

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Service	Model	Rate limit (triggered if any value is exceeded)
		Task submission RPS limit	Concurrent tasks
Text-to-image	wan2.6-t2v	5	5
	wan2.5-t2v-preview
	wan2.2-t2v-plus	2	2
	wanx2.1-t2v-turbo
	wanx2.1-t2v-plus
Image-to-video - first frame	wan2.6-i2v-flash	5	5
	wan2.6-i2v
	wan2.5-i2v-preview
	wan2.2-i2v-plus	2	2
	wanx2.1-i2v-turbo
	wanx2.1-i2v-plus
Image-to-video - first and last frames	wan2.2-kf2v-flash
	wanx2.1-kf2v-plus
General video editing	wanx2.1-vace-plus
Referece-to-video	wan2.6-r2v-flash	5	5
	wan2.6-r2v	5	5
Digital human	wan2.2-s2v-detect	5	No limit for sync API
	wan2.2-s2v		1
Animate image	wan2.2-animate-move	5	1
Video character swap	wan2.2-animate-mix	5	1

AnimateAnyone

Note

Model	Task submission RPS limit	Concurrent tasks
animate-anyone-detect-gen2	5	No limit for sync API
animate-anyone-template-gen2		1 At any given time, only one task is running. Other tasks in the queue are in a pending state.
animate-anyone-gen2

EMO

Note

Model

Task submission RPS limit

Concurrent tasks

emo-detect-v1

No limit for sync API

emo-v1

At any given time, only one task is running. Other tasks in the queue are in a pending state.

LivePortrait

Note

Model

Task submission RPS limit

Concurrent tasks

liveportrait-detect

No limit for sync API

liveportrait

At any given time, only one task is running. Other tasks in the queue are in a pending state.

VideoRetalk

Note

Model

Task submission RPS limit

Concurrent tasks

videoretalk

At any given time, only one task is running. Other tasks in the queue are in a pending state.

Emoji

Note

Model

Task submission RPS limit

Concurrent tasks

emoji-detect-v1

No limit for sync API

emoji-v1

At any given time, only one task is running. Other tasks in the queue are in a pending state.

Video style transform

Note

Model

Task submission RPS limit

Concurrent tasks

video-style-transform

At any given time, only one task is running. Other tasks in the queue are in a pending state.

Speech synthesis (text-to-speech)

Qwen speech synthesis

International

In the international deployment mode, the endpoint and data storage are both located in the Singapore region. The inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model	RPM
qwen3-tts-flash	180
qwen3-tts-flash-2025-11-27	180
qwen3-tts-flash-2025-09-18	10

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. The inference compute resources are limited to Mainland China.

Qwen3-TTS-Flash

Model	RPM
qwen3-tts-flash	180
qwen3-tts-flash-2025-11-27	180
qwen3-tts-flash-2025-09-18	10

Qwen-TTS

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen-tts	10	100,000
qwen-tts-latest
qwen-tts-2025-05-22
qwen-tts-2025-04-10

Qwen real-time speech synthesis

International

Qwen3-TTS-VD-Realtime

Model	RPM
qwen3-tts-vd-realtime-2025-12-16	180

Qwen3-TTS-VC-Realtime

Model	RPM
qwen3-tts-vc-realtime-2025-11-27	180

Qwen3-TTS-Flash-Realtime

Model	RPM
qwen3-tts-flash-realtime	180
qwen3-tts-flash-realtime-2025-11-27	180
qwen3-tts-flash-realtime-2025-09-18	10

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. The inference compute resources are limited to Mainland China.

Qwen3-TTS-VD-Realtime

Model	RPM
qwen3-tts-vd-realtime-2025-12-16	180

Qwen3-TTS-VC-Realtime

Model	RPM
qwen3-tts-vc-realtime-2025-11-27	180

Qwen3-TTS-Flash-Realtime

Model	RPM
qwen3-tts-flash-realtime	180
qwen3-tts-flash-realtime-2025-11-27	180
qwen3-tts-flash-realtime-2025-09-18	10

Qwen-TTS-Realtime

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen-tts-realtime	10	100,000
qwen-tts-realtime-latest
qwen-tts-realtime-2025-07-15

Qwen voice cloning

International

Model	RPM
qwen-voice-enrollment	180

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. The inference compute resources are limited to Mainland China.

Model	RPM
qwen-voice-enrollment	180

Qwen voice design

International

Model	RPM
qwen-voice-design	180

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. The inference compute resources are limited to Mainland China.

Model	RPM
qwen-voice-design	180

CosyVoice speech synthesis

Note

Speech synthesis

Model	Task submission RPS limit
cosyvoice-v3-plus	3
cosyvoice-v3-flash
cosyvoice-v2

Voice cloning

Model	Task submission RPS limit
cosyvoice-v3-plus	10 The total concurrent request limit for the voice cloning feature is 10 RPS. This limit applies whether you call a single model version or multiple model versions at the same time. This means: If you call only v2, its maximum concurrent request rate is 10 RPS. If you call v2 and v3 at the same time, their combined request rate cannot exceed 10 RPS. For example, if v2 uses 7 RPS, v3 can use a maximum of 3 RPS.
cosyvoice-v3-flash
cosyvoice-v2

Speech recognition (speech-to-text) and translation (speech-to-translation)

Qwen3-LiveTranslate-Flash

International

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen3-livetranslate-flash	100	100,000
qwen3-livetranslate-flash-2025-12-01

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. The inference compute resources are limited to Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen3-livetranslate-flash	100	100,000
qwen3-livetranslate-flash-2025-12-01

Qwen3-LiveTranslate-Flash-Realtime

International

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen3-livetranslate-flash-realtime	10	100,000
qwen3-livetranslate-flash-realtime-2025-09-22

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. The inference compute resources are limited to Mainland China.

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen3-livetranslate-flash-realtime	10	100,000
qwen3-livetranslate-flash-realtime-2025-09-22

Qwen audio file recognition

International

Qwen3-ASR-Flash-Filetrans

Model	RPM
qwen3-asr-flash-filetrans	100
qwen3-asr-flash-filetrans-2025-11-17	100

Qwen3-ASR-Flash

Model	RPM
qwen3-asr-flash	100
qwen3-asr-flash-2025-09-08	100

US

In the US deployment mode, the endpoint and data storage are both located in the US (Virginia) region. The inference compute resources are limited to the United States.

Model	RPM
qwen3-asr-flash-us	100
qwen3-asr-flash-2025-09-08-us	100

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. The inference compute resources are limited to Mainland China.

Qwen3-ASR-Flash-Filetrans

Model	RPM
qwen3-asr-flash-filetrans	100
qwen3-asr-flash-filetrans-2025-11-17	100

Qwen3-ASR-Flash

Model	Calls per Minute (RPM)
qwen3-asr-flash	100
qwen3-asr-flash-2025-09-08	100

Qwen real-time speech recognition

International

Model	RPS
qwen3-asr-flash-realtime	20
qwen3-asr-flash-realtime-2025-10-27	20

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. The inference compute resources are limited to Mainland China.

Model	RPS
qwen3-asr-flash-realtime	20
qwen3-asr-flash-realtime-2025-10-27	20

Paraformer speech recognition

Note

Model	Task submission RPS limit
paraformer-realtime-v2	20
paraformer-realtime-8k-v2	20

Model	Task submission RPS limit	Task query RPS limit
paraformer-v2	20	20
paraformer-8k-v2	20	20

Fun-ASR audio file recognition

International

Model	Task submission RPS limit	Task query RPS limit
fun-asr	10	20
fun-asr-2025-11-07
fun-asr-2025-08-25

Model	Task submission RPM limit	Task query RPS limit
fun-asr-mtl	100	20
fun-asr-mtl-2025-08-25	100	20

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. The inference compute resources are limited to Mainland China.

Model	Task submission RPS limit	Task query RPS limit
fun-asr	10	20
fun-asr-2025-11-07
fun-asr-2025-08-25
fun-asr-mtl
fun-asr-mtl-2025-08-25

Fun-ASR real-time speech recognition

International

Model	Task submission RPS limit
fun-asr-realtime	20
fun-asr-realtime-2025-11-07	20

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. The inference compute resources are limited to Mainland China.

Model	Task submission RPS limit
fun-asr-realtime	20
fun-asr-realtime-2025-11-07
fun-asr-realtime-2025-09-15

Text embedding

International

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM/Number of tasks Includes input and output tokens
text-embedding-v4	1,800	1,000,000
text-embedding-v3	6,000	24,000,000

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model

Rate limit (triggered if any value is exceeded)

RPS

TPM/Number of tasks

Includes input and output tokens

text-embedding-v4

1,200,000

Multimodal embedding

Note

Model	Rate limit The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
Model	RPM	TPM 仅输入Token
qwen3-vl-embedding	1,200	600,000
multimodal-embedding-v1	120	200,000

Text rerank

Note

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

gte-rerank-v2

5,040

4,980,000,000

Domain specific

Intent recognition

Note

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

tongyi-intent-detect-v3

1,200

1,000,000

Role playing

International

Model	Rate limit (triggered if any value is exceeded) The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).
	RPM	TPM Includes input and output tokens
qwen-plus-character	120	500,000
qwen-flash-character
qwen-plus-character-ja

Mainland China

Note

Model

Rate limit (triggered if any value is exceeded)

The following are per-minute limits. The service may also enforce limits based on RPS (RPM/60) and TPS (TPM/60).

RPM

TPM

Includes input and output tokens

qwen-plus-character

120

500,000

Retired models

For more information, see Model unpublishing mechanism.

Retired on January 30, 2026

Category	Model	Rate limit (triggered if any value is exceeded)
		RPM	TPM Includes input and output tokens
Qwen-Plus	qwen-plus-2024-11-27	0	0
	qwen-plus-2024-11-25
	qwen-plus-2024-09-19
	qwen-plus-2024-08-06
Qwen-Turbo	qwen-turbo-2024-09-19
Qwen-VL	qwen-vl-max-2024-10-30
	qwen-vl-max-2024-08-09
	qwen-vl-plus-2024-08-09

Retired on August 20, 2025

Category	Model	Rate limit (triggered if any value is exceeded)
		RPM	TPM Includes input and output tokens
Text generation - Qwen	qwen2-72b-instruct	0	0
	qwen2-57b-a14b-instruct
	qwen2-7b-instruct
	qwen1.5-110b-chat
	qwen1.5-72b-chat
	qwen1.5-32b-chat
	qwen1.5-14b-chat
	qwen1.5-7b-chat