Rate limiting - Alibaba Cloud Model Studio - Alibaba Cloud Documentation Center

Alibaba Cloud Model Studio applies rate limiting to model calls at the Alibaba Cloud account level, aggregating usage across all RAM users, workspaces, and API keys under the account. Requests are rejected when the limit is exceeded and typically recover automatically within one minute.

Rate limiting rules

Account-level rate limiting: Rate limits are applied at the root account level. The usage of all RAM users, workspaces, and API keys under the account is combined.
Model-specific rate limiting: Each model has its own rate limit. For more information, see the tables below.

FAQ

Why is rate limiting triggered?

You can identify the type of rate limit triggered based on the error message:

Requests rate limit exceeded or You exceeded your current requests list: This indicates that the requests per minute (RPM) limit was triggered.
Allocated quota exceeded or You exceeded your current quota: This indicates that the tokens per minute (TPM) limit was triggered.
Request rate increased too quickly: The request frequency surged in a short period, triggering system stability protection. This can occur even if the total number of calls has not reached the RPM or TPM limits.
For other errors, see Error codes to confirm the cause.

In addition to RPM and TPM, rate limiting may be enforced at the per-second level for requests per second (RPS), which is RPM/60, and tokens per second (TPS), which is TPM/60. Even if the total number of calls per minute does not exceed the limit, a burst of requests in a short time can still trigger rate limiting.

How to view model usage

One hour after you call a model, go to the Monitoring (Singapore or Beijing) page. Set the query conditions, such as the time range and workspace. Then, in the Models area, find the target model and click Monitor in the Actions column to view the model's call statistics. For more information, see the Monitoring document.

Data is updated hourly. During peak periods, there may be an hour-level latency.

How long does it take to recover from rate limiting?

Recovery usually occurs within one minute. If other errors occur, see Error codes for troubleshooting.

How to avoid rate limiting

Choose models with higher rate limits: Stable or latest versions have higher rate limits than dated snapshot versions.
Optimize your call strategy
- Reduce call frequency: If you receive a Requests rate limit exceeded or You exceeded your current requests list error, lower the API call frequency.
- Reduce token consumption: If you receive an Allocated quota exceeded or You exceeded your current quota error, shorten the input or limit the output length.
- Smooth the request rate: If you receive a Request rate increased too quickly error, use uniform scheduling, exponential backoff, or a request queue to distribute requests evenly and avoid sudden peaks.

Add a backup model

If rate limiting is triggered, you can switch to a backup model to continue generation. This can reduce the probability of failure and increase throughput. The following code automatically retries with qwen-plus-2025-07-14 after a rate limit is triggered for qwen-plus-2025-07-28.

Sample code

import os
import asyncio
from openai import AsyncOpenAI, APIStatusError

# Configuration
API_KEY = os.getenv("DASHSCOPE_API_KEY")
# Primary model
MODEL = "qwen-plus-2025-07-28"
# Backup model
BACKUP_MODEL = "qwen-plus-2025-07-14"
# Test question
QUESTION = "Who are you?"
# Concurrency setting
NUM_REQUESTS = 10

client = AsyncOpenAI(
    api_key=API_KEY,
    # When calling, replace {WorkspaceId} with your actual workspace ID.
    base_url="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1"
)

async def send_request(model):
    """Sends a single request."""
    try:
        await client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": QUESTION}]
        )
        return True
    except APIStatusError as e:
        if e.status_code == 429:
            print(f"[Rate limit triggered] Model {model}")
            return False
        raise
    except Exception as e:
        print(f"[Request failed] Model {model}, Error: {e}")
        return False

async def task(i):
    # Try the primary model.
    if await send_request(MODEL):
        return True
    # If rate limited, try the backup model.
    return await send_request(BACKUP_MODEL)

async def main():
    results = await asyncio.gather(*(task(i) for i in range(NUM_REQUESTS)))
    print(f"Successful requests: {sum(results)}, Failed requests: {len(results) - sum(results)}")

if __name__ == "__main__":
    asyncio.run(main())

Split tasks: Long conversations or large documents can consume many tokens quickly. You can split large batch tasks into smaller batches and submit them at different times.
Use batch inference: For tasks that do not require real-time responses, you can use the Batch API. Batch requests are not subject to real-time rate limits, but you must consider queuing and processing time.
Increase rate limits: If the default rate limits are insufficient, you can increase the temporary TPM quota for a model on the Increase Rate Limits page in the Model Studio console. The increase takes effect immediately. For more information, see Increase temporary rate limits.

How to control token usage or costs

Rate limiting only restricts the request rate per unit of time; it does not cap cumulative usage. To control token usage or costs, use the following methods:

Set a spending limit and cost alerts: On the Billing card, configure Cost alerts to enable a monthly spending limit and threshold notifications. You are notified when the threshold is reached, which helps you avoid overspending. For more information, see Query bills and manage costs.
Enable stop when the free quota is used up: For models that offer a free quota, you can enable stop when the free quota is used up so that calls stop automatically once the free quota is exhausted, which prevents additional charges. For more information, see Free quota.
Monitor model usage: Regularly check the token usage of each model to detect abnormal growth in time. See How to view model usage above.

Increase temporary rate limits

If the default rate limits are insufficient, you can increase a model's temporary TPM quota in the Model Studio console. The increase takes effect immediately and is valid for 30 days. After it expires, the quota automatically reverts to the system default.

This feature is currently available in the China (Beijing) and Singapore regions.

Log on to the Model Studio console and go to the Increase Rate Limits page.
Click Increase Temporary Model Rate Limit in the upper-right corner.
In the dialog box that appears, select a Model and enter the desired value for Token Account Limit (Tokens/60s). The dialog box displays the current quota and the maximum configurable limit.
Click OK. The increased quota takes effect immediately.

After the quota increase takes effect, you can confirm it in the following ways:

On the Increase Rate Limits page, view the models with increased quotas and their corresponding rate limit data in the list.
In the Model List, go to the details page of the corresponding model to view the updated rate limit data.

Note

The models for which you can temporarily increase quotas are listed in the dialog box on the Increase Rate Limits page.
Submitting another request for a model that already has an increased quota is considered a new application, and the validity period is reset to 30 days.
Request a quota based on your actual needs. If the provisioned capacity significantly exceeds actual usage for a long time, the system may restore it to the default value after prior notification.

Text generation - Qwen

Qwen language model

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3.7-max	International	600	1,000,000
qwen3.7-max-2026-06-08	International	60	1,000,000
qwen3.7-max-2026-05-20	International	60	1,000,000
qwen3.7-max-preview	International	600	1,000,000
qwen3.7-max-2026-05-17	International	600	1,000,000
qwen3.6-max-preview	International	600	1,000,000
qwen3-max	International	600	1,000,000
qwen3-max-2026-01-23	International	600	1,000,000
qwen3-max-2025-09-23	International	60	100,000
qwen3-max-preview	International	600	1,000,000
qwen-max Rate limiting does not apply to service calls made using the Batch API.	International	600	1,000,000
qwen3.7-plus	International	15,000	5,000,000
qwen3.7-plus-2026-05-26	International	60	1,000,000
qwen3.6-plus	International	15,000	5,000,000
qwen3.6-plus-2026-04-02	International	60	1,000,000
qwen3.6-flash	International	15,000	5,000,000
qwen3.6-flash-2026-04-16	International	60	1,000,000
qwen3.5-plus	International	15,000	6,000,000
qwen3.5-plus-2026-04-20	International	600	1,000,000
qwen3.5-plus-2026-02-15	International	60	1,000,000
qwen-plus Rate limiting does not apply to service calls made using the Batch API.	International	600	1,500,000
qwen-plus-latest	International	600	1,000,000
qwen-plus-2025-12-01	International	120	1,000,000
qwen-plus-2025-09-11	International	120	1,000,000
qwen-plus-2025-07-28	International	60	100,000
qwen-plus-2025-07-14 (qwen-plus-0714)	International	60	100,000
qwen-plus-2025-04-28 (qwen-plus-0428)	International	60	1,000,000
qwen-plus-2025-01-25 (qwen-plus-0125)	International	60	100,000
qwen3.5-flash	International	15,000	5,000,000
qwen3.5-flash-2026-02-23	International	60	1,000,000
qwen-flash Rate limiting does not apply to service calls made using the Batch API.	International	600	5,000,000
qwen-flash-2025-07-28	International	600	5,000,000
qwq-plus	International	60	100,000
qwen-turbo Rate limiting does not apply to service calls made using the Batch API.	International	600	5,000,000

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3.7-max	Global	30,000	5,000,000
qwen3.7-max-2026-06-08	Global	600	1,000,000
qwen3.7-max-2026-05-20	Global	600	1,000,000
qwen3-max	Global	600	1,000,000
qwen3-max-preview	Global	600	1,000,000
qwen3-max-2025-09-23	Global	60	100,000
qwen3.7-plus	Global	30,000	5,000,000
qwen3.7-plus-2026-05-26	Global	600	1,000,000
qwen3.6-plus	Global	30,000	5,000,000
qwen3.6-plus-2026-04-02	Global	600	1,000,000
qwen3.6-flash	Global	15,000	5,000,000
qwen3.6-flash-2026-04-16	Global	60	1,000,000
qwen3.5-plus	Global	30,000	5,000,000
qwen3.5-plus-2026-02-15	Global	600	1,000,000
qwen-plus	Global	15,000	5,000,000
qwen-plus-us	US	600	1,000,000
qwen-plus-2025-12-01	Global	60	1,000,000
qwen-plus-2025-09-11	Global	60	1,000,000
qwen-plus-2025-07-28	Global	60	1,000,000
qwen-plus-2025-12-01-us	US	60	1,000,000
qwen3.5-flash	Global	30,000	10,000,000
qwen3.5-flash-2026-02-23	Global	600	1,000,000
qwen-flash	Global	15,000	10,000,000
qwen-flash-us	US	600	5,000,000
qwen-flash-2025-07-28	Global	60	1,000,000
qwen-flash-2025-07-28-us	US	600	5,000,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3.7-max Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	30,000	5,000,000
qwen3.7-max-2026-06-08	The Chinese mainland	600	1,000,000
qwen3.7-max-2026-05-20	The Chinese mainland	600	1,000,000
qwen3.6-max-preview	The Chinese mainland	600	1,000,000
qwen3-max Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	30,000	5,000,000
qwen3-max-2026-01-23	The Chinese mainland	600	1,000,000
qwen3-max-2025-09-23	The Chinese mainland	60	100,000
qwen3-max-preview	The Chinese mainland	600	1,000,000
qwen-max Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	1,200	1,000,000
qwen3.7-plus	The Chinese mainland	30,000	5,000,000
qwen3.7-plus-2026-05-26	The Chinese mainland	600	1,000,000
qwen3.6-plus Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	30,000	5,000,000
qwen3.6-plus-2026-04-02	The Chinese mainland	600	1,000,000
qwen3.6-flash Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	30,000	10,000,000
qwen3.6-flash-2026-04-16	The Chinese mainland	600	1,000,000
qwen3.5-plus Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	30,000	5,000,000
qwen3.5-plus-2026-04-20	The Chinese mainland	600	1,000,000
qwen3.5-plus-2026-02-15	The Chinese mainland	600	1,000,000
qwen-plus Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	30,000	5,000,000
qwen-plus-latest Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	15,000	1,200,000
qwen-plus-2025-12-01	The Chinese mainland	120	1,000,000
qwen-plus-2025-09-11	The Chinese mainland	60	1,000,000
qwen-plus-2025-07-28 (qwen-plus-0728)	The Chinese mainland	60	1,000,000
qwen-plus-2025-07-14 (qwen-plus-0714)	The Chinese mainland	60	100,000
qwen-plus-2025-04-28 (qwen-plus-0428)	The Chinese mainland	60	1,000,000
qwen-plus-2025-01-25 (qwen-plus-0125)	The Chinese mainland	60	150,000
qwen-plus-2025-01-12 (qwen-plus-0112)	The Chinese mainland	60	150,000
qwen-plus-2024-12-20 (qwen-plus-1220)	The Chinese mainland	60	150,000
qwen3.5-flash Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	30,000	10,000,000
qwen3.5-flash-2026-02-23	The Chinese mainland	600	1,000,000
qwen-flash Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	30,000	10,000,000
qwen-flash-2025-07-28	The Chinese mainland	60	1,000,000
qwq-plus Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	600	1,000,000
qwen-turbo	The Chinese mainland	1,200	5,000,000
qwen-long-latest Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	1,200	60,000
qwen-long-2025-01-25 (qwen-long-0125)	The Chinese mainland	3	7,500

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3.7-max	Global	30,000	5,000,000
qwen3.7-max-2026-06-08	Global	600	1,000,000
qwen3.7-max-2026-05-20	Global	600	1,000,000
qwen3-max	Global	600	1,000,000
qwen3-max	EU	600	1,000,000
qwen3-max-preview	Global	600	1,000,000
qwen3-max-2026-01-23	EU	600	1,000,000
qwen3-max-2025-09-23	Global	60	100,000
qwen3.7-plus	Global	30,000	5,000,000
qwen3.7-plus-2026-05-26	Global	600	1,000,000
qwen3.6-plus	Global	30,000	5,000,000
qwen3.6-plus-2026-04-02	Global	600	1,000,000
qwen3.6-flash	Global	15,000	5,000,000
qwen3.6-flash-2026-04-16	Global	60	1,000,000
qwen3.5-plus	Global	30,000	5,000,000
qwen3.5-plus-2026-02-15	Global	600	1,000,000
qwen-plus	Global	15,000	5,000,000
qwen-plus	EU	600	1,000,000
qwen-plus-2025-12-01	Global	60	1,000,000
qwen-plus-2025-12-01	EU	120	1,000,000
qwen-plus-2025-09-11	Global	60	1,000,000
qwen-plus-2025-07-28	Global	60	1,000,000
qwen3.5-flash	Global	30,000	10,000,000
qwen3.5-flash	EU	30,000	10,000,000
qwen3.5-flash-2026-02-23	Global	600	1,000,000
qwen3.5-flash-2026-02-23	EU	600	1,000,000
qwen-flash	Global	15,000	10,000,000
qwen-flash-2025-07-28	Global	60	1,000,000

Hong Kong (China)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-max	Hong Kong (China)	600	1,000,000
qwen3-max-2026-01-23	Hong Kong (China)	600	1,000,000
qwen3.6-plus	Global	30,000	5,000,000
qwen3.6-flash	Global	15,000	5,000,000
qwen-plus	Hong Kong (China)	600	1,000,000
qwen-plus-2025-12-01	Hong Kong (China)	120	1,000,000
qwen3.5-flash	Hong Kong (China)	15,000	5,000,000
qwen3.5-flash-2026-02-23	Hong Kong (China)	60	1,000,000

Japan (Tokyo)

Model name	Service deployment scope	Rate limit conditions (rate limiting is triggered when any value is exceeded) The following are per-minute rate limits. The service may also enforce per-second limits at RPS (RPM/60) and TPS (TPM/60)
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Including input and output tokens
qwen3.7-max	Global	30,000	5,000,000
qwen3.7-max-2026-05-20	Global	600	1,000,000
qwen3.7-plus	Global	30,000	5,000,000
qwen3.7-plus-2026-05-26	Global	600	1,000,000
qwen3.7-plus	Japan	15,000	5,000,000
qwen3.7-plus-2026-05-26	Japan	60	1,000,000
qwen3.6-plus	Global	30,000	5,000,000
qwen3.6-plus-2026-04-02	Global	600	1,000,000
qwen3.6-flash	Global	15,000	5,000,000
qwen3.6-flash-2026-04-16	Global	60	1,000,000

Qwen-VL (visual understanding/image-to-text)

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-vl-plus	International	1,200	1,000,000
qwen3-vl-plus-2025-12-19	International	60	100,000
qwen3-vl-plus-2025-09-23	International	120	1,000,000
qwen3-vl-flash	International	1,200	1,000,000
qwen3-vl-flash-2026-01-22	International	60	100,000
qwen3-vl-flash-2025-10-15	International	120	1,000,000
qwen-vl-max	International	1,200	1,000,000
qwen-vl-plus	International	1,200	1,000,000
qvq-max	International	60	100,000

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-vl-plus	Global	1,200	1,000,000
qwen3-vl-plus-2025-09-23	Global	60	100,000
qwen3-vl-flash	Global	1,200	1,000,000
qwen3-vl-flash-us	US	1,200	1,000,000
qwen3-vl-flash-2025-10-15	Global	60	100,000
qwen3-vl-flash-2026-01-22-us	US	120	1,000,000
qwen3-vl-flash-2025-10-15-us	US	120	1,000,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-vl-plus Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	3,000	5,000,000
qwen3-vl-plus-2025-12-19	The Chinese mainland	60	100,000
qwen3-vl-plus-2025-09-23	The Chinese mainland	60	100,000
qwen3-vl-flash Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	3,000	5,000,000
qwen3-vl-flash-2026-01-22	The Chinese mainland	60	100,000
qwen3-vl-flash-2025-10-15	The Chinese mainland	60	100,000
qwen-vl-max Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	1,200	1,000,000
qwen-vl-plus Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	1,200	1,000,000
qvq-max	The Chinese mainland	60	100,000
qvq-plus	The Chinese mainland	60	100,000

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-vl-plus	Global	1,200	1,000,000
qwen3-vl-plus	EU	1,200	1,000,000
qwen3-vl-plus-2025-09-23	Global	60	100,000
qwen3-vl-flash	Global	1,200	1,000,000
qwen3-vl-flash	EU	1,200	1,000,000
qwen3-vl-flash-2026-01-22	EU	60	100,000
qwen3-vl-flash-2025-10-15	Global	60	100,000
qwen3-vl-flash-2025-10-15	EU	60	100,000

Hong Kong (China)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-vl-plus	Hong Kong (China)	1,200	1,000,000
qwen3-vl-plus-2025-12-19	Hong Kong (China)	60	100,000

Qwen-Omni (omni-modal)

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3.5-omni-flash	International	60	100,000
qwen3.5-omni-flash-2026-03-15	International	60	100,000
qwen3.5-omni-plus	International	60	100,000
qwen3.5-omni-plus-2026-03-15	International	60	100,000
qwen3-omni-flash	International	60	100,000
qwen3-omni-flash-2025-12-01	International	60	100,000
qwen3-omni-flash-2025-09-15	International	60	100,000
qwen-omni-turbo	International	60	100,000
qwen-omni-turbo-latest	International	60	100,000
qwen-omni-turbo-2025-03-26	International	60	100,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3.5-omni-flash	The Chinese mainland	60	100,000
qwen3.5-omni-flash-2026-03-15	The Chinese mainland	60	100,000
qwen3.5-omni-plus	The Chinese mainland	60	100,000
qwen3.5-omni-plus-2026-03-15	The Chinese mainland	60	100,000
qwen3-omni-flash	The Chinese mainland	60	100,000
qwen3-omni-flash-2025-12-01	The Chinese mainland	60	100,000
qwen3-omni-flash-2025-09-15	The Chinese mainland	60	100,000
qwen-omni-turbo	The Chinese mainland	60	100,000
qwen-omni-turbo-latest	The Chinese mainland	60	100,000
qwen-omni-turbo-2025-03-26 (qwen-omni-turbo-0326)	The Chinese mainland	60	100,000
qwen-omni-turbo-2025-01-19 (qwen-omni-turbo-0119)	The Chinese mainland	60	100,000

Qwen-Omni-Realtime (real-time multimodal)

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3.5-omni-plus-realtime	International	60	100,000
qwen3.5-omni-plus-realtime-2026-03-15	International	60	100,000
qwen3.5-omni-flash-realtime	International	60	100,000
qwen3.5-omni-flash-realtime-2026-03-15	International	60	100,000
qwen3-omni-flash-realtime	International	60	100,000
qwen3-omni-flash-realtime-2025-12-01	International	60	100,000
qwen3-omni-flash-realtime-2025-09-15	International	60	100,000
qwen-omni-turbo-realtime	International	60	10,000
qwen-omni-turbo-realtime-latest	International	60	10,000
qwen-omni-turbo-realtime-2025-05-08	International	60	10,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3.5-omni-plus-realtime	The Chinese mainland	60	100,000
qwen3.5-omni-plus-realtime-2026-03-15	The Chinese mainland	60	100,000
qwen3.5-omni-flash-realtime	The Chinese mainland	60	100,000
qwen3.5-omni-flash-realtime-2026-03-15	The Chinese mainland	60	100,000
qwen3-omni-flash-realtime	The Chinese mainland	60	100,000
qwen3-omni-flash-realtime-2025-12-01	The Chinese mainland	60	100,000
qwen3-omni-flash-realtime-2025-09-15	The Chinese mainland	60	100,000
qwen-omni-turbo-realtime	The Chinese mainland	60	100,000
qwen-omni-turbo-realtime-latest	The Chinese mainland	60	100,000
qwen-omni-turbo-realtime-2025-05-08	The Chinese mainland	60	100,000

Qwen-OCR (text extraction)

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen-vl-ocr	International	600	6,000,000
qwen-vl-ocr-2025-11-20	International	1,200	6,000,000

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen-vl-ocr	Global	600	6,000,000
qwen-vl-ocr-2025-11-20	Global	1,200	6,000,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3.5-ocr	The Chinese mainland	6,000	30,000,000
qwen-vl-ocr Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	600	6,000,000
qwen-vl-ocr-latest	The Chinese mainland	1,200	6,000,000
qwen-vl-ocr-2025-11-20	The Chinese mainland	1,200	6,000,000
qwen-vl-ocr-2025-04-13	The Chinese mainland	600	6,000,000
qwen-vl-ocr-2024-10-28	The Chinese mainland	600	6,000,000

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen-vl-ocr	Global	600	6,000,000
qwen-vl-ocr-2025-11-20	Global	1,200	6,000,000

Qwen math model

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen-math-plus	The Chinese mainland	1,200	1,000,000
qwen-math-plus-latest	The Chinese mainland	1,200	1,000,000
qwen-math-plus-2024-09-19 (qwen-math-plus-0919)	The Chinese mainland	60	100,000
qwen-math-plus-2024-08-16 (qwen-math-plus-0816)	The Chinese mainland	10	20,000
qwen-math-turbo	The Chinese mainland	1200	1,000,000

Qwen-Coder

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-coder-plus	International	2,400	2,000,000
qwen3-coder-plus-2025-09-23	International	600	1,000,000
qwen3-coder-plus-2025-07-22	International	60	1,000,000
qwen3-coder-flash	International	600	5,000,000
qwen3-coder-flash-2025-07-28	International	600	5,000,000

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-coder-plus	Global	2,400	2,000,000
qwen3-coder-plus-2025-09-23	Global	60	1,000,000
qwen3-coder-plus-2025-07-22	Global	60	1,000,000
qwen3-coder-flash	Global	1,200	1,000,000
qwen3-coder-flash-2025-07-28	Global	60	1,000,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-coder-plus	The Chinese mainland	5,000	5,000,000
qwen3-coder-plus-2025-09-23	The Chinese mainland	60	1,000,000
qwen3-coder-plus-2025-07-22	The Chinese mainland	60	1,000,000
qwen3-coder-flash	The Chinese mainland	5,000	5,000,000
qwen3-coder-flash-2025-07-28	The Chinese mainland	60	1,000,000
qwen-coder-plus	The Chinese mainland	1,200	1,000,000
qwen-coder-turbo	The Chinese mainland	1,200	1,000,000

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-coder-plus	Global	2,400	2,000,000
qwen3-coder-plus-2025-09-23	Global	60	1,000,000
qwen3-coder-plus-2025-07-22	Global	60	1,000,000
qwen3-coder-flash	Global	1,200	1,000,000
qwen3-coder-flash-2025-07-28	Global	60	1,000,000

Qwen translation model

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen-mt-plus	International	60	100,000
qwen-mt-flash	International	60	100,000
qwen-mt-lite	International	60	100,000
qwen-mt-turbo	International	60	100,000

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen-mt-plus	Global	60	25,000
qwen-mt-flash	Global	60	35,000
qwen-mt-lite	Global	60	100,000
qwen-mt-lite-us	US	60	100,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen-mt-plus	The Chinese mainland	60	25,000
qwen-mt-flash	The Chinese mainland	60	35,000
qwen-mt-lite	The Chinese mainland	60	100,000
qwen-mt-turbo	The Chinese mainland	60	35,000

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen-mt-plus	Global	60	25,000
qwen-mt-flash	Global	60	35,000
qwen-mt-lite	Global	60	100,000

Qwen data mining model

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-doc-turbo

The Chinese mainland

600

3,000,000

Qwen deep research model

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-deep-research

The Chinese mainland

120

1,200,000

Text generation - Qwen - Open source

Qwen language model open source

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3.6-35b-a3b	International	600	1,000,000
qwen3.6-27b	International	600	1,000,000
qwen3.5-397b-a17b	International	600	1,000,000
qwen3.5-122b-a10b	International	600	1,000,000
qwen3.5-27b	International	600	1,000,000
qwen3.5-35b-a3b	International	600	5,000,000
qwen3-next-80b-a3b-thinking	International	600	1,000,000
qwen3-next-80b-a3b-instruct	International	600	1,000,000
qwen3-235b-a22b-thinking-2507	International	600	1,000,000
qwen3-235b-a22b-instruct-2507	International	600	1,000,000
qwen3-30b-a3b-thinking-2507	International	600	5,000,000
qwen3-30b-a3b-instruct-2507	International	600	5,000,000
qwen3-235b-a22b	International	600	1,000,000
qwen3-32b	International	600	1,000,000
qwen3-30b-a3b	International	600	1,000,000
qwen3-14b	International	600	1,000,000
qwen3-8b	International	600	1,000,000
qwen3-4b	International	600	1,000,000
qwen3-1.7b	International	600	1,000,000
qwen3-0.6b	International	600	1,000,000

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3.5-397b-a17b	Global	600	1,000,000
qwen3.5-122b-a10b	Global	600	1,000,000
qwen3.5-27b	Global	600	1,000,000
qwen3.6-35b-a3b	Global	600	1,000,000
qwen3.5-35b-a3b	Global	600	1,000,000
qwen3-next-80b-a3b-thinking	Global	600	1,000,000
qwen3-next-80b-a3b-instruct	Global	600	1,000,000
qwen3-235b-a22b-thinking-2507	Global	600	1,000,000
qwen3-235b-a22b-instruct-2507	Global	600	1,000,000
qwen3-30b-a3b-thinking-2507	Global	600	1,000,000
qwen3-30b-a3b-instruct-2507	Global	600	1,000,000
qwen3-235b-a22b	Global	600	1,000,000
qwen3-30b-a3b	Global	600	1,000,000
qwen3-32b	Global	600	1,000,000
qwen3-14b	Global	600	1,000,000
qwen3-8b	Global	600	1,000,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3.6-35b-a3b	The Chinese mainland	600	1,000,000
qwen3.6-27b	The Chinese mainland	600	1,000,000
qwen3.5-397b-a17b	The Chinese mainland	600	1,000,000
qwen3.5-122b-a10b	The Chinese mainland	600	1,000,000
qwen3.5-27b	The Chinese mainland	600	1,000,000
qwen3.5-35b-a3b	The Chinese mainland	600	1,000,000
qwen3-next-80b-a3b-thinking	The Chinese mainland	600	1,000,000
qwen3-next-80b-a3b-instruct	The Chinese mainland	600	1,000,000
qwen3-235b-a22b-thinking-2507	The Chinese mainland	600	1,000,000
qwen3-235b-a22b-instruct-2507	The Chinese mainland	600	1,000,000
qwen3-30b-a3b-thinking-2507	The Chinese mainland	600	1,000,000
qwen3-30b-a3b-instruct-2507	The Chinese mainland	600	1,000,000
qwen3-235b-a22b	The Chinese mainland	600	1,000,000
qwen3-30b-a3b	The Chinese mainland	600	1,000,000
qwen3-32b	The Chinese mainland	2400	1,000,000
qwen3-14b	The Chinese mainland	600	1,000,000
qwen3-8b	The Chinese mainland	600	1,000,000
qwen3-4b	The Chinese mainland	600	1,000,000
qwen3-1.7b	The Chinese mainland	600	1,000,000
qwen3-0.6b	The Chinese mainland	600	1,000,000
qwen2.5-3b-instruct	The Chinese mainland	1,200	2,000,000
qwen2.5-1.5b-instruct	The Chinese mainland	1,200	2,000,000
qwen2.5-0.5b-instruct	The Chinese mainland	1,200	2,000,000

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3.5-397b-a17b	Global	600	1,000,000
qwen3.5-122b-a10b	Global	600	1,000,000
qwen3.5-27b	Global	600	1,000,000
qwen3.6-35b-a3b	Global	600	1,000,000
qwen3.5-35b-a3b	Global	600	1,000,000
qwen3-next-80b-a3b-thinking	Global	600	1,000,000
qwen3-next-80b-a3b-instruct	Global	600	1,000,000
qwen3-235b-a22b-thinking-2507	Global	600	1,000,000
qwen3-235b-a22b-instruct-2507	Global	600	1,000,000
qwen3-30b-a3b-thinking-2507	Global	600	1,000,000
qwen3-30b-a3b-instruct-2507	Global	600	1,000,000
qwen3-235b-a22b	Global	600	1,000,000
qwen3-30b-a3b	Global	600	1,000,000
qwen3-32b	Global	600	1,000,000
qwen3-14b	Global	600	1,000,000
qwen3-8b	Global	600	1,000,000

Qwen-VL (visual understanding/image-to-text)

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-vl-32b-thinking	International	60	100,000
qwen3-vl-32b-instruct	International	60	100,000
qwen3-vl-30b-a3b-thinking	International	60	100,000
qwen3-vl-30b-a3b-instruct	International	60	100,000
qwen3-vl-8b-thinking	International	60	100,000
qwen3-vl-8b-instruct	International	60	100,000
qwen3-vl-235b-a22b-thinking	International	60	100,000
qwen3-vl-235b-a22b-instruct	International	60	100,000

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-vl-235b-a22b-thinking	Global	60	100,000
qwen3-vl-235b-a22b-instruct	Global	60	100,000
qwen3-vl-32b-thinking	Global	600	1,000,000
qwen3-vl-32b-instruct	Global	600	1,000,000
qwen3-vl-30b-a3b-thinking	Global	600	1,000,000
qwen3-vl-30b-a3b-instruct	Global	600	1,000,000
qwen3-vl-8b-thinking	Global	600	1,000,000
qwen3-vl-8b-instruct	Global	600	1,000,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-vl-32b-thinking	The Chinese mainland	600	1,000,000
qwen3-vl-32b-instruct	The Chinese mainland	600	1,000,000
qwen3-vl-30b-a3b-thinking	The Chinese mainland	600	1,000,000
qwen3-vl-30b-a3b-instruct	The Chinese mainland	600	1,000,000
qwen3-vl-8b-thinking	The Chinese mainland	600	1,000,000
qwen3-vl-8b-instruct	The Chinese mainland	600	1,000,000
qwen3-vl-235b-a22b-thinking	The Chinese mainland	60	100,000
qwen3-vl-235b-a22b-instruct	The Chinese mainland	60	100,000
qwen2-vl-72b-instruct	The Chinese mainland	1,200	1,000,000
qwen2-vl-7b-instruct	The Chinese mainland	1,200	1,000,000
qwen2-vl-2b-instruct	The Chinese mainland	1,200	1,000,000

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-vl-235b-a22b-thinking	Global	60	100,000
qwen3-vl-235b-a22b-instruct	Global	60	100,000
qwen3-vl-32b-thinking	Global	600	1,000,000
qwen3-vl-32b-instruct	Global	600	1,000,000
qwen3-vl-30b-a3b-thinking	Global	600	1,000,000
qwen3-vl-30b-a3b-instruct	Global	600	1,000,000
qwen3-vl-8b-thinking	Global	600	1,000,000
qwen3-vl-8b-instruct	Global	600	1,000,000

Qwen3-Omni

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen2.5-omni-7b

International

100,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen2.5-omni-7b

The Chinese mainland

100,000

Qwen3-Omni-Captioner

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-omni-30b-a3b-captioner

International

100,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-omni-30b-a3b-captioner

The Chinese mainland

100,000

Qwen-Math

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

Qwen-Coder

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-coder-next	International	600	1,000,000
qwen3-coder-480b-a35b-instruct	International	600	1,000,000
qwen3-coder-30b-a3b-instruct	International	600	1,000,000

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-coder-480b-a35b-instruct	Global	600	1,000,000
qwen3-coder-30b-a3b-instruct	Global	600	1,000,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-coder-next	The Chinese mainland	600	1,000,000
qwen3-coder-480b-a35b-instruct	The Chinese mainland	600	1,000,000
qwen3-coder-30b-a3b-instruct	The Chinese mainland	600	1,000,000

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen3-coder-480b-a35b-instruct	Global	600	1,000,000
qwen3-coder-30b-a3b-instruct	Global	600	1,000,000
qwen3-coder-next	EU	600	1,000,000

Text generation - Third-party models

DeepSeek

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
deepseek-v4-pro	International	10,000	1,200,000
deepseek-v4-flash	International	10,000	1,200,000
deepseek-v3.2	International	10,000	1,200,000

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
deepseek-v4-pro	Global	15,000	1,200,000
deepseek-v4-flash	Global	15,000	1,200,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
deepseek-v4-pro	The Chinese mainland	15,000	1,200,000
deepseek-v4-flash	The Chinese mainland	15,000	1,200,000
deepseek-v3.2 Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	15,000	1,200,000
deepseek-v3.2-exp	The Chinese mainland	15,000	1,200,000
deepseek-v3.1	The Chinese mainland	15,000	1,200,000
deepseek-r1-0528	The Chinese mainland	60	100,000
deepseek-r1 Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	15,000	1,200,000
deepseek-v3 Rate limiting does not apply to service calls made using the Batch API.	The Chinese mainland	15,000	1,200,000
deepseek-r1-distill-qwen-7b	The Chinese mainland	15,000	1,200,000
deepseek-r1-distill-qwen-14b	The Chinese mainland	15,000	1,200,000
deepseek-r1-distill-qwen-32b	The Chinese mainland	15,000	1,200,000
deepseek-r1-distill-qwen-1.5b	The Chinese mainland	60	100,000
deepseek-r1-distill-llama-8b	The Chinese mainland	60	100,000
deepseek-r1-distill-llama-70b	The Chinese mainland	60	100,000

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
deepseek-v4-pro	Global	15,000	1,200,000
deepseek-v4-flash	Global	15,000	1,200,000

Japan (Tokyo)

Model name	Service deployment scope	Rate limit conditions (rate limiting is triggered when any value is exceeded) The following are per-minute rate limits. The service may also enforce per-second limits at RPS (RPM/60) and TPS (TPM/60)
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Including input and output tokens
deepseek-v4-pro	Japan	10,000	1,200,000
deepseek-v4-flash	Japan	10,000	1,200,000
deepseek-v4-pro	Global	15,000	1,200,000
deepseek-v4-flash	Global	15,000	1,200,000

Kimi

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
kimi-k2.7-code	The Chinese mainland	500	1,000,000
kimi-k2.6	The Chinese mainland	500	1,000,000
kimi-k2.5	The Chinese mainland	500	1,000,000
kimi-k2-thinking	The Chinese mainland	500	1,000,000
Moonshot-Kimi-K2-Instruct	The Chinese mainland	500	1,000,000

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
kimi-k2.7-code	Global	500	1,000,000
kimi-k2.5	Global	500	1,000,000

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
kimi-k2.7-code	Global	500	1,000,000
kimi-k2.5	Global	500	1,000,000

Hong Kong (China)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

kimi-k2.7-code

Global

500

1,000,000

Japan (Tokyo)

Model name

Service deployment scope

Rate limit conditions (rate limiting is triggered when any value is exceeded)

The following are per-minute rate limits. The service may also enforce per-second limits at RPS (RPM/60) and TPS (TPM/60)

Requests per minute (RPM)

Tokens per minute (TPM)

Including input and output tokens

kimi-k2.5

Global

500

1,000,000

MiniMax

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

MiniMax-M2.5

The Chinese mainland

500

1,000,000

GLM

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
glm-5.2	Global	500	1,000,000
glm-5.1	Global	500	1,000,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
glm-5.2	The Chinese mainland	500	1,000,000
glm-5.1	The Chinese mainland	500	1,000,000
glm-5	The Chinese mainland	500	1,000,000
glm-4.7	The Chinese mainland	500	1,000,000
glm-4.6	The Chinese mainland	60	1,000,000

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
glm-5.2	Global	500	1,000,000
glm-5.1	Global	500	1,000,000

Singapore

Model name

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

glm-5.1

500

1,000,000

Hong Kong (China)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

glm-5.2

Global

500

1,000,000

Japan (Tokyo)

Model name

Rate limit conditions (rate limiting is triggered when any value is exceeded)

The following are per-minute rate limits. The service may also enforce per-second limits at RPS (RPM/60) and TPS (TPM/60)

Requests per minute (RPM)

Tokens per minute (TPM)

Including input and output tokens

glm-5.1

500

1,000,000

Image generation

Qwen-Image

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	Task submission API call limit	Number of concurrent tasks (concurrency)
qwen-image-2.0-pro	International	2 times/minute	No limit for synchronous APIs
qwen-image-2.0-pro-2026-04-22	International	2 times/minute	No limit for synchronous APIs
qwen-image-2.0-pro-2026-03-03	International	2 times/minute	No limit for synchronous APIs
qwen-image-2.0	International	2 times/second	No limit for synchronous APIs
qwen-image-2.0-2026-03-03	International	2 times/second	No limit for synchronous APIs
qwen-image-max	International	2 times/minute	No limit for synchronous APIs
qwen-image-max-2025-12-30	International	2 times/minute	No limit for synchronous APIs
qwen-image-plus	International	2 times/second	No limit for synchronous APIs / 2 for asynchronous APIs
qwen-image-plus-2026-01-09	International	2 times/second	No limit for synchronous APIs
qwen-image	International	2 times/second	No limit for synchronous APIs / 2 for asynchronous APIs
qwen-image-edit-max	International	2 times/minute	No limit for synchronous APIs
qwen-image-edit-max-2026-01-16	International	2 times/minute	No limit for synchronous APIs
qwen-image-edit-plus	International	2 times/second	No limit for synchronous APIs
qwen-image-edit-plus-2025-12-15	International	2 times/second	No limit for synchronous APIs
qwen-image-edit-plus-2025-10-30	International	2 times/second	No limit for synchronous APIs
qwen-image-edit	International	2 times/second	No limit for synchronous APIs

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	Task submission API call limit	Number of concurrent tasks (concurrency)
qwen-image-2.0-pro	The Chinese mainland	2 times/minute	No limit for synchronous APIs
qwen-image-2.0-pro-2026-04-22	The Chinese mainland	2 times/minute	No limit for synchronous APIs
qwen-image-2.0-pro-2026-03-03	The Chinese mainland	2 times/minute	No limit for synchronous APIs
qwen-image-2.0	The Chinese mainland	2 times/second	No limit for synchronous APIs
qwen-image-2.0-2026-03-03	The Chinese mainland	2 times/second	No limit for synchronous APIs
qwen-image-max	The Chinese mainland	2 times/minute	No limit for synchronous APIs
qwen-image-max-2025-12-30	The Chinese mainland	2 times/minute	No limit for synchronous APIs
qwen-image-plus	The Chinese mainland	2 times/second	No limit for synchronous APIs / 2 for asynchronous APIs
qwen-image-plus-2026-01-09	The Chinese mainland	2 times/second	No limit for synchronous APIs
qwen-image	The Chinese mainland	2 times/second	No limit for synchronous APIs / 2 for asynchronous APIs
qwen-image-edit-max	The Chinese mainland	2 times/minute	No limit for synchronous APIs
qwen-image-edit-max-2026-01-16	The Chinese mainland	2 times/minute	No limit for synchronous APIs
qwen-image-edit-plus	The Chinese mainland	2 times/second	No limit for synchronous APIs
qwen-image-edit-plus-2025-12-15	The Chinese mainland	2 times/second	No limit for synchronous APIs
qwen-image-edit-plus-2025-10-30	The Chinese mainland	2 times/second	No limit for synchronous APIs
qwen-image-edit	The Chinese mainland	2 times/second	No limit for synchronous APIs
qwen-mt-image	The Chinese mainland	1 time/second	2

Text-to-image - Z-Image

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
		RPS limit for task submission API	Number of concurrent tasks (concurrency)
z-image-turbo	International	2	No limit for synchronous APIs

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
		RPS limit for task submission API	Number of concurrent tasks (concurrency)
z-image-turbo	The Chinese mainland	2	No limit for synchronous APIs

Wanxiang

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	RPS limit for task submission API	Number of concurrent tasks (concurrency)
wan2.7-image-pro	International	5	5
wan2.7-image	International	5	5
wan2.6-image	International	5	5
wan2.6-t2i	International	5	5
wan2.5-t2i-preview	International	5	5
wan2.2-t2i-flash	International	2	2
wan2.2-t2i-plus	International	2	2
wan2.1-t2i-turbo	International	2	2
wan2.1-t2i-plus	International	2	2
wan2.5-i2i-preview	International	5	5

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	RPS limit for task submission API	Number of concurrent tasks (concurrency)
wan2.6-t2i	Global	5	5
wan2.6-image	Global	5	5

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	RPS limit for task submission API	Number of concurrent tasks (concurrency)
wan2.7-image-pro	The Chinese mainland	5	5
wan2.7-image	The Chinese mainland	5	5
wan2.6-image	The Chinese mainland	5	5
wan2.6-t2i	The Chinese mainland	1	5
wan2.5-t2i-preview	The Chinese mainland	5	5
wanx2.0-t2i-turbo	The Chinese mainland	2	2
wanx2.1-t2i-turbo	The Chinese mainland	2	2
wanx2.1-t2i-plus	The Chinese mainland	2	2
wan2.2-t2i-flash	The Chinese mainland	2	2
wan2.2-t2i-plus	The Chinese mainland	2	2
wan2.5-i2i-preview	The Chinese mainland	5	5
wanx2.1-imageedit	The Chinese mainland	2	2

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	RPS limit for task submission API	Number of concurrent tasks (concurrency)
wan2.6-t2i	Global	5	5
wan2.6-image	Global	5	5

OutfitAnyone

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	RPS limit for job submission API	Number of concurrent tasks
aitryon-plus	The Chinese mainland	10	5
aitryon-parsing-v1	The Chinese mainland	10	No limit for synchronous APIs

Video generation

HappyHorse series

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	RPS limit for task submission API	Number of concurrent tasks (concurrency)
happyhorse-1.1-t2v	International	10	5
happyhorse-1.1-i2v	International	10	5
happyhorse-1.1-r2v	International	10	5
happyhorse-1.0-t2v	International	10	5
happyhorse-1.0-i2v	International	10	5
happyhorse-1.0-r2v	International	10	5
happyhorse-1.0-video-edit	International	10	5

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	RPS limit for task submission API	Number of concurrent tasks (concurrency)
happyhorse-1.0-t2v	Global	10	5
happyhorse-1.0-i2v	Global	10	5
happyhorse-1.0-r2v	Global	10	5
happyhorse-1.0-video-edit	Global	10	5

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	RPS limit for task submission API	Number of concurrent tasks (concurrency)
happyhorse-1.1-t2v	The Chinese mainland	10	5
happyhorse-1.1-i2v	The Chinese mainland	10	5
happyhorse-1.1-r2v	The Chinese mainland	10	5
happyhorse-1.0-t2v	The Chinese mainland	10	5
happyhorse-1.0-i2v	The Chinese mainland	10	5
happyhorse-1.0-r2v	The Chinese mainland	10	5
happyhorse-1.0-video-edit	The Chinese mainland	10	5

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	RPS limit for task submission API	Number of concurrent tasks (concurrency)
happyhorse-1.0-t2v	Global	10	5
happyhorse-1.0-i2v	Global	10	5
happyhorse-1.0-r2v	Global	10	5
happyhorse-1.0-video-edit	Global	10	5

Wanxiang series

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	RPS limit for task submission API	Number of concurrent tasks (concurrency)
wan2.7-t2v-2026-04-25	International	5	5
wan2.7-t2v	International	5	5
wan2.6-t2v	International	5	5
wan2.5-t2v-preview	International	5	5
wan2.2-t2v-plus	International	2	2
wan2.1-t2v-turbo	International	2	2
wan2.1-t2v-plus	International	2	2
wan2.7-i2v-2026-04-25	International	5	5
wan2.7-i2v	International	5	5
wan2.6-i2v-flash	International	5	5
wan2.6-i2v	International	5	5
wan2.5-i2v-preview	International	5	5
wan2.2-i2v-flash	International	2	2
wan2.1-i2v-plus	International	2	2
wan2.1-i2v-turbo	International	2	2
wan2.2-i2v-plus	International	2	2
wan2.2-kf2v-flash	International	2	2
wan2.1-kf2v-plus	International	1	2
wan2.1-vace-plus	International	2	2
wan2.7-videoedit	International	5	5
wan2.7-r2v	International	5	5
wan2.6-r2v-flash	International	5	5
wan2.6-r2v	International	5	5
wan2.2-animate-move	International	5	1
wan2.2-animate-mix	International	5	1

US (Virginia)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	RPS limit for task submission API	Number of concurrent tasks (concurrency)
wan2.6-t2v	Global	5	5
wan2.6-i2v	Global	5	5
wan2.6-r2v	Global	5	5
wan2.6-t2v-us	US	5	5
wan2.6-i2v-us	US	5	5

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	RPS limit for task submission API	Number of concurrent tasks (concurrency)
wan2.7-t2v-2026-04-25	The Chinese mainland	5	5
wan2.7-t2v	The Chinese mainland	5	5
wan2.6-t2v	The Chinese mainland	5	5
wan2.5-t2v-preview	The Chinese mainland	5	5
wan2.2-t2v-plus	The Chinese mainland	2	2
wanx2.1-t2v-turbo	The Chinese mainland	2	2
wanx2.1-t2v-plus	The Chinese mainland	2	2
wan2.7-i2v-2026-04-25	The Chinese mainland	5	5
wan2.7-i2v	The Chinese mainland	5	5
wan2.6-i2v-flash	The Chinese mainland	5	5
wan2.6-i2v	The Chinese mainland	5	5
wan2.5-i2v-preview	The Chinese mainland	5	5
wan2.2-i2v-plus	The Chinese mainland	2	2
wanx2.1-i2v-turbo	The Chinese mainland	2	2
wanx2.1-i2v-plus	The Chinese mainland	2	2
wan2.2-kf2v-flash	The Chinese mainland	2	2
wanx2.1-kf2v-plus	The Chinese mainland	2	2
wanx2.1-vace-plus	The Chinese mainland	2	2
wan2.7-videoedit	The Chinese mainland	5	5
wan2.7-r2v	The Chinese mainland	5	5
wan2.6-r2v-flash	The Chinese mainland	5	5
wan2.6-r2v	The Chinese mainland	5	5
wan2.2-s2v-detect	The Chinese mainland	5	No limit for synchronous APIs
wan2.2-s2v	The Chinese mainland	5	1
wan2.2-animate-move	The Chinese mainland	5	1
wan2.2-animate-mix	The Chinese mainland	5	1

Germany (Frankfurt)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded)
Model name	Service deployment scope	RPS limit for task submission API	Number of concurrent tasks (concurrency)
wan2.6-t2v	Global	5	5
wan2.6-i2v	Global	5	5
wan2.6-r2v	Global	5	5

AnimateAnyone

China (Beijing)

Model name	Service deployment scope	RPS limit for task submission API	Number of concurrent tasks
animate-anyone-detect-gen2	The Chinese mainland	5	No limit for synchronous APIs
animate-anyone-template-gen2	The Chinese mainland	5	1 Only one job runs at a time. Other jobs in the queue are in a waiting state.
animate-anyone-gen2	The Chinese mainland	5	1 Only one job runs at a time. Other jobs in the queue are in a waiting state.

EMO

China (Beijing)

Model name

Service deployment scope

RPS limit for task submission API

Number of concurrent tasks

emo-detect-v1

The Chinese mainland

No limit for synchronous APIs

emo-v1

The Chinese mainland

Only one job runs at a time. Other jobs in the queue are in a waiting state.

LivePortrait

China (Beijing)

Model name

Service deployment scope

RPS limit for task submission API

Number of concurrent tasks

liveportrait-detect

The Chinese mainland

No limit for synchronous APIs

liveportrait

The Chinese mainland

Only one job runs at a time. Other jobs in the queue are in a waiting state.

VideoRetalk

China (Beijing)

Model name

Service deployment scope

RPS limit for task submission API

Number of concurrent tasks

videoretalk

The Chinese mainland

Only one job runs at a time. Other jobs in the queue are in a waiting state.

Emoji

China (Beijing)

Model name

Service deployment scope

RPS limit for task submission API

Number of concurrent tasks

emoji-detect-v1

The Chinese mainland

No limit for synchronous APIs

emoji-v1

The Chinese mainland

Only one job runs at a time. Other jobs in the queue are in a waiting state.

Video style transform

China (Beijing)

Model name

Service deployment scope

RPS limit for task submission API

Number of concurrent tasks

video-style-transform

The Chinese mainland

Only one job runs at a time. Other jobs in the queue are in a waiting state.

Music generation

China (Beijing)

Model name	Service deployment scope	Requests per minute (RPM)
fun-music-preview	The Chinese mainland	180
fun-music-v1	The Chinese mainland	180

Speech synthesis (text-to-speech)

Qwen speech synthesis

Singapore

Qwen3-TTS-Instruct-Flash

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-instruct-flash	International	180
qwen3-tts-instruct-flash-2026-01-26	International	180

Qwen3-TTS-VD

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-vd-2026-01-26	International	180

Qwen3-TTS-VC

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-vc-2026-01-22	International	180

Qwen3-TTS-Flash

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-flash	International	180
qwen3-tts-flash-2025-11-27	International	180
qwen3-tts-flash-2025-09-18	International	10

China (Beijing)

Qwen3-TTS-Instruct-Flash

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-instruct-flash	Mainland China	180
qwen3-tts-instruct-flash-2026-01-26	Mainland China	180

Qwen3-TTS-VD

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-vd-2026-01-26	Mainland China	180

Qwen3-TTS-VC

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-vc-2026-01-22	Mainland China	180

Qwen3-TTS-Flash

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-flash	Mainland China	180
qwen3-tts-flash-2025-11-27	Mainland China	180
qwen3-tts-flash-2025-09-18	Mainland China	10

Qwen-TTS

Model name	Service deployment scope	Rate limiting conditions (rate limiting is triggered when any value is exceeded) The following are per-minute rate limiting conditions. The service may also enforce RPS (RPM/60) and TPS (TPM/60) limits
		Requests per minute (RPM)	Tokens consumed per minute (TPM) Including input and output tokens
qwen-tts	Mainland China	10	100,000
qwen-tts-latest	Mainland China
qwen-tts-2025-05-22	Mainland China
qwen-tts-2025-04-10	Mainland China

Qwen real-time speech synthesis

Singapore

Qwen3-TTS-Instruct-Flash-Realtime

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-instruct-flash-realtime	International	180
qwen3-tts-instruct-flash-realtime-2026-01-22	International	180

Qwen3-TTS-VD-Realtime

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-vd-realtime-2026-01-15	International	180
qwen3-tts-vd-realtime-2025-12-16	International	180

Qwen3-TTS-VC-Realtime

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-vc-realtime-2026-01-15	International	180
qwen3-tts-vc-realtime-2025-11-27	International	180

Qwen3-TTS-Flash-Realtime

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-flash-realtime	International	180
qwen3-tts-flash-realtime-2025-11-27	International	180
qwen3-tts-flash-realtime-2025-09-18	International	10

China (Beijing)

Qwen3-TTS-Instruct-Flash-Realtime

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-instruct-flash-realtime	Mainland China	180
qwen3-tts-instruct-flash-realtime-2026-01-22	Mainland China	180

Qwen3-TTS-VD-Realtime

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-vd-realtime-2026-01-15	Mainland China	180
qwen3-tts-vd-realtime-2025-12-16	Mainland China	180

Qwen3-TTS-VC-Realtime

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-vc-realtime-2026-01-15	Mainland China	180
qwen3-tts-vc-realtime-2025-11-27	Mainland China	180

Qwen3-TTS-Flash-Realtime

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-tts-flash-realtime	Mainland China	180
qwen3-tts-flash-realtime-2025-11-27	Mainland China	180
qwen3-tts-flash-realtime-2025-09-18	Mainland China	10

Qwen-TTS-Realtime

Model name	Service deployment scope	Rate limiting conditions (rate limiting is triggered when any value is exceeded) The following are per-minute rate limiting conditions. The service may also enforce RPS (RPM/60) and TPS (TPM/60) limits
		Requests per minute (RPM)	Tokens consumed per minute (TPM) Including input and output tokens
qwen-tts-realtime	Mainland China	10	100,000
qwen-tts-realtime-latest	Mainland China
qwen-tts-realtime-2025-07-15	Mainland China

Qwen voice cloning

Singapore

Model name	Service deployment scope	Requests per minute (RPM)
qwen-voice-enrollment	International	180

China (Beijing)

Model name	Service deployment scope	Requests per minute (RPM)
qwen-voice-enrollment	Mainland China	180

Qwen voice design

Singapore

Model name	Service deployment scope	Requests per minute (RPM)
qwen-voice-design	International	180

China (Beijing)

Model name	Service deployment scope	Requests per minute (RPM)
qwen-voice-design	Mainland China	180

CosyVoice speech synthesis

Singapore

Model name	Service deployment scope	Job submission API RPS limit
cosyvoice-v3-plus	International	3
cosyvoice-v3-flash	International	3

China (Beijing)

Model name	Service deployment scope	Job submission API RPS limit
cosyvoice-v3.5-plus	Mainland China	3
cosyvoice-v3.5-flash	Mainland China
cosyvoice-v3-plus	Mainland China
cosyvoice-v3-flash	Mainland China
cosyvoice-v2	Mainland China

CosyVoice voice cloning/design

CosyVoice voice cloning models share a single model and a shared rate limit quota.

Singapore

Model name	Service deployment scope	Job submission API RPS limit
voice-enrollment	International	10

China (Beijing)

Model name	Service deployment scope	Job submission API RPS limit
voice-enrollment	Mainland China	10

Speech recognition (speech-to-text) and translation (speech to text in a specified language)

Qwen3-LiveTranslate-Flash

Singapore

Model name	Service deployment scope	Rate limiting conditions (rate limiting is triggered when any value is exceeded) The following are per-minute rate limiting conditions. The service may also enforce RPS (RPM/60) and TPS (TPM/60) limits
Model name	Service deployment scope	Requests per minute (RPM)	Tokens consumed per minute (TPM) Including input and output tokens
qwen3-livetranslate-flash	International	100	100,000
qwen3-livetranslate-flash-2025-12-01	International	6,000	1,000,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (rate limiting is triggered when any value is exceeded) The following are per-minute rate limiting conditions. The service may also enforce RPS (RPM/60) and TPS (TPM/60) limits
		Requests per minute (RPM)	Tokens consumed per minute (TPM) Including input and output tokens
qwen3-livetranslate-flash	Mainland China	100	100,000
qwen3-livetranslate-flash-2025-12-01	Mainland China

Qwen-LiveTranslate-Flash-Realtime

Singapore

Model name	Service deployment scope	Rate limiting conditions (rate limiting is triggered when any value is exceeded) The following are per-minute rate limiting conditions. The service may also enforce RPS (RPM/60) and TPS (TPM/60) limits
		Requests per minute (RPM)	Tokens consumed per minute (TPM) Including input and output tokens
qwen3.5-livetranslate-flash-realtime	International	10	100,000
qwen3.5-livetranslate-flash-realtime-2026-05-19	International
qwen3-livetranslate-flash-realtime	International
qwen3-livetranslate-flash-realtime-2025-09-22	International

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (rate limiting is triggered when any value is exceeded) The following are per-minute rate limiting conditions. The service may also enforce RPS (RPM/60) and TPS (TPM/60) limits
		Requests per minute (RPM)	Tokens consumed per minute (TPM) Including input and output tokens
qwen3.5-livetranslate-flash-realtime	Mainland China	10	100,000
qwen3.5-livetranslate-flash-realtime-2026-05-19	Mainland China
qwen3-livetranslate-flash-realtime	Mainland China
qwen3-livetranslate-flash-realtime-2025-09-22	Mainland China

Qwen audio file recognition

Singapore

Qwen3-ASR-Flash-Filetrans

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-asr-flash-filetrans	International	100
qwen3-asr-flash-filetrans-2025-11-17	International	100

Qwen3-ASR-Flash

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-asr-flash	International	100
qwen3-asr-flash-2026-02-10	International
qwen3-asr-flash-2025-09-08	International

US (Virginia)

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-asr-flash-us	US	100
qwen3-asr-flash-2025-09-08-us	US	100

China (Beijing)

Qwen3-ASR-Flash-Filetrans

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-asr-flash-filetrans	Mainland China	100
qwen3-asr-flash-filetrans-2025-11-17	Mainland China	100

Qwen3-ASR-Flash

Model name	Service deployment scope	Requests per minute (RPM)
qwen3-asr-flash	Mainland China	100
qwen3-asr-flash-2026-02-10	Mainland China
qwen3-asr-flash-2025-09-08	Mainland China

Qwen real-time speech recognition

Singapore

Model name	Service deployment scope	Requests per second (RPS)
qwen3-asr-flash-realtime	International	20
qwen3-asr-flash-realtime-2026-02-10	International
qwen3-asr-flash-realtime-2025-10-27	International

China (Beijing)

Model name	Service deployment scope	Requests per second (RPS)
qwen3-asr-flash-realtime	Mainland China	20
qwen3-asr-flash-realtime-2026-02-10	Mainland China
qwen3-asr-flash-realtime-2025-10-27	Mainland China

ParaformerSpeech recognition

China (Beijing)

Model name	Service deployment scope	Job submission API RPS limit
paraformer-realtime-v2	Mainland China	20
paraformer-realtime-8k-v2	Mainland China	20

Model name	Service deployment scope	Requests per minute (RPM)
paraformer-v2	Mainland China	1,200

Model name	Service deployment scope	Job submission API RPS limit	Number of tasks being processed simultaneously (concurrency)
paraformer-8k-v2	Mainland China	20	100

Fun-ASR Audio file recognition

Singapore

Model name	Service deployment scope	Requests per minute (RPM)
fun-asr	International	600
fun-asr-2025-11-07	International	600
fun-asr-2025-08-25	International	600
fun-asr-mtl	International	100
fun-asr-mtl-2025-08-25	International	100
fun-asr-flash-2026-06-15	International	600

China (Beijing)

Model name	Service deployment scope	Requests per minute (RPM)
fun-asr	Mainland China	600
fun-asr-2025-11-07	Mainland China
fun-asr-2025-08-25	Mainland China
fun-asr-mtl	Mainland China
fun-asr-mtl-2025-08-25	Mainland China
fun-asr-flash-2026-06-15	Mainland China

Fun-ASR Real-time speech recognition

Singapore

Model name	Service deployment scope	Job submission API RPS limit
fun-asr-realtime	International	20
fun-asr-realtime-2025-11-07	International	20

China (Beijing)

Model name	Service deployment scope	Job submission API RPS limit
fun-asr-realtime	Mainland China	20
fun-asr-realtime-2026-02-28	Mainland China
fun-asr-realtime-2025-11-07	Mainland China
fun-asr-realtime-2025-09-15	Mainland China
fun-asr-flash-8k-realtime	Mainland China
fun-asr-flash-8k-realtime-2026-01-28	Mainland China

Text embedding

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM)/Number of jobs Includes input and output tokens.
text-embedding-v4	International	1,800	1,000,000
text-embedding-v3	International	6,000	24,000,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

Requests per second (RPS)

Tokens per minute (TPM)/Number of jobs

Includes input and output tokens.

text-embedding-v4

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

1,200,000

Hong Kong (China)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)/Number of jobs

Includes input and output tokens.

text-embedding-v4

Hong Kong (China)

1,800

1,000,000

Multimodal embedding

Singapore

Model name	Service deployment scope	Rate limiting conditions The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Input tokens only.
tongyi-embedding-vision-plus	International	600	200,000
tongyi-embedding-vision-flash	International	600	200,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Input tokens only.
qwen3-vl-embedding	The Chinese mainland	2,400	1,200,000
multimodal-embedding-v1	The Chinese mainland	120	100,000

Sorting model

Singapore

Model name

Service deployment scope

Rate limiting conditions

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Input tokens only.

qwen3-rerank

International

5,400

5,000,000,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Input tokens only.
qwen3-vl-rerank	The Chinese mainland	600	9,000,000
gte-rerank-v2	The Chinese mainland	5,040	4,980,000,000

Industry

Intention recognition

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

tongyi-intent-detect-v3

The Chinese mainland

1,200

1,000,000

Role assumption

Singapore

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen-plus-character	International	120	500,000
qwen-flash-character	International	120	500,000
qwen-plus-character-ja	International	120	500,000

China (Beijing)

Model name	Service deployment scope	Rate limiting conditions (triggered if any value is exceeded) The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).
Model name	Service deployment scope	Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
qwen-plus-character	The Chinese mainland	120	500,000
qwen-flash-character	The Chinese mainland	120	500,000

Offline models

For more information, see Model unpublishing policy.

Offline on January 30, 2026

Category	Model name	Rate limiting conditions (triggered if any value is exceeded)
		Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
Qwen-Plus	qwen-plus-2024-11-27	0	0
	qwen-plus-2024-11-25
	qwen-plus-2024-09-19
	qwen-plus-2024-08-06
Qwen-Turbo	qwen-turbo-2024-09-19
Qwen-VL	qwen-vl-max-2024-10-30
	qwen-vl-max-2024-08-09
	qwen-vl-plus-2024-08-09

Offline on August 20, 2025

Category	Model name	Rate limiting conditions (triggered if any value is exceeded)
		Requests per minute (RPM)	Tokens per minute (TPM) Includes input and output tokens.
Text generation - Qwen	qwen2-72b-instruct	0	0
	qwen2-57b-a14b-instruct
	qwen2-7b-instruct
	qwen1.5-110b-chat
	qwen1.5-72b-chat
	qwen1.5-32b-chat
	qwen1.5-14b-chat
	qwen1.5-7b-chat