All Products
Search
Document Center

Alibaba Cloud Model Studio:Rate limiting

Last Updated:Jun 25, 2026

Alibaba Cloud Model Studio applies rate limiting to model calls at the Alibaba Cloud account level, aggregating usage across all RAM users, workspaces, and API keys under the account. Requests are rejected when the limit is exceeded and typically recover automatically within one minute.

Rate limiting rules

  • Account-level rate limiting: Rate limits are applied at the root account level. The usage of all RAM users, workspaces, and API keys under the account is combined.

  • Model-specific rate limiting: Each model has its own rate limit. For more information, see the tables below.

FAQ

Why is rate limiting triggered?

You can identify the type of rate limit triggered based on the error message:

  • Requests rate limit exceeded or You exceeded your current requests list: This indicates that the requests per minute (RPM) limit was triggered.

  • Allocated quota exceeded or You exceeded your current quota: This indicates that the tokens per minute (TPM) limit was triggered.

  • Request rate increased too quickly: The request frequency surged in a short period, triggering system stability protection. This can occur even if the total number of calls has not reached the RPM or TPM limits.

  • For other errors, see Error codes to confirm the cause.

In addition to RPM and TPM, rate limiting may be enforced at the per-second level for requests per second (RPS), which is RPM/60, and tokens per second (TPS), which is TPM/60. Even if the total number of calls per minute does not exceed the limit, a burst of requests in a short time can still trigger rate limiting.

How to view model usage

One hour after you call a model, go to the Monitoring (Singapore or Beijing) page. Set the query conditions, such as the time range and workspace. Then, in the Models area, find the target model and click Monitor in the Actions column to view the model's call statistics. For more information, see the Monitoring document.

Data is updated hourly. During peak periods, there may be an hour-level latency.

image

How long does it take to recover from rate limiting?

Recovery usually occurs within one minute. If other errors occur, see Error codes for troubleshooting.

How to avoid rate limiting

  1. Choose models with higher rate limits: Stable or latest versions have higher rate limits than dated snapshot versions.

  2. Optimize your call strategy

    • Reduce call frequency: If you receive a Requests rate limit exceeded or You exceeded your current requests list error, lower the API call frequency.

    • Reduce token consumption: If you receive an Allocated quota exceeded or You exceeded your current quota error, shorten the input or limit the output length.

    • Smooth the request rate: If you receive a Request rate increased too quickly error, use uniform scheduling, exponential backoff, or a request queue to distribute requests evenly and avoid sudden peaks.

  3. Add a backup model

    If rate limiting is triggered, you can switch to a backup model to continue generation. This can reduce the probability of failure and increase throughput. The following code automatically retries with qwen-plus-2025-07-14 after a rate limit is triggered for qwen-plus-2025-07-28.

    Sample code

    import os
    import asyncio
    from openai import AsyncOpenAI, APIStatusError
    
    # Configuration
    API_KEY = os.getenv("DASHSCOPE_API_KEY")
    # Primary model
    MODEL = "qwen-plus-2025-07-28"
    # Backup model
    BACKUP_MODEL = "qwen-plus-2025-07-14"
    # Test question
    QUESTION = "Who are you?"
    # Concurrency setting
    NUM_REQUESTS = 10
    
    client = AsyncOpenAI(
        api_key=API_KEY,
        # When calling, replace {WorkspaceId} with your actual workspace ID.
        base_url="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1"
    )
    
    async def send_request(model):
        """Sends a single request."""
        try:
            await client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": QUESTION}]
            )
            return True
        except APIStatusError as e:
            if e.status_code == 429:
                print(f"[Rate limit triggered] Model {model}")
                return False
            raise
        except Exception as e:
            print(f"[Request failed] Model {model}, Error: {e}")
            return False
    
    async def task(i):
        # Try the primary model.
        if await send_request(MODEL):
            return True
        # If rate limited, try the backup model.
        return await send_request(BACKUP_MODEL)
    
    async def main():
        results = await asyncio.gather(*(task(i) for i in range(NUM_REQUESTS)))
        print(f"Successful requests: {sum(results)}, Failed requests: {len(results) - sum(results)}")
    
    if __name__ == "__main__":
        asyncio.run(main())
  4. Split tasks: Long conversations or large documents can consume many tokens quickly. You can split large batch tasks into smaller batches and submit them at different times.

  5. Use batch inference: For tasks that do not require real-time responses, you can use the Batch API. Batch requests are not subject to real-time rate limits, but you must consider queuing and processing time.

  6. Increase rate limits: If the default rate limits are insufficient, you can increase the temporary TPM quota for a model on the Increase Rate Limits page in the Model Studio console. The increase takes effect immediately. For more information, see Increase temporary rate limits.

How to control token usage or costs

Rate limiting only restricts the request rate per unit of time; it does not cap cumulative usage. To control token usage or costs, use the following methods:

  • Set a spending limit and cost alerts: On the Billing card, configure Cost alerts to enable a monthly spending limit and threshold notifications. You are notified when the threshold is reached, which helps you avoid overspending. For more information, see Query bills and manage costs.

  • Enable stop when the free quota is used up: For models that offer a free quota, you can enable stop when the free quota is used up so that calls stop automatically once the free quota is exhausted, which prevents additional charges. For more information, see Free quota.

  • Monitor model usage: Regularly check the token usage of each model to detect abnormal growth in time. See How to view model usage above.

Increase temporary rate limits

If the default rate limits are insufficient, you can increase a model's temporary TPM quota in the Model Studio console. The increase takes effect immediately and is valid for 30 days. After it expires, the quota automatically reverts to the system default.

This feature is currently available in the China (Beijing) and Singapore regions.

  1. Log on to the Model Studio console and go to the Increase Rate Limits page.

  2. Click Increase Temporary Model Rate Limit in the upper-right corner.

  3. In the dialog box that appears, select a Model and enter the desired value for Token Account Limit (Tokens/60s). The dialog box displays the current quota and the maximum configurable limit.

  4. Click OK. The increased quota takes effect immediately.

After the quota increase takes effect, you can confirm it in the following ways:

  • On the Increase Rate Limits page, view the models with increased quotas and their corresponding rate limit data in the list.

  • In the Model List, go to the details page of the corresponding model to view the updated rate limit data.

Note
  • The models for which you can temporarily increase quotas are listed in the dialog box on the Increase Rate Limits page.

  • Submitting another request for a model that already has an increased quota is considered a new application, and the validity period is reset to 30 days.

  • Request a quota based on your actual needs. If the provisioned capacity significantly exceeds actual usage for a long time, the system may restore it to the default value after prior notification.

Text generation - Qwen

Qwen language model

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3.7-max

International

600

1,000,000

qwen3.7-max-2026-06-08

International

60

1,000,000

qwen3.7-max-2026-05-20

International

60

1,000,000

qwen3.7-max-preview

International

600

1,000,000

qwen3.7-max-2026-05-17

International

600

1,000,000

qwen3.6-max-preview

International

600

1,000,000

qwen3-max

International

600

1,000,000

qwen3-max-2026-01-23

International

600

1,000,000

qwen3-max-2025-09-23

International

60

100,000

qwen3-max-preview

International

600

1,000,000

qwen-max

Rate limiting does not apply to service calls made using the Batch API.

International

600

1,000,000

qwen3.7-plus

International

15,000

5,000,000

qwen3.7-plus-2026-05-26

International

60

1,000,000

qwen3.6-plus

International

15,000

5,000,000

qwen3.6-plus-2026-04-02

International

60

1,000,000

qwen3.6-flash

International

15,000

5,000,000

qwen3.6-flash-2026-04-16

International

60

1,000,000

qwen3.5-plus

International

15,000

6,000,000

qwen3.5-plus-2026-04-20

International

600

1,000,000

qwen3.5-plus-2026-02-15

International

60

1,000,000

qwen-plus

Rate limiting does not apply to service calls made using the Batch API.

International

600

1,500,000

qwen-plus-latest

International

600

1,000,000

qwen-plus-2025-12-01

International

120

1,000,000

qwen-plus-2025-09-11

International

120

1,000,000

qwen-plus-2025-07-28

International

60

100,000

qwen-plus-2025-07-14

(qwen-plus-0714)

International

60

100,000

qwen-plus-2025-04-28

(qwen-plus-0428)

International

60

1,000,000

qwen-plus-2025-01-25

(qwen-plus-0125)

International

60

100,000

qwen3.5-flash

International

15,000

5,000,000

qwen3.5-flash-2026-02-23

International

60

1,000,000

qwen-flash

Rate limiting does not apply to service calls made using the Batch API.

International

600

5,000,000

qwen-flash-2025-07-28

International

600

5,000,000

qwq-plus

International

60

100,000

qwen-turbo

Rate limiting does not apply to service calls made using the Batch API.

International

600

5,000,000

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3.7-max

Global

30,000

5,000,000

qwen3.7-max-2026-06-08

Global

600

1,000,000

qwen3.7-max-2026-05-20

Global

600

1,000,000

qwen3-max

Global

600

1,000,000

qwen3-max-preview

Global

600

1,000,000

qwen3-max-2025-09-23

Global

60

100,000

qwen3.7-plus

Global

30,000

5,000,000

qwen3.7-plus-2026-05-26

Global

600

1,000,000

qwen3.6-plus

Global

30,000

5,000,000

qwen3.6-plus-2026-04-02

Global

600

1,000,000

qwen3.6-flash

Global

15,000

5,000,000

qwen3.6-flash-2026-04-16

Global

60

1,000,000

qwen3.5-plus

Global

30,000

5,000,000

qwen3.5-plus-2026-02-15

Global

600

1,000,000

qwen-plus

Global

15,000

5,000,000

qwen-plus-us

US

600

1,000,000

qwen-plus-2025-12-01

Global

60

1,000,000

qwen-plus-2025-09-11

Global

60

1,000,000

qwen-plus-2025-07-28

Global

60

1,000,000

qwen-plus-2025-12-01-us

US

60

1,000,000

qwen3.5-flash

Global

30,000

10,000,000

qwen3.5-flash-2026-02-23

Global

600

1,000,000

qwen-flash

Global

15,000

10,000,000

qwen-flash-us

US

600

5,000,000

qwen-flash-2025-07-28

Global

60

1,000,000

qwen-flash-2025-07-28-us

US

600

5,000,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3.7-max

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

30,000

5,000,000

qwen3.7-max-2026-06-08

The Chinese mainland

600

1,000,000

qwen3.7-max-2026-05-20

The Chinese mainland

600

1,000,000

qwen3.6-max-preview

The Chinese mainland

600

1,000,000

qwen3-max

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

30,000

5,000,000

qwen3-max-2026-01-23

The Chinese mainland

600

1,000,000

qwen3-max-2025-09-23

The Chinese mainland

60

100,000

qwen3-max-preview

The Chinese mainland

600

1,000,000

qwen-max

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

1,200

1,000,000

qwen3.7-plus

The Chinese mainland

30,000

5,000,000

qwen3.7-plus-2026-05-26

The Chinese mainland

600

1,000,000

qwen3.6-plus

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

30,000

5,000,000

qwen3.6-plus-2026-04-02

The Chinese mainland

600

1,000,000

qwen3.6-flash

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

30,000

10,000,000

qwen3.6-flash-2026-04-16

The Chinese mainland

600

1,000,000

qwen3.5-plus

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

30,000

5,000,000

qwen3.5-plus-2026-04-20

The Chinese mainland

600

1,000,000

qwen3.5-plus-2026-02-15

The Chinese mainland

600

1,000,000

qwen-plus

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

30,000

5,000,000

qwen-plus-latest

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

15,000

1,200,000

qwen-plus-2025-12-01

The Chinese mainland

120

1,000,000

qwen-plus-2025-09-11

The Chinese mainland

60

1,000,000

qwen-plus-2025-07-28

(qwen-plus-0728)

The Chinese mainland

60

1,000,000

qwen-plus-2025-07-14

(qwen-plus-0714)

The Chinese mainland

60

100,000

qwen-plus-2025-04-28

(qwen-plus-0428)

The Chinese mainland

60

1,000,000

qwen-plus-2025-01-25

(qwen-plus-0125)

The Chinese mainland

60

150,000

qwen-plus-2025-01-12

(qwen-plus-0112)

The Chinese mainland

60

150,000

qwen-plus-2024-12-20

(qwen-plus-1220)

The Chinese mainland

60

150,000

qwen3.5-flash

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

30,000

10,000,000

qwen3.5-flash-2026-02-23

The Chinese mainland

600

1,000,000

qwen-flash

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

30,000

10,000,000

qwen-flash-2025-07-28

The Chinese mainland

60

1,000,000

qwq-plus

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

600

1,000,000

qwen-turbo

The Chinese mainland

1,200

5,000,000

qwen-long-latest

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

1,200

60,000

qwen-long-2025-01-25

(qwen-long-0125)

The Chinese mainland

3

7,500

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3.7-max

Global

30,000

5,000,000

qwen3.7-max-2026-06-08

Global

600

1,000,000

qwen3.7-max-2026-05-20

Global

600

1,000,000

qwen3-max

Global

600

1,000,000

qwen3-max

EU

600

1,000,000

qwen3-max-preview

Global

600

1,000,000

qwen3-max-2026-01-23

EU

600

1,000,000

qwen3-max-2025-09-23

Global

60

100,000

qwen3.7-plus

Global

30,000

5,000,000

qwen3.7-plus-2026-05-26

Global

600

1,000,000

qwen3.6-plus

Global

30,000

5,000,000

qwen3.6-plus-2026-04-02

Global

600

1,000,000

qwen3.6-flash

Global

15,000

5,000,000

qwen3.6-flash-2026-04-16

Global

60

1,000,000

qwen3.5-plus

Global

30,000

5,000,000

qwen3.5-plus-2026-02-15

Global

600

1,000,000

qwen-plus

Global

15,000

5,000,000

qwen-plus

EU

600

1,000,000

qwen-plus-2025-12-01

Global

60

1,000,000

qwen-plus-2025-12-01

EU

120

1,000,000

qwen-plus-2025-09-11

Global

60

1,000,000

qwen-plus-2025-07-28

Global

60

1,000,000

qwen3.5-flash

Global

30,000

10,000,000

qwen3.5-flash

EU

30,000

10,000,000

qwen3.5-flash-2026-02-23

Global

600

1,000,000

qwen3.5-flash-2026-02-23

EU

600

1,000,000

qwen-flash

Global

15,000

10,000,000

qwen-flash-2025-07-28

Global

60

1,000,000

Hong Kong (China)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-max

Hong Kong (China)

600

1,000,000

qwen3-max-2026-01-23

Hong Kong (China)

600

1,000,000

qwen3.6-plus

Global

30,000

5,000,000

qwen3.6-flash

Global

15,000

5,000,000

qwen-plus

Hong Kong (China)

600

1,000,000

qwen-plus-2025-12-01

Hong Kong (China)

120

1,000,000

qwen3.5-flash

Hong Kong (China)

15,000

5,000,000

qwen3.5-flash-2026-02-23

Hong Kong (China)

60

1,000,000

Japan (Tokyo)

Model name

Service deployment scope

Rate limit conditions (rate limiting is triggered when any value is exceeded)

The following are per-minute rate limits. The service may also enforce per-second limits at RPS (RPM/60) and TPS (TPM/60)

Requests per minute (RPM)

Tokens per minute (TPM)

Including input and output tokens

qwen3.7-max

Global

30,000

5,000,000

qwen3.7-max-2026-05-20

Global

600

1,000,000

qwen3.7-plus

Global

30,000

5,000,000

qwen3.7-plus-2026-05-26

Global

600

1,000,000

qwen3.7-plus

Japan

15,000

5,000,000

qwen3.7-plus-2026-05-26

Japan

60

1,000,000

qwen3.6-plus

Global

30,000

5,000,000

qwen3.6-plus-2026-04-02

Global

600

1,000,000

qwen3.6-flash

Global

15,000

5,000,000

qwen3.6-flash-2026-04-16

Global

60

1,000,000

Qwen-VL (visual understanding/image-to-text)

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-vl-plus

International

1,200

1,000,000

qwen3-vl-plus-2025-12-19

International

60

100,000

qwen3-vl-plus-2025-09-23

International

120

1,000,000

qwen3-vl-flash

International

1,200

1,000,000

qwen3-vl-flash-2026-01-22

International

60

100,000

qwen3-vl-flash-2025-10-15

International

120

1,000,000

qwen-vl-max

International

1,200

1,000,000

qwen-vl-plus

International

1,200

1,000,000

qvq-max

International

60

100,000

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-vl-plus

Global

1,200

1,000,000

qwen3-vl-plus-2025-09-23

Global

60

100,000

qwen3-vl-flash

Global

1,200

1,000,000

qwen3-vl-flash-us

US

1,200

1,000,000

qwen3-vl-flash-2025-10-15

Global

60

100,000

qwen3-vl-flash-2026-01-22-us

US

120

1,000,000

qwen3-vl-flash-2025-10-15-us

US

120

1,000,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-vl-plus

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

3,000

5,000,000

qwen3-vl-plus-2025-12-19

The Chinese mainland

60

100,000

qwen3-vl-plus-2025-09-23

The Chinese mainland

60

100,000

qwen3-vl-flash

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

3,000

5,000,000

qwen3-vl-flash-2026-01-22

The Chinese mainland

60

100,000

qwen3-vl-flash-2025-10-15

The Chinese mainland

60

100,000

qwen-vl-max

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

1,200

1,000,000

qwen-vl-plus

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

1,200

1,000,000

qvq-max

The Chinese mainland

60

100,000

qvq-plus

The Chinese mainland

60

100,000

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-vl-plus

Global

1,200

1,000,000

qwen3-vl-plus

EU

1,200

1,000,000

qwen3-vl-plus-2025-09-23

Global

60

100,000

qwen3-vl-flash

Global

1,200

1,000,000

qwen3-vl-flash

EU

1,200

1,000,000

qwen3-vl-flash-2026-01-22

EU

60

100,000

qwen3-vl-flash-2025-10-15

Global

60

100,000

qwen3-vl-flash-2025-10-15

EU

60

100,000

Hong Kong (China)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-vl-plus

Hong Kong (China)

1,200

1,000,000

qwen3-vl-plus-2025-12-19

Hong Kong (China)

60

100,000

Qwen-Omni (omni-modal)

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3.5-omni-flash

International

60

100,000

qwen3.5-omni-flash-2026-03-15

International

60

100,000

qwen3.5-omni-plus

International

60

100,000

qwen3.5-omni-plus-2026-03-15

International

60

100,000

qwen3-omni-flash

International

60

100,000

qwen3-omni-flash-2025-12-01

International

60

100,000

qwen3-omni-flash-2025-09-15

International

60

100,000

qwen-omni-turbo

International

60

100,000

qwen-omni-turbo-latest

International

60

100,000

qwen-omni-turbo-2025-03-26

International

60

100,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3.5-omni-flash

The Chinese mainland

60

100,000

qwen3.5-omni-flash-2026-03-15

The Chinese mainland

60

100,000

qwen3.5-omni-plus

The Chinese mainland

60

100,000

qwen3.5-omni-plus-2026-03-15

The Chinese mainland

60

100,000

qwen3-omni-flash

The Chinese mainland

60

100,000

qwen3-omni-flash-2025-12-01

The Chinese mainland

60

100,000

qwen3-omni-flash-2025-09-15

The Chinese mainland

60

100,000

qwen-omni-turbo

The Chinese mainland

60

100,000

qwen-omni-turbo-latest

The Chinese mainland

60

100,000

qwen-omni-turbo-2025-03-26

(qwen-omni-turbo-0326)

The Chinese mainland

60

100,000

qwen-omni-turbo-2025-01-19

(qwen-omni-turbo-0119)

The Chinese mainland

60

100,000

Qwen-Omni-Realtime (real-time multimodal)

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3.5-omni-plus-realtime

International

60

100,000

qwen3.5-omni-plus-realtime-2026-03-15

International

60

100,000

qwen3.5-omni-flash-realtime

International

60

100,000

qwen3.5-omni-flash-realtime-2026-03-15

International

60

100,000

qwen3-omni-flash-realtime

International

60

100,000

qwen3-omni-flash-realtime-2025-12-01

International

60

100,000

qwen3-omni-flash-realtime-2025-09-15

International

60

100,000

qwen-omni-turbo-realtime

International

60

10,000

qwen-omni-turbo-realtime-latest

International

60

10,000

qwen-omni-turbo-realtime-2025-05-08

International

60

10,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3.5-omni-plus-realtime

The Chinese mainland

60

100,000

qwen3.5-omni-plus-realtime-2026-03-15

The Chinese mainland

60

100,000

qwen3.5-omni-flash-realtime

The Chinese mainland

60

100,000

qwen3.5-omni-flash-realtime-2026-03-15

The Chinese mainland

60

100,000

qwen3-omni-flash-realtime

The Chinese mainland

60

100,000

qwen3-omni-flash-realtime-2025-12-01

The Chinese mainland

60

100,000

qwen3-omni-flash-realtime-2025-09-15

The Chinese mainland

60

100,000

qwen-omni-turbo-realtime

The Chinese mainland

60

100,000

qwen-omni-turbo-realtime-latest

The Chinese mainland

60

100,000

qwen-omni-turbo-realtime-2025-05-08

The Chinese mainland

60

100,000

Qwen-OCR (text extraction)

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-vl-ocr

International

600

6,000,000

qwen-vl-ocr-2025-11-20

International

1,200

6,000,000

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-vl-ocr

Global

600

6,000,000

qwen-vl-ocr-2025-11-20

Global

1,200

6,000,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3.5-ocr

The Chinese mainland

6,000

30,000,000

qwen-vl-ocr

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

600

6,000,000

qwen-vl-ocr-latest

The Chinese mainland

1,200

6,000,000

qwen-vl-ocr-2025-11-20

The Chinese mainland

1,200

6,000,000

qwen-vl-ocr-2025-04-13

The Chinese mainland

600

6,000,000

qwen-vl-ocr-2024-10-28

The Chinese mainland

600

6,000,000

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-vl-ocr

Global

600

6,000,000

qwen-vl-ocr-2025-11-20

Global

1,200

6,000,000

Qwen math model

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-math-plus

The Chinese mainland

1,200

1,000,000

qwen-math-plus-latest

The Chinese mainland

1,200

1,000,000

qwen-math-plus-2024-09-19

(qwen-math-plus-0919)

The Chinese mainland

60

100,000

qwen-math-plus-2024-08-16

(qwen-math-plus-0816)

The Chinese mainland

10

20,000

qwen-math-turbo

The Chinese mainland

1200

1,000,000

Qwen-Coder

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-coder-plus

International

2,400

2,000,000

qwen3-coder-plus-2025-09-23

International

600

1,000,000

qwen3-coder-plus-2025-07-22

International

60

1,000,000

qwen3-coder-flash

International

600

5,000,000

qwen3-coder-flash-2025-07-28

International

600

5,000,000

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-coder-plus

Global

2,400

2,000,000

qwen3-coder-plus-2025-09-23

Global

60

1,000,000

qwen3-coder-plus-2025-07-22

Global

60

1,000,000

qwen3-coder-flash

Global

1,200

1,000,000

qwen3-coder-flash-2025-07-28

Global

60

1,000,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-coder-plus

The Chinese mainland

5,000

5,000,000

qwen3-coder-plus-2025-09-23

The Chinese mainland

60

1,000,000

qwen3-coder-plus-2025-07-22

The Chinese mainland

60

1,000,000

qwen3-coder-flash

The Chinese mainland

5,000

5,000,000

qwen3-coder-flash-2025-07-28

The Chinese mainland

60

1,000,000

qwen-coder-plus

The Chinese mainland

1,200

1,000,000

qwen-coder-turbo

The Chinese mainland

1,200

1,000,000

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-coder-plus

Global

2,400

2,000,000

qwen3-coder-plus-2025-09-23

Global

60

1,000,000

qwen3-coder-plus-2025-07-22

Global

60

1,000,000

qwen3-coder-flash

Global

1,200

1,000,000

qwen3-coder-flash-2025-07-28

Global

60

1,000,000

Qwen translation model

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-mt-plus

International

60

100,000

qwen-mt-flash

International

60

100,000

qwen-mt-lite

International

60

100,000

qwen-mt-turbo

International

60

100,000

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-mt-plus

Global

60

25,000

qwen-mt-flash

Global

60

35,000

qwen-mt-lite

Global

60

100,000

qwen-mt-lite-us

US

60

100,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-mt-plus

The Chinese mainland

60

25,000

qwen-mt-flash

The Chinese mainland

60

35,000

qwen-mt-lite

The Chinese mainland

60

100,000

qwen-mt-turbo

The Chinese mainland

60

35,000

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-mt-plus

Global

60

25,000

qwen-mt-flash

Global

60

35,000

qwen-mt-lite

Global

60

100,000

Qwen data mining model

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-doc-turbo

The Chinese mainland

600

3,000,000

Qwen deep research model

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-deep-research

The Chinese mainland

120

1,200,000

Text generation - Qwen - Open source

Qwen language model open source

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3.6-35b-a3b

International

600

1,000,000

qwen3.6-27b

International

600

1,000,000

qwen3.5-397b-a17b

International

600

1,000,000

qwen3.5-122b-a10b

International

600

1,000,000

qwen3.5-27b

International

600

1,000,000

qwen3.5-35b-a3b

International

600

5,000,000

qwen3-next-80b-a3b-thinking

International

600

1,000,000

qwen3-next-80b-a3b-instruct

International

600

1,000,000

qwen3-235b-a22b-thinking-2507

International

600

1,000,000

qwen3-235b-a22b-instruct-2507

International

600

1,000,000

qwen3-30b-a3b-thinking-2507

International

600

5,000,000

qwen3-30b-a3b-instruct-2507

International

600

5,000,000

qwen3-235b-a22b

International

600

1,000,000

qwen3-32b

International

600

1,000,000

qwen3-30b-a3b

International

600

1,000,000

qwen3-14b

International

600

1,000,000

qwen3-8b

International

600

1,000,000

qwen3-4b

International

600

1,000,000

qwen3-1.7b

International

600

1,000,000

qwen3-0.6b

International

600

1,000,000

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3.5-397b-a17b

Global

600

1,000,000

qwen3.5-122b-a10b

Global

600

1,000,000

qwen3.5-27b

Global

600

1,000,000

qwen3.6-35b-a3b

Global

600

1,000,000

qwen3.5-35b-a3b

Global

600

1,000,000

qwen3-next-80b-a3b-thinking

Global

600

1,000,000

qwen3-next-80b-a3b-instruct

Global

600

1,000,000

qwen3-235b-a22b-thinking-2507

Global

600

1,000,000

qwen3-235b-a22b-instruct-2507

Global

600

1,000,000

qwen3-30b-a3b-thinking-2507

Global

600

1,000,000

qwen3-30b-a3b-instruct-2507

Global

600

1,000,000

qwen3-235b-a22b

Global

600

1,000,000

qwen3-30b-a3b

Global

600

1,000,000

qwen3-32b

Global

600

1,000,000

qwen3-14b

Global

600

1,000,000

qwen3-8b

Global

600

1,000,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3.6-35b-a3b

The Chinese mainland

600

1,000,000

qwen3.6-27b

The Chinese mainland

600

1,000,000

qwen3.5-397b-a17b

The Chinese mainland

600

1,000,000

qwen3.5-122b-a10b

The Chinese mainland

600

1,000,000

qwen3.5-27b

The Chinese mainland

600

1,000,000

qwen3.5-35b-a3b

The Chinese mainland

600

1,000,000

qwen3-next-80b-a3b-thinking

The Chinese mainland

600

1,000,000

qwen3-next-80b-a3b-instruct

The Chinese mainland

600

1,000,000

qwen3-235b-a22b-thinking-2507

The Chinese mainland

600

1,000,000

qwen3-235b-a22b-instruct-2507

The Chinese mainland

600

1,000,000

qwen3-30b-a3b-thinking-2507

The Chinese mainland

600

1,000,000

qwen3-30b-a3b-instruct-2507

The Chinese mainland

600

1,000,000

qwen3-235b-a22b

The Chinese mainland

600

1,000,000

qwen3-30b-a3b

The Chinese mainland

600

1,000,000

qwen3-32b

The Chinese mainland

2400

1,000,000

qwen3-14b

The Chinese mainland

600

1,000,000

qwen3-8b

The Chinese mainland

600

1,000,000

qwen3-4b

The Chinese mainland

600

1,000,000

qwen3-1.7b

The Chinese mainland

600

1,000,000

qwen3-0.6b

The Chinese mainland

600

1,000,000

qwen2.5-3b-instruct

The Chinese mainland

1,200

2,000,000

qwen2.5-1.5b-instruct

The Chinese mainland

1,200

2,000,000

qwen2.5-0.5b-instruct

The Chinese mainland

1,200

2,000,000

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3.5-397b-a17b

Global

600

1,000,000

qwen3.5-122b-a10b

Global

600

1,000,000

qwen3.5-27b

Global

600

1,000,000

qwen3.6-35b-a3b

Global

600

1,000,000

qwen3.5-35b-a3b

Global

600

1,000,000

qwen3-next-80b-a3b-thinking

Global

600

1,000,000

qwen3-next-80b-a3b-instruct

Global

600

1,000,000

qwen3-235b-a22b-thinking-2507

Global

600

1,000,000

qwen3-235b-a22b-instruct-2507

Global

600

1,000,000

qwen3-30b-a3b-thinking-2507

Global

600

1,000,000

qwen3-30b-a3b-instruct-2507

Global

600

1,000,000

qwen3-235b-a22b

Global

600

1,000,000

qwen3-30b-a3b

Global

600

1,000,000

qwen3-32b

Global

600

1,000,000

qwen3-14b

Global

600

1,000,000

qwen3-8b

Global

600

1,000,000

Qwen-VL (visual understanding/image-to-text)

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-vl-32b-thinking

International

60

100,000

qwen3-vl-32b-instruct

International

60

100,000

qwen3-vl-30b-a3b-thinking

International

60

100,000

qwen3-vl-30b-a3b-instruct

International

60

100,000

qwen3-vl-8b-thinking

International

60

100,000

qwen3-vl-8b-instruct

International

60

100,000

qwen3-vl-235b-a22b-thinking

International

60

100,000

qwen3-vl-235b-a22b-instruct

International

60

100,000

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-vl-235b-a22b-thinking

Global

60

100,000

qwen3-vl-235b-a22b-instruct

Global

60

100,000

qwen3-vl-32b-thinking

Global

600

1,000,000

qwen3-vl-32b-instruct

Global

600

1,000,000

qwen3-vl-30b-a3b-thinking

Global

600

1,000,000

qwen3-vl-30b-a3b-instruct

Global

600

1,000,000

qwen3-vl-8b-thinking

Global

600

1,000,000

qwen3-vl-8b-instruct

Global

600

1,000,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-vl-32b-thinking

The Chinese mainland

600

1,000,000

qwen3-vl-32b-instruct

The Chinese mainland

600

1,000,000

qwen3-vl-30b-a3b-thinking

The Chinese mainland

600

1,000,000

qwen3-vl-30b-a3b-instruct

The Chinese mainland

600

1,000,000

qwen3-vl-8b-thinking

The Chinese mainland

600

1,000,000

qwen3-vl-8b-instruct

The Chinese mainland

600

1,000,000

qwen3-vl-235b-a22b-thinking

The Chinese mainland

60

100,000

qwen3-vl-235b-a22b-instruct

The Chinese mainland

60

100,000

qwen2-vl-72b-instruct

The Chinese mainland

1,200

1,000,000

qwen2-vl-7b-instruct

The Chinese mainland

1,200

1,000,000

qwen2-vl-2b-instruct

The Chinese mainland

1,200

1,000,000

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-vl-235b-a22b-thinking

Global

60

100,000

qwen3-vl-235b-a22b-instruct

Global

60

100,000

qwen3-vl-32b-thinking

Global

600

1,000,000

qwen3-vl-32b-instruct

Global

600

1,000,000

qwen3-vl-30b-a3b-thinking

Global

600

1,000,000

qwen3-vl-30b-a3b-instruct

Global

600

1,000,000

qwen3-vl-8b-thinking

Global

600

1,000,000

qwen3-vl-8b-instruct

Global

600

1,000,000

Qwen3-Omni

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen2.5-omni-7b

International

60

100,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen2.5-omni-7b

The Chinese mainland

60

100,000

Qwen3-Omni-Captioner

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-omni-30b-a3b-captioner

International

60

100,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-omni-30b-a3b-captioner

The Chinese mainland

60

100,000

Qwen-Math

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

Qwen-Coder

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-coder-next

International

600

1,000,000

qwen3-coder-480b-a35b-instruct

International

600

1,000,000

qwen3-coder-30b-a3b-instruct

International

600

1,000,000

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-coder-480b-a35b-instruct

Global

600

1,000,000

qwen3-coder-30b-a3b-instruct

Global

600

1,000,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-coder-next

The Chinese mainland

600

1,000,000

qwen3-coder-480b-a35b-instruct

The Chinese mainland

600

1,000,000

qwen3-coder-30b-a3b-instruct

The Chinese mainland

600

1,000,000

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen3-coder-480b-a35b-instruct

Global

600

1,000,000

qwen3-coder-30b-a3b-instruct

Global

600

1,000,000

qwen3-coder-next

EU

600

1,000,000

Text generation - Third-party models

DeepSeek

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

deepseek-v4-pro

International

10,000

1,200,000

deepseek-v4-flash

International

10,000

1,200,000

deepseek-v3.2

International

10,000

1,200,000

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

deepseek-v4-pro

Global

15,000

1,200,000

deepseek-v4-flash

Global

15,000

1,200,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

deepseek-v4-pro

The Chinese mainland

15,000

1,200,000

deepseek-v4-flash

The Chinese mainland

15,000

1,200,000

deepseek-v3.2

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

15,000

1,200,000

deepseek-v3.2-exp

The Chinese mainland

15,000

1,200,000

deepseek-v3.1

The Chinese mainland

15,000

1,200,000

deepseek-r1-0528

The Chinese mainland

60

100,000

deepseek-r1

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

15,000

1,200,000

deepseek-v3

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

15,000

1,200,000

deepseek-r1-distill-qwen-7b

The Chinese mainland

15,000

1,200,000

deepseek-r1-distill-qwen-14b

The Chinese mainland

15,000

1,200,000

deepseek-r1-distill-qwen-32b

The Chinese mainland

15,000

1,200,000

deepseek-r1-distill-qwen-1.5b

The Chinese mainland

60

100,000

deepseek-r1-distill-llama-8b

The Chinese mainland

60

100,000

deepseek-r1-distill-llama-70b

The Chinese mainland

60

100,000

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

deepseek-v4-pro

Global

15,000

1,200,000

deepseek-v4-flash

Global

15,000

1,200,000

Japan (Tokyo)

Model name

Service deployment scope

Rate limit conditions (rate limiting is triggered when any value is exceeded)

The following are per-minute rate limits. The service may also enforce per-second limits at RPS (RPM/60) and TPS (TPM/60)

Requests per minute (RPM)

Tokens per minute (TPM)

Including input and output tokens

deepseek-v4-pro

Japan

10,000

1,200,000

deepseek-v4-flash

Japan

10,000

1,200,000

deepseek-v4-pro

Global

15,000

1,200,000

deepseek-v4-flash

Global

15,000

1,200,000

Kimi

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

kimi-k2.7-code

The Chinese mainland

500

1,000,000

kimi-k2.6

The Chinese mainland

500

1,000,000

kimi-k2.5

The Chinese mainland

500

1,000,000

kimi-k2-thinking

The Chinese mainland

500

1,000,000

Moonshot-Kimi-K2-Instruct

The Chinese mainland

500

1,000,000

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

kimi-k2.7-code

Global

500

1,000,000

kimi-k2.5

Global

500

1,000,000

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

kimi-k2.7-code

Global

500

1,000,000

kimi-k2.5

Global

500

1,000,000

Hong Kong (China)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

kimi-k2.7-code

Global

500

1,000,000

Japan (Tokyo)

Model name

Service deployment scope

Rate limit conditions (rate limiting is triggered when any value is exceeded)

The following are per-minute rate limits. The service may also enforce per-second limits at RPS (RPM/60) and TPS (TPM/60)

Requests per minute (RPM)

Tokens per minute (TPM)

Including input and output tokens

kimi-k2.5

Global

500

1,000,000

MiniMax

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

MiniMax-M2.5

The Chinese mainland

500

1,000,000

GLM

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

glm-5.2

Global

500

1,000,000

glm-5.1

Global

500

1,000,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

glm-5.2

The Chinese mainland

500

1,000,000

glm-5.1

The Chinese mainland

500

1,000,000

glm-5

The Chinese mainland

500

1,000,000

glm-4.7

The Chinese mainland

500

1,000,000

glm-4.6

The Chinese mainland

60

1,000,000

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

glm-5.2

Global

500

1,000,000

glm-5.1

Global

500

1,000,000

Singapore

Model name

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

glm-5.1

500

1,000,000

Hong Kong (China)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

glm-5.2

Global

500

1,000,000

Japan (Tokyo)

Model name

Rate limit conditions (rate limiting is triggered when any value is exceeded)

The following are per-minute rate limits. The service may also enforce per-second limits at RPS (RPM/60) and TPS (TPM/60)

Requests per minute (RPM)

Tokens per minute (TPM)

Including input and output tokens

glm-5.1

500

1,000,000

Image generation

Qwen-Image

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

Task submission API call limit

Number of concurrent tasks (concurrency)

qwen-image-2.0-pro

International

2 times/minute

No limit for synchronous APIs

qwen-image-2.0-pro-2026-04-22

International

2 times/minute

No limit for synchronous APIs

qwen-image-2.0-pro-2026-03-03

International

2 times/minute

No limit for synchronous APIs

qwen-image-2.0

International

2 times/second

No limit for synchronous APIs

qwen-image-2.0-2026-03-03

International

2 times/second

No limit for synchronous APIs

qwen-image-max

International

2 times/minute

No limit for synchronous APIs

qwen-image-max-2025-12-30

International

2 times/minute

No limit for synchronous APIs

qwen-image-plus

International

2 times/second

No limit for synchronous APIs / 2 for asynchronous APIs

qwen-image-plus-2026-01-09

International

2 times/second

No limit for synchronous APIs

qwen-image

International

2 times/second

No limit for synchronous APIs / 2 for asynchronous APIs

qwen-image-edit-max

International

2 times/minute

No limit for synchronous APIs

qwen-image-edit-max-2026-01-16

International

2 times/minute

No limit for synchronous APIs

qwen-image-edit-plus

International

2 times/second

No limit for synchronous APIs

qwen-image-edit-plus-2025-12-15

International

2 times/second

No limit for synchronous APIs

qwen-image-edit-plus-2025-10-30

International

2 times/second

No limit for synchronous APIs

qwen-image-edit

International

2 times/second

No limit for synchronous APIs

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

Task submission API call limit

Number of concurrent tasks (concurrency)

qwen-image-2.0-pro

The Chinese mainland

2 times/minute

No limit for synchronous APIs

qwen-image-2.0-pro-2026-04-22

The Chinese mainland

2 times/minute

No limit for synchronous APIs

qwen-image-2.0-pro-2026-03-03

The Chinese mainland

2 times/minute

No limit for synchronous APIs

qwen-image-2.0

The Chinese mainland

2 times/second

No limit for synchronous APIs

qwen-image-2.0-2026-03-03

The Chinese mainland

2 times/second

No limit for synchronous APIs

qwen-image-max

The Chinese mainland

2 times/minute

No limit for synchronous APIs

qwen-image-max-2025-12-30

The Chinese mainland

2 times/minute

No limit for synchronous APIs

qwen-image-plus

The Chinese mainland

2 times/second

No limit for synchronous APIs / 2 for asynchronous APIs

qwen-image-plus-2026-01-09

The Chinese mainland

2 times/second

No limit for synchronous APIs

qwen-image

The Chinese mainland

2 times/second

No limit for synchronous APIs / 2 for asynchronous APIs

qwen-image-edit-max

The Chinese mainland

2 times/minute

No limit for synchronous APIs

qwen-image-edit-max-2026-01-16

The Chinese mainland

2 times/minute

No limit for synchronous APIs

qwen-image-edit-plus

The Chinese mainland

2 times/second

No limit for synchronous APIs

qwen-image-edit-plus-2025-12-15

The Chinese mainland

2 times/second

No limit for synchronous APIs

qwen-image-edit-plus-2025-10-30

The Chinese mainland

2 times/second

No limit for synchronous APIs

qwen-image-edit

The Chinese mainland

2 times/second

No limit for synchronous APIs

qwen-mt-image

The Chinese mainland

1 time/second

2

Text-to-image - Z-Image

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

z-image-turbo

International

2

No limit for synchronous APIs

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

z-image-turbo

The Chinese mainland

2

No limit for synchronous APIs

Wanxiang

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

wan2.7-image-pro

International

5

5

wan2.7-image

International

5

5

wan2.6-image

International

5

5

wan2.6-t2i

International

5

5

wan2.5-t2i-preview

International

5

5

wan2.2-t2i-flash

International

2

2

wan2.2-t2i-plus

International

2

2

wan2.1-t2i-turbo

International

2

2

wan2.1-t2i-plus

International

2

2

wan2.5-i2i-preview

International

5

5

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

wan2.6-t2i

Global

5

5

wan2.6-image

Global

5

5

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

wan2.7-image-pro

The Chinese mainland

5

5

wan2.7-image

The Chinese mainland

5

5

wan2.6-image

The Chinese mainland

5

5

wan2.6-t2i

The Chinese mainland

1

5

wan2.5-t2i-preview

The Chinese mainland

5

5

wanx2.0-t2i-turbo

The Chinese mainland

2

2

wanx2.1-t2i-turbo

The Chinese mainland

2

2

wanx2.1-t2i-plus

The Chinese mainland

2

2

wan2.2-t2i-flash

The Chinese mainland

2

2

wan2.2-t2i-plus

The Chinese mainland

2

2

wan2.5-i2i-preview

The Chinese mainland

5

5

wanx2.1-imageedit

The Chinese mainland

2

2

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

wan2.6-t2i

Global

5

5

wan2.6-image

Global

5

5

OutfitAnyone

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for job submission API

Number of concurrent tasks

aitryon-plus

The Chinese mainland

10

5

aitryon-parsing-v1

The Chinese mainland

10

No limit for synchronous APIs

Video generation

HappyHorse series

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

happyhorse-1.1-t2v

International

10

5

happyhorse-1.1-i2v

International

10

5

happyhorse-1.1-r2v

International

10

5

happyhorse-1.0-t2v

International

10

5

happyhorse-1.0-i2v

International

10

5

happyhorse-1.0-r2v

International

10

5

happyhorse-1.0-video-edit

International

10

5

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

happyhorse-1.0-t2v

Global

10

5

happyhorse-1.0-i2v

Global

10

5

happyhorse-1.0-r2v

Global

10

5

happyhorse-1.0-video-edit

Global

10

5

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

happyhorse-1.1-t2v

The Chinese mainland

10

5

happyhorse-1.1-i2v

The Chinese mainland

10

5

happyhorse-1.1-r2v

The Chinese mainland

10

5

happyhorse-1.0-t2v

The Chinese mainland

10

5

happyhorse-1.0-i2v

The Chinese mainland

10

5

happyhorse-1.0-r2v

The Chinese mainland

10

5

happyhorse-1.0-video-edit

The Chinese mainland

10

5

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

happyhorse-1.0-t2v

Global

10

5

happyhorse-1.0-i2v

Global

10

5

happyhorse-1.0-r2v

Global

10

5

happyhorse-1.0-video-edit

Global

10

5

Wanxiang series

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

wan2.7-t2v-2026-04-25

International

5

5

wan2.7-t2v

International

5

5

wan2.6-t2v

International

5

5

wan2.5-t2v-preview

International

5

5

wan2.2-t2v-plus

International

2

2

wan2.1-t2v-turbo

International

2

2

wan2.1-t2v-plus

International

2

2

wan2.7-i2v-2026-04-25

International

5

5

wan2.7-i2v

International

5

5

wan2.6-i2v-flash

International

5

5

wan2.6-i2v

International

5

5

wan2.5-i2v-preview

International

5

5

wan2.2-i2v-flash

International

2

2

wan2.1-i2v-plus

International

2

2

wan2.1-i2v-turbo

International

2

2

wan2.2-i2v-plus

International

2

2

wan2.2-kf2v-flash

International

2

2

wan2.1-kf2v-plus

International

1

2

wan2.1-vace-plus

International

2

2

wan2.7-videoedit

International

5

5

wan2.7-r2v

International

5

5

wan2.6-r2v-flash

International

5

5

wan2.6-r2v

International

5

5

wan2.2-animate-move

International

5

1

wan2.2-animate-mix

International

5

1

US (Virginia)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

wan2.6-t2v

Global

5

5

wan2.6-i2v

Global

5

5

wan2.6-r2v

Global

5

5

wan2.6-t2v-us

US

5

5

wan2.6-i2v-us

US

5

5

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

wan2.7-t2v-2026-04-25

The Chinese mainland

5

5

wan2.7-t2v

The Chinese mainland

5

5

wan2.6-t2v

The Chinese mainland

5

5

wan2.5-t2v-preview

The Chinese mainland

5

5

wan2.2-t2v-plus

The Chinese mainland

2

2

wanx2.1-t2v-turbo

The Chinese mainland

2

2

wanx2.1-t2v-plus

The Chinese mainland

2

2

wan2.7-i2v-2026-04-25

The Chinese mainland

5

5

wan2.7-i2v

The Chinese mainland

5

5

wan2.6-i2v-flash

The Chinese mainland

5

5

wan2.6-i2v

The Chinese mainland

5

5

wan2.5-i2v-preview

The Chinese mainland

5

5

wan2.2-i2v-plus

The Chinese mainland

2

2

wanx2.1-i2v-turbo

The Chinese mainland

2

2

wanx2.1-i2v-plus

The Chinese mainland

2

2

wan2.2-kf2v-flash

The Chinese mainland

2

2

wanx2.1-kf2v-plus

The Chinese mainland

2

2

wanx2.1-vace-plus

The Chinese mainland

2

2

wan2.7-videoedit

The Chinese mainland

5

5

wan2.7-r2v

The Chinese mainland

5

5

wan2.6-r2v-flash

The Chinese mainland

5

5

wan2.6-r2v

The Chinese mainland

5

5

wan2.2-s2v-detect

The Chinese mainland

5

No limit for synchronous APIs

wan2.2-s2v

The Chinese mainland

5

1

wan2.2-animate-move

The Chinese mainland

5

1

wan2.2-animate-mix

The Chinese mainland

5

1

Germany (Frankfurt)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

RPS limit for task submission API

Number of concurrent tasks (concurrency)

wan2.6-t2v

Global

5

5

wan2.6-i2v

Global

5

5

wan2.6-r2v

Global

5

5

AnimateAnyone

China (Beijing)

Model name

Service deployment scope

RPS limit for task submission API

Number of concurrent tasks

animate-anyone-detect-gen2

The Chinese mainland

5

No limit for synchronous APIs

animate-anyone-template-gen2

The Chinese mainland

5

1

Only one job runs at a time. Other jobs in the queue are in a waiting state.

animate-anyone-gen2

The Chinese mainland

5

1

Only one job runs at a time. Other jobs in the queue are in a waiting state.

EMO

China (Beijing)

Model name

Service deployment scope

RPS limit for task submission API

Number of concurrent tasks

emo-detect-v1

The Chinese mainland

5

No limit for synchronous APIs

emo-v1

The Chinese mainland

5

1

Only one job runs at a time. Other jobs in the queue are in a waiting state.

LivePortrait

China (Beijing)

Model name

Service deployment scope

RPS limit for task submission API

Number of concurrent tasks

liveportrait-detect

The Chinese mainland

5

No limit for synchronous APIs

liveportrait

The Chinese mainland

5

1

Only one job runs at a time. Other jobs in the queue are in a waiting state.

VideoRetalk

China (Beijing)

Model name

Service deployment scope

RPS limit for task submission API

Number of concurrent tasks

videoretalk

The Chinese mainland

1

1

Only one job runs at a time. Other jobs in the queue are in a waiting state.

Emoji

China (Beijing)

Model name

Service deployment scope

RPS limit for task submission API

Number of concurrent tasks

emoji-detect-v1

The Chinese mainland

1

No limit for synchronous APIs

emoji-v1

The Chinese mainland

1

1

Only one job runs at a time. Other jobs in the queue are in a waiting state.

Video style transform

China (Beijing)

Model name

Service deployment scope

RPS limit for task submission API

Number of concurrent tasks

video-style-transform

The Chinese mainland

20

2

Only one job runs at a time. Other jobs in the queue are in a waiting state.

Music generation

China (Beijing)

Model name

Service deployment scope

Requests per minute (RPM)

fun-music-preview

The Chinese mainland

180

fun-music-v1

The Chinese mainland

180

Speech synthesis (text-to-speech)

Qwen speech synthesis

Singapore

Qwen3-TTS-Instruct-Flash

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-instruct-flash

International

180

qwen3-tts-instruct-flash-2026-01-26

International

180

Qwen3-TTS-VD

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-vd-2026-01-26

International

180

Qwen3-TTS-VC

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-vc-2026-01-22

International

180

Qwen3-TTS-Flash

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-flash

International

180

qwen3-tts-flash-2025-11-27

International

180

qwen3-tts-flash-2025-09-18

International

10

China (Beijing)

Qwen3-TTS-Instruct-Flash

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-instruct-flash

Mainland China

180

qwen3-tts-instruct-flash-2026-01-26

Mainland China

180

Qwen3-TTS-VD

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-vd-2026-01-26

Mainland China

180

Qwen3-TTS-VC

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-vc-2026-01-22

Mainland China

180

Qwen3-TTS-Flash

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-flash

Mainland China

180

qwen3-tts-flash-2025-11-27

Mainland China

180

qwen3-tts-flash-2025-09-18

Mainland China

10

Qwen-TTS

Model name

Service deployment scope

Rate limiting conditions (rate limiting is triggered when any value is exceeded)

The following are per-minute rate limiting conditions. The service may also enforce RPS (RPM/60) and TPS (TPM/60) limits

Requests per minute (RPM)

Tokens consumed per minute (TPM)

Including input and output tokens

qwen-tts

Mainland China

10

100,000

qwen-tts-latest

Mainland China

qwen-tts-2025-05-22

Mainland China

qwen-tts-2025-04-10

Mainland China

Qwen real-time speech synthesis

Singapore

Qwen3-TTS-Instruct-Flash-Realtime

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-instruct-flash-realtime

International

180

qwen3-tts-instruct-flash-realtime-2026-01-22

International

180

Qwen3-TTS-VD-Realtime

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-vd-realtime-2026-01-15

International

180

qwen3-tts-vd-realtime-2025-12-16

International

Qwen3-TTS-VC-Realtime

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-vc-realtime-2026-01-15

International

180

qwen3-tts-vc-realtime-2025-11-27

International

Qwen3-TTS-Flash-Realtime

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-flash-realtime

International

180

qwen3-tts-flash-realtime-2025-11-27

International

180

qwen3-tts-flash-realtime-2025-09-18

International

10

China (Beijing)

Qwen3-TTS-Instruct-Flash-Realtime

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-instruct-flash-realtime

Mainland China

180

qwen3-tts-instruct-flash-realtime-2026-01-22

Mainland China

180

Qwen3-TTS-VD-Realtime

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-vd-realtime-2026-01-15

Mainland China

180

qwen3-tts-vd-realtime-2025-12-16

Mainland China

Qwen3-TTS-VC-Realtime

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-vc-realtime-2026-01-15

Mainland China

180

qwen3-tts-vc-realtime-2025-11-27

Mainland China

Qwen3-TTS-Flash-Realtime

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-tts-flash-realtime

Mainland China

180

qwen3-tts-flash-realtime-2025-11-27

Mainland China

180

qwen3-tts-flash-realtime-2025-09-18

Mainland China

10

Qwen-TTS-Realtime

Model name

Service deployment scope

Rate limiting conditions (rate limiting is triggered when any value is exceeded)

The following are per-minute rate limiting conditions. The service may also enforce RPS (RPM/60) and TPS (TPM/60) limits

Requests per minute (RPM)

Tokens consumed per minute (TPM)

Including input and output tokens

qwen-tts-realtime

Mainland China

10

100,000

qwen-tts-realtime-latest

Mainland China

qwen-tts-realtime-2025-07-15

Mainland China

Qwen voice cloning

Singapore

Model name

Service deployment scope

Requests per minute (RPM)

qwen-voice-enrollment

International

180

China (Beijing)

Model name

Service deployment scope

Requests per minute (RPM)

qwen-voice-enrollment

Mainland China

180

Qwen voice design

Singapore

Model name

Service deployment scope

Requests per minute (RPM)

qwen-voice-design

International

180

China (Beijing)

Model name

Service deployment scope

Requests per minute (RPM)

qwen-voice-design

Mainland China

180

CosyVoice speech synthesis

Singapore

Model name

Service deployment scope

Job submission API RPS limit

cosyvoice-v3-plus

International

3

cosyvoice-v3-flash

International

China (Beijing)

Model name

Service deployment scope

Job submission API RPS limit

cosyvoice-v3.5-plus

Mainland China

3

cosyvoice-v3.5-flash

Mainland China

cosyvoice-v3-plus

Mainland China

cosyvoice-v3-flash

Mainland China

cosyvoice-v2

Mainland China

CosyVoice voice cloning/design

CosyVoice voice cloning models share a single model and a shared rate limit quota.

Singapore

Model name

Service deployment scope

Job submission API RPS limit

voice-enrollment

International

10

China (Beijing)

Model name

Service deployment scope

Job submission API RPS limit

voice-enrollment

Mainland China

10

Speech recognition (speech-to-text) and translation (speech to text in a specified language)

Qwen3-LiveTranslate-Flash

Singapore

Model name

Service deployment scope

Rate limiting conditions (rate limiting is triggered when any value is exceeded)

The following are per-minute rate limiting conditions. The service may also enforce RPS (RPM/60) and TPS (TPM/60) limits

Requests per minute (RPM)

Tokens consumed per minute (TPM)

Including input and output tokens

qwen3-livetranslate-flash

International

100

100,000

qwen3-livetranslate-flash-2025-12-01

International

6,000

1,000,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (rate limiting is triggered when any value is exceeded)

The following are per-minute rate limiting conditions. The service may also enforce RPS (RPM/60) and TPS (TPM/60) limits

Requests per minute (RPM)

Tokens consumed per minute (TPM)

Including input and output tokens

qwen3-livetranslate-flash

Mainland China

100

100,000

qwen3-livetranslate-flash-2025-12-01

Mainland China

Qwen-LiveTranslate-Flash-Realtime

Singapore

Model name

Service deployment scope

Rate limiting conditions (rate limiting is triggered when any value is exceeded)

The following are per-minute rate limiting conditions. The service may also enforce RPS (RPM/60) and TPS (TPM/60) limits

Requests per minute (RPM)

Tokens consumed per minute (TPM)

Including input and output tokens

qwen3.5-livetranslate-flash-realtime

International

10

100,000

qwen3.5-livetranslate-flash-realtime-2026-05-19

International

qwen3-livetranslate-flash-realtime

International

qwen3-livetranslate-flash-realtime-2025-09-22

International

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (rate limiting is triggered when any value is exceeded)

The following are per-minute rate limiting conditions. The service may also enforce RPS (RPM/60) and TPS (TPM/60) limits

Requests per minute (RPM)

Tokens consumed per minute (TPM)

Including input and output tokens

qwen3.5-livetranslate-flash-realtime

Mainland China

10

100,000

qwen3.5-livetranslate-flash-realtime-2026-05-19

Mainland China

qwen3-livetranslate-flash-realtime

Mainland China

qwen3-livetranslate-flash-realtime-2025-09-22

Mainland China

Qwen audio file recognition

Singapore

Qwen3-ASR-Flash-Filetrans

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-asr-flash-filetrans

International

100

qwen3-asr-flash-filetrans-2025-11-17

International

Qwen3-ASR-Flash

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-asr-flash

International

100

qwen3-asr-flash-2026-02-10

International

qwen3-asr-flash-2025-09-08

International

US (Virginia)

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-asr-flash-us

US

100

qwen3-asr-flash-2025-09-08-us

US

China (Beijing)

Qwen3-ASR-Flash-Filetrans

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-asr-flash-filetrans

Mainland China

100

qwen3-asr-flash-filetrans-2025-11-17

Mainland China

Qwen3-ASR-Flash

Model name

Service deployment scope

Requests per minute (RPM)

qwen3-asr-flash

Mainland China

100

qwen3-asr-flash-2026-02-10

Mainland China

qwen3-asr-flash-2025-09-08

Mainland China

Qwen real-time speech recognition

Singapore

Model name

Service deployment scope

Requests per second (RPS)

qwen3-asr-flash-realtime

International

20

qwen3-asr-flash-realtime-2026-02-10

International

qwen3-asr-flash-realtime-2025-10-27

International

China (Beijing)

Model name

Service deployment scope

Requests per second (RPS)

qwen3-asr-flash-realtime

Mainland China

20

qwen3-asr-flash-realtime-2026-02-10

Mainland China

qwen3-asr-flash-realtime-2025-10-27

Mainland China

ParaformerSpeech recognition

China (Beijing)

Model name

Service deployment scope

Job submission API RPS limit

paraformer-realtime-v2

Mainland China

20

paraformer-realtime-8k-v2

Mainland China

Model name

Service deployment scope

Requests per minute (RPM)

paraformer-v2

Mainland China

1,200

Model name

Service deployment scope

Job submission API RPS limit

Number of tasks being processed simultaneously (concurrency)

paraformer-8k-v2

Mainland China

20

100

Fun-ASR Audio file recognition

Singapore

Model name

Service deployment scope

Requests per minute (RPM)

fun-asr

International

600

fun-asr-2025-11-07

International

600

fun-asr-2025-08-25

International

600

fun-asr-mtl

International

100

fun-asr-mtl-2025-08-25

International

100

fun-asr-flash-2026-06-15

International

600

China (Beijing)

Model name

Service deployment scope

Requests per minute (RPM)

fun-asr

Mainland China

600

fun-asr-2025-11-07

Mainland China

fun-asr-2025-08-25

Mainland China

fun-asr-mtl

Mainland China

fun-asr-mtl-2025-08-25

Mainland China

fun-asr-flash-2026-06-15

Mainland China

Fun-ASR Real-time speech recognition

Singapore

Model name

Service deployment scope

Job submission API RPS limit

fun-asr-realtime

International

20

fun-asr-realtime-2025-11-07

International

China (Beijing)

Model name

Service deployment scope

Job submission API RPS limit

fun-asr-realtime

Mainland China

20

fun-asr-realtime-2026-02-28

Mainland China

fun-asr-realtime-2025-11-07

Mainland China

fun-asr-realtime-2025-09-15

Mainland China

fun-asr-flash-8k-realtime

Mainland China

fun-asr-flash-8k-realtime-2026-01-28

Mainland China

Text embedding

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)/Number of jobs

Includes input and output tokens.

text-embedding-v4

International

1,800

1,000,000

text-embedding-v3

International

6,000

24,000,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

Requests per second (RPS)

Tokens per minute (TPM)/Number of jobs

Includes input and output tokens.

text-embedding-v4

Rate limiting does not apply to service calls made using the Batch API.

The Chinese mainland

30

1,200,000

Hong Kong (China)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)/Number of jobs

Includes input and output tokens.

text-embedding-v4

Hong Kong (China)

1,800

1,000,000

Multimodal embedding

Singapore

Model name

Service deployment scope

Rate limiting conditions

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Input tokens only.

tongyi-embedding-vision-plus

International

600

200,000

tongyi-embedding-vision-flash

International

600

200,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Input tokens only.

qwen3-vl-embedding

The Chinese mainland

2,400

1,200,000

multimodal-embedding-v1

The Chinese mainland

120

100,000

Sorting model

Singapore

Model name

Service deployment scope

Rate limiting conditions

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Input tokens only.

qwen3-rerank

International

5,400

5,000,000,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Input tokens only.

qwen3-vl-rerank

The Chinese mainland

600

9,000,000

gte-rerank-v2

The Chinese mainland

5,040

4,980,000,000

Industry

Intention recognition

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

tongyi-intent-detect-v3

The Chinese mainland

1,200

1,000,000

Role assumption

Singapore

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-plus-character

International

120

500,000

qwen-flash-character

International

120

500,000

qwen-plus-character-ja

International

120

500,000

China (Beijing)

Model name

Service deployment scope

Rate limiting conditions (triggered if any value is exceeded)

The following limits are per minute. The service may also enforce limits based on requests per second (RPS = RPM/60) and tokens per second (TPS = TPM/60).

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

qwen-plus-character

The Chinese mainland

120

500,000

qwen-flash-character

The Chinese mainland

120

500,000

Offline models

For more information, see Model unpublishing policy.

Offline on January 30, 2026

Category

Model name

Rate limiting conditions (triggered if any value is exceeded)

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

Qwen-Plus

qwen-plus-2024-11-27

0

0

qwen-plus-2024-11-25

qwen-plus-2024-09-19

qwen-plus-2024-08-06

Qwen-Turbo

qwen-turbo-2024-09-19

Qwen-VL

qwen-vl-max-2024-10-30

qwen-vl-max-2024-08-09

qwen-vl-plus-2024-08-09

Offline on August 20, 2025

Category

Model name

Rate limiting conditions (triggered if any value is exceeded)

Requests per minute (RPM)

Tokens per minute (TPM)

Includes input and output tokens.

Text generation - Qwen

qwen2-72b-instruct

0

0

qwen2-57b-a14b-instruct

qwen2-7b-instruct

qwen1.5-110b-chat

qwen1.5-72b-chat

qwen1.5-32b-chat

qwen1.5-14b-chat

qwen1.5-7b-chat