Text generation - Qwen
Qwen-Max
Billing is based on the number of input and output tokens.
If the model supports batch calls, the batch price for input and output tokens is 50% of the real-time price. If the model supports context cache, only input tokens are eligible for a discount. These two discounts cannot be applied at the same time.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response | Free quota (Note) |
qwen3-max Batch calling 50% off Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $1.2 | $6 | 1 million tokens each Validity: 90 days after activating Model Studio |
32K<Token≤128K | $2.4 | $12 | |||
128K<Token≤252K | $3 | $15 | |||
qwen3-max-2026-01-23 | Thinking and non-thinking | 0<Token≤32K | $1.2 | $6 | |
32K<Token≤128K | $2.4 | $12 | |||
128K<Token≤252K | $3 | $15 | |||
qwen3-max-2025-09-23 | Non-thinking only | 0<Token≤32K | $1.2 | $6 | |
32K<Token≤128K | $2.4 | $12 | |||
128K<Token≤252K | $3 | $15 | |||
qwen3-max-preview Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $1.2 | $6 | |
32K<Token≤128K | $2.4 | $12 | |||
128K<Token≤252K | $3 | $15 |
More models
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-max Batch calling 50% off | Non-thinking only | No tiered pricing | $1.6 | $6.4 | 1 million tokens each |
qwen-max-latest | Non-thinking only | No tiered pricing | $1.6 | $6.4 | |
qwen-max-2025-01-25 | Non-thinking only | No tiered pricing | $1.6 | $6.4 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-max Context cache discount applicable | Non-thinking only | 0<Token≤32K | $0.359 | $1.434 |
32K<Token≤128K | $0.574 | $2.294 | ||
128K<Token≤252K | $1.004 | $4.014 | ||
qwen3-max-2025-09-23 | Non-thinking only | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.434 | $5.735 | ||
128K<Token≤252K | $2.151 | $8.602 | ||
qwen3-max-preview Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.434 | $5.735 | ||
128K<Token≤252K | $2.151 | $8.602 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-max Batch calling 50% off Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.359 | $1.434 |
32K<Token≤128K | $0.574 | $2.294 | ||
128K<Token≤252K | $1.004 | $4.014 | ||
qwen3-max-2026-01-23 | Thinking and non-thinking | 0<Token≤32K | $0.359 | $1.434 |
32K<Token≤128K | $0.574 | $2.294 | ||
128K<Token≤252K | $1.004 | $4.014 | ||
qwen3-max-2025-09-23 | Non-thinking only | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.434 | $5.735 | ||
128K<Token≤252K | $2.151 | $8.602 | ||
qwen3-max-preview Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.434 | $5.735 | ||
128K<Token≤252K | $2.151 | $8.602 |
More models
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-max | Non-thinking only | No tiered pricing | $0.345 | $1.377 |
qwen-max-latest | Non-thinking only | No tiered pricing | $0.345 | $1.377 |
qwen-max-2025-01-25 | Non-thinking only | No tiered pricing | $0.345 | $1.377 |
qwen-max-2024-09-19 | Non-thinking only | No tiered pricing | $2.868 | $8.602 |
Qwen-Plus
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | |
Non-thinking mode | Thinking mode (CoT + response) | ||||
qwen3.5-plus | 0<Token≤256K | $0.4 | $2.4 | $2.4 | 1 million tokens each |
256K<Token≤1M | $0.5 | $3 | $3 | ||
qwen3.5-plus-2026-02-15 | 0<Token≤256K | $0.4 | $2.4 | $2.4 | |
256K<Token≤1M | $0.5 | $3 | $3 | ||
qwen-plus | 0<Token≤256K | $0.4 | $1.2 | $4 | |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-latest | 0<Token≤256K | $0.4 | $1.2 | $4 | |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-2025-12-01 | 0<Token≤256K | $0.4 | $1.2 | $4 | |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-2025-09-11 | 0<Token≤256K | $0.4 | $1.2 | $4 | |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-2025-07-28 | 0<Token≤256K | $0.4 | $1.2 | $4 | |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-2025-07-14 | No tiered pricing | $0.4 | $1.2 | $4 | |
qwen-plus-2025-04-28 | No tiered pricing | $0.4 | $1.2 | $4 | |
qwen-plus-2025-01-25 | No tiered pricing | $0.4 | $1.2 | - | |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen3.5-plus | 0<Token≤128K | $0.115 | $0.688 | $0.688 |
128K<Token≤256K | $0.287 | $1.72 | $1.72 | |
256K<Token≤1M | $0.573 | $3.44 | $3.44 | |
qwen3.5-plus-2026-02-15 | 0<Token≤128K | $0.115 | $0.688 | $0.688 |
128K<Token≤256K | $0.287 | $1.72 | $1.72 | |
256K<Token≤1M | $0.573 | $3.44 | $3.44 | |
qwen-plus | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-12-01 | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-09-11 | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-07-28 | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen-plus-us | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
qwen-plus-2025-12-01-us | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen3.5-plus | 0<Token≤128K | $0.115 | $0.688 | $0.688 |
128K<Token≤256K | $0.287 | $1.72 | $1.72 | |
256K<Token≤1M | $0.573 | $3.44 | $3.44 | |
qwen3.5-plus-2026-02-15 | 0<Token≤128K | $0.115 | $0.688 | $0.688 |
128K<Token≤256K | $0.287 | $1.72 | $1.72 | |
256K<Token≤1M | $0.573 | $3.44 | $3.44 | |
qwen-plus | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-latest | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-12-01 | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-09-11 | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-07-28 | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-07-14 | No tiered pricing | $0.115 | $0.287 | $1.147 |
qwen-plus-2025-04-28 | No tiered pricing | $0.115 | $0.287 | $1.147 |
More models
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-plus-2025-01-25 | No tiered pricing | $0.115 | $0.287 |
qwen-plus-2025-01-12 | No tiered pricing | $0.115 | $0.287 |
qwen-plus-2024-12-20 | No tiered pricing | $0.115 | $0.287 |
Qwen-Flash
Billing is based on the number of input and output tokens.
If the model supports batch calls, the batch price for input and output tokens is 50% of the real-time price. If the model supports context cache, only input tokens are eligible for a discount. These two discounts cannot be applied at the same time.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen3.5-flash Batch calling50% off Context cachediscount | 0<Token≤1M | $0.1 | $0.4 | 1 million tokens each |
qwen3.5-flash-2026-02-23 | 0<Token≤1M | $0.1 | $0.4 | |
qwen-flash Batch calling 50% off Context cache discount applicable | 0<Token≤256K | $0.05 | $0.4 | |
256K<Token≤1M | $0.25 | $2 | ||
qwen-flash-2025-07-28 | 0<Token≤256K | $0.05 | $0.4 | |
256K<Token≤1M | $0.25 | $2 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen3.5-flash | 0<Token≤128K | $0.029 | $0.287 |
128K<Token≤256K | $0.115 | $1.147 | |
256K<Token≤1M | $0.172 | $1.72 | |
qwen3.5-flash-2026-02-23 | 0<Token≤128K | $0.029 | $0.287 |
128K<Token≤256K | $0.115 | $1.147 | |
256K<Token≤1M | $0.172 | $1.72 | |
qwen-flash Context cache discount applicable | 0<Token≤128K | $0.022 | $0.216 |
128K<Token≤256K | $0.087 | $0.861 | |
256K<Token≤1M | $0.173 | $1.721 | |
qwen-flash-2025-07-28 | 0<Token≤128K | $0.022 | $0.216 |
128K<Token≤256K | $0.087 | $0.861 | |
256K<Token≤1M | $0.173 | $1.721 |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-flash | 0<Token≤256K | $0.05 | $0.4 |
256K<Token≤1M | $0.25 | $2 | |
qwen-flash-2025-07-28 | 0<Token≤256K | $0.05 | $0.4 |
256K<Token≤1M | $0.25 | $2 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen3.5-flash | 0<Token≤128K | $0.029 | $0.287 |
128K<Token≤256K | $0.115 | $1.147 | |
256K<Token≤1M | $0.172 | $1.72 | |
qwen3.5-flash-2026-02-23 | 0<Token≤128K | $0.029 | $0.287 |
128K<Token≤256K | $0.115 | $1.147 | |
256K<Token≤1M | $0.172 | $1.72 | |
qwen-flash Context cache discount applicable | 0<Token≤128K | $0.022 | $0.216 |
128K<Token≤256K | $0.087 | $0.861 | |
256K<Token≤1M | $0.173 | $1.721 | |
qwen-flash-2025-07-28 | 0<Token≤128K | $0.022 | $0.216 |
128K<Token≤256K | $0.087 | $0.861 | |
256K<Token≤1M | $0.173 | $1.721 |
Qwen-Turbo
Qwen-Turbo will no longer be updated. We recommend Qwen-Flash instead.
Billing is based on the number of input and output tokens.
If the model supports batch calls, the batch price for input and output tokens is 50% of the real-time price.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen-turbo Batch calling 50% off | $0.05 | $0.2 | $0.5 | 1 million tokens each |
qwen-turbo-latest | $0.05 | $0.2 | $0.5 | |
qwen-turbo-2025-04-28 | $0.05 | $0.2 | $0.5 | |
More models
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-turbo-2024-11-01 | $0.05 | $0.2 | 1 million tokens each |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | ||
qwen-turbo | $0.044 | $0.087 | $0.431 |
qwen-turbo-latest | $0.044 | $0.087 | $0.431 |
qwen-turbo-2025-07-15 | $0.044 | $0.087 | $0.431 |
qwen-turbo-2025-04-28 | $0.044 | $0.087 | $0.431 |
QwQ
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwq-plus | $0.8 | $2.4 | 1 million tokens |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwq-plus | $0.230 | $0.574 |
qwq-plus-latest | $0.230 | $0.574 |
qwq-plus-2025-03-05 | $0.230 | $0.574 |
Qwen-Long
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-long-latest | $0.072 | $0.287 | No free quota |
qwen-long-2025-01-25 | $0.072 | $0.287 |
Qwen-Omni
Billing rule: Charges are calculated per input and output token. For token calculation rules for different modalities, see Qwen-Omni.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | ||||
Input: Text | Input: Audio | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | |||
qwen3-omni-flash | Thinking and non-thinking | $0.43 | $3.81 | $0.78 | $1.66 | $3.06 | $15.11 | 1 million tokens each (regardless of modality) Validity: 90 days after activating Model Studio |
qwen3-omni-flash-2025-12-01 | Thinking and non-thinking | $0.43 | $3.81 | $0.78 | $1.66 | $3.06 | $15.11 | |
qwen3-omni-flash-2025-09-15 | Thinking and non-thinking | $0.43 | $3.81 | $0.78 | $1.66 | $3.06 | $15.11 | |
More models
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | ||||
Input: Text | Input: Audio | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | ||
qwen-omni-turbo | $0.07 | $4.44 | $0.21 | $0.27 | $0.63 | $8.89 | 1 million tokens each (regardless of modality) Validity: 90 days after activating Model Studio |
qwen-omni-turbo-latest | $0.07 | $4.44 | $0.21 | $0.27 | $0.63 | $8.89 | |
qwen-omni-turbo-2025-03-26 | $0.07 | $4.44 | $0.21 | $0.27 | $0.63 | $8.89 | |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) | ||||
Input: Text | Input: Audio Audio part is billed separately | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | ||
qwen3-omni-flash | Thinking and non-thinking | $0.258 | $2.265 | $0.473 | $0.989 | $1.821 | $8.974 |
qwen3-omni-flash-2025-12-01 | Thinking and non-thinking | $0.258 | $2.265 | $0.473 | $0.989 | $1.821 | $8.974 |
qwen3-omni-flash-2025-09-15 | Thinking and non-thinking | $0.258 | $2.265 | $0.473 | $0.989 | $1.821 | $8.974 |
More models
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | ||||
Input: Text | Input: Audio Audio part is billed separately | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | |
qwen-omni-turbo | $0.058 | $3.584 | $0.216 | $0.230 | $0.646 | $7.168 |
qwen-omni-turbo-latest | $0.058 | $3.584 | $0.216 | $0.230 | $0.646 | $7.168 |
qwen-omni-turbo-2025-03-26 | $0.058 | $3.584 | $0.216 | $0.230 | $0.646 | $7.168 |
qwen-omni-turbo-2025-01-19 | $0.058 | $3.584 | $0.216 | $0.230 | $0.646 | $7.168 |
Qwen-Omni-Realtime
Billing rule: Charges are calculated per input and output token. For token calculation rules for different modalities, see Billing and rate limiting.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | ||||
Input: Text | Input: Audio Audio part is billed separately | Input: Image | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | ||
qwen3-omni-flash-realtime | $0.52 | $4.57 | $0.94 | $1.99 | $3.67 | $18.13 | 1 million tokens each (regardless of modality) Validity: 90 days after activating Model Studio |
qwen3-omni-flash-realtime-2025-12-01 | $0.52 | $4.57 | $0.94 | $1.99 | $3.67 | $18.13 | |
qwen3-omni-flash-2025-09-15-realtime | $0.52 | $4.57 | $0.94 | $1.99 | $3.67 | $18.13 | |
qwen-omni-turbo-realtime | $0.270 | $4.440 | $0.840 | $1.070 | $2.520 | $8.890 | |
qwen-omni-turbo-realtime-latest | $0.270 | $4.440 | $0.840 | $1.070 | $2.520 | $8.890 | |
qwen-omni-turbo-realtime-2025-05-08 | $0.270 | $4.440 | $0.840 | $1.070 | $2.520 | $8.890 | |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | ||||
Input: Text | Input: Audio Audio part is billed separately | Input: Image | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | |
qwen3-omni-flash-realtime | $0.315 | $2.709 | $0.559 | $1.19 | $2.179 | $10.766 |
qwen3-omni-flash-realtime-2025-12-01 | $0.315 | $2.709 | $0.559 | $1.19 | $2.179 | $10.766 |
qwen3-omni-flash-realtime-2025-09-15 | $0.315 | $2.709 | $0.559 | $1.19 | $2.179 | $10.766 |
qwen-omni-turbo-realtime | $0.230 | $3.584 | $0.861 | $0.918 | $2.581 | $7.168 |
qwen-omni-turbo-realtime-latest | $0.230 | $3.584 | $0.861 | $0.918 | $2.581 | $7.168 |
qwen-omni-turbo-realtime-2025-05-08 | $0.230 | $3.584 | $0.861 | $0.918 | $2.581 | $7.168 |
QVQ
Billing rule: Charges are calculated per input and output token. For token calculation rules for different modalities, see Billing and rate limiting.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qvq-max | $1.2 | $4.8 | 1 million tokens each |
qvq-max-latest | $1.2 | $4.8 | |
qvq-max-2025-03-25 | $1.2 | $4.8 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qvq-max | $1.147 | $4.588 |
qvq-max-latest | $1.147 | $4.588 |
qvq-max-2025-05-15 | $1.147 | $4.588 |
qvq-max-2025-03-25 | $1.147 | $4.588 |
qvq-plus | $0.287 | $0.717 |
qvq-plus-latest | $0.287 | $0.717 |
qvq-plus-2025-05-15 | $0.287 | $0.717 |
Qwen-VL
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response | Free quota (Note) |
qwen3-vl-plus Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.2 | $1.6 | 1 million tokens each |
32K<Token≤128K | $0.3 | $2.4 | |||
128K<Token≤256K | $0.6 | $4.8 | |||
qwen3-vl-plus-2025-12-19 | Thinking and non-thinking | 0<Token≤32K | $0.2 | $1.6 | |
32K<Token≤128K | $0.3 | $2.4 | |||
128K<Token≤256K | $0.6 | $4.8 | |||
qwen3-vl-plus-2025-09-23 | Thinking and non-thinking | 0<Token≤32K | $0.2 | $1.6 | |
32K<Token≤128K | $0.3 | $2.4 | |||
128K<Token≤256K | $0.6 | $4.8 | |||
qwen3-vl-flash Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 | |
32K<Token≤128K | $0.075 | $0.6 | |||
128K<Token≤256K | $0.12 | $0.96 | |||
qwen3-vl-flash-2026-01-22 | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 | |
32K<Token≤128K | $0.075 | $0.6 | |||
128K<Token≤256K | $0.12 | $0.96 | |||
qwen3-vl-flash-2025-10-15 | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 | |
32K<Token≤128K | $0.075 | $0.6 | |||
128K<Token≤256K | $0.12 | $0.96 |
More models
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-vl-max Context cache discount applicable | No tiered pricing | $0.8 | $3.2 | 1 million tokens each Validity: 90 days after activating Model Studio |
qwen-vl-max-latest | No tiered pricing | $0.8 | $3.2 | |
qwen-vl-max-2025-08-13 | No tiered pricing | $0.8 | $3.2 | |
qwen-vl-max-2025-04-08 | No tiered pricing | $0.8 | $3.2 | |
qwen-vl-plus Context cache discount applicable | No tiered pricing | $0.21 | $0.63 | |
qwen-vl-plus-latest | No tiered pricing | $0.21 | $0.63 | |
qwen-vl-plus-2025-08-15 | No tiered pricing | $0.21 | $0.63 | |
qwen-vl-plus-2025-05-07 | No tiered pricing | $0.21 | $0.63 | |
qwen-vl-plus-2025-01-25 | No tiered pricing | $0.21 | $0.63 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-vl-plus Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.143 | $1.434 |
32K<Token≤128K | $0.215 | $2.15 | ||
128K<Token≤256K | $0.43 | $4.301 | ||
qwen3-vl-plus-2025-09-23 | Thinking and non-thinking | 0<Token≤32K | $0.143 | $1.434 |
32K<Token≤128K | $0.215 | $2.15 | ||
128K<Token≤256K | $0.43 | $4.301 | ||
qwen3-vl-flash Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.022 | $0.215 |
32K<Token≤128K | $0.043 | $0.43 | ||
128K<Token≤256K | $0.086 | $0.859 | ||
qwen3-vl-flash-2025-10-15 | Thinking and non-thinking | 0<Token≤32K | $0.022 | $0.215 |
32K<Token≤128K | $0.043 | $0.43 | ||
128K<Token≤256K | $0.086 | $0.859 |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-vl-flash-us Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 |
32K<Token≤128K | $0.075 | $0.6 | ||
128K<Token≤256K | $0.12 | $0.96 | ||
qwen3-vl-flash-2025-10-15-us | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 |
32K<Token≤128K | $0.075 | $0.6 | ||
128K<Token≤256K | $0.12 | $0.96 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-vl-plus Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.143 | $1.434 |
32K<Token≤128K | $0.215 | $2.15 | ||
128K<Token≤256K | $0.43 | $4.301 | ||
qwen3-vl-plus-2025-12-19 | Thinking and non-thinking | 0<Token≤32K | $0.143 | $1.434 |
32K<Token≤128K | $0.215 | $2.15 | ||
128K<Token≤256K | $0.43 | $4.301 | ||
qwen3-vl-plus-2025-09-23 | Thinking and non-thinking | 0<Token≤32K | $0.143 | $1.434 |
32K<Token≤128K | $0.215 | $2.15 | ||
128K<Token≤256K | $0.43 | $4.301 | ||
qwen3-vl-flash Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.022 | $0.215 |
32K<Token≤128K | $0.043 | $0.43 | ||
128K<Token≤256K | $0.086 | $0.859 | ||
qwen3-vl-flash-2026-01-22 | Thinking and non-thinking | 0<Token≤32K | $0.022 | $0.215 |
32K<Token≤128K | $0.043 | $0.43 | ||
128K<Token≤256K | $0.086 | $0.859 | ||
qwen3-vl-flash-2025-10-15 | Thinking and non-thinking | 0<Token≤32K | $0.022 | $0.215 |
32K<Token≤128K | $0.043 | $0.43 | ||
128K<Token≤256K | $0.086 | $0.859 |
More models
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-vl-max Context cache discount applicable | No tiered pricing | $0.23 | $0.574 |
qwen-vl-max-latest | No tiered pricing | $0.23 | $0.574 |
qwen-vl-max-2025-08-13 | No tiered pricing | $0.23 | $0.574 |
qwen-vl-max-2025-04-08 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-max-2025-04-02 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-max-2025-01-25 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-max-2024-12-30 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-max-2024-11-19 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-plus Context cache discount applicable | No tiered pricing | $0.115 | $0.287 |
qwen-vl-plus-latest | No tiered pricing | $0.115 | $0.287 |
qwen-vl-plus-2025-08-15 | No tiered pricing | $0.115 | $0.287 |
qwen-vl-plus-2025-07-10 | No tiered pricing | $0.022 | $0.216 |
qwen-vl-plus-2025-05-07 | No tiered pricing | $0.216 | $0.646 |
qwen-vl-plus-2025-01-25 | No tiered pricing | $0.216 | $0.646 |
qwen-vl-plus-2025-01-02 | No tiered pricing | $0.216 | $0.646 |
Qwen-OCR
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-vl-ocr | $0.07 | $0.16 | 1 million tokens each |
qwen-vl-ocr-2025-11-20 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-vl-ocr | $0.043 | $0.072 |
qwen-vl-ocr-2025-11-20 | $0.043 | $0.072 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-vl-ocr | $0.717 | $0.717 |
qwen-vl-ocr-latest | $0.043 | $0.072 |
qwen-vl-ocr-2025-11-20 | ||
qwen-vl-ocr-2025-08-28 | $0.717 | $0.717 |
qwen-vl-ocr-2025-04-13 | ||
qwen-vl-ocr-2024-10-28 |
Qwen-Math
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-math-plus | $0.574 | $1.721 | No free quota |
qwen-math-plus-latest | $0.574 | $1.721 | |
qwen-math-plus-2024-09-19 | $0.574 | $1.721 | |
qwen-math-plus-2024-08-16 | $0.574 | $1.721 | |
qwen-math-turbo | $0.287 | $0.861 | |
qwen-math-turbo-latest | $0.287 | $0.861 | |
qwen-math-turbo-2024-09-19 | $0.287 | $0.861 |
Qwen-Coder
Billing is based on the number of input and output tokens.
If the model supports context cache, only input tokens are eligible for a discount.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen3-coder-plus Context cache discount applicable | 0<Token≤32K | $1 | $5 | 1 million tokens each |
32K<Token≤128K | $1.8 | $9 | ||
128K<Token≤256K | $3 | $15 | ||
256K<Token≤1M | $6 | $60 | ||
qwen3-coder-plus-2025-09-23 | 0<Token≤32K | $1 | $5 | |
32K<Token≤128K | $1.8 | $9 | ||
128K<Token≤256K | $3 | $15 | ||
256K<Token≤1M | $6 | $60 | ||
qwen3-coder-plus-2025-07-22 | 0<Token≤32K | $1 | $5 | |
32K<Token≤128K | $1.8 | $9 | ||
128K<Token≤256K | $3 | $15 | ||
256K<Token≤1M | $6 | $60 | ||
qwen3-coder-flash | 0<Token≤32K | $0.3 | $1.5 | |
32K<Token≤128K | $0.5 | $2.5 | ||
128K<Token≤256K | $0.8 | $4 | ||
256K<Token≤1M | $1.6 | $9.6 | ||
qwen3-coder-flash-2025-07-28 | 0<Token≤32K | $0.3 | $1.5 | |
32K<Token≤128K | $0.5 | $2.5 | ||
128K<Token≤256K | $0.8 | $4 | ||
256K<Token≤1M | $1.6 | $9.6 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen3-coder-plus Context cache discount applicable | 0<Token≤32K | $0.574 | $2.294 |
32K<Token≤128K | $0.861 | $3.441 | |
128K<Token≤256K | $1.434 | $5.735 | |
256K<Token≤1M | $2.868 | $28.671 | |
qwen3-coder-plus-2025-09-23 | 0<Token≤32K | $0.574 | $2.294 |
32K<Token≤128K | $0.861 | $3.441 | |
128K<Token≤256K | $1.434 | $5.735 | |
256K<Token≤1M | $2.868 | $28.671 | |
qwen3-coder-plus-2025-07-22 | 0<Token≤32K | $0.574 | $2.294 |
32K<Token≤128K | $0.861 | $3.441 | |
128K<Token≤256K | $1.434 | $5.735 | |
256K<Token≤1M | $2.868 | $28.671 | |
qwen3-coder-flash Context cache discount applicable | 0<Token≤32K | $0.144 | $0.574 |
32K<Token≤128K | $0.216 | $0.861 | |
128K<Token≤256K | $0.359 | $1.434 | |
256K<Token≤1M | $0.717 | $3.584 | |
qwen3-coder-flash-2025-07-28 | 0<Token≤32K | $0.144 | $0.574 |
32K<Token≤128K | $0.216 | $0.861 | |
128K<Token≤256K | $0.359 | $1.434 | |
256K<Token≤1M | $0.717 | $3.584 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
qwen3-coder series
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen3-coder-plus Context cache discount applicable | 0<Token≤32K | $0.574 | $2.294 |
32K<Token≤128K | $0.861 | $3.441 | |
128K<Token≤256K | $1.434 | $5.735 | |
256K<Token≤1M | $2.868 | $28.671 | |
qwen3-coder-plus-2025-09-23 | 0<Token≤32K | $0.574 | $2.294 |
32K<Token≤128K | $0.861 | $3.441 | |
128K<Token≤256K | $1.434 | $5.735 | |
256K<Token≤1M | $2.868 | $28.671 | |
qwen3-coder-plus-2025-07-22 | 0<Token≤32K | $0.574 | $2.294 |
32K<Token≤128K | $0.861 | $3.441 | |
128K<Token≤256K | $1.434 | $5.735 | |
256K<Token≤1M | $2.868 | $28.671 | |
qwen3-coder-flash | 0<Token≤32K | $0.144 | $0.574 |
32K<Token≤128K | $0.216 | $0.861 | |
128K<Token≤256K | $0.359 | $1.434 | |
256K<Token≤1M | $0.717 | $3.584 | |
qwen3-coder-flash-2025-07-28 | 0<Token≤32K | $0.144 | $0.574 |
32K<Token≤128K | $0.216 | $0.861 | |
128K<Token≤256K | $0.359 | $1.434 | |
256K<Token≤1M | $0.717 | $3.584 |
Earlier qwen-coder series
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-coder-plus | No tiered pricing | $0.502 | $1.004 |
qwen-coder-plus-latest | No tiered pricing | $0.502 | $1.004 |
qwen-coder-plus-2024-11-06 | No tiered pricing | $0.502 | $1.004 |
qwen-coder-turbo | No tiered pricing | $0.287 | $0.861 |
qwen-coder-turbo-latest | No tiered pricing | $0.287 | $0.861 |
qwen-coder-turbo-2024-09-19 | No tiered pricing | $0.287 | $0.861 |
Qwen-MT
Billing is based on the number of input and output tokens.
International
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-mt-plus | $2.46 | $7.37 | 1 million tokens each |
qwen-mt-flash | $0.16 | $0.49 | |
qwen-mt-lite | $0.12 | $0.36 | |
qwen-mt-turbo | $0.16 | $0.49 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-mt-plus | $0.259 | $0.775 |
qwen-mt-flash | $0.101 | $0.280 |
qwen-mt-lite | $0.086 | $0.229 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-mt-plus | $0.259 | $0.775 |
qwen-mt-flash | $0.101 | $0.280 |
qwen-mt-lite | $0.086 | $0.229 |
qwen-mt-turbo | $0.101 | $0.280 |
Qwen data mining
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-doc-turbo | $0.087 | $0.144 | No free quota |
Qwen deep research
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-deep-research | $7.742 | $23.367 | No free quota |
Text generation - Qwen - Open source
Qwen3.5
Billing is based on the number of input and output tokens.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking | Thinking (CoT + response) | |||
qwen3.5-397b-a17b | 0<Token≤128K | $0.172 | $1.032 | $1.032 |
128K<Token≤256K | $0.43 | $2.58 | $2.58 | |
qwen3.5-122b-a10b | 0<Token≤128K | $0.115 | $0.917 | $0.917 |
128K<Token≤256K | $0.287 | $2.294 | $2.294 | |
qwen3.5-27b | 0<Token≤128K | $0.086 | $0.688 | $0.688 |
128K<Token≤256K | $0.258 | $2.064 | $2.064 | |
qwen3.5-35b-a3b | 0<Token≤128K | $0.057 | $0.459 | $0.459 |
128K<Token≤256K | $0.229 | $1.835 | $1.835 | |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | |
Non-thinking | Thinking (CoT + response) | ||||
qwen3.5-397b-a17b | 0<Token≤256K | $0.6 | $3.6 | $3.6 | 1 million tokens each |
qwen3.5-122b-a10b | 0<Token≤256K | $0.4 | $3.2 | $3.2 | |
qwen3.5-27b | 0<Token≤256K | $0.3 | $2.4 | $2.4 | |
qwen3.5-35b-a3b | 0<Token≤256K | $0.25 | $2 | $2 | |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking | Thinking (CoT + response) | |||
qwen3.5-397b-a17b | 0<Token≤128K | $0.172 | $1.032 | $1.032 |
128K<Token≤256K | $0.43 | $2.58 | $2.58 | |
qwen3.5-122b-a10b | 0<Token≤128K | $0.115 | $0.917 | $0.917 |
128K<Token≤256K | $0.287 | $2.294 | $2.294 | |
qwen3.5-27b | 0<Token≤128K | $0.086 | $0.688 | $0.688 |
128K<Token≤256K | $0.258 | $2.064 | $2.064 | |
qwen3.5-35b-a3b | 0<Token≤128K | $0.057 | $0.459 | $0.459 |
128K<Token≤256K | $0.229 | $1.835 | $1.835 | |
Qwen3
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | |
Non-thinking mode | Thinking mode | ||||
qwen3-next-80b-a3b-thinking | Thinking only | $0.15 | - | $1.2 | 1 million tokens each |
qwen3-next-80b-a3b-instruct | Non-thinking only | $0.15 | $1.2 | - | |
qwen3-235b-a22b-thinking-2507 | Thinking only | $0.23 | - | $2.3 | |
qwen3-235b-a22b-instruct-2507 | Non-thinking only | $0.23 | $0.92 | - | |
qwen3-30b-a3b-thinking-2507 | Thinking only | $0.2 | - | $2.4 | |
qwen3-30b-a3b-instruct-2507 | Non-thinking only | $0.2 | $0.8 | - | |
qwen3-235b-a22b | Thinking and non-thinking | $0.7 | $2.8 | $8.4 | |
qwen3-32b | Thinking and non-thinking | $0.16 | $0.64 | $0.64 | |
qwen3-30b-a3b | Thinking and non-thinking | $0.2 | $0.8 | $2.4 | |
qwen3-14b | Thinking and non-thinking | $0.35 | $1.4 | $4.2 | |
qwen3-8b | Thinking and non-thinking | $0.18 | $0.7 | $2.1 | |
qwen3-4b | Thinking and non-thinking | $0.11 | $0.42 | $1.26 | |
qwen3-1.7b | Thinking and non-thinking | $0.11 | $0.42 | $1.26 | |
qwen3-0.6b | Thinking and non-thinking | $0.11 | $0.42 | $1.26 | |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen3-next-80b-a3b-thinking | Thinking only | $0.144 | - | $1.434 |
qwen3-next-80b-a3b-instruct | Non-thinking only | $0.144 | $0.574 | - |
qwen3-235b-a22b-thinking-2507 | Thinking only | $0.287 | - | $2.3 |
qwen3-235b-a22b-instruct-2507 | Non-thinking only | $0.287 | $0.92 | - |
qwen3-30b-a3b-thinking-2507 | Thinking only | $0.108 | - | $1.076 |
qwen3-30b-a3b-instruct-2507 | Non-thinking only | $0.108 | $0.431 | - |
qwen3-235b-a22b | Thinking and non-thinking | $0.287 | $1.147 | $2.868 |
qwen3-32b | Thinking and non-thinking | $0.287 | $0.64 | $0.64 |
qwen3-30b-a3b | Thinking and non-thinking | $0.108 | $0.431 | $1.076 |
qwen3-14b | Thinking and non-thinking | $0.144 | $0.574 | $1.434 |
qwen3-8b | Thinking and non-thinking | $0.072 | $0.287 | $0.717 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen3-next-80b-a3b-thinking | Thinking only | $0.144 | - | $1.434 |
qwen3-next-80b-a3b-instruct | Non-thinking only | $0.144 | $0.574 | - |
qwen3-235b-a22b-thinking-2507 | Thinking only | $0.287 | - | $2.868 |
qwen3-235b-a22b-instruct-2507 | Non-thinking only | $0.287 | $1.147 | - |
qwen3-30b-a3b-thinking-2507 | Thinking only | $0.108 | - | $1.076 |
qwen3-30b-a3b-instruct-2507 | Non-thinking only | $0.108 | $0.431 | - |
qwen3-235b-a22b | Thinking and non-thinking | $0.287 | $1.147 | $2.868 |
qwen3-32b | Thinking and non-thinking | $0.287 | $1.147 | $2.868 |
qwen3-30b-a3b | Thinking and non-thinking | $0.108 | $0.431 | $1.076 |
qwen3-14b | Thinking and non-thinking | $0.144 | $0.574 | $1.434 |
qwen3-8b | Thinking and non-thinking | $0.072 | $0.287 | $0.717 |
qwen3-4b | Thinking and non-thinking | $0.044 | $0.173 | $0.431 |
qwen3-1.7b | Thinking and non-thinking | $0.044 | $0.173 | $0.431 |
qwen3-0.6b | Thinking and non-thinking | $0.044 | $0.173 | $0.431 |
QwQ - Open source
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwq-32b | $0.287 | $0.861 | No free quota |
QwQ-Preview
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwq-32b-preview | $0.287 | $0.861 | No free quota |
Qwen2.5
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen2.5-14b-instruct-1m | $0.805 | $3.22 | 1 million tokens each |
qwen2.5-7b-instruct-1m | $0.368 | $1.47 | |
qwen2.5-72b-instruct | $1.4 | $5.6 | |
qwen2.5-32b-instruct | $0.7 | $2.8 | |
qwen2.5-14b-instruct | $0.35 | $1.4 | |
qwen2.5-7b-instruct | $0.175 | $0.7 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen2.5-14b-instruct-1m | $0.144 | $0.431 |
qwen2.5-7b-instruct-1m | $0.072 | $0.144 |
qwen2.5-72b-instruct | $0.574 | $1.721 |
qwen2.5-32b-instruct | $0.287 | $0.861 |
qwen2.5-14b-instruct | $0.144 | $0.431 |
qwen2.5-7b-instruct | $0.072 | $0.144 |
qwen2.5-3b-instruct | $0.044 | $0.130 |
qwen2.5-1.5b-instruct | Limited time free | |
qwen2.5-0.5b-instruct | ||
QVQ
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qvq-72b-preview | $1.721 | $5.161 | No free quota |
Qwen-Omni
Billing rule: Charges are calculated per input and output token. For token calculation rules for different modalities, see Qwen-Omni.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | ||||
Input: Text | Input: Audio | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | ||
qwen2.5-omni-7b | $0.10 | $6.76 | $0.28 | $0.40 | $0.84 | $13.51 | 1 million tokens (regardless of modality) Validity: 90 days after activating Model Studio |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | ||||
Input: Text | Input: Audio | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | |
qwen2.5-omni-7b | $0.087 | $5.448 | $0.287 | $0.345 | $0.861 | $10.895 |
Qwen3-Omni-Captioner
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen3-omni-30b-a3b-captioner | $3.81 | $3.06 | 1 million tokens |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen3-omni-30b-a3b-captioner | $2.265 | $1.821 |
Qwen-VL
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response | Free quota (Note) |
qwen3-vl-235b-a22b-thinking | Thinking only | $0.4 | $4 | 1 million tokens each |
qwen3-vl-235b-a22b-instruct | Non-thinking only | $0.4 | $1.6 | |
qwen3-vl-32b-thinking | Thinking only | $0.16 | $0.64 | |
qwen3-vl-32b-instruct | Non-thinking only | $0.16 | $0.64 | |
qwen3-vl-30b-a3b-thinking | Thinking only | $0.2 | $2.4 | |
qwen3-vl-30b-a3b-instruct | Non-thinking only | $0.2 | $0.8 | |
qwen3-vl-8b-thinking | Thinking only | $0.18 | $2.1 | |
qwen3-vl-8b-instruct | Non-thinking only | $0.18 | $0.7 |
More models
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen2.5-vl-72b-instruct | $2.8 | $8.4 | 1 million tokens each |
qwen2.5-vl-32b-instruct | $1.4 | $4.2 | |
qwen2.5-vl-7b-instruct | $0.35 | $1.05 | |
qwen2.5-vl-3b-instruct | $0.21 | $0.63 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-vl-235b-a22b-thinking | Thinking only | $0.287 | $2.867 |
qwen3-vl-235b-a22b-instruct | Non-thinking only | $0.287 | $1.147 |
qwen3-vl-32b-thinking | Thinking only | $0.16 | $0.64 |
qwen3-vl-32b-instruct | Non-thinking only | $0.16 | $0.64 |
qwen3-vl-30b-a3b-thinking | Thinking only | $0.108 | $1.075 |
qwen3-vl-30b-a3b-instruct | Non-thinking only | $0.108 | $0.43 |
qwen3-vl-8b-thinking | Thinking only | $0.072 | $0.717 |
qwen3-vl-8b-instruct | Non-thinking only | $0.072 | $0.287 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-vl-235b-a22b-thinking | Thinking only | $0.287 | $2.8677 |
qwen3-vl-235b-a22b-instruct | Non-thinking only | $0.287 | $1.147 |
qwen3-vl-32b-thinking | Thinking only | $0.287 | $2.868 |
qwen3-vl-32b-instruct | Non-thinking only | $0.287 | $1.147 |
qwen3-vl-30b-a3b-thinking | Thinking only | $0.108 | $1.076 |
qwen3-vl-30b-a3b-instruct | Non-thinking only | $0.108 | $0.431 |
qwen3-vl-8b-thinking | Thinking only | $0.072 | $0.717 |
qwen3-vl-8b-instruct | Non-thinking only | $0.072 | $0.287 |
More models
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen2.5-vl-72b-instruct | $2.294 | $6.881 |
qwen2.5-vl-32b-instruct | $1.147 | $3.441 |
qwen2.5-vl-7b-instruct | $0.287 | $0.717 |
qwen2.5-vl-3b-instruct | $0.173 | $0.517 |
qwen2-vl-72b-instruct | $2.294 | $6.881 |
qwen2-vl-7b-instruct | Limited time free | |
qwen2-vl-2b-instruct | ||
Qwen-Math
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen2.5-math-72b-instruct | $0.574 | $1.721 | No free quota |
qwen2.5-math-7b-instruct | $0.144 | $0.287 | |
qwen2.5-math-1.5b-instruct | Limited time free | ||
Qwen-Coder
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen3-coder-next | 0<Token≤32K | $0.3 | $1.5 | 1 million tokens each |
32K<Token≤128K | $0.5 | $2.5 | ||
128K<Token≤256K | $0.8 | $4 | ||
qwen3-coder-480b-a35b-instruct | 0<Token≤32K | $1.5 | $7.5 | |
32K<Token≤128K | $2.7 | $13.5 | ||
128K<Token≤200K | $4.5 | $22.5 | ||
qwen3-coder-30b-a3b-instruct | 0<Token≤32K | $0.45 | $2.25 | |
32K<Token≤128K | $0.75 | $3.75 | ||
128K<Token≤200K | $1.2 | $6 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen3-coder-480b-a35b-instruct | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.291 | $5.161 | |
128K<Token≤200K | $2.151 | $8.602 | |
qwen3-coder-30b-a3b-instruct | 0<Token≤32K | $0.216 | $0.861 |
32K<Token≤128K | $0.323 | $1.291 | |
128K<Token≤200K | $0.538 | $2.151 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen3-coder-next | 0<Token≤32K | $0.144 | $0.574 |
32K<Token≤128K | $0.216 | $0.861 | |
128K<Token≤256K | $0.359 | $1.434 | |
qwen3-coder-480b-a35b-instruct | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.291 | $5.161 | |
128K<Token≤200K | $2.151 | $8.602 | |
qwen3-coder-30b-a3b-instruct | 0<Token≤32K | $0.216 | $0.861 |
32K<Token≤128K | $0.323 | $1.291 | |
128K<Token≤200K | $0.538 | $2.151 | |
qwen2.5-coder-32b-instruct | No tiered pricing | $0.287 | $0.861 |
qwen2.5-coder-14b-instruct | No tiered pricing | $0.287 | $0.861 |
qwen2.5-coder-7b-instruct | No tiered pricing | $0.144 | $0.287 |
qwen2.5-coder-3b-instruct | No tiered pricing | Limited time free | |
qwen2.5-coder-1.5b-instruct | No tiered pricing | ||
qwen2.5-coder-0.5b-instruct | No tiered pricing | ||
Text generation - Third party
DeepSeek
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
deepseek-v3.2 | $0.287 | $0.431 | No free quota |
deepseek-v3.2-exp | $0.287 | $0.431 | |
deepseek-v3.1 | $0.574 | $1.721 | |
deepseek-r1 | $0.574 | $2.294 | |
deepseek-r1-0528 | $0.574 | $2.294 | |
deepseek-v3 | $0.287 | $1.147 | |
deepseek-r1-distill-qwen-1.5b | Limited time free | ||
deepseek-r1-distill-qwen-7b | $0.072 | $0.144 | No free quota |
deepseek-r1-distill-qwen-14b | $0.144 | $0.431 | |
deepseek-r1-distill-qwen-32b | $0.287 | $0.861 | |
deepseek-r1-distill-llama-8b | Limited time free | ||
deepseek-r1-distill-llama-70b | Limited time free | ||
Kimi
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
kimi-k2.5 | $0.574 | $3.011 | |
kimi-k2-thinking | $0.574 | $2.294 | No free quota |
Moonshot-Kimi-K2-Instruct | $0.574 | $2.294 |
GLM
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Billing is based on the number of input and output tokens.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT and response |
glm-5 | Thinking and non-thinking | 0<Token≤32K | $0.573 | $2.58 |
32K<Token≤166K | $0.86 | $3.154 | ||
glm-4.7 | Thinking and non-thinking | 0<Token≤32K | $0.431 | $2.007 |
32K<Token≤166K | $0.574 | $2.294 | ||
glm-4.6 | Thinking and non-thinking | 0<Token≤32K | $0.431 | $2.007 |
32K<Token≤166K | $0.574 | $2.294 |
Image generation
Inputs are not billed. Billing is based on the number of successfully generated images in the output.
Billing formula: Fee = Unit price per image × Number of successfully generated images.
Billing details:
-
The fee is not affected by the resolution or aspect ratio of the output images.
-
Failed requests do not incur fees or consume the free quota.
Qwen-Image
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Output price | Free quota (Note) |
qwen-image-2.0-pro | $0.075/image | 100 images each |
qwen-image-2.0-pro-2026-03-03 | $0.075/image | |
qwen-image-2.0 | $0.035/image | |
qwen-image-2.0-2026-03-03 | $0.035/image | |
qwen-image-max | $0.075/image | |
qwen-image-max-2025-12-30 | $0.075/image | |
qwen-image-plus | $0.03/image | |
qwen-image-plus-2026-01-09 | $0.03/image | |
qwen-image | $0.035/image |
Mainland China
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Output price |
qwen-image-2.0-pro | $0.071676/image |
qwen-image-2.0-pro-2026-03-03 | $0.071676/image |
qwen-image-2.0 | $0.028671/image |
qwen-image-2.0-2026-03-03 | $0.028671/image |
qwen-image-max | $0.071677/image |
qwen-image-max-2025-12-30 | $0.071677/image |
qwen-image-plus | $0.028671/image |
qwen-image-plus-2026-01-09 | $0.028671/image |
qwen-image | $0.035/image |
Qwen-Image-Edit
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Output price | Free quota (Note) |
qwen-image-2.0-pro | $0.075/image | 100 images each |
qwen-image-2.0-pro-2026-03-03 | $0.075/image | |
qwen-image-2.0 | $0.035/image | |
qwen-image-2.0-2026-03-03 | $0.035/image | |
qwen-image-edit-max | $0.075/image | |
qwen-image-edit-max-2026-01-16 | $0.075/image | |
qwen-image-edit-plus | $0.03/image | |
qwen-image-edit-plus-2025-12-15 | $0.03/image | |
qwen-image-edit-plus-2025-10-30 | $0.03/image | |
qwen-image-edit | $0.045/image |
Mainland China
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Output price |
qwen-image-2.0-pro | $0.071676/image |
qwen-image-2.0-pro-2026-03-03 | $0.071676/image |
qwen-image-2.0 | $0.028671/image |
qwen-image-2.0-2026-03-03 | $0.028671/image |
qwen-image-edit-max | $0.071677/image |
qwen-image-edit-max-2026-01-16 | $0.071677/image |
qwen-image-edit-plus | $0.028671/image |
qwen-image-edit-plus-2025-12-15 | $0.028671/image |
qwen-image-edit-plus-2025-10-30 | $0.028671/image |
qwen-image-edit | $0.043/image |
Qwen-MT-Image
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Only output is billed. For rules, see Image generation.
Model | Output price | Free quota (Note) |
qwen-mt-image | $0.000431/image | No free quota |
Text-to-image - Z-Image
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Output price | Free quota (Note) |
z-image-turbo | Prompt rewriting disabled ( Prompt rewriting enabled ( | 100 images Validity: 90 days after activating Model Studio |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Output price |
z-image-turbo | Prompt rewriting disabled ( Prompt rewriting enabled ( |
Wan text-to-image
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Output price | Free quota (Note) |
wan2.6-t2i | $0.03/image | 50 images |
wan2.5-t2i-preview | $0.03/image | 50 images |
wan2.2-t2i-plus | $0.05/image | 100 images |
wan2.2-t2i-flash | $0.025/image | 100 images |
wan2.1-t2i-plus | $0.05/image | 200 images |
wan2.1-t2i-turbo | $0.025/image | 200 images |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Output price |
wan2.6-t2i | $0.028671/image |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Output price |
wan2.6-t2i | $0.028671/image |
wan2.5-t2i-preview | $0.028671/image |
wan2.2-t2i-plus | $0.020070/image |
wan2.2-t2i-flash | $0.028671/image |
wanx2.1-t2i-plus | $0.028671/image |
wanx2.1-t2i-turbo | $0.020070/image |
wanx2.0-t2i-turbo | $0.005735/image |
Wan image generation and editing
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Output price | Free quota (Note) |
wan2.6-image | $0.03/image | 50 images |
Global
Global (Virginia) models do not offer a free quota.
Model | Output price |
wan2.6-image | $0.028671/image |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Output price |
wan2.6-image | $0.028671/image |
Wan general image editing
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Service | Model | Output price | Free quota (Note) |
General image editing 2.5 | wan2.5-i2i-preview | $0.03/image | 50 images |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Service | Model | Output price |
General image editing 2.5 | wan2.5-i2i-preview | $0.028671/image |
General image editing 2.1 | wanx2.1-imageedit | $0.020070/image |
OutfitAnyone
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
aitryon-plus: Charges apply to output only. For billing rules, see Image generation.
aitryon-parsing-v1: Charges apply to input only. Billing based on the number of input images. Failed requests are not billed.
Service | Model | Price | Free quota (Note) |
OutfitAnyone - Plus | aitryon-plus | $0.071677/image | No free quota |
OutfitAnyone - Image parsing | aitryon-parsing-v1 | $0.000574/image |
Video generation
Inputs are not billed. Billing is based on the duration in seconds of successfully generated videos in the output.
Billing formula: Fee = Unit price per video × Duration of successfully generated video (in seconds).
Billing details:
-
For some models, the price is based on the output video resolution. The prices for different resolutions, such as 480P, 720P, and 1080P, vary.
-
For some models, the price is based on the output video mode. The prices for different video modes, such as Standard Edition and Professional Edition, vary.
-
For some models, the price is based on the output video aspect ratio. The prices for different video aspect ratios, such as 1:1 and 3:4, vary.
-
Some models use uniform pricing, regardless of resolution, pattern, or aspect ratio.
-
Failed requests do not incur fees or consume the free quota.
Wan - text-to-video
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Output video resolution | Output price | Free quota (Note) Validity: 90 days after activating Model Studio |
wan2.6-t2v | 720P | $0.10/second | 50 seconds |
1080P | $0.15/second | ||
wan2.5-t2v-preview | 480P | $0.05/second | 50 seconds |
720P | $0.10/second | ||
1080P | $0.15/second | ||
wan2.2-t2v-plus | 480P | $0.02/second | 50 seconds |
1080P | $0.10/second | ||
wan2.1-t2v-turbo | 480P | $0.036/second | 200 seconds |
720P | $0.036/second | ||
wan2.1-t2v-plus | 720P | $0.10/second | 200 seconds |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-t2v | 720P | $0.086012/second |
1080P | $0.143353/second |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-t2v-us | 720P | $0.1/second |
1080P | $0.15/second |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-t2v | 720P | $0.086012/second |
1080P | $0.143353/second | |
wan2.5-t2v-preview | 480P | $0.043006/second |
720P | $0.086012/second | |
1080P | $0.143353/second | |
wan2.2-t2v-plus | 480P | $0.02007/second |
1080P | $0.100347/second | |
wanx2.1-t2v-turbo | 480P | $0.034405/second |
720P | $0.034405/second | |
wanx2.1-t2v-plus | 720P | $0.100347/second |
Wan - image-to-video - first fame
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Output video type | Output video resolution | Output price | Free quota (Note) Validity: 90 days after activating Model Studio |
wan2.6-i2v-flash | Video with audio
| 720P | $0.05/second | 50 seconds |
1080P | $0.075/second | |||
Video without audio
| 720P | $0.025/second | ||
1080P | $0.0375/second | |||
wan2.6-i2v | Video with audio | 720P | $0.10/second | 50 seconds |
1080P | $0.15/second | |||
wan2.5-i2v-preview | Video with audio | 480P | $0.05/second | 50 seconds |
720P | $0.10/second | |||
1080P | $0.15/second | |||
wan2.2-i2v-flash | Video without audio | 480P | $0.015/second | 50 seconds |
720P | $0.036/second | |||
wan2.2-i2v-plus | Video without audio | 480P | $0.02/second | 50 seconds |
1080P | $0.10/second | |||
wan2.1-t2v-turbo | Video without audio | 480P | $0.036/second | 200 seconds |
720P | $0.036/second | |||
wan2.1-t2v-plus | Video without audio | 720P | $0.10/second | 200 seconds |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-i2v | 720P | $0.086012/second |
1080P | $0.143353/second |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-i2v-us | 720P | $0.1/second |
1080P | $0.15/second |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Output video type | Output video resolution | Output price |
wan2.6-i2v-flash | Video with audio
| 720P | $0.043006/second |
1080P | $0.071676/second | ||
Video without audio
| 720P | $0.021503/second | |
1080P | $0.035838/second | ||
wan2.6-i2v | Video with audio | 720P | $0.086012/second |
1080P | $0.143353/second | ||
wan2.5-i2v-preview | Video with audio | 480P | $0.043006/second |
720P | $0.086012/second | ||
1080P | $0.143353/second | ||
wan2.2-i2v-plus | Video without audio | 480P | $0.02007/second |
1080P | $0.100347/second | ||
wanx2.1-t2v-turbo | Video without audio | 480P | $0.034405/second |
720P | $0.034405/second | ||
wanx2.1-t2v-plus | Video without audio | 720P | $0.100347/second |
Wan - image-to-video - first and last frames
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Output video resolution | Output price | Free quota (Note) Validity: 90 days after activating Model Studio |
wan2.2-kf2v-flash | 480P | $0.015/second | 50 seconds |
720P | $0.036/second | ||
1080P | $0.07/second | ||
wan2.1-kf2v-plus | 720P | $0.10/second | 200 seconds |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.2-kf2v-flash | 480P | $0.014335/second |
720P | $0.028671/second | |
1080P | $0.068809/second | |
wanx2.1-kf2v-plus | 720P | $0.100347/second |
Wan - reference-to-video
Billing rule: Charges apply to both input and output videos by seconds of video duration. Failed generations are not billed and do not consume the free quota.
Formula: Billable duration = input video duration (up to 5 seconds) + output video duration.
The input video is billed for no more than 5 seconds. For specific rules, see Wan - reference-to-video.
The output video is billed based on seconds of successfully generated video.
Pricinf description: The unit price is decided by the resolution tier and the audio option, regardless of the input video's resultion or audio.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Output specification | Output resolution | Input & output price | Free quota (Note) Validity: 90 days after activating Model Studio |
wan.6-r2v-flash | Video with audio
| 720P | $0.05/second | 50 seconds |
1080P | $0.075/second | |||
Video without audio
| 720P | $0.025/second | ||
1080P | $0.0375/second | |||
wan2.6-r2v | Video with audio | 720P | $0.10/second | 50 seconds |
1080P | $0.15/second |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Output specification | Output resolution | Input & output price |
wan2.6-r2v | Video with audio | 720P | $0.086012/second |
1080P | $0.143353/second |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Output specification | Output resolution | Input & output price |
wan.6-r2v-flash | Video with audio
| 720P | $0.043006/second |
1080P | $0.071676/second | ||
Video without audio
| 720P | $0.021503/second | |
1080P | $0.035838/second | ||
wan2.6-r2v | Video with audio | 720P | $0.086012/second |
1080P | $0.143353/second |
Wan - general video editing
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Output video resolution | Output price | Free quota (Note) |
wan2.1-vace-plus | 720P | $0.10/second | 50 seconds Validity: 90 days after activating Model Studio |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wanx2.1-vace-plus | 720P | $0.100347/second |
Wan - digital human
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
wan2.2-s2v-detect: Charges apply to input only. Billing based on the number of detected images. Each input image is charged once, regardless of detection success.
wan2.2-s2v: Charges apply to output only. Billing based on the duration of successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | wan2.2-s2v-detect | Input image: $0.000574/image | No free quota |
Video generation | wan2.2-s2v | Output video:
|
Wan - image-to-action
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Output video mode | Output price | Free quota (Note) |
wan2.2-animate-move | Standard mode | $0.12/second | 50 seconds Validity: 90 days after activating Model Studio |
Professional mode | $0.18/second |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Output video mode | Output price |
wan2.2-animate-move | Standard mode | $0.06/second |
Professional mode | $0.09/second |
Wan - Video character swap
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Output video mode | Output price | Free quota (Note) |
wan2.2-animate-mix | Standard mode | $0.18/second | 50 seconds Validity: 90 days after activating Model Studio |
Professional mode | $0.26/second |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Output video mode | Output price |
wan2.2-animate-mix | Standard mode | $0.09/second |
Professional mode | $0.13/second |
AnimateAnyone
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
animate-anyone-detect-gen2: Charges apply to input only. Billing based on the number of detected images. Each input image is charged once, regardless of detection success.
animate-anyone-template-gen2: Charges apply to output only. Billing based on the duration of successfully generated video in seconds. For billing rules, see Video generation.
animate-anyone-gen2: Charges apply to output only. Billing based on the duration of successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | animate-anyone-detect-gen2 | Input image: $0.000574/image | No free quota |
Action template generation | animate-anyone-template-gen2 | Output video: $0.011469/second | |
Video generation | animate-anyone-gen2 | Output video: $0.011469/second |
EMO
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
emo-detect-v1: Charges apply to input only. Billing based on the number of detected images. Each input image is charged once, regardless of detection success.
emo-v1: Charges apply to output only. Billing based on the duration of successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | emo-detect-v1 | Input image: $0.000574/image | No free quota |
Video generation | emo-v1 | Output video:
|
LivePortrait
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
liveportrait-detect: Charges apply to input only. Billing based on the number of detected images. Each input image is charged once, regardless of detection success.
liveportrait: Charges apply to output only. Billing based on the duration of successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | liveportrait-detect | Input image: $0.000574/image | No free quota |
Video generation | liveportrait | Output video: $0.002868/second |
Emoji
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
emoji-detect-v1: Charges apply to input only. Billing based on the number of detected images. Each input image is charged once, regardless of detection success.
emoji-v1: Charges apply to output only. Billing based on the duration of successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | emoji-detect-v1 | Input image: $0.000574/image | No free quota |
Video generation | emoji-v1 | Output video: $0.011469/second |
VideoRetalk
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Only output is billed. For rules, see Video generation.
Model | Output price | Free quota (Note) |
videoretalk | $0.011469/second | No free quota |
Video style transform
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Only output is billed. For rules, see Video generation.
Model | Output video resolution | Output price | Free quota (Note) |
video-style-transform | 540P | $0.028671/second | No free quota |
720P | $0.071677/second |
Speech synthesis (text-to-speech)
Qwen-TTS
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Qwen3-TTS-Instruct-Flash
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Free quota (Note) |
qwen3-tts-instruct-flash | $0.115 | 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-instruct-flash-2026-01-26 | $0.115 |
Qwen3-TTS-VD
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Free quota (Note) |
qwen3-tts-vd-2026-01-26 | $0.115 | 10,000 characters Validity: 90 days after activating Model Studio |
Qwen3-TTS-VC
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Free quota (Note) |
qwen3-tts-vc-2026-01-22 | $0.115 | 10,000 characters Validity: 90 days after activating Model Studio |
Qwen3-TTS-Flash
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Free quota (Note) |
qwen3-tts-flash | $0.1 | 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-flash-2025-11-27 | $0.1 | |
qwen3-tts-flash-2025-09-18 | $0.1 | Model Studio activated before 00:00 on November 13, 2025: 2000 characters Model Studio activated after 00:00 on November 13, 2025: 10,000 characters Validity: 90 days after activating Model Studio |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Qwen3-TTS-Instruct-Flash
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Output price (per 10K characters) |
qwen3-tts-instruct-flash | $0.115 | Not charged |
qwen3-tts-instruct-flash-2026-01-26 | $0.115 | Not charged |
Qwen3-TTS-VD
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Output price (per 10K characters) |
qwen3-tts-vd-2026-01-26 | $0.115 | Not charged |
Qwen3-TTS-VC
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Output price (per 10K characters) |
qwen3-tts-vc-2026-01-22 | $0.115 | Not charged |
Qwen3-TTS-Flash
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Output price (per 10K characters) |
qwen3-tts-flash | $0.114682 | Not charged |
qwen3-tts-flash-2025-11-27 | $0.114682 | Not charged |
qwen3-tts-flash-2025-09-18 | $0.114682 | Not charged |
Qwen-TTS
Billing rule: Charges are calculated per input and output token.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-tts-flash | $0.23 | $1.434 |
qwen-tts-latest | $0.23 | $1.434 |
qwen-tts-2025-05-22 | $0.23 | $1.434 |
qwen-tts-2025-04-10 | $0.23 | $1.434 |
Qwen-TTS-Realtime
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Qwen3-TTS-Instruct-Flash-Realtime
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Free quota (Note) |
qwen3-tts-instruct-flash-realtime | $0.143 | 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-instruct-flash-realtime-2026-01-22 | $0.143 | 10,000 characters Validity: 90 days after activating Model Studio |
Qwen3-TTS-VD-Realtime
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Free quota (Note) |
qwen3-tts-vd-realtime-2026-01-15 | $0.143353 | 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-vd-realtime-2025-12-16 | $0.143353 | 10,000 characters Validity: 90 days after activating Model Studio |
Qwen3-TTS-VC-Realtime
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Free quota (Note) |
qwen3-tts-vc-realtime-2026-01-15 | $0.13 | 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-vc-realtime-2025-11-27 |
Qwen3-TTS-Flash-Realtime
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Free quota (Note) |
qwen3-tts-flash-realtime | $0.13 | Model Studio activated before 00:00 on November 13, 2025: 2000 characters Model Studio activated after 00:00 on November 13, 2025: 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-flash-realtime-2025-11-27 | $0.13 | 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-flash-realtime-2025-09-18 | $0.13 | Model Studio activated before 00:00 on November 13, 2025: 2000 characters Model Studio activated after 00:00 on November 13, 2025: 10,000 characters Validity: 90 days after activating Model Studio |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Qwen3-TTS-Instruct-Flash-Realtime
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Output price |
qwen3-tts-instruct-flash-realtime | $0.143 | Not charged |
qwen3-tts-instruct-flash-realtime-2026-01-22 | $0.143 | Not charged |
Qwen3-TTS-VD-Realtime
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Output price |
qwen3-tts-vd-realtime-2026-01-15 | $0.143353 | Not charged |
qwen3-tts-vd-realtime-2025-12-16 | $0.143353 | Not charged |
Qwen3-TTS-VC-Realtime
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Output price |
qwen3-tts-vc-realtime-2026-01-15 | $0.143353 | Not charged |
qwen3-tts-vc-realtime-2025-11-27 |
Qwen3-TTS-Flash-Realtime
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Output price |
qwen3-tts-flash-realtime | $0.143353 | Not charged |
qwen3-tts-flash-realtime-2025-11-27 | $0.143353 | Not charged |
qwen3-tts-flash-realtime-2025-09-18 | $0.143353 | Not charged |
Qwen-TTS-Realtime
Billing rule: Charges are calculated per input and output token.
Model | Input price (per 1M tokens) | Input price (per 1M tokens) |
qwen-tts-realtime | $0.345 | $1.721 |
qwen-tts-realtime-latest | $0.345 | $1.721 |
qwen-tts-realtime-2025-07-15 | $0.345 | $1.721 |
Qwen-TTS voice cloning
Billing rule: Charges apply to the number of new voices created.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Price (per voice) | Free quota (Note) |
qwen-voice-enrollment | $0.01 | 1000 voices/account |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Price (per voice) |
qwen-voice-enrollment | $0.01 |
Qwen-TTS voice design
Billing rule: Charges apply to the number of new voices created.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Price (per voice) | Free quota (Note) |
qwen-voice-design | $0.2 | 10 voices/account |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Price (per voice) |
qwen-voice-design | $0.2 |
CosyVoice
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Free quota (Note) |
cosyvoice-v3-plus | $0.26 | 10,000 characters Validity: 90 days after activating Model Studio |
cosyvoice-v3-flash | $0.13 |
Chinese Mainland
Models in Chinese Mainland deployment mode do not have a free quota.
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (per 10K characters) | Free quota (Note) |
cosyvoice-v3.5-plus | $0.22 | No free quota |
cosyvoice-v3.5-flash | $0.116 | |
cosyvoice-v3-plus | $0.286706 | |
cosyvoice-v3-flash | $0.14335 | |
cosyvoice-v2 | $0.286706 |
Speech recognition (speech-to-text) and translation (speech-to-translation)
Qwen3-LiveTranslate-Flash-Realtime
Billing rule: Charges are calculated per input and output token. For token calculation rules for different modalities, see Billing.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | ||
Input: Audio | Input: Image | Output: Text | Output: Audio | ||
qwen3-livetranslate-flash-realtime | $10 | $1.3 | $10 | $38 | 1 million tokens each |
qwen3-livetranslate-flash-realtime-2025-09-22 | $10 | $1.3 | $10 | $38 | |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | ||
Input: Audio | Input: Image | Output: Text | Output: Audio | |
qwen3-livetranslate-flash-realtime | $9.175 | $1.147 | $9.175 | $34.405 |
qwen3-livetranslate-flash-realtime-2025-09-22 | $9.175 | $1.147 | $9.175 | $34.405 |
Qwen-ASR
Billing rule: Charges apply per second of input audio duration. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price | Free quota (Note) |
qwen3-asr-flash-filetrans | $0.000035/second | 36,000 seconds (10 hours) |
qwen3-asr-flash-filetrans-2025-11-17 | ||
qwen3-asr-flash | ||
qwen3-asr-flash-2025-09-08 |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Input price |
qwen3-asr-flash-us | $0.000035/second |
qwen3-asr-flash-2025-09-08-us |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price |
qwen3-asr-flash-filetrans | $0.000032/second |
qwen3-asr-flash-filetrans-2025-11-17 | |
qwen3-asr-flash | |
qwen3-asr-flash-2025-09-08 |
Qwen-ASR-Realtime
Billing rule: Charges apply per second of input audio duration. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price | Free quota (Note) |
qwen3-asr-flash-realtime | $0.000090/second | 36,000 seconds (10 hours) |
qwen3-asr-flash-realtime-2026-02-10 | $0.000090/second | |
qwen3-asr-flash-realtime-2025-10-27 | $0.000090/second |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price |
qwen3-asr-flash-realtime | $0.000047/second |
qwen3-asr-flash-realtime-2026-02-10 | |
qwen3-asr-flash-realtime-2025-10-27 |
Fun-ASR
Audio file recognition
Billing rule: Charges apply per second of input audio duration. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price | Free quota (Note) |
fun-asr | $0.000035/second | 36,000 seconds (10 hours) |
fun-asr-2025-11-07 | ||
fun-asr-2025-08-25 | ||
fun-asr-mtl | ||
fun-asr-mtl-2025-08-25 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price |
fun-asr | $0.000032/second |
fun-asr-2025-11-07 | |
fun-asr-2025-08-25 | |
fun-asr-mtl | |
fun-asr-mtl-2025-08-25 |
Real-time speech recognition
Billing rule: Charges apply per second of input audio duration. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price | Free quota (Note) |
fun-asr-realtime | $0.00009/second | 36,000 seconds (10 hours) Valid for 90 days |
fun-asr-realtime-2025-11-07 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price |
fun-asr-realtime | $0.000047/second |
fun-asr-realtime-2025-11-07 | |
fun-asr-realtime-2025-09-15 |
Paraformer
Audio file recognition
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Billing rule: Charges apply per second of input audio duration. Output is not billed.
Model | Input price |
paraformer-v2 | $0.000012/second |
paraformer-8k-v2 |
Real-time speech recognition
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Billing rule: Charges apply per second of input audio duration. Output is not billed.
Model | Input price | Free quota (Note) |
paraformer-realtime-v2 | $0.000035/second | No free quota |
paraformer-realtime-8k-v2 |
Text embedding
Billing rule: Charges apply per input token. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price (per 1M tokens) | Free quota (Note) |
text-embedding-v4 | $0.07 | 1 million tokens |
text-embedding-v3 | $0.07 | 500,000 tokens |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price (per 1M tokens) |
text-embedding-v4 | $0.072 |
Multimodal embedding
Billing rule: Charges apply per input token. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price (Million input tokens) | Free quota (Note) |
tongyi-embedding-vision-plus | $0.09 | 1 million tokens Validity: 90 days after activating Model Studio |
tongyi-embedding-vision-flash | Image/Video: $0.03 Text: $0.09 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Model | Input price (per 1M tokens) | Free quota (Note) |
qwen3-vl-embedding | Image/Video: $0.258 Text: $0.1 | No free quota |
multimodal-embedding-v1 | Free trial |
Text rerank
Billing rule: Charges apply per input token. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price (per 1M tokens) | Free quota (Note) |
qwen3-rerank | $0.1 | 1 million tokens Validity: 90 days after activating Model Studio |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Models in Chinese Mainland deployment mode do not have a free quota.
Model | Input price (per 1M tokens) |
gte-rerank-v2 | $0.115 |
Domain specific
Intent recognition
Only the Chinese Mainland deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Chinese Mainland.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
tongyi-intent-detect-v3 | $0.058 | $0.144 | No free quota |
Role playing
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-plus-character | $0.5 | $1.4 | No free quota |
qwen-flash-character | $0.05 | $0.4 | |
qwen-plus-character-ja | $0.5 | $1.4 |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Chinese Mainland.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-plus-character | $0.115 | $0.287 | No free quota |