Text generation - Qwen
Qwen-Max
Billing is based on the number of input and output tokens.
If the model supports batch calls, the batch price for input and output tokens is 50% of the real-time price. If the model supports context cache, only input tokens are eligible for a discount. These two discounts cannot be applied at the same time.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) CoT + response |
qwen3-max Context cache discount applicable | Non-thinking only | 0<Token≤32K | $1.2 | $6 |
32K<Token≤128K | $2.4 | $12 | ||
128K<Token≤252K | $3 | $15 | ||
qwen3-max-2025-09-23 | Non-thinking only | 0<Token≤32K | $1.2 | $6 |
32K<Token≤128K | $2.4 | $12 | ||
128K<Token≤252K | $3 | $15 | ||
qwen3-max-preview Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $1.2 | $6 |
32K<Token≤128K | $2.4 | $12 | ||
128K<Token≤252K | $3 | $15 |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Mode | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) CoT + response | Free quota (Note) |
qwen3-max Batch calling half price Context cache discount applicable | Non-thinking only | 0<Token≤32K | $1.2 | $6 | 1 million tokens each Validity: 90 days after activating Model Studio |
32K<Token≤128K | $2.4 | $12 | |||
128K<Token≤252K | $3 | $15 | |||
qwen3-max-2025-09-23 | Non-thinking only | 0<Token≤32K | $1.2 | $6 | |
32K<Token≤128K | $2.4 | $12 | |||
128K<Token≤252K | $3 | $15 | |||
qwen3-max-preview Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $1.2 | $6 | |
32K<Token≤128K | $2.4 | $12 | |||
128K<Token≤252K | $3 | $15 |
More models
Model | Mode | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen-max Batch calling half price | Non-thinking only | No tiered pricing | $1.6 | $6.4 | 1 million tokens each |
qwen-max-latest | Non-thinking only | No tiered pricing | $1.6 | $6.4 | |
qwen-max-2025-01-25 | Non-thinking only | No tiered pricing | $1.6 | $6.4 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) CoT + response |
qwen3-max Batch calling half price Context cache discount applicable | Non-thinking only | 0<Token≤32K | $0.459 | $1.836 |
32K<Token≤128K | $0.918 | $3.672 | ||
128K<Token≤252K | $1.377 | $5.508 | ||
qwen3-max-2025-09-23 | Non-thinking only | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.434 | $5.735 | ||
128K<Token≤252K | $2.151 | $8.602 | ||
qwen3-max-preview Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.434 | $5.735 | ||
128K<Token≤252K | $2.151 | $8.602 |
More models
Model | Mode | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
qwen-max | Non-thinking only | No tiered pricing | $0.345 | $1.377 |
qwen-max-latest | Non-thinking only | No tiered pricing | $0.345 | $1.377 |
qwen-max-2025-01-25 | Non-thinking only | No tiered pricing | $0.345 | $1.377 |
qwen-max-2024-09-19 | Non-thinking only | No tiered pricing | $2.868 | $8.602 |
Qwen-Plus
Billing is based on the number of input and output tokens.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen-plus | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
qwen-plus-2025-12-01 | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
qwen-plus-2025-09-11 | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
qwen-plus-2025-07-28 | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) | |
Non-thinking mode | Thinking mode (CoT + response) | ||||
qwen-plus | 0<Token≤256K | $0.4 | $1.2 | $4 | 1 million tokens each |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-latest | 0<Token≤256K | $0.4 | $1.2 | $4 | |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-2025-12-01 | 0<Token≤256K | $0.4 | $1.2 | $4 | |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-2025-09-11 | 0<Token≤256K | $0.4 | $1.2 | $4 | |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-2025-07-28 | 0<Token≤256K | $0.4 | $1.2 | $4 | |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-2025-07-14 | No tiered pricing | $0.4 | $1.2 | $4 | |
qwen-plus-2025-04-28 | No tiered pricing | $0.4 | $1.2 | $4 | |
qwen-plus-2025-01-25 | No tiered pricing | $0.4 | $1.2 | - | |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen-plus-us | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
qwen-plus-2025-12-01-us | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen-plus | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-latest | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-12-01 | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-09-11 | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-07-28 | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-07-14 | No tiered pricing | $0.115 | $0.287 | $1.147 |
qwen-plus-2025-04-28 | No tiered pricing | $0.115 | $0.287 | $1.147 |
More models
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
qwen-plus-2025-01-25 | No tiered pricing | $0.115 | $0.287 |
qwen-plus-2025-01-12 | No tiered pricing | $0.115 | $0.287 |
qwen-plus-2024-12-20 | No tiered pricing | $0.115 | $0.287 |
qwen-plus-2024-11-27 | No tiered pricing | $0.115 | $0.287 |
qwen-plus-2024-11-25 | No tiered pricing | $0.115 | $0.287 |
qwen-plus-2024-09-19 | No tiered pricing | $0.115 | $0.287 |
qwen-plus-2024-08-06 | No tiered pricing | $0.574 | $1.721 |
Qwen-Flash
Billing is based on the number of input and output tokens.
If the model supports batch calls, the batch price for input and output tokens is 50% of the real-time price. If the model supports context cache, only input tokens are eligible for a discount. These two discounts cannot be applied at the same time.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
qwen-flash Context cache discount applicable | 0<Token≤256K | $0.05 | $0.4 |
256K<Token≤1M | $0.25 | $2 | |
qwen-flash-2025-07-28 | 0<Token≤256K | $0.05 | $0.4 |
256K<Token≤1M | $0.25 | $2 |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen-flash Batch calling half price Context cache discount applicable | 0<Token≤256K | $0.05 | $0.4 | 1 million tokens each |
256K<Token≤1M | $0.25 | $2 | ||
qwen-flash-2025-07-28 | 0<Token≤256K | $0.05 | $0.4 | |
256K<Token≤1M | $0.25 | $2 |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
qwen-flash | 0<Token≤256K | $0.05 | $0.4 |
256K<Token≤1M | $0.25 | $2 | |
qwen-flash-2025-07-28 | 0<Token≤256K | $0.05 | $0.4 |
256K<Token≤1M | $0.25 | $2 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
qwen-flash Context cache discount applicable | 0<Token≤128K | $0.022 | $0.216 |
128K<Token≤256K | $0.087 | $0.861 | |
256K<Token≤1M | $0.173 | $1.721 | |
qwen-flash-2025-07-28 | 0<Token≤128K | $0.022 | $0.216 |
128K<Token≤256K | $0.087 | $0.861 | |
256K<Token≤1M | $0.173 | $1.721 |
Qwen-Turbo
Qwen-Turbo will no longer be updated. We recommend Qwen-Flash instead.
Billing is based on the number of input and output tokens.
If the model supports batch calls, the batch price for input and output tokens is 50% of the real-time price.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen-turbo Batch calling half price | $0.05 | $0.2 | $0.5 | 1 million tokens each |
qwen-turbo-latest | $0.05 | $0.2 | $0.5 | |
qwen-turbo-2025-04-28 | $0.05 | $0.2 | $0.5 | |
More models
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen-turbo-2024-11-01 | $0.05 | $0.2 | 1 million tokens each |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (Million tokens) | Output price (Million tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | ||
qwen-turbo | $0.044 | $0.087 | $0.431 |
qwen-turbo-latest | $0.044 | $0.087 | $0.431 |
qwen-turbo-2025-07-15 | $0.044 | $0.087 | $0.431 |
qwen-turbo-2025-04-28 | $0.044 | $0.087 | $0.431 |
QwQ
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwq-plus | $0.8 | $2.4 | 1 million tokens |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (Million tokens) | Output price (Million tokens) |
qwq-plus | $0.230 | $0.574 |
qwq-plus-latest | $0.230 | $0.574 |
qwq-plus-2025-03-05 | $0.230 | $0.574 |
Qwen-Long
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen-long-latest | $0.072 | $0.287 | No free quota |
qwen-long-2025-01-25 | $0.072 | $0.287 |
Qwen-Omni
Billing rule: You are charged based on the number of input and output tokens. For token calculation rules for different modalities, see Billing and rate limiting.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Mode | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) | ||||
Input: Text | Input: Audio | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | |||
qwen3-omni-flash | Thinking and non-thinking | $0.43 | $3.81 | $0.78 | $1.66 | $3.06 | $15.11 | 1 million tokens each (regardless of modality) Validity: 90 days after activating Model Studio |
qwen3-omni-flash-2025-12-01 | Thinking and non-thinking | $0.43 | $3.81 | $0.78 | $1.66 | $3.06 | $15.11 | |
qwen3-omni-flash-2025-09-15 | Thinking and non-thinking | $0.43 | $3.81 | $0.78 | $1.66 | $3.06 | $15.11 | |
More models
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) | ||||
Input: Text | Input: Audio | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | ||
qwen-omni-turbo | $0.07 | $4.44 | $0.21 | $0.27 | $0.63 | $8.89 | 1 million tokens each (regardless of modality) Validity: 90 days after activating Model Studio |
qwen-omni-turbo-latest | $0.07 | $4.44 | $0.21 | $0.27 | $0.63 | $8.89 | |
qwen-omni-turbo-2025-03-26 | $0.07 | $4.44 | $0.21 | $0.27 | $0.63 | $8.89 | |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Mode | Input price (Million tokens) | Output price (Million tokens) | ||||
Input: Text | Input: Audio Audio part is billed separately | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | ||
qwen3-omni-flash | Thinking and non-thinking | $0.258 | $2.265 | $0.473 | $0.989 | $1.821 | $8.974 |
qwen3-omni-flash-2025-12-01 | Thinking and non-thinking | $0.258 | $2.265 | $0.473 | $0.989 | $1.821 | $8.974 |
qwen3-omni-flash-2025-09-15 | Thinking and non-thinking | $0.258 | $2.265 | $0.473 | $0.989 | $1.821 | $8.974 |
More models
Model | Input price (Million tokens) | Output price (Million tokens) | ||||
Input: Text | Input: Audio Audio part is billed separately | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | |
qwen-omni-turbo | $0.058 | $3.584 | $0.216 | $0.230 | $0.646 | $7.168 |
qwen-omni-turbo-latest | $0.058 | $3.584 | $0.216 | $0.230 | $0.646 | $7.168 |
qwen-omni-turbo-2025-03-26 | $0.058 | $3.584 | $0.216 | $0.230 | $0.646 | $7.168 |
qwen-omni-turbo-2025-01-19 | $0.058 | $3.584 | $0.216 | $0.230 | $0.646 | $7.168 |
Qwen-Omni-Realtime
Billing rule: You are charged based on the number of input and output tokens. For token calculation rules for different modalities, see Billing and rate limiting.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) | ||||
Input: Text | Input: Audio Audio part is billed separately | Input: Image | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | ||
qwen3-omni-flash-realtime | $0.52 | $4.57 | $0.94 | $1.99 | $3.67 | $18.13 | 1 million tokens each (regardless of modality) Validity: 90 days after activating Model Studio |
qwen3-omni-flash-realtime-2025-12-01 | $0.52 | $4.57 | $0.94 | $1.99 | $3.67 | $18.13 | |
qwen3-omni-flash-2025-09-15-realtime | $0.52 | $4.57 | $0.94 | $1.99 | $3.67 | $18.13 | |
qwen-omni-turbo-realtime | $0.270 | $4.440 | $0.840 | $1.070 | $2.520 | $8.890 | |
qwen-omni-turbo-realtime-latest | $0.270 | $4.440 | $0.840 | $1.070 | $2.520 | $8.890 | |
qwen-omni-turbo-realtime-2025-05-08 | $0.270 | $4.440 | $0.840 | $1.070 | $2.520 | $8.890 | |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (Million tokens) | Output price (Million tokens) | ||||
Input: Text | Input: Audio Audio part is billed separately | Input: Image | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | |
qwen3-omni-flash-realtime | $0.315 | $2.709 | $0.559 | $1.19 | $2.179 | $10.766 |
qwen3-omni-flash-realtime-2025-12-01 | $0.315 | $2.709 | $0.559 | $1.19 | $2.179 | $10.766 |
qwen3-omni-flash-realtime-2025-09-15 | $0.315 | $2.709 | $0.559 | $1.19 | $2.179 | $10.766 |
qwen-omni-turbo-realtime | $0.230 | $3.584 | $0.861 | $0.918 | $2.581 | $7.168 |
qwen-omni-turbo-realtime-latest | $0.230 | $3.584 | $0.861 | $0.918 | $2.581 | $7.168 |
qwen-omni-turbo-realtime-2025-05-08 | $0.230 | $3.584 | $0.861 | $0.918 | $2.581 | $7.168 |
QVQ
Billing rule: You are charged based on the number of input and output tokens. For token calculation rules for different modalities, see Billing and rate limiting.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qvq-max | $1.2 | $4.8 | 1 million tokens each |
qvq-max-latest | $1.2 | $4.8 | |
qvq-max-2025-03-25 | $1.2 | $4.8 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (Million tokens) | Output price (Million tokens) |
qvq-max | $1.147 | $4.588 |
qvq-max-latest | $1.147 | $4.588 |
qvq-max-2025-05-15 | $1.147 | $4.588 |
qvq-max-2025-03-25 | $1.147 | $4.588 |
qvq-plus | $0.287 | $0.717 |
qvq-plus-latest | $0.287 | $0.717 |
qvq-plus-2025-05-15 | $0.287 | $0.717 |
Qwen-VL
Billing is based on the number of input and output tokens.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) CoT + response |
qwen3-vl-plus | Thinking and non-thinking | 0<Token≤32K | $0.2 | $1.6 |
32K<Token≤128K | $0.3 | $2.4 | ||
128K<Token≤256K | $0.6 | $4.8 | ||
qwen3-vl-plus-2025-09-23 | Thinking and non-thinking | 0<Token≤32K | $0.2 | $1.6 |
32K<Token≤128K | $0.3 | $2.4 | ||
128K<Token≤256K | $0.6 | $4.8 | ||
qwen3-vl-flash | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 |
32K<Token≤128K | $0.075 | $0.6 | ||
128K<Token≤256K | $0.12 | $0.96 | ||
qwen3-vl-flash-2025-10-15 | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 |
32K<Token≤128K | $0.075 | $0.6 | ||
128K<Token≤256K | $0.12 | $0.96 |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Mode | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) CoT + response | Free quota (Note) |
qwen3-vl-plus | Thinking and non-thinking | 0<Token≤32K | $0.2 | $1.6 | 1 million tokens each |
32K<Token≤128K | $0.3 | $2.4 | |||
128K<Token≤256K | $0.6 | $4.8 | |||
qwen3-vl-plus-2025-12-19 | Thinking and non-thinking | 0<Token≤32K | $0.2 | $1.6 | |
32K<Token≤128K | $0.3 | $2.4 | |||
128K<Token≤256K | $0.6 | $4.8 | |||
qwen3-vl-plus-2025-09-23 | Thinking and non-thinking | 0<Token≤32K | $0.2 | $1.6 | |
32K<Token≤128K | $0.3 | $2.4 | |||
128K<Token≤256K | $0.6 | $4.8 | |||
qwen3-vl-flash | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 | |
32K<Token≤128K | $0.075 | $0.6 | |||
128K<Token≤256K | $0.12 | $0.96 | |||
qwen3-vl-flash-2025-10-15 | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 | |
32K<Token≤128K | $0.075 | $0.6 | |||
128K<Token≤256K | $0.12 | $0.96 |
More models
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen-vl-max | No tiered pricing | $0.8 | $3.2 | 1 million tokens each Validity: 90 days after activating Model Studio |
qwen-vl-max-latest | No tiered pricing | $0.8 | $3.2 | |
qwen-vl-max-2025-08-13 | No tiered pricing | $0.8 | $3.2 | |
qwen-vl-max-2025-04-08 | No tiered pricing | $0.8 | $3.2 | |
qwen-vl-plus | No tiered pricing | $0.21 | $0.63 | |
qwen-vl-plus-latest | No tiered pricing | $0.21 | $0.63 | |
qwen-vl-plus-2025-08-15 | No tiered pricing | $0.21 | $0.63 | |
qwen-vl-plus-2025-05-07 | No tiered pricing | $0.21 | $0.63 | |
qwen-vl-plus-2025-01-25 | No tiered pricing | $0.21 | $0.63 |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) CoT + response |
qwen3-vl-flash-us | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 |
32K<Token≤128K | $0.075 | $0.6 | ||
128K<Token≤256K | $0.12 | $0.96 | ||
qwen3-vl-flash-2025-10-15-us | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 |
32K<Token≤128K | $0.075 | $0.6 | ||
128K<Token≤256K | $0.12 | $0.96 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) CoT + response |
qwen3-vl-plus | Thinking and non-thinking | 0<Token≤32K | $0.143 | $1.434 |
32K<Token≤128K | $0.215 | $2.15 | ||
128K<Token≤256K | $0.43 | $4.301 | ||
qwen3-vl-plus-2025-12-19 | Thinking and non-thinking | 0<Token≤32K | $0.143 | $1.434 |
32K<Token≤128K | $0.215 | $2.15 | ||
128K<Token≤256K | $0.43 | $4.301 | ||
qwen3-vl-plus-2025-09-23 | Thinking and non-thinking | 0<Token≤32K | $0.143 | $1.434 |
32K<Token≤128K | $0.215 | $2.15 | ||
128K<Token≤256K | $0.43 | $4.301 | ||
qwen3-vl-flash | Thinking and non-thinking | 0<Token≤32K | $0.022 | $0.215 |
32K<Token≤128K | $0.043 | $0.43 | ||
128K<Token≤256K | $0.086 | $0.859 | ||
qwen3-vl-flash-2025-10-15 | Thinking and non-thinking | 0<Token≤32K | $0.022 | $0.215 |
32K<Token≤128K | $0.043 | $0.43 | ||
128K<Token≤256K | $0.086 | $0.859 |
More models
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
qwen-vl-max | No tiered pricing | $0.23 | $0.574 |
qwen-vl-max-latest | No tiered pricing | $0.23 | $0.574 |
qwen-vl-max-2025-08-13 | No tiered pricing | $0.23 | $0.574 |
qwen-vl-max-2025-04-08 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-max-2025-04-02 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-max-2025-01-25 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-max-2024-12-30 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-max-2024-11-19 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-max-2024-10-30 | No tiered pricing | $2.868 | $2.868 |
qwen-vl-max-2024-08-09 | No tiered pricing | $2.868 | $2.868 |
qwen-vl-plus | No tiered pricing | $0.115 | $0.287 |
qwen-vl-plus-latest | No tiered pricing | $0.115 | $0.287 |
qwen-vl-plus-2025-08-15 | No tiered pricing | $0.115 | $0.287 |
qwen-vl-plus-2025-07-10 | No tiered pricing | $0.022 | $0.216 |
qwen-vl-plus-2025-05-07 | No tiered pricing | $0.216 | $0.646 |
qwen-vl-plus-2025-01-25 | No tiered pricing | $0.216 | $0.646 |
qwen-vl-plus-2025-01-02 | No tiered pricing | $0.216 | $0.646 |
qwen-vl-plus-2024-08-09 | No tiered pricing | $0.216 | $0.646 |
Qwen-OCR
Billing is based on the number of input and output tokens.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input price (Million tokens) | Output price (Million tokens) |
qwen-vl-ocr | $0.07 | $0.16 |
qwen-vl-ocr-2025-11-20 | $0.07 | $0.16 |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen-vl-ocr | $0.72 | $0.72 | 1 million tokens each |
qwen-vl-ocr-2025-11-20 | $0.07 | $0.16 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (Million tokens) | Output price (Million tokens) |
qwen-vl-ocr | $0.717 | $0.717 |
qwen-vl-ocr-latest | $0.043 | $0.072 |
qwen-vl-ocr-2025-11-20 | ||
qwen-vl-ocr-2025-04-13 | $0.717 | $0.717 |
qwen-vl-ocr-2024-10-28 |
Qwen-Math
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen-math-plus | $0.574 | $1.721 | No free quota |
qwen-math-plus-latest | $0.574 | $1.721 | |
qwen-math-plus-2024-09-19 | $0.574 | $1.721 | |
qwen-math-plus-2024-08-16 | $0.574 | $1.721 | |
qwen-math-turbo | $0.287 | $0.861 | |
qwen-math-turbo-latest | $0.287 | $0.861 | |
qwen-math-turbo-2024-09-19 | $0.287 | $0.861 |
Qwen-Coder
Billing is based on the number of input and output tokens.
If the model supports context cache, only input tokens are eligible for a discount.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
qwen3-coder-plus Context cache discount applicable | 0<Token≤32K | $1 | $5 |
32K<Token≤128K | $1.8 | $9 | |
128K<Token≤256K | $3 | $15 | |
256K<Token≤1M | $6 | $60 | |
qwen3-coder-plus-2025-09-23 | 0<Token≤32K | $1 | $5 |
32K<Token≤128K | $1.8 | $9 | |
128K<Token≤256K | $3 | $15 | |
256K<Token≤1M | $6 | $60 | |
qwen3-coder-plus-2025-07-22 | 0<Token≤32K | $1 | $5 |
32K<Token≤128K | $1.8 | $9 | |
128K<Token≤256K | $3 | $15 | |
256K<Token≤1M | $6 | $60 | |
qwen3-coder-flash Context cache discount applicable | 0<Token≤32K | $0.3 | $1.5 |
32K<Token≤128K | $0.5 | $2.5 | |
128K<Token≤256K | $0.8 | $4 | |
256K<Token≤1M | $1.6 | $9.6 | |
qwen3-coder-flash-2025-07-28 | 0<Token≤32K | $0.3 | $1.5 |
32K<Token≤128K | $0.5 | $2.5 | |
128K<Token≤256K | $0.8 | $4 | |
256K<Token≤1M | $1.6 | $9.6 |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen3-coder-plus Context cache discount applicable | 0<Token≤32K | $1 | $5 | 1 million tokens each |
32K<Token≤128K | $1.8 | $9 | ||
128K<Token≤256K | $3 | $15 | ||
256K<Token≤1M | $6 | $60 | ||
qwen3-coder-plus-2025-09-23 | 0<Token≤32K | $1 | $5 | |
32K<Token≤128K | $1.8 | $9 | ||
128K<Token≤256K | $3 | $15 | ||
256K<Token≤1M | $6 | $60 | ||
qwen3-coder-plus-2025-07-22 | 0<Token≤32K | $1 | $5 | |
32K<Token≤128K | $1.8 | $9 | ||
128K<Token≤256K | $3 | $15 | ||
256K<Token≤1M | $6 | $60 | ||
qwen3-coder-flash | 0<Token≤32K | $0.3 | $1.5 | |
32K<Token≤128K | $0.5 | $2.5 | ||
128K<Token≤256K | $0.8 | $4 | ||
256K<Token≤1M | $1.6 | $9.6 | ||
qwen3-coder-flash-2025-07-28 | 0<Token≤32K | $0.3 | $1.5 | |
32K<Token≤128K | $0.5 | $2.5 | ||
128K<Token≤256K | $0.8 | $4 | ||
256K<Token≤1M | $1.6 | $9.6 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
qwen3-coder series
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
qwen3-coder-plus Context cache discount applicable | 0<Token≤32K | $0.574 | $2.294 |
32K<Token≤128K | $0.861 | $3.441 | |
128K<Token≤256K | $1.434 | $5.735 | |
256K<Token≤1M | $2.868 | $28.671 | |
qwen3-coder-plus-2025-09-23 | 0<Token≤32K | $0.574 | $2.294 |
32K<Token≤128K | $0.861 | $3.441 | |
128K<Token≤256K | $1.434 | $5.735 | |
256K<Token≤1M | $2.868 | $28.671 | |
qwen3-coder-plus-2025-07-22 | 0<Token≤32K | $0.574 | $2.294 |
32K<Token≤128K | $0.861 | $3.441 | |
128K<Token≤256K | $1.434 | $5.735 | |
256K<Token≤1M | $2.868 | $28.671 | |
qwen3-coder-flash | 0<Token≤32K | $0.144 | $0.574 |
32K<Token≤128K | $0.216 | $0.861 | |
128K<Token≤256K | $0.359 | $1.434 | |
256K<Token≤1M | $0.717 | $3.584 | |
qwen3-coder-flash-2025-07-28 | 0<Token≤32K | $0.144 | $0.574 |
32K<Token≤128K | $0.216 | $0.861 | |
128K<Token≤256K | $0.359 | $1.434 | |
256K<Token≤1M | $0.717 | $3.584 |
Earlier qwen-coder series
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
qwen-coder-plus | No tiered pricing | $0.502 | $1.004 |
qwen-coder-plus-latest | No tiered pricing | $0.502 | $1.004 |
qwen-coder-plus-2024-11-06 | No tiered pricing | $0.502 | $1.004 |
qwen-coder-turbo | No tiered pricing | $0.287 | $0.861 |
qwen-coder-turbo-latest | No tiered pricing | $0.287 | $0.861 |
qwen-coder-turbo-2024-09-19 | No tiered pricing | $0.287 | $0.861 |
Qwen-MT
Billing is based on the number of input and output tokens.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input price (Million tokens) | Output price (Million tokens) |
qwen-mt-plus | $2.46 | $7.37 |
qwen-mt-flash | $0.16 | $0.49 |
qwen-mt-lite | $0.12 | $0.36 |
International
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen-mt-plus | $2.46 | $7.37 | 1 million tokens each |
qwen-mt-flash | $0.16 | $0.49 | |
qwen-mt-lite | $0.12 | $0.36 | |
qwen-mt-turbo | $0.16 | $0.49 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (Million tokens) | Output price (Million tokens) |
qwen-mt-plus | $0.259 | $0.775 |
qwen-mt-flash | $0.101 | $0.280 |
qwen-mt-lite | $0.086 | $0.229 |
qwen-mt-turbo | $0.101 | $0.280 |
Qwen data mining
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen-doc-turbo | $0.087 | $0.144 | No free quota |
Qwen deep research
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen-deep-research | $7.742 | $23.367 | No free quota |
Text generation - Qwen - Open source
Qwen3
Billing is based on the number of input and output tokens.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Mode | Input price (Million tokens) | Output price (Million tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen3-next-80b-a3b-thinking | Thinking only | $0.15 | - | $1.2 |
qwen3-next-80b-a3b-instruct | Non-thinking only | $0.15 | $1.2 | - |
qwen3-235b-a22b-thinking-2507 | Thinking only | $0.23 | - | $2.3 |
qwen3-235b-a22b-instruct-2507 | Non-thinking only | $0.23 | $0.92 | - |
qwen3-30b-a3b-thinking-2507 | Thinking only | $0.2 | - | $2.4 |
qwen3-30b-a3b-instruct-2507 | Non-thinking only | $0.2 | $0.8 | - |
qwen3-235b-a22b | Thinking and non-thinking | $0.7 | $2.8 | $8.4 |
qwen3-32b | Thinking and non-thinking | $0.16 | $0.64 | $0.64 |
qwen3-30b-a3b | Thinking and non-thinking | $0.2 | $0.8 | $2.4 |
qwen3-14b | Thinking and non-thinking | $0.35 | $1.4 | $4.2 |
qwen3-8b | Thinking and non-thinking | $0.18 | $0.7 | $2.1 |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Mode | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) | |
Non-thinking mode | Thinking mode | ||||
qwen3-next-80b-a3b-thinking | Thinking only | $0.15 | - | $1.2 | 1 million tokens each |
qwen3-next-80b-a3b-instruct | Non-thinking only | $0.15 | $1.2 | - | |
qwen3-235b-a22b-thinking-2507 | Thinking only | $0.23 | - | $2.3 | |
qwen3-235b-a22b-instruct-2507 | Non-thinking only | $0.23 | $0.92 | - | |
qwen3-30b-a3b-thinking-2507 | Thinking only | $0.2 | - | $2.4 | |
qwen3-30b-a3b-instruct-2507 | Non-thinking only | $0.2 | $0.8 | - | |
qwen3-235b-a22b | Thinking and non-thinking | $0.7 | $2.8 | $8.4 | |
qwen3-32b | Thinking and non-thinking | $0.16 | $0.64 | $0.64 | |
qwen3-30b-a3b | Thinking and non-thinking | $0.2 | $0.8 | $2.4 | |
qwen3-14b | Thinking and non-thinking | $0.35 | $1.4 | $4.2 | |
qwen3-8b | Thinking and non-thinking | $0.18 | $0.7 | $2.1 | |
qwen3-4b | Thinking and non-thinking | $0.11 | $0.42 | $1.26 | |
qwen3-1.7b | Thinking and non-thinking | $0.11 | $0.42 | $1.26 | |
qwen3-0.6b | Thinking and non-thinking | $0.11 | $0.42 | $1.26 | |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Mode | Input price (Million tokens) | Output price (Million tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen3-next-80b-a3b-thinking | Thinking only | $0.144 | - | $1.434 |
qwen3-next-80b-a3b-instruct | Non-thinking only | $0.144 | $0.574 | - |
qwen3-235b-a22b-thinking-2507 | Thinking only | $0.287 | - | $2.868 |
qwen3-235b-a22b-instruct-2507 | Non-thinking only | $0.287 | $1.147 | - |
qwen3-30b-a3b-thinking-2507 | Thinking only | $0.108 | - | $1.076 |
qwen3-30b-a3b-instruct-2507 | Non-thinking only | $0.108 | $0.431 | - |
qwen3-235b-a22b | Thinking and non-thinking | $0.287 | $1.147 | $2.868 |
qwen3-32b | Thinking and non-thinking | $0.287 | $1.147 | $2.868 |
qwen3-30b-a3b | Thinking and non-thinking | $0.108 | $0.431 | $1.076 |
qwen3-14b | Thinking and non-thinking | $0.144 | $0.574 | $1.434 |
qwen3-8b | Thinking and non-thinking | $0.072 | $0.287 | $0.717 |
qwen3-4b | Thinking and non-thinking | $0.044 | $0.173 | $0.431 |
qwen3-1.7b | Thinking and non-thinking | $0.044 | $0.173 | $0.431 |
qwen3-0.6b | Thinking and non-thinking | $0.044 | $0.173 | $0.431 |
QwQ - Open source
Billing is based on the number of input and output tokens.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwq-32b | $0.287 | $0.861 | No free quota |
QwQ-Preview
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwq-32b-preview | $0.287 | $0.861 | No free quota |
Qwen2.5
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen2.5-14b-instruct-1m | $0.805 | $3.22 | 1 million tokens each |
qwen2.5-7b-instruct-1m | $0.368 | $1.47 | |
qwen2.5-72b-instruct | $1.4 | $5.6 | |
qwen2.5-32b-instruct | $0.7 | $2.8 | |
qwen2.5-14b-instruct | $0.35 | $1.4 | |
qwen2.5-7b-instruct | $0.175 | $0.7 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (Million tokens) | Output price (Million tokens) |
qwen2.5-14b-instruct-1m | $0.144 | $0.431 |
qwen2.5-7b-instruct-1m | $0.072 | $0.144 |
qwen2.5-72b-instruct | $0.574 | $1.721 |
qwen2.5-32b-instruct | $0.287 | $0.861 |
qwen2.5-14b-instruct | $0.144 | $0.431 |
qwen2.5-7b-instruct | $0.072 | $0.144 |
qwen2.5-3b-instruct | $0.044 | $0.130 |
qwen2.5-1.5b-instruct | Limited time free | |
qwen2.5-0.5b-instruct | ||
QVQ
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qvq-72b-preview | $1.721 | $5.161 | No free quota |
Qwen-Omni
Billing rule: You are charged based on the number of input and output tokens. For token calculation rules for different modalities, see Billing and rate limiting.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) | ||||
Input: Text | Input: Audio | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | ||
qwen2.5-omni-7b | $0.10 | $6.76 | $0.28 | $0.40 | $0.84 | $13.51 | 1 million tokens (regardless of modality) Validity: 90 days after activating Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (Million tokens) | Output price (Million tokens) | ||||
Input: Text | Input: Audio | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | |
qwen2.5-omni-7b | $0.087 | $5.448 | $0.287 | $0.345 | $0.861 | $10.895 |
Qwen3-Omni-Captioner
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen3-omni-30b-a3b-captioner | $3.81 | $3.06 | 1 million tokens |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (Million tokens) | Output price (Million tokens) |
qwen3-omni-30b-a3b-captioner | $2.265 | $1.821 |
Qwen-VL
Billing is based on the number of input and output tokens.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Mode | Input price (Million tokens) | Output price (Million tokens) CoT + response |
qwen3-vl-235b-a22b-thinking | Thinking only | $0.287 | $2.867 |
qwen3-vl-235b-a22b-instruct | Non-thinking only | $0.287 | $1.147 |
qwen3-vl-32b-thinking | Thinking only | $0.287 | $2.867 |
qwen3-vl-32b-instruct | Non-thinking only | $0.287 | $1.147 |
qwen3-vl-30b-a3b-thinking | Thinking only | $0.108 | $1.075 |
qwen3-vl-30b-a3b-instruct | Non-thinking only | $0.108 | $0.43 |
qwen3-vl-8b-thinking | Thinking only | $0.072 | $0.717 |
qwen3-vl-8b-instruct | Non-thinking only | $0.072 | $0.287 |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Mode | Input price (Million tokens) | Output price (Million tokens) CoT + response | Free quota (Note) |
qwen3-vl-235b-a22b-thinking | Thinking only | $0.4 | $4 | 1 million tokens each |
qwen3-vl-235b-a22b-instruct | Non-thinking only | $0.4 | $1.6 | |
qwen3-vl-32b-thinking | Thinking only | $0.16 | $0.64 | |
qwen3-vl-32b-instruct | Non-thinking only | $0.16 | $0.64 | |
qwen3-vl-30b-a3b-thinking | Thinking only | $0.2 | $2.4 | |
qwen3-vl-30b-a3b-instruct | Non-thinking only | $0.2 | $0.8 | |
qwen3-vl-8b-thinking | Thinking only | $0.18 | $2.1 | |
qwen3-vl-8b-instruct | Non-thinking only | $0.18 | $0.7 |
More models
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen2.5-vl-72b-instruct | $2.8 | $8.4 | 1 million tokens each |
qwen2.5-vl-32b-instruct | $1.4 | $4.2 | |
qwen2.5-vl-7b-instruct | $0.35 | $1.05 | |
qwen2.5-vl-3b-instruct | $0.21 | $0.63 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Mode | Input price (Million tokens) | Output price (Million tokens) CoT + response |
qwen3-vl-235b-a22b-thinking | Thinking only | $0.287 | $2.8677 |
qwen3-vl-235b-a22b-instruct | Non-thinking only | $0.287 | $1.147 |
qwen3-vl-32b-thinking | Thinking only | $0.287 | $2.868 |
qwen3-vl-32b-instruct | Non-thinking only | $0.287 | $1.147 |
qwen3-vl-30b-a3b-thinking | Thinking only | $0.108 | $1.076 |
qwen3-vl-30b-a3b-instruct | Non-thinking only | $0.108 | $0.431 |
qwen3-vl-8b-thinking | Thinking only | $0.072 | $0.717 |
qwen3-vl-8b-instruct | Non-thinking only | $0.072 | $0.287 |
More models
Model | Input price (Million tokens) | Output price (Million tokens) |
qwen2.5-vl-72b-instruct | $2.294 | $6.881 |
qwen2.5-vl-32b-instruct | $1.147 | $3.441 |
qwen2.5-vl-7b-instruct | $0.287 | $0.717 |
qwen2.5-vl-3b-instruct | $0.173 | $0.517 |
qwen2-vl-72b-instruct | $2.294 | $6.881 |
qwen2-vl-7b-instruct | Limited time free | |
qwen2-vl-2b-instruct | ||
Qwen-Math
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen2.5-math-72b-instruct | $0.574 | $1.721 | No free quota |
qwen2.5-math-7b-instruct | $0.144 | $0.287 | |
qwen2.5-math-1.5b-instruct | Limited time free | ||
Qwen-Coder
Billing is based on the number of input and output tokens.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
qwen3-coder-480b-a35b-instruct | 0<Token≤32K | $1.5 | $7.5 |
32K<Token≤128K | $2.7 | $13.5 | |
128K<Token≤200K | $4.5 | $22.5 | |
qwen3-coder-30b-a3b-instruct | 0<Token≤32K | $0.45 | $2.25 |
32K<Token≤128K | $0.75 | $3.75 | |
128K<Token≤200K | $1.2 | $6 |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen3-coder-480b-a35b-instruct | 0<Token≤32K | $1.5 | $7.5 | 1 million tokens each |
32K<Token≤128K | $2.7 | $13.5 | ||
128K<Token≤200K | $4.5 | $22.5 | ||
qwen3-coder-30b-a3b-instruct | 0<Token≤32K | $0.45 | $2.25 | |
32K<Token≤128K | $0.75 | $3.75 | ||
128K<Token≤200K | $1.2 | $6 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
qwen3-coder-480b-a35b-instruct | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.291 | $5.161 | |
128K<Token≤200K | $2.151 | $8.602 | |
qwen3-coder-30b-a3b-instruct | 0<Token≤32K | $0.216 | $0.861 |
32K<Token≤128K | $0.323 | $1.291 | |
128K<Token≤200K | $0.538 | $2.151 | |
qwen2.5-coder-32b-instruct | No tiered pricing | $0.287 | $0.861 |
qwen2.5-coder-14b-instruct | No tiered pricing | $0.287 | $0.861 |
qwen2.5-coder-7b-instruct | No tiered pricing | $0.144 | $0.287 |
qwen2.5-coder-3b-instruct | No tiered pricing | Limited time free | |
qwen2.5-coder-1.5b-instruct | No tiered pricing | ||
qwen2.5-coder-0.5b-instruct | No tiered pricing | ||
Text generation - Third party
DeepSeek
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
deepseek-v3.2 | $0.287 | $0.431 | No free quota |
deepseek-v3.2-exp | $0.287 | $0.431 | |
deepseek-v3.1 | $0.574 | $1.721 | |
deepseek-r1 | $0.574 | $2.294 | |
deepseek-r1-0528 | $0.574 | $2.294 | |
deepseek-v3 | $0.287 | $1.147 | |
deepseek-r1-distill-qwen-1.5b | Limited time free | ||
deepseek-r1-distill-qwen-7b | $0.072 | $0.144 | No free quota |
deepseek-r1-distill-qwen-14b | $0.144 | $0.431 | |
deepseek-r1-distill-qwen-32b | $0.287 | $0.861 | |
deepseek-r1-distill-llama-8b | Limited time free | ||
deepseek-r1-distill-llama-70b | Limited time free | ||
Kimi
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
kimi-k2-thinking | $0.574 | $2.294 | No free quota |
Moonshot-Kimi-K2-Instruct | $0.574 | $2.294 |
Image generation
Inputs are not billed. Billing is based on the number of successfully generated images in the output.
Billing formula: Fee = Unit price per image × Number of successfully generated images.
Billing details:
The fee is not affected by the resolution or aspect ratio of the output images.
Failed requests do not incur fees or consume the free quota.
Qwen-Image
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output price | Free quota (Note) |
qwen-image-max | $0.075/image | 100 images each |
qwen-image-max-2025-12-30 | $0.075/image | |
qwen-image-plus | $0.03/image | |
qwen-image | $0.035/image |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output price |
qwen-image-max | $0.071677/image |
qwen-image-max-2025-12-30 | $0.071677/image |
qwen-image-plus | $0.028671/image |
qwen-image | $0.035/image |
Qwen-Image-Edit
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output price | Free quota (Note) |
qwen-image-edit-plus | $0.03/image | 100 images each |
qwen-image-edit-plus-2025-12-15 | $0.03/image | |
qwen-image-edit-plus-2025-10-30 | $0.03/image | |
qwen-image-edit | $0.045/image |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output price |
qwen-image-edit-plus | $0.028671/image |
qwen-image-edit-plus-2025-12-15 | $0.028671/image |
qwen-image-edit-plus-2025-10-30 | $0.028671/image |
qwen-image-edit | $0.043/image |
Qwen-MT-Image
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Only output is billed. For rules, see Image generation.
Model | Output price | Free quota (Note) |
qwen-mt-image | $0.000431/image | No free quota |
Tongyi - text-to-image - Z-Image
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output price | Free quota (Note) |
z-image-turbo | Prompt rewriting disabled ( Prompt rewriting enabled ( | 100 images Validity: 90 days after activating Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output price |
z-image-turbo | Prompt rewriting disabled ( Prompt rewriting enabled ( |
Wan text-to-image
Only output is billed. For rules, see Image generation.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Output price |
wan2.6-t2i | $0.03/image |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output price | Free quota (Note) |
wan2.6-t2i | $0.03/image | 50 images |
wan2.5-t2i-preview | $0.03/image | 50 images |
wan2.2-t2i-plus | $0.05/image | 100 images |
wan2.2-t2i-flash | $0.025/image | 100 images |
wan2.1-t2i-plus | $0.05/image | 200 images |
wan2.1-t2i-turbo | $0.025/image | 200 images |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output price |
wan2.6-t2i | $0.028671/image |
wan2.5-t2i-preview | $0.028671/image |
wan2.2-t2i-plus | $0.020070/image |
wan2.2-t2i-flash | $0.028671/image |
wanx2.1-t2i-plus | $0.028671/image |
wanx2.1-t2i-turbo | $0.020070/image |
wanx2.0-t2i-turbo | $0.005735/image |
Wan image generation and editing
Only output is billed. For rules, see Image generation.
Global
Global (Virginia) models do not offer a free quota.
Model | Output price |
wan2.6-image | $0.03/image |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output price | Free quota (Note) |
wan2.6-image | $0.03/image | 50 images |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output price |
wan2.6-image | $0.028671/image |
Wan general image editing
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Service | Model | Output price | Free quota (Note) |
General image editing 2.5 | wan2.5-i2i-preview | $0.03/image | 50 images |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Service | Model | Output price |
General image editing 2.5 | wan2.5-i2i-preview | $0.028671/image |
General image editing 2.1 | wanx2.1-imageedit | $0.020070/image |
OutfitAnyone
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
aitryon-plus: Input is not charged, but output charged. For billing rules, see Image generation.
aitryon-parsing-v1: Input is charged, but output not charged. You are charged based on the number of input images. You are not charged for failed requests.
Service | Model | Price | Free quota (Note) |
OutfitAnyone - Plus | aitryon-plus | $0.071677/image | No free quota |
OutfitAnyone - Image parsing | aitryon-parsing-v1 | $0.000574/image |
Video generation
Inputs are not billed. Billing is based on the duration in seconds of successfully generated videos in the output.
Billing formula: Fee = Unit price per video × Duration of successfully generated video (in seconds).
Billing details:
For some models, the price is based on the output video resolution. The prices for different resolutions, such as 480P, 720P, and 1080P, vary.
For some models, the price is based on the output video mode. The prices for different video modes, such as Standard Edition and Professional Edition, vary.
For some models, the price is based on the output video aspect ratio. The prices for different video aspect ratios, such as 1:1 and 3:4, vary.
Some models use uniform pricing, regardless of resolution, pattern, or aspect ratio.
Failed requests do not incur fees or consume the free quota.
Wan - text-to-video
Only output is billed. For rules, see Video generation.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-t2v | 720P | $0.1/second |
1080P | $0.15/second |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output video resolution | Output price | Free quota (Note) Validity: 90 days after activating Model Studio |
wan2.6-t2v | 720P | $0.10/second | 50 seconds |
1080P | $0.15/second | ||
wan2.5-t2v-preview | 480P | $0.05/second | 50 seconds |
720P | $0.10/second | ||
1080P | $0.15/second | ||
wan2.2-t2v-plus | 480P | $0.02/second | 50 seconds |
1080P | $0.10/second | ||
wan2.1-t2v-turbo | 480P | $0.036/second | 200 seconds |
720P | $0.036/second | ||
wan2.1-t2v-plus | 720P | $0.10/second | 200 seconds |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-t2v-us | 720P | $0.1/second |
1080P | $0.15/second |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-t2v | 720P | $0.086012/second |
1080P | $0.143353/second | |
wan2.5-t2v-preview | 480P | $0.043006/second |
720P | $0.086012/second | |
1080P | $0.143353/second | |
wan2.2-t2v-plus | 480P | $0.02007/second |
1080P | $0.100347/second | |
wanx2.1-t2v-turbo | 480P | $0.034405/second |
720P | $0.034405/second | |
wanx2.1-t2v-plus | 720P | $0.100347/second |
Wan - image-to-video - first fame
Only output is billed. For rules, see Video generation.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-i2v | 720P | $0.1/second |
1080P | $0.15/second |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output video resolution | Output price | Free quota (Note) Validity: 90 days after activating Model Studio |
wan2.6-i2v | 720P | $0.10/second | 50 seconds |
1080P | $0.15/second | ||
wan2.5-i2v-preview | 480P | $0.05/second | 50 seconds |
720P | $0.10/second | ||
1080P | $0.15/second | ||
wan2.2-i2v-flash | 480P | $0.015/second | 50 seconds |
720P | $0.036/second | ||
wan2.2-i2v-plus | 480P | $0.02/second | 50 seconds |
1080P | $0.10/second | ||
wan2.1-t2v-turbo | 480P | $0.036/second | 200 seconds |
720P | $0.036/second | ||
wan2.1-t2v-plus | 720P | $0.10/second | 200 seconds |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-i2v-us | 720P | $0.1/second |
1080P | $0.15/second |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-i2v | 720P | $0.086012/second |
1080P | $0.143353/second | |
wan2.5-i2v-preview | 480P | $0.043006/second |
720P | $0.086012/second | |
1080P | $0.143353/second | |
wan2.2-i2v-plus | 480P | $0.02007/second |
1080P | $0.100347/second | |
wanx2.1-t2v-turbo | 480P | $0.034405/second |
720P | $0.034405/second | |
wanx2.1-t2v-plus | 720P | $0.100347/second |
Wan - image-to-video - first and last frames
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output video resolution | Output price | Free quota (Note) Validity: 90 days after activating Model Studio |
wan2.2-kf2v-flash | 480P | $0.015/second | 50 seconds |
720P | $0.036/second | ||
1080P | $0.07/second | ||
wan2.1-kf2v-plus | 720P | $0.10/second | 200 seconds |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.2-kf2v-flash | 480P | $0.014335/second |
720P | $0.028671/second | |
1080P | $0.068809/second | |
wanx2.1-kf2v-plus | 720P | $0.100347/second |
Wan - reference-to-video
Billing rule: You are charged for both the input and output videos by seconds of video duration. Failed generations will not charge or consume the free quota.
The input video is billed for no more than 5 seconds. For specific rules, see Billing and rate limiting.
The output video is billed based on seconds of successfully generated video.
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Video resolution | Input price | Output price |
wan2.6-r2v | 720P | $0.1/second | $0.1/second |
1080P | $0.15/second | $0.15/second |
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Video resolution | Input price | Output price | Free quota (Note) Validity: 90 days after activating Model Studio |
wan2.6-r2v | 720P | $0.10/second | $0.10/second | 50 seconds |
1080P | $0.15/second | $0.15/second |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Video resolution | Input price | Output price |
wan2.6-r2v | 720P | $0.086012/second | $0.086012/second |
1080P | $0.143353/second | $0.143353/second |
Wan - general video editing
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output video resolution | Output price | Free quota (Note) |
wan2.1-vace-plus | 720P | $0.10/second | 50 seconds Validity: 90 days after activating Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wanx2.1-vace-plus | 720P | $0.100347/second |
Wan - digital human
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
wan2.2-s2v-detect: Input is charged, but output not charged. You are charged based on the number of detected images. Each input image is charged once, regardless of whether the detection is successful.
wan2.2-s2v: Input is not charged, but output charged. You are charged based on the duration of the successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | wan2.2-s2v-detect | Input image: $0.000574/image | No free quota |
Video generation | wan2.2-s2v | Output video:
|
Wan - image-to-action
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output video mode | Output price | Free quota (Note) |
wan2.2-animate-move | Standard mode | $0.12/second | 50 seconds Validity: 90 days after activating Model Studio |
Professional mode | $0.18/second |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output video mode | Output price |
wan2.2-animate-move | Standard mode | $0.06/second |
Professional mode | $0.09/second |
Wan - Video character swap
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output video mode | Output price | Free quota (Note) |
wan2.2-animate-mix | Standard mode | $0.18/second | 50 seconds Validity: 90 days after activating Model Studio |
Professional mode | $0.26/second |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output video mode | Output price |
wan2.2-animate-mix | Standard mode | $0.09/second |
Professional mode | $0.13/second |
AnimateAnyone
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
animate-anyone-detect-gen2: Input is charged, but output not charged. You are charged based on the number of detected images. Each input image is charged once, regardless of whether the detection is successful.
animate-anyone-template-gen2: Input is not charged, but output charged. You are charged based on the duration of the successfully generated video in seconds. For billing rules, see Video generation.
animate-anyone-gen2: Input is not charged, but output charged. You are charged based on the duration of the successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | animate-anyone-detect-gen2 | Input image: $0.000574/image | No free quota |
Action template generation | animate-anyone-template-gen2 | Output video: $0.011469/second | |
Video generation | animate-anyone-gen2 | Output video: $0.011469/second |
EMO
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
emo-detect-v1: Input is charged, but output not charged. You are charged based on the number of detected images. Each input image is charged once, regardless of whether the detection is successful.
emo-v1: Input is not charged, but output charged. You are charged based on the duration of the successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | emo-detect-v1 | Input image: $0.000574/image | No free quota |
Video generation | emo-v1 | Output video:
|
LivePortrait
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
liveportrait-detect: Input is charged, but output not charged. You are charged based on the number of detected images. Each input image is charged once, regardless of whether the detection is successful.
liveportrait: Input is not charged, but output charged. You are charged based on the duration of the successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | liveportrait-detect | Input image: $0.000574/image | No free quota |
Video generation | liveportrait | Output video: $0.002868/second |
Emoji
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
emoji-detect-v1: Input is charged, but output not charged. You are charged based on the number of detected images. Each input image is charged once, regardless of whether the detection is successful.
emoji-v1: Input is not charged, but output charged. You are charged based on the duration of the successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | emoji-detect-v1 | Input image: $0.000574/image | No free quota |
Video generation | emoji-v1 | Output video: $0.011469/second |
VideoRetalk
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Only output is billed. For rules, see Video generation.
Model | Output price | Free quota (Note) |
videoretalk | $0.011469/second | No free quota |
Video style reform
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Only output is billed. For rules, see Video generation.
Model | Output video resolution | Output price | Free quota (Note) |
video-style-transform | 540P | $0.028671/second | No free quota |
720P | $0.071677/second |
Speech synthesis (text-to-speech)
Qwen-TTS
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
qwen3-tts series
Billing rule: You are charged based on the number of input text characters. You are not charged for the output.
Model | Input price | Free quota (Note) |
qwen3-tts-flash | $0.1/10,000 characters | If Model Studio activated before 00:00 on November 13, 2025: 2000 characters If Model Studio activated after 00:00 on November 13, 2025: 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-flash-2025-11-27 | $0.1/10,000 characters | 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-flash-2025-09-18 | $0.1/10,000 characters | If Model Studio activated before 00:00 on November 13, 2025: 2000 characters If Model Studio activated after 00:00 on November 13, 2025: 10,000 characters Validity: 90 days after activating Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
qwen3-tts series
Billing rule: You are charged based on the number of input text characters. You are not charged for the output.
Model | Input price (10,000 characters) | Output price (10,000 characters) |
qwen3-tts-flash | $0.114682 | Not charged |
qwen3-tts-flash-2025-11-27 | $0.114682 | Not charged |
qwen3-tts-flash-2025-09-18 | $0.114682 | Not charged |
qwen-tts series
Billing rule: You are charged based on the number of input and output tokens.
Model | Input price (Million tokens) | Output price (Million tokens) |
qwen-tts-flash | $0.23 | $1.434 |
qwen-tts-latest | $0.23 | $1.434 |
qwen-tts-2025-05-22 | $0.23 | $1.434 |
qwen-tts-2025-04-10 | $0.23 | $1.434 |
Qwen-TTS-Realtime
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
qwen3-tts-vd realtime series
Billing rule: You are charged based on the number of input text characters. You are not charged for the output.
Model | Input price | Free quota (Note) |
qwen3-tts-vd-realtime-2025-12-16 | $0.143353/10,000 characters | 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-vc realtime series
Billing rule: You are charged based on the number of input text characters. You are not charged for the output.
Model | Input price | Free quota (Note) |
qwen3-tts-vc-realtime-2025-11-27 | $0.13/10,000 characters | 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts realtime series
Billing rule: You are charged based on the number of input text characters. You are not charged for the output.
Model | Input price | Free quota (Note) |
qwen3-tts-flash-realtime | $0.13/10,000 characters | If Model Studio activated before 00:00 on November 13, 2025: 2000 characters If Model Studio activated after 00:00 on November 13, 2025: 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-flash-realtime-2025-11-27 | $0.13/10,000 characters | 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-flash-realtime-2025-09-18 | $0.13/10,000 characters | If Model Studio activated before 00:00 on November 13, 2025: 2000 characters If Model Studio activated after 00:00 on November 13, 2025: 10,000 characters Validity: 90 days after activating Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
qwen3-tts-vd realtime series
Billing rule: You are charged based on the number of input text characters. You are not charged for the output.
Model | Input price (10,000 characters) | Output price |
qwen3-tts-vd-realtime-2025-12-16 | $0.143353 | Not charged |
qwen3-tts-vc realtime series
Billing rule: You are charged based on the number of input text characters. You are not charged for the output.
Model | Input price (10,000 characters) | Output price |
qwen3-tts-vc-realtime-2025-11-27 | $0.143353 | Not charged |
qwen3-tts realtime series
Billing rule: You are charged based on the number of input text characters. You are not charged for the output.
Model | Input price (10,000 characters) | Output price |
qwen3-tts-flash-realtime | $0.143353 | Not charged |
qwen3-tts-flash-realtime-2025-11-27 | $0.143353 | Not charged |
qwen3-tts-flash-realtime-2025-09-18 | $0.143353 | Not charged |
qwen-tts realtime series
Billing rule: You are charged based on the number of input and output tokens.
Model | Input price (Million tokens) | Input price (Million tokens) |
qwen-tts-realtime | $0.345 | $1.721 |
qwen-tts-realtime-latest | $0.345 | $1.721 |
qwen-tts-realtime-2025-07-15 | $0.345 | $1.721 |
Qwen-TTS voice cloning
Billing rule: You are charged based on the number of new voices created.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Price (per voice) | Free quota (Note) |
qwen-voice-enrollment | $0.01 | 1000 voices/account |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Price (per voice) |
qwen-voice-enrollment | $0.01 |
Qwen-TTS voice design
Billing rule: You are charged based on the number of new voices created.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Price (per voice) | Free quota (Note) |
qwen-voice-design | $0.2 | 10 voices/account |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Price (per voice) |
qwen-voice-design | $0.2 |
CosyVoice
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing rule: You are charged based on the number of input text characters. You are not charged for the output.
Model | Input price | Free quota (Note) |
cosyvoice-v3-plus | $0.286706/10,000 characters | No free quota |
cosyvoice-v3-flash | $0.14335/10,000 characters | |
cosyvoice-v2 | $0.286706/10,000 characters |
Speech recognition (speech-to-text) and translation (speech-to-translation)
Qwen3-LiveTranslate-Flash-Realtime
Billing rule: You are charged based on the number of input and output tokens. For token calculation rules for different modalities, see Billing.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) | ||
Input: Audio | Input: Image | Output: Text | Output: Audio | ||
qwen3-livetranslate-flash-realtime | $10 | $1.3 | $10 | $38 | 1 million tokens each |
qwen3-livetranslate-flash-realtime-2025-09-22 | $10 | $1.3 | $10 | $38 | |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (Million tokens) | Output price (Million tokens) | ||
Input: Audio | Input: Image | Output: Text | Output: Audio | |
qwen3-livetranslate-flash-realtime | $9.175 | $1.147 | $9.175 | $34.405 |
qwen3-livetranslate-flash-realtime-2025-09-22 | $9.175 | $1.147 | $9.175 | $34.405 |
Qwen-ASR
Billing rule: You are charged based on the duration of the input audio in seconds. You are not charged for the output.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price | Free quota (Note) |
qwen3-asr-flash-filetrans | $0.000035/second | 36,000 seconds (10 hours) |
qwen3-asr-flash-filetrans-2025-11-17 | ||
qwen3-asr-flash | ||
qwen3-asr-flash-2025-09-08 |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Input price |
qwen3-asr-flash-us | $0.000035/second |
qwen3-asr-flash-2025-09-08-us |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price |
qwen3-asr-flash-filetrans | $0.000032/second |
qwen3-asr-flash-filetrans-2025-11-17 | |
qwen3-asr-flash | |
qwen3-asr-flash-2025-09-08 |
Qwen-ASR-Realtime
Billing rule: You are charged based on the duration of the input audio in seconds. You are not charged for the output.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price | Free quota (Note) |
qwen3-asr-flash-realtime | $0.000090/second | 36,000 seconds (10 hours) |
qwen3-asr-flash-realtime-2025-10-27 | $0.000090/second |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price |
qwen3-asr-flash-realtime | $0.000047/second |
qwen3-asr-flash-realtime-2025-10-27 |
Fun-ASR
Audio file recognition
Billing rule: You are charged based on the duration of the input audio in seconds. You are not charged for the output.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price | Free quota (Note) |
fun-asr | $0.000035/second | 36,000 seconds (10 hours) |
fun-asr-2025-11-07 | ||
fun-asr-2025-08-25 | ||
fun-asr-mtl | ||
fun-asr-mtl-2025-08-25 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price |
fun-asr | $0.000032/second |
fun-asr-2025-11-07 | |
fun-asr-2025-08-25 | |
fun-asr-mtl | |
fun-asr-mtl-2025-08-25 |
Real-time speech recognition
Billing rule: You are charged based on the duration of the input audio in seconds. You are not charged for the output.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price | Free quota (Note) |
fun-asr-realtime | $0.00009/second | 36,000 seconds (10 hours) Valid for 90 days |
fun-asr-realtime-2025-11-07 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price |
fun-asr-realtime | $0.000047/second |
fun-asr-realtime-2025-11-07 | |
fun-asr-realtime-2025-09-15 |
Paraformer
Audio file recognition
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing rule: You are charged based on the duration of the input audio in seconds. You are not charged for the output.
Model | Input price |
paraformer-v2 | $0.000012/second |
paraformer-8k-v2 |
Real-time speech recognition
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing rule: You are charged based on the duration of the input audio in seconds. You are not charged for the output.
Model | Input price | Free quota (Note) |
paraformer-realtime-v2 | $0.000035/second | No free quota |
paraformer-realtime-8k-v2 |
Text embedding
Billing rule: You are charged based on the number of input tokens. You are not charged for the output.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million tokens) | Free quota (Note) |
text-embedding-v4 | $0.07 | 1 million tokens |
text-embedding-v3 | $0.07 | 500,000 tokens |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (Million tokens) |
text-embedding-v4 | $0.072 |
Multimodal embedding
Billing rule: You are charged based on the number of input tokens. You are not charged for the output.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million input tokens) | Free quota (Note) |
tongyi-embedding-vision-plus | $0.09 | 1 million tokens Validity: 90 days after activating Model Studio |
tongyi-embedding-vision-flash | Image/Video: $0.03 Text: $0.09 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Model | Input price (Million tokens) | Free quota (Note) |
multimodal-embedding-v1 | Free trial | No token quota |
Text rerank
Billing rule: You are charged based on the number of input tokens. You are not charged for the output.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million tokens) | Free quota (Note) |
qwen3-rerank | $0.1 | 1 million tokens Validity: 90 days after activating Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (Million tokens) |
gte-rerank-v2 | $0.115 |
Domain specific
Intent recognition
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
tongyi-intent-detect-v3 | $0.058 | $0.144 | No free quota |
Role playing
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen-plus-character-ja | $0.5 | $1.4 | No free quota |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Model | Input price (Million tokens) | Output price (Million tokens) | Free quota (Note) |
qwen-plus-character | $0.115 | $0.287 | No free quota |