Text generation - Qwen
Qwen-Max
Billing is based on the number of input and output tokens.
If the model supports batch calls, the batch price for input and output tokens is 50% of the real-time price. If the model supports context cache, only input tokens are eligible for a discount. These two discounts cannot be applied at the same time.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response | Free quota (Note) |
qwen3-max Batch calling 50% off Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $1.2 | $6 | 1 million tokens each Validity: 90 days after activating Model Studio |
32K<Token≤128K | $2.4 | $12 | |||
128K<Token≤252K | $3 | $15 | |||
qwen3-max-2026-01-23 | Thinking and non-thinking | 0<Token≤32K | $1.2 | $6 | |
32K<Token≤128K | $2.4 | $12 | |||
128K<Token≤252K | $3 | $15 | |||
qwen3-max-2025-09-23 | Non-thinking only | 0<Token≤32K | $1.2 | $6 | |
32K<Token≤128K | $2.4 | $12 | |||
128K<Token≤252K | $3 | $15 | |||
qwen3-max-preview Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $1.2 | $6 | |
32K<Token≤128K | $2.4 | $12 | |||
128K<Token≤252K | $3 | $15 |
More models
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-max Batch calling 50% off | Non-thinking only | No tiered pricing | $1.6 | $6.4 | 1 million tokens each |
qwen-max-latest | Non-thinking only | No tiered pricing | $1.6 | $6.4 | |
qwen-max-2025-01-25 | Non-thinking only | No tiered pricing | $1.6 | $6.4 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-max Context cache discount applicable | Non-thinking only | 0<Token≤32K | $1.2 | $6 |
32K<Token≤128K | $2.4 | $12 | ||
128K<Token≤252K | $3 | $15 | ||
qwen3-max-2025-09-23 | Non-thinking only | 0<Token≤32K | $1.2 | $6 |
32K<Token≤128K | $2.4 | $12 | ||
128K<Token≤252K | $3 | $15 | ||
qwen3-max-preview Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $1.2 | $6 |
32K<Token≤128K | $2.4 | $12 | ||
128K<Token≤252K | $3 | $15 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-max Batch calling 50% off Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.359 | $1.434 |
32K<Token≤128K | $0.574 | $2.294 | ||
128K<Token≤252K | $1.004 | $4.014 | ||
qwen3-max-2026-01-23 | Thinking and non-thinking | 0<Token≤32K | $0.359 | $1.434 |
32K<Token≤128K | $0.574 | $2.294 | ||
128K<Token≤252K | $1.004 | $4.014 | ||
qwen3-max-2025-09-23 | Non-thinking only | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.434 | $5.735 | ||
128K<Token≤252K | $2.151 | $8.602 | ||
qwen3-max-preview Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.434 | $5.735 | ||
128K<Token≤252K | $2.151 | $8.602 |
More models
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-max | Non-thinking only | No tiered pricing | $0.345 | $1.377 |
qwen-max-latest | Non-thinking only | No tiered pricing | $0.345 | $1.377 |
qwen-max-2025-01-25 | Non-thinking only | No tiered pricing | $0.345 | $1.377 |
qwen-max-2024-09-19 | Non-thinking only | No tiered pricing | $2.868 | $8.602 |
Qwen-Plus
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | |
Non-thinking mode | Thinking mode (CoT + response) | ||||
qwen-plus | 0<Token≤256K | $0.4 | $1.2 | $4 | 1 million tokens each |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-latest | 0<Token≤256K | $0.4 | $1.2 | $4 | |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-2025-12-01 | 0<Token≤256K | $0.4 | $1.2 | $4 | |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-2025-09-11 | 0<Token≤256K | $0.4 | $1.2 | $4 | |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-2025-07-28 | 0<Token≤256K | $0.4 | $1.2 | $4 | |
256K<Token≤1M | $1.2 | $3.6 | $12 | ||
qwen-plus-2025-07-14 | No tiered pricing | $0.4 | $1.2 | $4 | |
qwen-plus-2025-04-28 | No tiered pricing | $0.4 | $1.2 | $4 | |
qwen-plus-2025-01-25 | No tiered pricing | $0.4 | $1.2 | - | |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen-plus | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
qwen-plus-2025-12-01 | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
qwen-plus-2025-09-11 | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
qwen-plus-2025-07-28 | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen-plus-us | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
qwen-plus-2025-12-01-us | 0<Token≤256K | $0.4 | $1.2 | $4 |
256K<Token≤1M | $1.2 | $3.6 | $12 | |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen-plus | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-latest | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-12-01 | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-09-11 | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-07-28 | 0<Token≤128K | $0.115 | $0.287 | $1.147 |
128K<Token≤256K | $0.345 | $2.868 | $3.441 | |
256K<Token≤1M | $0.689 | $6.881 | $9.175 | |
qwen-plus-2025-07-14 | No tiered pricing | $0.115 | $0.287 | $1.147 |
qwen-plus-2025-04-28 | No tiered pricing | $0.115 | $0.287 | $1.147 |
More models
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-plus-2025-01-25 | No tiered pricing | $0.115 | $0.287 |
qwen-plus-2025-01-12 | No tiered pricing | $0.115 | $0.287 |
qwen-plus-2024-12-20 | No tiered pricing | $0.115 | $0.287 |
Qwen-Flash
Billing is based on the number of input and output tokens.
If the model supports batch calls, the batch price for input and output tokens is 50% of the real-time price. If the model supports context cache, only input tokens are eligible for a discount. These two discounts cannot be applied at the same time.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-flash Batch calling 50% off Context cache discount applicable | 0<Token≤256K | $0.05 | $0.4 | 1 million tokens each |
256K<Token≤1M | $0.25 | $2 | ||
qwen-flash-2025-07-28 | 0<Token≤256K | $0.05 | $0.4 | |
256K<Token≤1M | $0.25 | $2 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-flash Context cache discount applicable | 0<Token≤256K | $0.05 | $0.4 |
256K<Token≤1M | $0.25 | $2 | |
qwen-flash-2025-07-28 | 0<Token≤256K | $0.05 | $0.4 |
256K<Token≤1M | $0.25 | $2 |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-flash | 0<Token≤256K | $0.05 | $0.4 |
256K<Token≤1M | $0.25 | $2 | |
qwen-flash-2025-07-28 | 0<Token≤256K | $0.05 | $0.4 |
256K<Token≤1M | $0.25 | $2 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-flash Context cache discount applicable | 0<Token≤128K | $0.022 | $0.216 |
128K<Token≤256K | $0.087 | $0.861 | |
256K<Token≤1M | $0.173 | $1.721 | |
qwen-flash-2025-07-28 | 0<Token≤128K | $0.022 | $0.216 |
128K<Token≤256K | $0.087 | $0.861 | |
256K<Token≤1M | $0.173 | $1.721 |
Qwen-Turbo
Qwen-Turbo will no longer be updated. We recommend Qwen-Flash instead.
Billing is based on the number of input and output tokens.
If the model supports batch calls, the batch price for input and output tokens is 50% of the real-time price.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen-turbo Batch calling 50% off | $0.05 | $0.2 | $0.5 | 1 million tokens each |
qwen-turbo-latest | $0.05 | $0.2 | $0.5 | |
qwen-turbo-2025-04-28 | $0.05 | $0.2 | $0.5 | |
More models
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-turbo-2024-11-01 | $0.05 | $0.2 | 1 million tokens each |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | ||
qwen-turbo | $0.044 | $0.087 | $0.431 |
qwen-turbo-latest | $0.044 | $0.087 | $0.431 |
qwen-turbo-2025-07-15 | $0.044 | $0.087 | $0.431 |
qwen-turbo-2025-04-28 | $0.044 | $0.087 | $0.431 |
QwQ
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwq-plus | $0.8 | $2.4 | 1 million tokens |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwq-plus | $0.230 | $0.574 |
qwq-plus-latest | $0.230 | $0.574 |
qwq-plus-2025-03-05 | $0.230 | $0.574 |
Qwen-Long
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-long-latest | $0.072 | $0.287 | No free quota |
qwen-long-2025-01-25 | $0.072 | $0.287 |
Qwen-Omni
Billing rule: Charges are calculated per input and output token. For token calculation rules for different modalities, see Billing and rate limiting.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | ||||
Input: Text | Input: Audio | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | |||
qwen3-omni-flash | Thinking and non-thinking | $0.43 | $3.81 | $0.78 | $1.66 | $3.06 | $15.11 | 1 million tokens each (regardless of modality) Validity: 90 days after activating Model Studio |
qwen3-omni-flash-2025-12-01 | Thinking and non-thinking | $0.43 | $3.81 | $0.78 | $1.66 | $3.06 | $15.11 | |
qwen3-omni-flash-2025-09-15 | Thinking and non-thinking | $0.43 | $3.81 | $0.78 | $1.66 | $3.06 | $15.11 | |
More models
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | ||||
Input: Text | Input: Audio | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | ||
qwen-omni-turbo | $0.07 | $4.44 | $0.21 | $0.27 | $0.63 | $8.89 | 1 million tokens each (regardless of modality) Validity: 90 days after activating Model Studio |
qwen-omni-turbo-latest | $0.07 | $4.44 | $0.21 | $0.27 | $0.63 | $8.89 | |
qwen-omni-turbo-2025-03-26 | $0.07 | $4.44 | $0.21 | $0.27 | $0.63 | $8.89 | |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) | ||||
Input: Text | Input: Audio Audio part is billed separately | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | ||
qwen3-omni-flash | Thinking and non-thinking | $0.258 | $2.265 | $0.473 | $0.989 | $1.821 | $8.974 |
qwen3-omni-flash-2025-12-01 | Thinking and non-thinking | $0.258 | $2.265 | $0.473 | $0.989 | $1.821 | $8.974 |
qwen3-omni-flash-2025-09-15 | Thinking and non-thinking | $0.258 | $2.265 | $0.473 | $0.989 | $1.821 | $8.974 |
More models
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | ||||
Input: Text | Input: Audio Audio part is billed separately | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | |
qwen-omni-turbo | $0.058 | $3.584 | $0.216 | $0.230 | $0.646 | $7.168 |
qwen-omni-turbo-latest | $0.058 | $3.584 | $0.216 | $0.230 | $0.646 | $7.168 |
qwen-omni-turbo-2025-03-26 | $0.058 | $3.584 | $0.216 | $0.230 | $0.646 | $7.168 |
qwen-omni-turbo-2025-01-19 | $0.058 | $3.584 | $0.216 | $0.230 | $0.646 | $7.168 |
Qwen-Omni-Realtime
Billing rule: Charges are calculated per input and output token. For token calculation rules for different modalities, see Billing and rate limiting.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | ||||
Input: Text | Input: Audio Audio part is billed separately | Input: Image | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | ||
qwen3-omni-flash-realtime | $0.52 | $4.57 | $0.94 | $1.99 | $3.67 | $18.13 | 1 million tokens each (regardless of modality) Validity: 90 days after activating Model Studio |
qwen3-omni-flash-realtime-2025-12-01 | $0.52 | $4.57 | $0.94 | $1.99 | $3.67 | $18.13 | |
qwen3-omni-flash-2025-09-15-realtime | $0.52 | $4.57 | $0.94 | $1.99 | $3.67 | $18.13 | |
qwen-omni-turbo-realtime | $0.270 | $4.440 | $0.840 | $1.070 | $2.520 | $8.890 | |
qwen-omni-turbo-realtime-latest | $0.270 | $4.440 | $0.840 | $1.070 | $2.520 | $8.890 | |
qwen-omni-turbo-realtime-2025-05-08 | $0.270 | $4.440 | $0.840 | $1.070 | $2.520 | $8.890 | |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | ||||
Input: Text | Input: Audio Audio part is billed separately | Input: Image | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | |
qwen3-omni-flash-realtime | $0.315 | $2.709 | $0.559 | $1.19 | $2.179 | $10.766 |
qwen3-omni-flash-realtime-2025-12-01 | $0.315 | $2.709 | $0.559 | $1.19 | $2.179 | $10.766 |
qwen3-omni-flash-realtime-2025-09-15 | $0.315 | $2.709 | $0.559 | $1.19 | $2.179 | $10.766 |
qwen-omni-turbo-realtime | $0.230 | $3.584 | $0.861 | $0.918 | $2.581 | $7.168 |
qwen-omni-turbo-realtime-latest | $0.230 | $3.584 | $0.861 | $0.918 | $2.581 | $7.168 |
qwen-omni-turbo-realtime-2025-05-08 | $0.230 | $3.584 | $0.861 | $0.918 | $2.581 | $7.168 |
QVQ
Billing rule: Charges are calculated per input and output token. For token calculation rules for different modalities, see Billing and rate limiting.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qvq-max | $1.2 | $4.8 | 1 million tokens each |
qvq-max-latest | $1.2 | $4.8 | |
qvq-max-2025-03-25 | $1.2 | $4.8 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qvq-max | $1.147 | $4.588 |
qvq-max-latest | $1.147 | $4.588 |
qvq-max-2025-05-15 | $1.147 | $4.588 |
qvq-max-2025-03-25 | $1.147 | $4.588 |
qvq-plus | $0.287 | $0.717 |
qvq-plus-latest | $0.287 | $0.717 |
qvq-plus-2025-05-15 | $0.287 | $0.717 |
Qwen-VL
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response | Free quota (Note) |
qwen3-vl-plus Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.2 | $1.6 | 1 million tokens each |
32K<Token≤128K | $0.3 | $2.4 | |||
128K<Token≤256K | $0.6 | $4.8 | |||
qwen3-vl-plus-2025-12-19 | Thinking and non-thinking | 0<Token≤32K | $0.2 | $1.6 | |
32K<Token≤128K | $0.3 | $2.4 | |||
128K<Token≤256K | $0.6 | $4.8 | |||
qwen3-vl-plus-2025-09-23 | Thinking and non-thinking | 0<Token≤32K | $0.2 | $1.6 | |
32K<Token≤128K | $0.3 | $2.4 | |||
128K<Token≤256K | $0.6 | $4.8 | |||
qwen3-vl-flash Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 | |
32K<Token≤128K | $0.075 | $0.6 | |||
128K<Token≤256K | $0.12 | $0.96 | |||
qwen3-vl-flash-2026-01-22 | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 | |
32K<Token≤128K | $0.075 | $0.6 | |||
128K<Token≤256K | $0.12 | $0.96 | |||
qwen3-vl-flash-2025-10-15 | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 | |
32K<Token≤128K | $0.075 | $0.6 | |||
128K<Token≤256K | $0.12 | $0.96 |
More models
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-vl-max Context cache discount applicable | No tiered pricing | $0.8 | $3.2 | 1 million tokens each Validity: 90 days after activating Model Studio |
qwen-vl-max-latest | No tiered pricing | $0.8 | $3.2 | |
qwen-vl-max-2025-08-13 | No tiered pricing | $0.8 | $3.2 | |
qwen-vl-max-2025-04-08 | No tiered pricing | $0.8 | $3.2 | |
qwen-vl-plus Context cache discount applicable | No tiered pricing | $0.21 | $0.63 | |
qwen-vl-plus-latest | No tiered pricing | $0.21 | $0.63 | |
qwen-vl-plus-2025-08-15 | No tiered pricing | $0.21 | $0.63 | |
qwen-vl-plus-2025-05-07 | No tiered pricing | $0.21 | $0.63 | |
qwen-vl-plus-2025-01-25 | No tiered pricing | $0.21 | $0.63 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-vl-plus Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.2 | $1.6 |
32K<Token≤128K | $0.3 | $2.4 | ||
128K<Token≤256K | $0.6 | $4.8 | ||
qwen3-vl-plus-2025-09-23 | Thinking and non-thinking | 0<Token≤32K | $0.2 | $1.6 |
32K<Token≤128K | $0.3 | $2.4 | ||
128K<Token≤256K | $0.6 | $4.8 | ||
qwen3-vl-flash Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 |
32K<Token≤128K | $0.075 | $0.6 | ||
128K<Token≤256K | $0.12 | $0.96 | ||
qwen3-vl-flash-2025-10-15 | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 |
32K<Token≤128K | $0.075 | $0.6 | ||
128K<Token≤256K | $0.12 | $0.96 |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-vl-flash-us Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 |
32K<Token≤128K | $0.075 | $0.6 | ||
128K<Token≤256K | $0.12 | $0.96 | ||
qwen3-vl-flash-2025-10-15-us | Thinking and non-thinking | 0<Token≤32K | $0.05 | $0.4 |
32K<Token≤128K | $0.075 | $0.6 | ||
128K<Token≤256K | $0.12 | $0.96 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-vl-plus Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.143 | $1.434 |
32K<Token≤128K | $0.215 | $2.15 | ||
128K<Token≤256K | $0.43 | $4.301 | ||
qwen3-vl-plus-2025-12-19 | Thinking and non-thinking | 0<Token≤32K | $0.143 | $1.434 |
32K<Token≤128K | $0.215 | $2.15 | ||
128K<Token≤256K | $0.43 | $4.301 | ||
qwen3-vl-plus-2025-09-23 | Thinking and non-thinking | 0<Token≤32K | $0.143 | $1.434 |
32K<Token≤128K | $0.215 | $2.15 | ||
128K<Token≤256K | $0.43 | $4.301 | ||
qwen3-vl-flash Context cache discount applicable | Thinking and non-thinking | 0<Token≤32K | $0.022 | $0.215 |
32K<Token≤128K | $0.043 | $0.43 | ||
128K<Token≤256K | $0.086 | $0.859 | ||
qwen3-vl-flash-2026-01-22 | Thinking and non-thinking | 0<Token≤32K | $0.022 | $0.215 |
32K<Token≤128K | $0.043 | $0.43 | ||
128K<Token≤256K | $0.086 | $0.859 | ||
qwen3-vl-flash-2025-10-15 | Thinking and non-thinking | 0<Token≤32K | $0.022 | $0.215 |
32K<Token≤128K | $0.043 | $0.43 | ||
128K<Token≤256K | $0.086 | $0.859 |
More models
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-vl-max Context cache discount applicable | No tiered pricing | $0.23 | $0.574 |
qwen-vl-max-latest | No tiered pricing | $0.23 | $0.574 |
qwen-vl-max-2025-08-13 | No tiered pricing | $0.23 | $0.574 |
qwen-vl-max-2025-04-08 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-max-2025-04-02 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-max-2025-01-25 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-max-2024-12-30 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-max-2024-11-19 | No tiered pricing | $0.431 | $1.291 |
qwen-vl-plus Context cache discount applicable | No tiered pricing | $0.115 | $0.287 |
qwen-vl-plus-latest | No tiered pricing | $0.115 | $0.287 |
qwen-vl-plus-2025-08-15 | No tiered pricing | $0.115 | $0.287 |
qwen-vl-plus-2025-07-10 | No tiered pricing | $0.022 | $0.216 |
qwen-vl-plus-2025-05-07 | No tiered pricing | $0.216 | $0.646 |
qwen-vl-plus-2025-01-25 | No tiered pricing | $0.216 | $0.646 |
qwen-vl-plus-2025-01-02 | No tiered pricing | $0.216 | $0.646 |
Qwen-OCR
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-vl-ocr | $0.07 | $0.16 | 1 million tokens each |
qwen-vl-ocr-2025-11-20 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-vl-ocr | $0.07 | $0.16 |
qwen-vl-ocr-2025-11-20 | $0.07 | $0.16 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-vl-ocr | $0.717 | $0.717 |
qwen-vl-ocr-latest | $0.043 | $0.072 |
qwen-vl-ocr-2025-11-20 | ||
qwen-vl-ocr-2025-08-28 | $0.717 | $0.717 |
qwen-vl-ocr-2025-04-13 | ||
qwen-vl-ocr-2024-10-28 |
Qwen-Math
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-math-plus | $0.574 | $1.721 | No free quota |
qwen-math-plus-latest | $0.574 | $1.721 | |
qwen-math-plus-2024-09-19 | $0.574 | $1.721 | |
qwen-math-plus-2024-08-16 | $0.574 | $1.721 | |
qwen-math-turbo | $0.287 | $0.861 | |
qwen-math-turbo-latest | $0.287 | $0.861 | |
qwen-math-turbo-2024-09-19 | $0.287 | $0.861 |
Qwen-Coder
Billing is based on the number of input and output tokens.
If the model supports context cache, only input tokens are eligible for a discount.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen3-coder-plus Context cache discount applicable | 0<Token≤32K | $1 | $5 | 1 million tokens each |
32K<Token≤128K | $1.8 | $9 | ||
128K<Token≤256K | $3 | $15 | ||
256K<Token≤1M | $6 | $60 | ||
qwen3-coder-plus-2025-09-23 | 0<Token≤32K | $1 | $5 | |
32K<Token≤128K | $1.8 | $9 | ||
128K<Token≤256K | $3 | $15 | ||
256K<Token≤1M | $6 | $60 | ||
qwen3-coder-plus-2025-07-22 | 0<Token≤32K | $1 | $5 | |
32K<Token≤128K | $1.8 | $9 | ||
128K<Token≤256K | $3 | $15 | ||
256K<Token≤1M | $6 | $60 | ||
qwen3-coder-flash | 0<Token≤32K | $0.3 | $1.5 | |
32K<Token≤128K | $0.5 | $2.5 | ||
128K<Token≤256K | $0.8 | $4 | ||
256K<Token≤1M | $1.6 | $9.6 | ||
qwen3-coder-flash-2025-07-28 | 0<Token≤32K | $0.3 | $1.5 | |
32K<Token≤128K | $0.5 | $2.5 | ||
128K<Token≤256K | $0.8 | $4 | ||
256K<Token≤1M | $1.6 | $9.6 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen3-coder-plus Context cache discount applicable | 0<Token≤32K | $1 | $5 |
32K<Token≤128K | $1.8 | $9 | |
128K<Token≤256K | $3 | $15 | |
256K<Token≤1M | $6 | $60 | |
qwen3-coder-plus-2025-09-23 | 0<Token≤32K | $1 | $5 |
32K<Token≤128K | $1.8 | $9 | |
128K<Token≤256K | $3 | $15 | |
256K<Token≤1M | $6 | $60 | |
qwen3-coder-plus-2025-07-22 | 0<Token≤32K | $1 | $5 |
32K<Token≤128K | $1.8 | $9 | |
128K<Token≤256K | $3 | $15 | |
256K<Token≤1M | $6 | $60 | |
qwen3-coder-flash Context cache discount applicable | 0<Token≤32K | $0.3 | $1.5 |
32K<Token≤128K | $0.5 | $2.5 | |
128K<Token≤256K | $0.8 | $4 | |
256K<Token≤1M | $1.6 | $9.6 | |
qwen3-coder-flash-2025-07-28 | 0<Token≤32K | $0.3 | $1.5 |
32K<Token≤128K | $0.5 | $2.5 | |
128K<Token≤256K | $0.8 | $4 | |
256K<Token≤1M | $1.6 | $9.6 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
qwen3-coder series
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen3-coder-plus Context cache discount applicable | 0<Token≤32K | $0.574 | $2.294 |
32K<Token≤128K | $0.861 | $3.441 | |
128K<Token≤256K | $1.434 | $5.735 | |
256K<Token≤1M | $2.868 | $28.671 | |
qwen3-coder-plus-2025-09-23 | 0<Token≤32K | $0.574 | $2.294 |
32K<Token≤128K | $0.861 | $3.441 | |
128K<Token≤256K | $1.434 | $5.735 | |
256K<Token≤1M | $2.868 | $28.671 | |
qwen3-coder-plus-2025-07-22 | 0<Token≤32K | $0.574 | $2.294 |
32K<Token≤128K | $0.861 | $3.441 | |
128K<Token≤256K | $1.434 | $5.735 | |
256K<Token≤1M | $2.868 | $28.671 | |
qwen3-coder-flash | 0<Token≤32K | $0.144 | $0.574 |
32K<Token≤128K | $0.216 | $0.861 | |
128K<Token≤256K | $0.359 | $1.434 | |
256K<Token≤1M | $0.717 | $3.584 | |
qwen3-coder-flash-2025-07-28 | 0<Token≤32K | $0.144 | $0.574 |
32K<Token≤128K | $0.216 | $0.861 | |
128K<Token≤256K | $0.359 | $1.434 | |
256K<Token≤1M | $0.717 | $3.584 |
Earlier qwen-coder series
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-coder-plus | No tiered pricing | $0.502 | $1.004 |
qwen-coder-plus-latest | No tiered pricing | $0.502 | $1.004 |
qwen-coder-plus-2024-11-06 | No tiered pricing | $0.502 | $1.004 |
qwen-coder-turbo | No tiered pricing | $0.287 | $0.861 |
qwen-coder-turbo-latest | No tiered pricing | $0.287 | $0.861 |
qwen-coder-turbo-2024-09-19 | No tiered pricing | $0.287 | $0.861 |
Qwen-MT
Billing is based on the number of input and output tokens.
International
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-mt-plus | $2.46 | $7.37 | 1 million tokens each |
qwen-mt-flash | $0.16 | $0.49 | |
qwen-mt-lite | $0.12 | $0.36 | |
qwen-mt-turbo | $0.16 | $0.49 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-mt-plus | $2.46 | $7.37 |
qwen-mt-flash | $0.16 | $0.49 |
qwen-mt-lite | $0.12 | $0.36 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-mt-plus | $0.259 | $0.775 |
qwen-mt-flash | $0.101 | $0.280 |
qwen-mt-lite | $0.086 | $0.229 |
qwen-mt-turbo | $0.101 | $0.280 |
Qwen data mining
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-doc-turbo | $0.087 | $0.144 | No free quota |
Qwen deep research
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-deep-research | $7.742 | $23.367 | No free quota |
Text generation - Qwen - Open source
Qwen3
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | |
Non-thinking mode | Thinking mode | ||||
qwen3-next-80b-a3b-thinking | Thinking only | $0.15 | - | $1.2 | 1 million tokens each |
qwen3-next-80b-a3b-instruct | Non-thinking only | $0.15 | $1.2 | - | |
qwen3-235b-a22b-thinking-2507 | Thinking only | $0.23 | - | $2.3 | |
qwen3-235b-a22b-instruct-2507 | Non-thinking only | $0.23 | $0.92 | - | |
qwen3-30b-a3b-thinking-2507 | Thinking only | $0.2 | - | $2.4 | |
qwen3-30b-a3b-instruct-2507 | Non-thinking only | $0.2 | $0.8 | - | |
qwen3-235b-a22b | Thinking and non-thinking | $0.7 | $2.8 | $8.4 | |
qwen3-32b | Thinking and non-thinking | $0.16 | $0.64 | $0.64 | |
qwen3-30b-a3b | Thinking and non-thinking | $0.2 | $0.8 | $2.4 | |
qwen3-14b | Thinking and non-thinking | $0.35 | $1.4 | $4.2 | |
qwen3-8b | Thinking and non-thinking | $0.18 | $0.7 | $2.1 | |
qwen3-4b | Thinking and non-thinking | $0.11 | $0.42 | $1.26 | |
qwen3-1.7b | Thinking and non-thinking | $0.11 | $0.42 | $1.26 | |
qwen3-0.6b | Thinking and non-thinking | $0.11 | $0.42 | $1.26 | |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen3-next-80b-a3b-thinking | Thinking only | $0.15 | - | $1.2 |
qwen3-next-80b-a3b-instruct | Non-thinking only | $0.15 | $1.2 | - |
qwen3-235b-a22b-thinking-2507 | Thinking only | $0.23 | - | $2.3 |
qwen3-235b-a22b-instruct-2507 | Non-thinking only | $0.23 | $0.92 | - |
qwen3-30b-a3b-thinking-2507 | Thinking only | $0.2 | - | $2.4 |
qwen3-30b-a3b-instruct-2507 | Non-thinking only | $0.2 | $0.8 | - |
qwen3-235b-a22b | Thinking and non-thinking | $0.7 | $2.8 | $8.4 |
qwen3-32b | Thinking and non-thinking | $0.16 | $0.64 | $0.64 |
qwen3-30b-a3b | Thinking and non-thinking | $0.2 | $0.8 | $2.4 |
qwen3-14b | Thinking and non-thinking | $0.35 | $1.4 | $4.2 |
qwen3-8b | Thinking and non-thinking | $0.18 | $0.7 | $2.1 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) | |
Non-thinking mode | Thinking mode (CoT + response) | |||
qwen3-next-80b-a3b-thinking | Thinking only | $0.144 | - | $1.434 |
qwen3-next-80b-a3b-instruct | Non-thinking only | $0.144 | $0.574 | - |
qwen3-235b-a22b-thinking-2507 | Thinking only | $0.287 | - | $2.868 |
qwen3-235b-a22b-instruct-2507 | Non-thinking only | $0.287 | $1.147 | - |
qwen3-30b-a3b-thinking-2507 | Thinking only | $0.108 | - | $1.076 |
qwen3-30b-a3b-instruct-2507 | Non-thinking only | $0.108 | $0.431 | - |
qwen3-235b-a22b | Thinking and non-thinking | $0.287 | $1.147 | $2.868 |
qwen3-32b | Thinking and non-thinking | $0.287 | $1.147 | $2.868 |
qwen3-30b-a3b | Thinking and non-thinking | $0.108 | $0.431 | $1.076 |
qwen3-14b | Thinking and non-thinking | $0.144 | $0.574 | $1.434 |
qwen3-8b | Thinking and non-thinking | $0.072 | $0.287 | $0.717 |
qwen3-4b | Thinking and non-thinking | $0.044 | $0.173 | $0.431 |
qwen3-1.7b | Thinking and non-thinking | $0.044 | $0.173 | $0.431 |
qwen3-0.6b | Thinking and non-thinking | $0.044 | $0.173 | $0.431 |
QwQ - Open source
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwq-32b | $0.287 | $0.861 | No free quota |
QwQ-Preview
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwq-32b-preview | $0.287 | $0.861 | No free quota |
Qwen2.5
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen2.5-14b-instruct-1m | $0.805 | $3.22 | 1 million tokens each |
qwen2.5-7b-instruct-1m | $0.368 | $1.47 | |
qwen2.5-72b-instruct | $1.4 | $5.6 | |
qwen2.5-32b-instruct | $0.7 | $2.8 | |
qwen2.5-14b-instruct | $0.35 | $1.4 | |
qwen2.5-7b-instruct | $0.175 | $0.7 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen2.5-14b-instruct-1m | $0.144 | $0.431 |
qwen2.5-7b-instruct-1m | $0.072 | $0.144 |
qwen2.5-72b-instruct | $0.574 | $1.721 |
qwen2.5-32b-instruct | $0.287 | $0.861 |
qwen2.5-14b-instruct | $0.144 | $0.431 |
qwen2.5-7b-instruct | $0.072 | $0.144 |
qwen2.5-3b-instruct | $0.044 | $0.130 |
qwen2.5-1.5b-instruct | Limited time free | |
qwen2.5-0.5b-instruct | ||
QVQ
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qvq-72b-preview | $1.721 | $5.161 | No free quota |
Qwen-Omni
Billing rule: Charges are calculated per input and output token. For token calculation rules for different modalities, see Billing and rate limiting.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | ||||
Input: Text | Input: Audio | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | ||
qwen2.5-omni-7b | $0.10 | $6.76 | $0.28 | $0.40 | $0.84 | $13.51 | 1 million tokens (regardless of modality) Validity: 90 days after activating Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | ||||
Input: Text | Input: Audio | Input: Image/Video | Output: Text Plain text input | Output: Text Multimodal input | Output: Text+Audio Only audio is billed | |
qwen2.5-omni-7b | $0.087 | $5.448 | $0.287 | $0.345 | $0.861 | $10.895 |
Qwen3-Omni-Captioner
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen3-omni-30b-a3b-captioner | $3.81 | $3.06 | 1 million tokens |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen3-omni-30b-a3b-captioner | $2.265 | $1.821 |
Qwen-VL
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response | Free quota (Note) |
qwen3-vl-235b-a22b-thinking | Thinking only | $0.4 | $4 | 1 million tokens each |
qwen3-vl-235b-a22b-instruct | Non-thinking only | $0.4 | $1.6 | |
qwen3-vl-32b-thinking | Thinking only | $0.16 | $0.64 | |
qwen3-vl-32b-instruct | Non-thinking only | $0.16 | $0.64 | |
qwen3-vl-30b-a3b-thinking | Thinking only | $0.2 | $2.4 | |
qwen3-vl-30b-a3b-instruct | Non-thinking only | $0.2 | $0.8 | |
qwen3-vl-8b-thinking | Thinking only | $0.18 | $2.1 | |
qwen3-vl-8b-instruct | Non-thinking only | $0.18 | $0.7 |
More models
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen2.5-vl-72b-instruct | $2.8 | $8.4 | 1 million tokens each |
qwen2.5-vl-32b-instruct | $1.4 | $4.2 | |
qwen2.5-vl-7b-instruct | $0.35 | $1.05 | |
qwen2.5-vl-3b-instruct | $0.21 | $0.63 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-vl-235b-a22b-thinking | Thinking only | $0.287 | $2.867 |
qwen3-vl-235b-a22b-instruct | Non-thinking only | $0.287 | $1.147 |
qwen3-vl-32b-thinking | Thinking only | $0.287 | $2.867 |
qwen3-vl-32b-instruct | Non-thinking only | $0.287 | $1.147 |
qwen3-vl-30b-a3b-thinking | Thinking only | $0.108 | $1.075 |
qwen3-vl-30b-a3b-instruct | Non-thinking only | $0.108 | $0.43 |
qwen3-vl-8b-thinking | Thinking only | $0.072 | $0.717 |
qwen3-vl-8b-instruct | Non-thinking only | $0.072 | $0.287 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Mode | Input price (per 1M tokens) | Output price (per 1M tokens) CoT + response |
qwen3-vl-235b-a22b-thinking | Thinking only | $0.287 | $2.8677 |
qwen3-vl-235b-a22b-instruct | Non-thinking only | $0.287 | $1.147 |
qwen3-vl-32b-thinking | Thinking only | $0.287 | $2.868 |
qwen3-vl-32b-instruct | Non-thinking only | $0.287 | $1.147 |
qwen3-vl-30b-a3b-thinking | Thinking only | $0.108 | $1.076 |
qwen3-vl-30b-a3b-instruct | Non-thinking only | $0.108 | $0.431 |
qwen3-vl-8b-thinking | Thinking only | $0.072 | $0.717 |
qwen3-vl-8b-instruct | Non-thinking only | $0.072 | $0.287 |
More models
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen2.5-vl-72b-instruct | $2.294 | $6.881 |
qwen2.5-vl-32b-instruct | $1.147 | $3.441 |
qwen2.5-vl-7b-instruct | $0.287 | $0.717 |
qwen2.5-vl-3b-instruct | $0.173 | $0.517 |
qwen2-vl-72b-instruct | $2.294 | $6.881 |
qwen2-vl-7b-instruct | Limited time free | |
qwen2-vl-2b-instruct | ||
Qwen-Math
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen2.5-math-72b-instruct | $0.574 | $1.721 | No free quota |
qwen2.5-math-7b-instruct | $0.144 | $0.287 | |
qwen2.5-math-1.5b-instruct | Limited time free | ||
Qwen-Coder
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen3-coder-480b-a35b-instruct | 0<Token≤32K | $1.5 | $7.5 | 1 million tokens each |
32K<Token≤128K | $2.7 | $13.5 | ||
128K<Token≤200K | $4.5 | $22.5 | ||
qwen3-coder-30b-a3b-instruct | 0<Token≤32K | $0.45 | $2.25 | |
32K<Token≤128K | $0.75 | $3.75 | ||
128K<Token≤200K | $1.2 | $6 |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen3-coder-480b-a35b-instruct | 0<Token≤32K | $1.5 | $7.5 |
32K<Token≤128K | $2.7 | $13.5 | |
128K<Token≤200K | $4.5 | $22.5 | |
qwen3-coder-30b-a3b-instruct | 0<Token≤32K | $0.45 | $2.25 |
32K<Token≤128K | $0.75 | $3.75 | |
128K<Token≤200K | $1.2 | $6 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen3-coder-480b-a35b-instruct | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.291 | $5.161 | |
128K<Token≤200K | $2.151 | $8.602 | |
qwen3-coder-30b-a3b-instruct | 0<Token≤32K | $0.216 | $0.861 |
32K<Token≤128K | $0.323 | $1.291 | |
128K<Token≤200K | $0.538 | $2.151 | |
qwen2.5-coder-32b-instruct | No tiered pricing | $0.287 | $0.861 |
qwen2.5-coder-14b-instruct | No tiered pricing | $0.287 | $0.861 |
qwen2.5-coder-7b-instruct | No tiered pricing | $0.144 | $0.287 |
qwen2.5-coder-3b-instruct | No tiered pricing | Limited time free | |
qwen2.5-coder-1.5b-instruct | No tiered pricing | ||
qwen2.5-coder-0.5b-instruct | No tiered pricing | ||
Text generation - Third party
DeepSeek
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
deepseek-v3.2 | $0.287 | $0.431 | No free quota |
deepseek-v3.2-exp | $0.287 | $0.431 | |
deepseek-v3.1 | $0.574 | $1.721 | |
deepseek-r1 | $0.574 | $2.294 | |
deepseek-r1-0528 | $0.574 | $2.294 | |
deepseek-v3 | $0.287 | $1.147 | |
deepseek-r1-distill-qwen-1.5b | Limited time free | ||
deepseek-r1-distill-qwen-7b | $0.072 | $0.144 | No free quota |
deepseek-r1-distill-qwen-14b | $0.144 | $0.431 | |
deepseek-r1-distill-qwen-32b | $0.287 | $0.861 | |
deepseek-r1-distill-llama-8b | Limited time free | ||
deepseek-r1-distill-llama-70b | Limited time free | ||
Kimi
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
kimi-k2.5 | $0.574 | $3.011 | |
kimi-k2-thinking | $0.574 | $2.294 | No free quota |
Moonshot-Kimi-K2-Instruct | $0.574 | $2.294 |
GLM
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing is based on the number of input and output tokens.
Model | Mode | Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) CoT and response |
glm-4.7 | Thinking and non-thinking | 0<Token≤32K | $0.431 | $2.007 |
32K<Token≤166K | $0.574 | $2.294 | ||
glm-4.6 | Thinking and non-thinking | 0<Token≤32K | $0.431 | $2.007 |
32K<Token≤166K | $0.574 | $2.294 |
Image generation
Inputs are not billed. Billing is based on the number of successfully generated images in the output.
Billing formula: Fee = Unit price per image × Number of successfully generated images.
Billing details:
The fee is not affected by the resolution or aspect ratio of the output images.
Failed requests do not incur fees or consume the free quota.
Qwen-Image
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output price(per image) | Free quota (Note) |
qwen-image-max | $0.075 | 100 images each |
qwen-image-max-2025-12-30 | $0.075 | |
qwen-image-plus | $0.03 | |
qwen-image-plus-2026-01-09 | $0.03 | |
qwen-image | $0.035 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output price(per image) |
qwen-image-max | $0.071677 |
qwen-image-max-2025-12-30 | $0.071677 |
qwen-image-plus | $0.028671 |
qwen-image-plus-2026-01-09 | $0.028671 |
qwen-image | $0.035 |
Qwen-Image-Edit
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output price(per image) | Free quota (Note) |
qwen-image-edit-max | $0.075 | 100 images each |
qwen-image-edit-max-2026-01-16 | $0.075 | |
qwen-image-edit-plus | $0.03 | |
qwen-image-edit-plus-2025-12-15 | $0.03 | |
qwen-image-edit-plus-2025-10-30 | $0.03 | |
qwen-image-edit | $0.045 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output price(per image) |
qwen-image-edit-max | $0.071677 |
qwen-image-edit-max-2026-01-16 | $0.071677 |
qwen-image-edit-plus | $0.028671 |
qwen-image-edit-plus-2025-12-15 | $0.028671 |
qwen-image-edit-plus-2025-10-30 | $0.028671 |
qwen-image-edit | $0.043 |
Qwen-MT-Image
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Only output is billed. For rules, see Image generation.
Model | Output price | Free quota (Note) |
qwen-mt-image | $0.000431/image | No free quota |
Tongyi - text-to-image - Z-Image
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output price | Free quota (Note) |
z-image-turbo | Prompt rewriting disabled ( Prompt rewriting enabled ( | 100 images Validity: 90 days after activating Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output price |
z-image-turbo | Prompt rewriting disabled ( Prompt rewriting enabled ( |
Wan text-to-image
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output price | Free quota (Note) |
wan2.6-t2i | $0.03/image | 50 images |
wan2.5-t2i-preview | $0.03/image | 50 images |
wan2.2-t2i-plus | $0.05/image | 100 images |
wan2.2-t2i-flash | $0.025/image | 100 images |
wan2.1-t2i-plus | $0.05/image | 200 images |
wan2.1-t2i-turbo | $0.025/image | 200 images |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Output price |
wan2.6-t2i | $0.03/image |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output price |
wan2.6-t2i | $0.028671/image |
wan2.5-t2i-preview | $0.028671/image |
wan2.2-t2i-plus | $0.020070/image |
wan2.2-t2i-flash | $0.028671/image |
wanx2.1-t2i-plus | $0.028671/image |
wanx2.1-t2i-turbo | $0.020070/image |
wanx2.0-t2i-turbo | $0.005735/image |
Wan image generation and editing
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output price | Free quota (Note) |
wan2.6-image | $0.03/image | 50 images |
Global
Global (Virginia) models do not offer a free quota.
Model | Output price |
wan2.6-image | $0.03/image |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output price |
wan2.6-image | $0.028671/image |
Wan general image editing
Only output is billed. For rules, see Image generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Service | Model | Output price | Free quota (Note) |
General image editing 2.5 | wan2.5-i2i-preview | $0.03/image | 50 images |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Service | Model | Output price |
General image editing 2.5 | wan2.5-i2i-preview | $0.028671/image |
General image editing 2.1 | wanx2.1-imageedit | $0.020070/image |
OutfitAnyone
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
aitryon-plus: Charges apply to output only. For billing rules, see Image generation.
aitryon-parsing-v1: Charges apply to input only. Billing based on the number of input images. Failed requests are not billed.
Service | Model | Price | Free quota (Note) |
OutfitAnyone - Plus | aitryon-plus | $0.071677/image | No free quota |
OutfitAnyone - Image parsing | aitryon-parsing-v1 | $0.000574/image |
Video generation
Inputs are not billed. Billing is based on the duration in seconds of successfully generated videos in the output.
Billing formula: Fee = Unit price per video × Duration of successfully generated video (in seconds).
Billing details:
For some models, the price is based on the output video resolution. The prices for different resolutions, such as 480P, 720P, and 1080P, vary.
For some models, the price is based on the output video mode. The prices for different video modes, such as Standard Edition and Professional Edition, vary.
For some models, the price is based on the output video aspect ratio. The prices for different video aspect ratios, such as 1:1 and 3:4, vary.
Some models use uniform pricing, regardless of resolution, pattern, or aspect ratio.
Failed requests do not incur fees or consume the free quota.
Wan - text-to-video
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output video resolution | Output price | Free quota (Note) Validity: 90 days after activating Model Studio |
wan2.6-t2v | 720P | $0.10/second | 50 seconds |
1080P | $0.15/second | ||
wan2.5-t2v-preview | 480P | $0.05/second | 50 seconds |
720P | $0.10/second | ||
1080P | $0.15/second | ||
wan2.2-t2v-plus | 480P | $0.02/second | 50 seconds |
1080P | $0.10/second | ||
wan2.1-t2v-turbo | 480P | $0.036/second | 200 seconds |
720P | $0.036/second | ||
wan2.1-t2v-plus | 720P | $0.10/second | 200 seconds |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-t2v | 720P | $0.1/second |
1080P | $0.15/second |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-t2v-us | 720P | $0.1/second |
1080P | $0.15/second |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-t2v | 720P | $0.086012/second |
1080P | $0.143353/second | |
wan2.5-t2v-preview | 480P | $0.043006/second |
720P | $0.086012/second | |
1080P | $0.143353/second | |
wan2.2-t2v-plus | 480P | $0.02007/second |
1080P | $0.100347/second | |
wanx2.1-t2v-turbo | 480P | $0.034405/second |
720P | $0.034405/second | |
wanx2.1-t2v-plus | 720P | $0.100347/second |
Wan - image-to-video - first fame
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output video type | Output video resolution | Output price | Free quota (Note) Validity: 90 days after activating Model Studio |
wan2.6-i2v-flash | Video with audio
| 720P | $0.05/second | 50 seconds |
1080P | $0.075/second | |||
Video without audio
| 720P | $0.025/second | ||
1080P | $0.0375/second | |||
wan2.6-i2v | Video with audio | 720P | $0.10/second | 50 seconds |
1080P | $0.15/second | |||
wan2.5-i2v-preview | Video with audio | 480P | $0.05/second | 50 seconds |
720P | $0.10/second | |||
1080P | $0.15/second | |||
wan2.2-i2v-flash | Video without audio | 480P | $0.015/second | 50 seconds |
720P | $0.036/second | |||
wan2.2-i2v-plus | Video without audio | 480P | $0.02/second | 50 seconds |
1080P | $0.10/second | |||
wan2.1-t2v-turbo | Video without audio | 480P | $0.036/second | 200 seconds |
720P | $0.036/second | |||
wan2.1-t2v-plus | Video without audio | 720P | $0.10/second | 200 seconds |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-i2v | 720P | $0.1/second |
1080P | $0.15/second |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.6-i2v-us | 720P | $0.1/second |
1080P | $0.15/second |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output video type | Output video resolution | Output price |
wan2.6-i2v-flash | Video with audio
| 720P | $0.043006/second |
1080P | $0.071676/second | ||
Video without audio
| 720P | $0.021503/second | |
1080P | $0.035838/second | ||
wan2.6-i2v | Video with audio | 720P | $0.086012/second |
1080P | $0.143353/second | ||
wan2.5-i2v-preview | Video with audio | 480P | $0.043006/second |
720P | $0.086012/second | ||
1080P | $0.143353/second | ||
wan2.2-i2v-plus | Video without audio | 480P | $0.02007/second |
1080P | $0.100347/second | ||
wanx2.1-t2v-turbo | Video without audio | 480P | $0.034405/second |
720P | $0.034405/second | ||
wanx2.1-t2v-plus | Video without audio | 720P | $0.100347/second |
Wan - image-to-video - first and last frames
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output video resolution | Output price | Free quota (Note) Validity: 90 days after activating Model Studio |
wan2.2-kf2v-flash | 480P | $0.015/second | 50 seconds |
720P | $0.036/second | ||
1080P | $0.07/second | ||
wan2.1-kf2v-plus | 720P | $0.10/second | 200 seconds |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wan2.2-kf2v-flash | 480P | $0.014335/second |
720P | $0.028671/second | |
1080P | $0.068809/second | |
wanx2.1-kf2v-plus | 720P | $0.100347/second |
Wan - reference-to-video
Billing rule: Charges apply to both input and output videos by seconds of video duration. Failed generations are not billed and do not consume the free quota.
Formula: Billable duration = input video duration (up to 5 seconds) + output video duration.
The input video is billed for no more than 5 seconds. For specific rules, see Billing and rate limits.
The output video is billed based on seconds of successfully generated video.
Pricinf description: The unit price is decided by the resolution tier and the audio option, regardless of the input video's resultion or audio.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output specification | Output resolution | Input & output price | Free quota (Note) Validity: 90 days after activating Model Studio |
wan.6-r2v-flash | Video with audio
| 720P | $0.05/second | 50 seconds |
1080P | $0.075/second | |||
Video without audio
| 720P | $0.025/second | ||
1080P | $0.0375/second | |||
wan2.6-r2v | Video with audio | 720P | $0.10/second | 50 seconds |
1080P | $0.15/second |
Global
In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.
Models in the Global deployment mode do not have a free quota.
Model | Output specification | Output resolution | Input & output price |
wan2.6-r2v | Video with audio | 720P | $0.1/second |
1080P | $0.15/second |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output specification | Output resolution | Input & output price |
wan.6-r2v-flash | Video with audio
| 720P | $0.043006/second |
1080P | $0.071676/second | ||
Video without audio
| 720P | $0.021503/second | |
1080P | $0.035838/second | ||
wan2.6-r2v | Video with audio | 720P | $0.086012/second |
1080P | $0.143353/second |
Wan - general video editing
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output video resolution | Output price | Free quota (Note) |
wan2.1-vace-plus | 720P | $0.10/second | 50 seconds Validity: 90 days after activating Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output video resolution | Output price |
wanx2.1-vace-plus | 720P | $0.100347/second |
Wan - digital human
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
wan2.2-s2v-detect: Charges apply to input only. Billing based on the number of detected images. Each input image is charged once, regardless of detection success.
wan2.2-s2v: Charges apply to output only. Billing based on the duration of successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | wan2.2-s2v-detect | Input image: $0.000574/image | No free quota |
Video generation | wan2.2-s2v | Output video:
|
Wan - image-to-action
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output video mode | Output price | Free quota (Note) |
wan2.2-animate-move | Standard mode | $0.12/second | 50 seconds Validity: 90 days after activating Model Studio |
Professional mode | $0.18/second |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output video mode | Output price |
wan2.2-animate-move | Standard mode | $0.06/second |
Professional mode | $0.09/second |
Wan - Video character swap
Only output is billed. For rules, see Video generation.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Output video mode | Output price | Free quota (Note) |
wan2.2-animate-mix | Standard mode | $0.18/second | 50 seconds Validity: 90 days after activating Model Studio |
Professional mode | $0.26/second |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Output video mode | Output price |
wan2.2-animate-mix | Standard mode | $0.09/second |
Professional mode | $0.13/second |
AnimateAnyone
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
animate-anyone-detect-gen2: Charges apply to input only. Billing based on the number of detected images. Each input image is charged once, regardless of detection success.
animate-anyone-template-gen2: Charges apply to output only. Billing based on the duration of successfully generated video in seconds. For billing rules, see Video generation.
animate-anyone-gen2: Charges apply to output only. Billing based on the duration of successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | animate-anyone-detect-gen2 | Input image: $0.000574/image | No free quota |
Action template generation | animate-anyone-template-gen2 | Output video: $0.011469/second | |
Video generation | animate-anyone-gen2 | Output video: $0.011469/second |
EMO
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
emo-detect-v1: Charges apply to input only. Billing based on the number of detected images. Each input image is charged once, regardless of detection success.
emo-v1: Charges apply to output only. Billing based on the duration of successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | emo-detect-v1 | Input image: $0.000574/image | No free quota |
Video generation | emo-v1 | Output video:
|
LivePortrait
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
liveportrait-detect: Charges apply to input only. Billing based on the number of detected images. Each input image is charged once, regardless of detection success.
liveportrait: Charges apply to output only. Billing based on the duration of successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | liveportrait-detect | Input image: $0.000574/image | No free quota |
Video generation | liveportrait | Output video: $0.002868/second |
Emoji
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
emoji-detect-v1: Charges apply to input only. Billing based on the number of detected images. Each input image is charged once, regardless of detection success.
emoji-v1: Charges apply to output only. Billing based on the duration of successfully generated video in seconds. For billing rules, see Video generation.
Service | Model | Price | Free quota (Note) |
Image detection | emoji-detect-v1 | Input image: $0.000574/image | No free quota |
Video generation | emoji-v1 | Output video: $0.011469/second |
VideoRetalk
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Only output is billed. For rules, see Video generation.
Model | Output price | Free quota (Note) |
videoretalk | $0.011469/second | No free quota |
Video style transform
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Only output is billed. For rules, see Video generation.
Model | Output video resolution | Output price | Free quota (Note) |
video-style-transform | 540P | $0.028671/second | No free quota |
720P | $0.071677/second |
Speech synthesis (text-to-speech)
Qwen-TTS
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
qwen3-tts series
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price | Free quota (Note) |
qwen3-tts-flash | $0.1/10,000 characters | If Model Studio activated before 00:00 on November 13, 2025: 2000 characters If Model Studio activated after 00:00 on November 13, 2025: 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-flash-2025-11-27 | $0.1/10,000 characters | 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-flash-2025-09-18 | $0.1/10,000 characters | If Model Studio activated before 00:00 on November 13, 2025: 2000 characters If Model Studio activated after 00:00 on November 13, 2025: 10,000 characters Validity: 90 days after activating Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
qwen3-tts series
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (10,000 characters) | Output price (10,000 characters) |
qwen3-tts-flash | $0.114682 | Not charged |
qwen3-tts-flash-2025-11-27 | $0.114682 | Not charged |
qwen3-tts-flash-2025-09-18 | $0.114682 | Not charged |
qwen-tts series
Billing rule: Charges are calculated per input and output token.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) |
qwen-tts-flash | $0.23 | $1.434 |
qwen-tts-latest | $0.23 | $1.434 |
qwen-tts-2025-05-22 | $0.23 | $1.434 |
qwen-tts-2025-04-10 | $0.23 | $1.434 |
Qwen-TTS-Realtime
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
qwen3-tts-vd realtime series
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price | Free quota (Note) |
qwen3-tts-vd-realtime-2025-12-16 | 10,000 characters Validity: 90 days after activating Model Studio | $0.143353/10,000 characters |
qwen3-tts-vc realtime series
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price | Free quota (Note) |
qwen3-tts-vc-realtime-2026-01-15 | $0.13/10,000 characters | 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-vc-realtime-2025-11-27 |
qwen3-tts realtime series
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price | Free quota (Note) |
qwen3-tts-flash-realtime | $0.13/10,000 characters | If Model Studio activated before 00:00 on November 13, 2025: 2000 characters If Model Studio activated after 00:00 on November 13, 2025: 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-flash-realtime-2025-11-27 | $0.13/10,000 characters | 10,000 characters Validity: 90 days after activating Model Studio |
qwen3-tts-flash-realtime-2025-09-18 | $0.13/10,000 characters | If Model Studio activated before 00:00 on November 13, 2025: 2000 characters If Model Studio activated after 00:00 on November 13, 2025: 10,000 characters Validity: 90 days after activating Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
qwen3-tts-vd realtime series
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (10,000 characters) | Output price |
qwen3-tts-vd-realtime-2025-12-16 | Not charged | $0.143353 |
qwen3-tts-vc realtime series
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (10,000 characters) | Output price |
qwen3-tts-vc-realtime-2026-01-15 | $0.143353 | Not charged |
qwen3-tts-vc-realtime-2025-11-27 |
qwen3-tts realtime series
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price (10,000 characters) | Output price |
qwen3-tts-flash-realtime | $0.143353 | Not charged |
qwen3-tts-flash-realtime-2025-11-27 | $0.143353 | Not charged |
qwen3-tts-flash-realtime-2025-09-18 | $0.143353 | Not charged |
qwen-tts realtime series
Billing rule: Charges are calculated per input and output token.
Model | Input price (per 1M tokens) | Input price (per 1M tokens) |
qwen-tts-realtime | $0.345 | $1.721 |
qwen-tts-realtime-latest | $0.345 | $1.721 |
qwen-tts-realtime-2025-07-15 | $0.345 | $1.721 |
Qwen-TTS voice cloning
Billing rule: Charges apply to the number of new voices created.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Price (per voice) | Free quota (Note) |
qwen-voice-enrollment | $0.01 | 1000 voices/account |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Price (per voice) |
qwen-voice-enrollment | $0.01 |
Qwen-TTS voice design
Billing rule: Charges apply to the number of new voices created.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Price (per voice) | Free quota (Note) |
qwen-voice-design | $0.2 | 10 voices/account |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Price (per voice) |
qwen-voice-design | $0.2 |
CosyVoice
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing rule: Charges apply per input text character. Output is not billed.
Model | Input price | Free quota (Note) |
cosyvoice-v3-plus | $0.286706/10,000 characters | No free quota |
cosyvoice-v3-flash | $0.14335/10,000 characters | |
cosyvoice-v2 | $0.286706/10,000 characters |
Speech recognition (speech-to-text) and translation (speech-to-translation)
Qwen3-LiveTranslate-Flash-Realtime
Billing rule: Charges are calculated per input and output token. For token calculation rules for different modalities, see Billing.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) | ||
Input: Audio | Input: Image | Output: Text | Output: Audio | ||
qwen3-livetranslate-flash-realtime | $10 | $1.3 | $10 | $38 | 1 million tokens each |
qwen3-livetranslate-flash-realtime-2025-09-22 | $10 | $1.3 | $10 | $38 | |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | ||
Input: Audio | Input: Image | Output: Text | Output: Audio | |
qwen3-livetranslate-flash-realtime | $9.175 | $1.147 | $9.175 | $34.405 |
qwen3-livetranslate-flash-realtime-2025-09-22 | $9.175 | $1.147 | $9.175 | $34.405 |
Qwen-ASR
Billing rule: Charges apply per second of input audio duration. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price | Free quota (Note) |
qwen3-asr-flash-filetrans | $0.000035/second | 36,000 seconds (10 hours) |
qwen3-asr-flash-filetrans-2025-11-17 | ||
qwen3-asr-flash | ||
qwen3-asr-flash-2025-09-08 |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.
Models in the US deployment mode do not have a free quota.
Model | Input price |
qwen3-asr-flash-us | $0.000035/second |
qwen3-asr-flash-2025-09-08-us |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price |
qwen3-asr-flash-filetrans | $0.000032/second |
qwen3-asr-flash-filetrans-2025-11-17 | |
qwen3-asr-flash | |
qwen3-asr-flash-2025-09-08 |
Qwen-ASR-Realtime
Billing rule: Charges apply per second of input audio duration. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price | Free quota (Note) |
qwen3-asr-flash-realtime | $0.000090/second | 36,000 seconds (10 hours) |
qwen3-asr-flash-realtime-2025-10-27 | $0.000090/second |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price |
qwen3-asr-flash-realtime | $0.000047/second |
qwen3-asr-flash-realtime-2025-10-27 |
Fun-ASR
Audio file recognition
Billing rule: Charges apply per second of input audio duration. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price | Free quota (Note) |
fun-asr | $0.000035/second | 36,000 seconds (10 hours) |
fun-asr-2025-11-07 | ||
fun-asr-2025-08-25 | ||
fun-asr-mtl | ||
fun-asr-mtl-2025-08-25 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price |
fun-asr | $0.000032/second |
fun-asr-2025-11-07 | |
fun-asr-2025-08-25 | |
fun-asr-mtl | |
fun-asr-mtl-2025-08-25 |
Real-time speech recognition
Billing rule: Charges apply per second of input audio duration. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price | Free quota (Note) |
fun-asr-realtime | $0.00009/second | 36,000 seconds (10 hours) Valid for 90 days |
fun-asr-realtime-2025-11-07 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price |
fun-asr-realtime | $0.000047/second |
fun-asr-realtime-2025-11-07 | |
fun-asr-realtime-2025-09-15 |
Paraformer
Audio file recognition
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing rule: Charges apply per second of input audio duration. Output is not billed.
Model | Input price |
paraformer-v2 | $0.000012/second |
paraformer-8k-v2 |
Real-time speech recognition
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Billing rule: Charges apply per second of input audio duration. Output is not billed.
Model | Input price | Free quota (Note) |
paraformer-realtime-v2 | $0.000035/second | No free quota |
paraformer-realtime-8k-v2 |
Text embedding
Billing rule: Charges apply per input token. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (per 1M tokens) | Free quota (Note) |
text-embedding-v4 | $0.07 | 1 million tokens |
text-embedding-v3 | $0.07 | 500,000 tokens |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (per 1M tokens) |
text-embedding-v4 | $0.072 |
Multimodal embedding
Billing rule: Charges apply per input token. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (Million input tokens) | Free quota (Note) |
tongyi-embedding-vision-plus | $0.09 | 1 million tokens Validity: 90 days after activating Model Studio |
tongyi-embedding-vision-flash | Image/Video: $0.03 Text: $0.09 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Model | Input price (per 1M tokens) | Free quota (Note) |
qwen3-vl-embedding | Image/Video: $0.258 Text: $0.1 | No free quota |
multimodal-embedding-v1 | Free trial |
Text rerank
Billing rule: Charges apply per input token. Output is not billed.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (per 1M tokens) | Free quota (Note) |
qwen3-rerank | $0.1 | 1 million tokens Validity: 90 days after activating Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Models in Mainland China deployment mode do not have a free quota.
Model | Input price (per 1M tokens) |
gte-rerank-v2 | $0.115 |
Domain specific
Intent recognition
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
tongyi-intent-detect-v3 | $0.058 | $0.144 | No free quota |
Role playing
Billing is based on the number of input and output tokens.
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-plus-character | $0.5 | $1.4 | No free quota |
qwen-flash-character | $0.05 | $0.4 | |
qwen-plus-character-ja | $0.5 | $1.4 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Model | Input price (per 1M tokens) | Output price (per 1M tokens) | Free quota (Note) |
qwen-plus-character | $0.115 | $0.287 | No free quota |