Flagship models
Global
In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Flagship models |
Ideal for complex tasks. The most powerful model. |
A balance of performance, speed, and cost. |
Ideal for simple jobs. Fast and low-cost. |
An excellent code model that excels at tool calling and environment interaction. |
Max context window (tokens) | 262,144 | 1,000,000 | 1,000,000 | 1,000,000 |
Min input cost (per 1M tokens) | $1.2 | $0.4 | $0.05 | $0.3 |
Min output cost (per 1M tokens) | $6 | $1.2 | $0.4 | $1.5 |
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Flagship models |
Ideal for complex tasks. The most powerful model. |
A balance of performance, speed, and cost. |
Ideal for simple jobs. Fast and low-cost. |
An excellent code model that excels at tool calling and environment interaction. |
Max context window (tokens) | 262,144 | 1,000,000 | 1,000,000 | 1,000,000 |
Min input cost (per 1M tokens) | $1.2 | $0.4 | $0.05 | $0.3 |
Min output cost (per 1M tokens) | $6 | $1.2 | $0.4 | $1.5 |
US
In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.
Flagship models |
A balance of performance, speed, and cost. |
Ideal for simple jobs. Fast and low-cost. |
Max context window (tokens) | 1,000,000 | 1,000,000 |
Min input cost (per 1M tokens) | $0.4 | $0.05 |
Min output cost (per 1M tokens) | $1.2 | $0.4 |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Flagship models |
Ideal for complex tasks. The most powerful model. |
A balance of performance, speed, and cost. |
Ideal for simple jobs. Fast and low-cost. |
An excellent code model that excels at tool calling and environment interaction. |
Max context window (tokens) | 262,144 | 1,000,000 | 1,000,000 | 1,000,000 |
Min input cost (per 1M tokens) | $0.459 | $0.115 | $0.022 | $0.144 |
Min output cost (per 1M tokens) | $1.836 | $0.287 | $0.216 | $0.574 |
Model overview
Global
In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Category | Subcategory | Description |
Text generation | Qwen large language models: Commercial models (Qwen-Max, Qwen-Plus, Qwen-Flash), open source models (Qwen3) | |
Visual understanding model Qwen-VL | ||
Image generation |
| |
| ||
Video generation | Generates high-quality videos with rich styles from a single sentence. | |
First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt. | ||
Reference-to-video: Generates a video that maintains character consistency using a prompt and the appearance and voice from an input video. |
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Category | Subcategory | Description |
Text generation | Qwen large language models: Commercial models (Qwen-Max, Qwen-Plus, Qwen-Flash), open source models (Qwen3, Qwen2.5) | |
Visual understanding model Qwen-VL, visual reasoning model QVQ, omni-modal model Qwen-Omni, and real-time multi-modal model Qwen-Omni-Realtime | ||
Image generation |
| |
| ||
Speech synthesis and recognition | Qwen speech synthesis and Qwen realtime speech synthesis can be used for text-to-speech in scenarios such as intelligent voice customer service, audiobooks, in-car navigation, and educational tutoring. | |
Qwen realtime speech recognition, Qwen audio file recognition, Qwen3-LiveTranslate-Flash-Realtime, and Fun-ASR speech recognition can perform speech-to-text for scenarios such as real-time meeting records, real-time live stream captions, and telephone customer service. | ||
Video generation | Generates high-quality videos with rich styles from a single sentence. | |
| ||
Reference-to-video: Generates a video that maintains character consistency using a prompt and the appearance and voice from an input video. | ||
General video editing: Performs various video editing tasks based on input text, images, and videos. For example, it can generate a new video by extracting motion features from an input video and combining them with a prompt. | ||
Embedding | Converts text into a set of numbers that represent the text. It is suitable for search, clustering, recommendation, and classification tasks. |
US
In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.
Category | Subcategory | Description |
Text generation | Qwen large language models: Commercial models (Qwen-Plus, Qwen-Flash) | |
Visual understanding model Qwen-VL | ||
Video generation | Generates high-quality videos with rich styles from a single sentence. | |
First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt. | ||
Speech recognition | Qwen audio file recognition can perform speech-to-text for scenarios such as meeting transcription and live stream captioning. |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Category | Model | Description |
Text generation | ||
Visual understanding model Qwen-VL, visual reasoning model QVQ, and omni-modal model Qwen-Omni | ||
Code model, Mathematical model, Translation model, Data mining model, Research model, Intention recognition model, Role-playing model | ||
Image generation |
| |
General-purpose models:
More models: Qwen Image Translation, OutfitAnyone | ||
Speech synthesis and recognition | Qwen speech synthesis, Qwen realtime speech synthesis, and CosyVoice speech synthesis convert text to speech for scenarios such as voice-based customer service, audiobooks, in-car navigation, and educational tutoring. | |
Qwen realtime speech recognition, Qwen audio file recognition, Fun-ASR speech recognition, and Paraformer speech recognition convert speech to text for scenarios such as real-time meeting transcription, real-time live stream captioning, and customer service calls. | ||
Video editing and generation | Generates high-quality videos with rich styles from a single sentence. | |
| ||
Reference-to-video: Generates a video that maintains character consistency using a prompt and the appearance and voice from an input video. | ||
| ||
Vector | Converts text into a set of numbers that represent the text. It is used for search, clustering, recommendation, and classification. | |
Converts text, images, and speech into a set of numbers. It is used for audio and video classification, image classification, and image-text retrieval. |
Text generation - Qwen
The following are the Qwen commercial models. Compared to the open-source versions, the commercial models offer the latest capabilities and improvements.
The parameter sizes of the commercial models are not disclosed.
Each model is updated periodically. To use a fixed version, you can select a snapshot version. A snapshot version is typically maintained for one month after the release of the next snapshot version.
We recommend that you use the stable or latest version for more lenient rate limiting conditions.
Qwen-Max
The most powerful model in the Qwen series, ideal for complex, multi-step tasks. Usage | Thinking | API reference | Try online
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output | Free quota |
(tokens) | (per 1M tokens) | ||||||||
qwen3-max Currently qwen3-max-2026-01-23 Part of Qwen3 series Supports calling built-in tools | Stable | Thinking | 262,144 | 258,048 | 81,920 | 32,768 | Tiered pricing. See details below. | 1 million tokens each Valid for 90 days after activating Model Studio | |
Non-thinking | - | 65,536 | |||||||
qwen3-max-2026-01-23 Thinking mode aka Qwen3-Max-Thinking Part of Qwen3 series Supports calling built-in tools | Snapshot | Thinking | 81,920 | 32,768 | |||||
Non-thinking | - | 65,536 | |||||||
qwen3-max-2025-09-23 Part of Qwen3 series | Snapshot | Non-thinking only | |||||||
qwen3-max-preview Part of Qwen3 series | Preview | Thinking | 81,920 | 32,768 | |||||
Non-thinking | - | 65,536 | |||||||
The models above use tiered pricing based on the number of input tokens in the current request.
Input tokens per request | Input cost (per 1M tokens) qwen3-max and qwen3-max-preview support context cache. | Output cost (per 1M tokens) |
0<Token≤32K | $1.2 | $6 |
32K<Token≤128K | $2.4 | $12 |
128K<Token≤252K | $3 | $15 |
Global
In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1K tokens) | ||||||||
qwen3-max Currently qwen3-max-2025-09-23 context cache discount available | Stable | Non-thinking only | 262,144 | 258,048 | - | 65,536 | Tiered pricing. See details below. | None | |
qwen3-max-2025-09-23 | Snapshot | Non-thinking only | |||||||
qwen3-max-preview Context cache discount available | Preview | Thinking | 81,920 | 32,768 | |||||
Non-thinking | - | 65,536 | |||||||
The models above use tiered pricing based on the number of input tokens in the current request.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) CoT + response |
0<Token≤32K | $1.2 | $6 |
32K<Token≤128K | $2.4 | $12 |
128K<Token≤252K | $3 | $15 |
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||||
qwen3-max Currently qwen3-max-2026-01-23 Part of Qwen3 series Supports calling built-in tools | Stable | Thinking | 262,144 | 258,048 | 81,920 | 32,768 | Tiered pricing. See details below. | |
Non-thinking | - | 65,536 | ||||||
qwen3-max-2026-01-23 Thinking mode aka Qwen3-Max-Thinking Part of Qwen3 series Supports calling built-in tools | Snapshot | Thinking | 81,920 | 32,768 | ||||
Non-thinking | - | 65,536 | ||||||
qwen3-max-2025-09-23 Part of Qwen3 series | Snapshot | Non-thinking only | ||||||
qwen3-max-preview Part of Qwen3 series | Preview | Thinking | 81,920 | 32,768 | ||||
Non-thinking | - | 65,536 | ||||||
The models above use tiered pricing based on the number of input tokens in the current request.
Model | Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) CoT + response |
qwen3-max Batch calls at half price context cache discount available | 0<Token≤32K | $0.359 | $1.434 |
32K<Token≤128K | $0.574 | $2.294 | |
128K<Token≤252K | $1.004 | $4.014 | |
qwen3-max-2026-01-23 | 0<Token≤32K | $0.359 | $1.434 |
32K<Token≤128K | $0.574 | $2.294 | |
128K<Token≤252K | $1.004 | $4.014 | |
qwen3-max-2025-09-23 | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.434 | $5.735 | |
128K<Token≤252K | $2.151 | $8.602 | |
qwen3-max-preview context cache discount available | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.434 | $5.735 | |
128K<Token≤252K | $2.151 | $8.602 |
qwen3-max-2026-01-23 thinking mode: Compared to the snapshot from September 23, 2025, it effectively integrates thinking and non-thinking modes, significantly improving overall model performance. In thinking mode, the model integrates three tools—web search, web extractor, and code interpreter—to achieve higher accuracy on complex problems by leveraging external tools during reasoning.
qwen3-max, qwen3-max-2026-01-23, and qwen3-max-2025-09-23 natively support the search agent feature, see Web Search.
Qwen-Plus
A balanced model with inference performance, cost, and speed between Qwen-Max and Qwen-Flash, ideal for moderately complex tasks. Usage | Thinking | API reference | Try online
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||
qwen-plus Currently qwen-plus-2025-12-01 Part of Qwen3 series Batch calls at half price | Stable | 1,000,000 | Thinking 995,904 Non-thinking mode 997,952 | 32,768 Max CoT: 81,920 | Tiered pricing. See details below. | 1 million tokens each Valid for 90 days after activating Model Studio | |
qwen-plus-latest Currently qwen-plus-2025-12-01 Part of Qwen3 series | Latest | Thinking 995,904 Non-thinking mode 997,952 | |||||
qwen-plus-2025-12-01 Part of Qwen3 series | Snapshot | Thinking 995,904 Non-thinking mode 997,952 | |||||
qwen-plus-2025-09-11 Part of Qwen3 series | |||||||
qwen-plus-2025-07-28 Also known as qwen-plus-0728 Part of Qwen3 series | |||||||
qwen-plus-2025-07-14 Also known as qwen-plus-0714 Part of Qwen3 series | 131,072 | Thinking 98,304 Non-thinking mode 129,024 | 16,384 Max CoT: 38,912 | $0.4 | Thinking $4 Non-thinking mode $1.2 | ||
qwen-plus-2025-04-28 Also known as qwen-plus-0428 Part of Qwen3 series | |||||||
qwen-plus-2025-01-25 Also known as qwen-plus-0125 | 129,024 | 8,192 | $1.2 | ||||
qwen-plus, qwen-plus-latest, qwen-plus-2025-12-01, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 use tiered pricing based on the number of input tokens in the current request.
Input tokens per request | Input cost (per 1M tokens) | Mode | Output cost (per 1M tokens) |
0<Token≤256K | $0.4 | Non-thinking mode | $1.2 |
Thinking | $4 | ||
256K<Token≤1M | $1.2 | Non-thinking mode | $3.6 |
Thinking | $12 |
Global
In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen-plus Currently qwen-plus-2025-12-01 Part of Qwen3 series | Stable | 1,000,000 | Thinking 995,904 Non-thinking mode 997,952 | 32,768 Max CoT: 81,920 | Tiered pricing. See the description below the table. | |
qwen-plus-2025-12-01 Part of Qwen3 series | Snapshot | Thinking 995,904 Non-thinking mode 997,952 | ||||
qwen-plus-2025-09-11 Part of the Qwen3 series | ||||||
qwen-plus-2025-07-28 Part of the Qwen3 series | ||||||
The models above use tiered pricing based on the number of input tokens in the current request. qwen-plus supports context cache.
Input tokens per request | Input cost (per 1M tokens) | Mode | Output cost (per 1M tokens) |
0<Token≤256 KB | $0.4 | Non-thinking mode | $1.2 |
Thinking | $4 | ||
256K<Token≤1M | $1.2 | Non-thinking mode | $3.6 |
Thinking | $12 |
US
In US deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are limited to the US.
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||
qwen-plus-us Currently qwen-plus-2025-12-01-us Part of the Qwen3 series | Stable | 1,000,000 | Thinking 995,904 Non-thinking mode 997,952 | 32,768 Max CoT: 81,920 | Tiered pricing. See details below. | None | |
qwen-plus-2025-12-01-us Part of the Qwen3 series | Snapshot | Thinking 995,904 Non-thinking mode 997,952 | |||||
The models above use tiered pricing based on the number of input tokens in the current request. qwen-plus-us supports context cache.
Input tokens per request | Input cost (per 1M tokens) | Mode | Output cost (per 1M tokens) |
0<Token≤256K | $0.4 | Non-thinking mode | $1.2 |
Thinking | $4 | ||
256K<Token≤1M | $1.2 | Non-thinking mode | $3.6 |
Thinking | $12 |
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region, and model inference compute resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen-plus Currently qwen-plus-2025-12-01 Part of the Qwen3 series Batch calls at half price | Stable | 1,000,000 | Thinking 995,904 Non-thinking mode 997,952 | 32,768 Max CoT: 81,920 | Tiered pricing. See details below. | |
qwen-plus-latest Currently qwen-plus-2025-12-01 Part of the Qwen3 series Batch calls at half price | Latest | Thinking 995,904 Non-thinking mode 997,952 | ||||
qwen-plus-2025-12-01 Part of the Qwen3 series | Snapshot | Thinking 995,904 Non-thinking mode 997,952 | ||||
qwen-plus-2025-09-11 Part of the Qwen3 series | ||||||
qwen-plus-2025-07-28 Also known as qwen-plus-0728 Part of the Qwen3 series | ||||||
qwen-plus-2025-07-14 Also known as qwen-plus-0714 Part of the Qwen3 series | 131,072 | Thinking 98,304 Non-thinking mode 129,024 | 16,384 Max CoT: 38,912 | $0.115 | Thinking $1.147 Non-thinking mode $0.287 | |
qwen-plus-2025-04-28 Also known as qwen-plus-0428 Part of the Qwen3 series | ||||||
qwen-plus, qwen-plus-latest, qwen-plus-2025-12-01, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 use tiered pricing based on the number of input tokens in the current request.
Input tokens per request | Input cost (per 1M tokens) | Mode | Output cost (per 1M tokens) |
0<Token≤128K | $0.115 | Non-thinking mode | $0.287 |
Thinking | $1.147 | ||
128K<Token≤256K | $0.345 | Non-thinking mode | $2.868 |
Thinking | $3.441 | ||
256K<Token≤1M | $0.689 | Non-thinking mode | $6.881 |
Thinking | $9.175 |
The models above support both thinking and non-thinking modes. You can switch between modes using the enable_thinking parameter. Additionally, these models offer the following significant improvements:
Reasoning ability: Significantly outperforms QwQ and similarly sized non-reasoning models in evaluations for math, code, and logical reasoning, achieving top-tier industry performance for a model of its size.
Human preference alignment: Features greatly enhanced capabilities for creative writing, role assumption, multi-turn conversation, and instruction following. Its general abilities significantly surpass those of similarly sized models.
Agent capabilities: Achieves industry-leading performance in both thinking and non-thinking modes and enables precise external tool invocation.
Multilingual support: Supports over 100 languages and dialects, and provides notable improvements in multilingual translation, instruction understanding, and commonsense reasoning.
Response formatting: Resolves issues found in previouss, such as incorrect Markdown formatting, response truncation, and incorrectly formatted boxed output.
For the models above, if thinking mode is enabled but no reasoning process is output, billing applies at the non-thinking mode rate.
Qwen-Flash
The fastest and lowest-cost model in the Qwen series, ideal for simple jobs. Qwen-Flash uses flexible tiered pricing, which provides more cost-effective billing than Qwen-Turbo. Usage | API reference | Try online | Thinking
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output | Free quota |
(tokens) | (per 1K tokens) | ||||||||
qwen-flash Currently qwen-flash-2025-07-28 Part of the Qwen3 series Batch calls at half price | Stable | Thinking | 1,000,000 | 995,904 | 81,920 | 32,768 | Tiered pricing. See details below. | 1 million tokens each Valid for 90 days after activating Model Studio | |
Non-thinking | 997,952 | - | |||||||
qwen-flash-2025-07-28 Part of the Qwen3 series | Snapshot | Thinking | 995,904 | 81,920 | |||||
Non-thinking | 997,952 | - | |||||||
The models above use tiered pricing based on the number of input tokens in the current request. qwen-flash supports cache and batch calls.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 256K | $0.05 | $0.4 |
256K < Tokens ≤ 1M | $0.25 | $2 |
Global
In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output |
(tokens) | (per 1K tokens) | |||||||
qwen-flash Currently qwen-flash-2025-07-28 Part of the Qwen3 series | Stable | Thinking | 1,000,000 | 995,904 | 81,920 | 32,768 | Tiered pricing. See details below. | |
Non-thinking | 997,952 | - | ||||||
qwen-flash-2025-07-28 Part of the Qwen3 series | Snapshot | Thinking | 995,904 | 81,920 | ||||
Non-thinking | 997,952 | - | ||||||
The models above use tiered pricing based on the number of input tokens in the current request. qwen-flash supports context cache.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 256K | $0.05 | $0.4 |
256K < Tokens ≤ 1M | $0.25 | $2 |
US
In US deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are limited to the US.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output | Free quota |
(tokens) | (per 1K tokens) | ||||||||
qwen-flash-us Currently qwen-flash-2025-07-28-us Part of the Qwen3 series | Stable | Thinking | 1,000,000 | 995,904 | 81,920 | 32,768 | Tiered pricing. See details below. | None | |
Non-thinking | 997,952 | - | |||||||
qwen-flash-2025-07-28-us Part of the Qwen3 series | Snapshot | Thinking | 995,904 | 81,920 | |||||
Non-thinking | 997,952 | - | |||||||
The models above use tiered pricing based on the number of input tokens in the current request.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 256K | $0.05 | $0.4 |
256K < Tokens ≤ 1M | $0.25 | $2 |
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output |
(tokens) | (per 1K tokens) | |||||||
qwen-flash Currently qwen-flash-2025-07-28 Part of the Qwen3 series Batch calls at half price | Stable | Thinking | 1,000,000 | 995,904 | 81,920 | 32,768 | Tiered pricing. See details below. | |
Non-thinking | 997,952 | - | ||||||
qwen-flash-2025-07-28 Part of the Qwen3 series | Snapshot | Thinking | 995,904 | 81,920 | ||||
Non-thinking | 997,952 | - | ||||||
The models above use tiered pricing based on the number of input tokens in the current request. qwen-flash supports context cache.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 128K | $0.022 | $0.216 |
128K < Tokens ≤ 256K | $0.087 | $0.861 |
256K < Tokens ≤ 1M | $0.173 | $1.721 |
Qwen-Turbo
Qwen-Turbo will no longer receive updates. Replace it with Qwen-Flash. Qwen-Flash uses flexible tiered pricing for more cost-effective billing. Usage | API reference | Try online|Thinking
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference computing resources are dynamically scheduled worldwide (excluding Mainland China).
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||
qwen-turbo Currently qwen-turbo-2025-04-28 Part of the Qwen3 series Batch calls at half price | Stable | Thinking 131,072 Non-thinking mode 1,000,000 | Thinking 98,304 Non-thinking mode 1,000,000 | 16,384 Max CoT: 38,912 | $0.05 | Thinking mode: $0.5 Non-thinking mode: $0.2 | 1 million tokens each Valid for 90 days after activating Model Studio |
qwen-turbo-latest Always the latest snapshot Part of the Qwen3 series | Latest | $0.05 | Thinking mode: $0.5 Non-thinking mode: $0.2 | ||||
qwen-turbo-2025-04-28 Also known as qwen-turbo-0428 Part of the Qwen3 series | Snapshot | ||||||
qwen-turbo-2024-11-01 Also known as qwen-turbo-1101 | 1,000,000 | 1,000,000 | 8,192 | $0.2 | |||
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference computing resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen-turbo Currently qwen-turbo-2025-04-28 Part of the Qwen3 series | Stable | Thinking 131,072 Non-thinking mode 1,000,000 | Thinking 98,304 Non-thinking mode 1,000,000 | 16,384 Max CoT: 38,912 | $0.044 | Thinking $0.431 Non-thinking mode $0.087 |
qwen-turbo-latest Always the latest snapshot Part of the Qwen3 series | Latest | |||||
qwen-turbo-2025-07-15 Also known as qwen-turbo-0715 Part of the Qwen3 series | Snapshot | |||||
qwen-turbo-2025-04-28 Also known as qwen-turbo-0428 Part of the Qwen3 series | ||||||
QwQ
QwQ is a reasoning model trained on the Qwen2.5 base and significantly enhanced through reinforcement learning. It achieves performance comparable to the full-capacity DeepSeek-R1 on core metrics, such as AIME 24/25 and LiveCodeBench, and on certain general benchmarks, such as IFEval and LiveBench. Usage
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
Model | Version | Context window | Max input | Max CoT | Max response | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||||
qwq-plus | Stable | 131,072 | 98,304 | 32,768 | 8,192 | $0.8 | $2.4 | 1 million tokens Valid for 90 days after activating Model Studio |
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.
Model | Version | Context window | Max input | Max CoT | Max response | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||||
qwq-plus Currently qwq-plus-2025-03-05 Batch calls at half price | Stable | 131,072 | 98,304 | 32,768 | 8,192 | $0.230 | $0.574 |
qwq-plus-latest Always the latest snapshot | Latest | ||||||
qwq-plus-2025-03-05 Also known as qwq-plus-0305 | Snapshot | ||||||
Qwen-Long
This Qwen series model features the longest context window, balanced capabilities, and a low cost. It is ideal for long-text analysis, information extraction, summarization, and classification tasks. Usage | Try online
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen-long-latest Always the latest snapshot Batch calls at half price | Stable | 10,000,000 | 10,000,000 | 32,768 | $0.072 | $0.287 |
qwen-long-2025-01-25 Also known as qwen-long-0125 | Snapshot | |||||
Qwen-Omni
Qwen-Omni accepts multimodal inputs, such as text, images, audio, and video, and generates text or speech responses. It offers multiple expressive, human-like voice options and supports multilingual and dialect speech output. This makes it suitable for audiovisual chat scenarios, such as visual recognition, emotion sensing, and education. Usage|API reference
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference computing resources are dynamically scheduled worldwide (excluding Mainland China).
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Free quota |
(tokens) | |||||||
qwen3-omni-flash This model has the same capabilities as qwen3-omni-flash-2025-12-01. | Stable | Thinking | 65,536 | 16,384 | 32,768 | 16,384 | 1 million tokens (regardless of modality) Valid for 90 days after activating Model Studio |
Non-thinking | 49,152 | - | |||||
qwen3-omni-flash-2025-12-01 | Snapshot | Thinking | 65,536 | 16,384 | 32,768 | 16,384 | |
Non-thinking | 49,152 | - | |||||
qwen3-omni-flash-2025-09-15 Also known as qwen3-omni-flash-0915 | Snapshot | Thinking | 65,536 | 16,384 | 32,768 | 16,384 | |
Non-thinking | 49,152 | - | |||||
After the free quota is used up, input and output are billed as follows. The pricing is the same for thinking and non-thinking modes. Audio output is not supported in thinking mode.
|
|
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference computing resources are limited to Mainland China.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Free quota |
(tokens) | |||||||
qwen3-omni-flash Currently qwen3-omni-flash-2025-12-01 | Stable | Thinking | 65,536 | 16,384 | 32,768 | 16,384 | No free quota |
Non-thinking | 49,152 | - | |||||
qwen3-omni-flash-2025-12-01 | Snapshot | Thinking | 65,536 | 16,384 | 32,768 | 16,384 | |
Non-thinking | 49,152 | - | |||||
qwen3-omni-flash-2025-09-15 Also known as qwen3-omni-flash-0915 | Snapshot | Thinking | 65,536 | 16,384 | 32,768 | 16,384 | |
Non-thinking | 49,152 | - | |||||
After the free quota is used up, input and output are billed as follows. The pricing is the same for thinking and non-thinking modes. Audio output is not supported in thinking mode.
|
|
Use the Qwen3-Omni-Flash model for its significant capability improvements over Qwen-Omni-Turbo, which is no longer updated:
It is a hybrid thinking model that supports both thinking and non-thinking modes. Switch between modes using the
enable_thinkingparameter. By default, thinking mode is disabled.Audio output is not supported in thinking mode. For audio output in non-thinking mode:
qwen3-omni-flash-2025-12-01 supports up to 49 voice options, qwen3-omni-flash-2025-09-15 and qwen3-omni-flash support up to 17 voice options, and Qwen-Omni-Turbo supports only 4.
Supports up to 10 languages, while Qwen-Omni-Turbo supports only 2.
Qwen-Omni-Realtime
Compared to Qwen-Omni, Qwen-Omni-Realtime supports streaming audio input and includes built-in Voice Activity Detection (VAD) to automatically detect the start and end of user speech. Usage|Client events|Server events
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
Model | Version | Context window | Max input | Max output | Free quota |
(tokens) | |||||
qwen3-omni-flash-realtime Currently qwen3-omni-flash-realtime-2025-12-01. | Stable | 65,536 | 49,152 | 16,384 | 1 million tokens (regardless of modality) Valid for 90 days after activating Model Studio |
qwen3-omni-flash-realtime-2025-12-01 | Snapshot | ||||
qwen3-omni-flash-realtime-2025-09-15 | |||||
After the free quota is used up, input and output are billed as follows:
|
|
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Free quota |
(tokens) | |||||
qwen3-omni-flash-realtime This model currently has the same capabilities as qwen3-omni-flash-realtime-2025-12-01. | Stable | 65,536 | 49,152 | 16,384 | No free quota |
qwen3-omni-flash-realtime-2025-12-01 | Snapshot | ||||
qwen3-omni-flash-realtime-2025-09-15 | |||||
After the free quota is used up, input and output are billed as follows:
|
|
Use the Qwen3-Omni-Flash-Realtime model instead of Qwen-Omni-Turbo-Realtime, which will no longer be updated. Qwen3-Omni-Flash-Realtime offers significant capability improvements. For audio output:
qwen3-omni-flash-realtime-2025-12-01 supports 49 voices. qwen3-omni-flash-realtime-2025-09-15 and qwen3-omni-realtime-flash support 17 voices. Qwen-Omni-Turbo-Realtime supports only 4.
Supports 10 languages, compared to Qwen-Omni-Turbo-Realtime's 2.
QVQ
QVQ is a visual reasoning model that supports visual input and CoT output. It demonstrates stronger capabilities in math, programming, visual analysis, creation, and general tasks. Usage | Try online
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
Model | Version | Context window | Max input | Max CoT | Max response | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||||
qvq-max Currently qvq-max-2025-03-25. | Stable | 131,072 | 106,496 Max per image: 16,384 | 16,384 | 8,192 | $1.2 | $4.8 | 1 million input tokens each Valid for 90 days after activating Model Studio |
qvq-max-latest Always the latest snapshot. | Latest | |||||||
qvq-max-2025-03-25 Also known as qvq-max-0325. | Snapshot | |||||||
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.
Model | Version | Context window | Max input | Max CoT | Max response | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||||
qvq-max Offers stronger visual reasoning and instruction-following capabilities than qvq-plus and delivers optimal performance for more complex tasks. Currently qvq-max-2025-03-25 | Stable | 131,072 | 106,496 Max per image: 16,384 | 16,384 | 8,192 | $1.147 | $4.588 |
qvq-max-latest Always the latest snapshot. | Latest | ||||||
qvq-max-2025-05-15 Also known as qvq-max-0515. | Snapshot | ||||||
qvq-max-2025-03-25 Also known as qvq-max-0325. | |||||||
qvq-plus Currently qvq-plus-2025-05-15 | Stable | $0.287 | $0.717 | ||||
qvq-plus-latest Always the latest snapshot. | Latest | ||||||
qvq-plus-2025-05-15 Also known as qvq-plus-0515. | Snapshot | ||||||
Qwen-VL
Qwen-VL is a text generation model with visual (image) understanding capabilities. It performs OCR, and can further summarize and reason. For example, it extracts attributes from product photos or solves problems based on exercise diagrams. Usage | API reference | Try online
Qwen-VL models are billed based on the total number of input and output tokens. For more information about image token calculation rules, see Visual Understanding.
International
In international deployment mode, the access point and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT and output | Free quota |
(tokens) | (per 1M tokens) | ||||||||
qwen3-vl-plus Currently qwen3-vl-plus-2025-12-19 | Stable | Thinking | 262,144 | 258,048 Max per image: 16,384 | 81,920 | 32,768 | Tiered pricing. See details below. | 1 million input tokens and 1 million output tokens Valid for 90 days after activating Model Studio | |
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-plus-2025-12-19 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-plus-2025-09-23 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-flash Currently qwen3-vl-flash-2025-10-15 | Stable | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-flash-2026-01-22 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-flash-2025-10-15 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
The models above use tiered pricing based on the number of input tokens in the current request. The input and output prices are the same for thinking and non-thinking modes. In addition, qwen3-vl-plus and qwen3-vl-flash models support context cache.
qwen3-vl-plus series
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32K | $0.2 | $1.6 |
32K < Tokens ≤ 128K | $0.3 | $2.4 |
128K < Tokens ≤ 256K | $0.6 | $4.8 |
qwen3-vl-flash series
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32,000 | $0.05 | $0.40 |
32,000 < Tokens ≤ 128,000 | $0.075 | $0.6 |
128,000 < Tokens ≤ 256,000 | $0.12 | $0.96 |
Global
In global deployment mode, the access point and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT and output |
(tokens) | (per 1M tokens) | |||||||
qwen3-vl-plus Currently qwen3-vl-plus-2025-12-19. | Stable | Thinking | 262,144 | 258,048 Max per image: 16,384. | 81,920 | 32,768 | Tiered pricing. See details below. | |
Non-thinking | 260,096 Max per image: 16,384. | - | ||||||
qwen3-vl-plus-2025-09-23 | Snapshot | Thinking | 258,048 Max per image: 16,384. | 81,920 | ||||
Non-thinking | 260,096 Max per image: 16,384. | - | ||||||
qwen3-vl-flash Currently qwen3-vl-flash-2025-10-15. | Stable | Thinking | 258,048 Max per image: 16,384. | 81,920 | ||||
Non-thinking | 260,096 Max per image: 16,384. | - | ||||||
qwen3-vl-flash-2025-10-15 | Snapshot | Thinking | 258,048 Max per image: 16,384. | 81,920 | ||||
Non-thinking | 260,096 Max per image: 16,384. | - | ||||||
The models above use tiered pricing based on the number of input tokens in the current request. The input and output prices are the same for thinking and non-thinking modes. In addition, qwen3-vl-plus and qwen3-vl-flash models support context cache.
qwen3-vl-plus series
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32,000 | $0.20 | $1.6 |
32,000 < Tokens ≤ 128,000 | $0.30 | $2.40 |
128,000 < Tokens ≤ 256,000 | $0.60 | $4.80 |
qwen3-vl-flash series
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32K | $0.05 | $0.4 |
32K < Tokens ≤ 128K | $0.075 | $0.6 |
128K < Tokens ≤ 256K | $0.12 | $0.96 |
US
In US deployment mode, the access point and data storage are both located in the US (Virginia) region. Model inference compute resources are limited to the US.
Model | Version | Mode | Context window | Max input | Longest CoT | Max output | Input cost | Output cost For CoT and final output |
(tokens) | (per 1M tokens) | |||||||
qwen3-vl-flash-us Offers the same capabilities as qwen3-vl-flash-2025-10-15-us. | Stable | Thinking | 258,048 Max per image: 16,384 | 81,920 | 32,768 | Tiered pricing. See details below. | ||
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
qwen3-vl-flash-2025-10-15-us | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | ||||
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
The models above use tiered pricing based on the number of input tokens in the current request. The input and output prices are the same for thinking and non-thinking modes. In addition, qwen3-vl-flash-us model supports context cache.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32,000 | $0.05 | $0.4 |
32,000 < Tokens ≤ 128,000 | $0.075 | $0.6 |
128,000 < Tokens ≤ 256,000 | $0.12 | $0.96 |
Mainland China
In Mainland China deployment mode, the access point and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.
Model | Version | Mode | Context window (tokens) | Max input (tokens) | Max CoT | Max output (tokens) | Input cost | Output cost | Free quota |
Token count | per 1 M tokens | ||||||||
qwen3-vl-plus Currently qwen3-vl-plus-2025-12-19 Batch calls at half price | Stable | Thinking | 262,144 | 258,048 Max per image: 16,384 | 81,920 | 32,768 | Tiered pricing. See details below. | No free quota | |
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-plus-2025-12-19 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-plus-2025-09-23 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-flash Currently qwen3-vl-flash-2025-10-15 Batch calls at half price | Stable | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-flash-2026-01-22 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-flash-2025-10-15 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
The models above use tiered pricing based on the number of input tokens in the current request. The input and output prices are the same for thinking and non-thinking modes. In addition, qwen3-vl-plus and qwen3-vl-flash models support context cache.
qwen3-vl-plus series
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32K | $0.143 | $1.434 |
32K < Tokens ≤ 128K | $0.215 | $2.15 |
128K < Tokens ≤ 256K | $0.43 | $4.301 |
qwen3-vl-flash series
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32,000 | $0.022 | $0.215 |
32,000 < Tokens ≤ 128,000 | $0.043 | $0.43 |
128,000 < Tokens ≤ 256,000 | $0.086 | $0.859 |
The qwen3-vl-flash-2026-01-22 model effectively integrates thinking and non-thinking modes. Compared to the snapshot of October 15, 2025, it significantly improves the model's overall performance. It achieves higher inference accuracy in business scenarios such as general visual recognition, security, store inspection, patrol inspection, and photo-based problem solving.
Qwen-OCR
Qwen-OCR is a model that specializes in text extraction. Compared to Qwen-VL, it focuses more on extracting text from images of items such as documents, tables, exam questions, and handwriting. It can recognize multiple languages, including English, French, Japanese, Korean, German, Russian, and Italian. Usage | API reference|Try online
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
Model | Version | Context window | Max input | Max output | Input price | Output price | Free quota |
(tokens) | (per 1M tokens) | ||||||
qwen-vl-ocr Equivalent to qwen-vl-ocr-2025-11-20. | Stable | 38,192 | 30,000 Max per image: 30,000 | 8,192 | $0.07 | $0.16 | 1 million input tokens and 1 million output tokens Valid for 90 days after activating Model Studio |
qwen-vl-ocr-2025-11-20 Also known as qwen-vl-ocr-1120. Based on the Qwen3-VL architecture, this model significantly improves document parsing and text localization. | Snapshot | ||||||
Global
In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.
Model | Version | Context window | Max input | Max output | Input price | Output price |
(tokens) | (per 1M tokens) | |||||
qwen-vl-ocr Equivalent to qwen-vl-ocr-2025-11-20. | Stable | 38,192 | 30,000 Max per image: 30,000 | 8,192 | $0.07 | $0.16 |
qwen-vl-ocr-2025-11-20 Also known as qwen-vl-ocr-1120. Based on the Qwen3-VL architecture, this model significantly improves document parsing and text localization. | Snapshot | |||||
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Input price | Output price | Free quota |
(tokens) | (per 1M tokens) | ||||||
qwen-vl-ocr Currently qwen-vl-ocr-2025-11-20. Batch calls are available at half price. | Stable | 38,192 | 30,000 Max per image: 30,000 | 8,192 | $0.043 | $0.072 | No free quota |
qwen-vl-ocr-latest Always the latest | Latest | ||||||
qwen-vl-ocr-2025-11-20 Also known as qwen-vl-ocr-1120. Based on the Qwen3-VL architecture, this model significantly improves document parsing and text localization. | Snapshot | ||||||
qwen-vl-ocr-2025-08-28 Also known as qwen-vl-ocr-0828. | 34,096 | 4,096 | $0.717 | $0.717 | |||
qwen-vl-ocr-2025-04-13 Also known as qwen-vl-ocr-0413. | |||||||
qwen-vl-ocr-2024-10-28 Also known as qwen-vl-ocr-1028. | |||||||
Qwen-Math
Qwen-Math is a language model that specializes in solving mathematical problems. Usage | API reference | Try online
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen-math-plus This model currently has the same capabilities as qwen-math-plus-2024-09-19. | Stable | 4,096 | 3,072 | 3,072 | $0.574 | $1.721 |
qwen-math-plus-latest Always the latest snapshot | Latest | |||||
qwen-math-plus-2024-09-19 Also known as qwen-math-plus-0919 | Snapshot | |||||
qwen-math-plus-2024-08-16 Also known as qwen-math-plus-0816 | ||||||
qwen-math-turbo Currently qwen-math-turbo-2024-09-19. | Stable | $0.287 | $0.861 | |||
qwen-math-turbo-latest Always the latest snapshot | Latest | |||||
qwen-math-turbo-2024-09-19 Also known as qwen-math-turbo-0919 | Snapshot | |||||
Qwen-Coder
Qwen-Coder is a code generation model. The latest Qwen3-Coder-Plus series builds on Qwen3 and delivers advanced coding agent capabilities. It excels at tool calling, environment interaction, and autonomous programming—combining strong coding proficiency with general-purpose intelligence. Usage | API reference | Try online
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||
qwen3-coder-plus Currently qwen3-coder-plus-2025-09-23 | Stable | 1,000,000 | 997,952 | 65,536 | Pricing is tiered. See the notes below the table. | 1 million tokens each Validity period: 90 days after you activate Alibaba Cloud Model Studio | |
qwen3-coder-plus-2025-09-23 | Snapshot | ||||||
qwen3-coder-plus-2025-07-22 | Snapshot | ||||||
qwen3-coder-flash Currently qwen3-coder-flash-2025-07-28 | Stable | ||||||
qwen3-coder-flash-2025-07-28 | Snapshot | ||||||
The models above use tiered pricing based on the number of input tokens in the current request.
qwen3-coder-plus series
qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are priced as follows. qwen3-coder-plus supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price, while input text that hits the explicit cache is billed at 10% of the unit price.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0<Token≤32K | $1 | $5 |
32,000 < Tokens ≤ 128,000 | $1.80 | $9 |
128,000 < Tokens ≤ 256,000 | $3 | $15 |
256,000 < Tokens ≤ 1,000,000 | $6 | $60 |
qwen3-coder-flash series
qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are priced as follows. qwen3-coder-flash supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price, while input text that hits the explicit cache is billed at 10% of the unit price.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
Up to 32,000 | $0.30 | $1.50 |
32,000 < Tokens ≤ 128,000 | $0.50 | $2.50 |
128,000 < Tokens ≤ 256,000 | $0.80 | $4.00 |
256,000 < Tokens ≤ 1,000,000 | $1.6 | $9.60 |
Global
In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen3-coder-plus Currently qwen3-coder-plus-2025-09-23 | Stable | 1,000,000 | 997,952 | 65,536 | Pricing is tiered. See the note below the table. | |
qwen3-coder-plus-2025-09-23 | Snapshot | |||||
qwen3-coder-plus-2025-07-22 | Snapshot | |||||
qwen3-coder-flash Currently qwen3-coder-flash-2025-07-28 | Stable | |||||
qwen3-coder-flash-2025-07-28 | Snapshot | |||||
The models above use tiered pricing based on the number of input tokens in the current request.
qwen3-coder-plus series
qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are priced as follows. qwen3-coder-plus supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0<Token≤32K | $1 | $5 |
32,000 < Tokens ≤ 128,000 | $1.80 | $9 |
128,000 < Tokens ≤ 256,000 | $3 | $15 |
256,000 < Tokens ≤ 1,000,000 | $6 | $60 |
qwen3-coder-flash series
qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are priced as follows. qwen3-coder-flash supports context cache. Input text that hits the cache is billed at 20% of the unit price.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Token ≤ 32K | $0.3 | $1.5 |
32K < Tokens ≤ 128K | $0.5 | $2.5 |
128K < Tokens ≤ 256K | $0.8 | $4 |
256K < Tokens ≤ 1M | $1.6 | $9.6 |
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen3-coder-plus Currently qwen3-coder-plus-2025-09-23 | Stable | 1,000,000 | 997,952 | 65,536 | Tiered pricing. See details below. | |
qwen3-coder-plus-2025-09-23 | Snapshot | |||||
qwen3-coder-plus-2025-07-22 | Snapshot | |||||
qwen3-coder-flash Currently qwen3-coder-flash-2025-07-28 | Stable | |||||
qwen3-coder-flash-2025-07-28 | Snapshot | |||||
The models above use tiered pricing based on the number of input tokens in the current request.
qwen3-coder-plus series
qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are priced as follows. qwen3-coder-plus supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price, while input text that hits the explicit cache is billed at 10% of the unit price.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0<Token≤32K | $0.574 | $2.294 |
32K < Tokens ≤ 128K | $0.861 | $3.441 |
128K < Tokens ≤ 256K | $1.434 | $5.735 |
256K < Tokens ≤ 1M | $2.868 | $28.671 |
qwen3-coder-flash series
qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are priced as follows. qwen3-coder-flash supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price, while input text that hits the explicit cache is billed at 10% of the unit price.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Token ≤ 32K | $0.144 | $0.574 |
32 K < Tokens ≤ 128 K | $0.216 | $0.861 |
128 K < Tokens ≤ 256 K | $0.359 | $1.434 |
256 K < Tokens ≤ 1 M | $0.717 | $3.584 |
Qwen-MT
Qwen-MT is a flagship Large Language Model (LLM) for translation, fully upgraded from Qwen 3. It supports translation between 92 languages, such as Chinese, English, Japanese, Korean, French, Spanish, German, Thai, Indonesian, Vietnamese, and Arabic. It features comprehensive upgrades in model performance and translation quality. The model offers more stable glossary customization, format retention, and domain-specific prompting, making translations more accurate and natural. Usage
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference computing resources are dynamically scheduled worldwide (excluding Mainland China).
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||
qwen-mt-plus Part of Qwen3-MT | 16,384 | 8,192 | 8,192 | $2.46 | $7.37 | 1 million tokens Valid for 90 days after activating Model Studio |
qwen-mt-flash Part of Qwen3-MT | $0.16 | $0.49 | ||||
qwen-mt-lite Part of Qwen3-MT | $0.12 | $0.36 | ||||
qwen-mt-turbo Part of Qwen3-MT | $0.16 | $0.49 | ||||
Global
In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference computing resources are dynamically scheduled worldwide.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen-mt-plus Part of Qwen3-MT | 16,384 | 8,192 | 8,192 | $2.46 | $7.37 |
qwen-mt-flash Part of Qwen3-MT | $0.16 | $0.49 | |||
qwen-mt-lite Part of Qwen3-MT | $0.12 | $0.36 | |||
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen-mt-plus Belongs toQwen3-MT | 16,384 | 8,192 | 8,192 | $0.259 | $0.775 |
qwen-mt-flash Belongs toQwen3-MT | $0.101 | $0.280 | |||
qwen-mt-lite Belongs toQwen3-MT | $0.086 | $0.229 | |||
qwen-mt-turbo Belongs toQwen3-MT | $0.101 | $0.280 | |||
Qwen data mining model
The Qwen data mining model extracts structured information from documents for use in data annotation, content moderation, and other applications. Usage | API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||
qwen-doc-turbo | 262,144 | 253,952 | 32,768 | $0.087 | $0.144 | No free quota |
Qwen deep research model
The Qwen deep research model can break down complex problems, perform reasoning and analysis using web searches, and generate research reports.Usage | API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1K tokens) | ||||
qwen-deep-research | 1,000,000 | 997,952 | 32,768 | $0.007742 | $0.023367 |
Text generation - Qwen - Open source
In the model names, xxb indicates the parameter size. For example, qwen2-72b-instruct indicates a parameter size of 72 billion (72B).
Model Studio supports invoking the open-source versions of Qwen. You do not need to deploy the models locally. For open-source versions, we recommend using the Qwen3 and Qwen2.5 models.
Qwen3
The qwen3-next-80b-a3b-thinking model, released in September 2025, supports only thinking mode. It improves instruction-following capabilities and delivers more concise summary responses than qwen3-235b-a22b-thinking-2507.
The qwen3-next-80b-a3b-instruct model, released in September 2025, supports only non-thinking mode. It enhances Chinese understanding, logical reasoning, and text generation capabilities compared to qwen3-235b-a22b-instruct-2507.
The qwen3-235b-a22b-thinking-2507 and qwen3-30b-a3b-thinking-2507 models, released in July 2025, support only thinking mode and are upgrades of qwen3-235b-a22b (thinking mode) and qwen3-30b-a3b (thinking mode), respectively.
The qwen3-235b-a22b-instruct-2507 and qwen3-30b-a3b-instruct-2507 models, released in July 2025, support only non-thinking mode and are upgrades of qwen3-235b-a22b (non-thinking mode) and qwen3-30b-a3b (non-thinking mode), respectively.
The Qwen3 models, released in April 2025, support both thinking and non-thinking modes. You can switch between modes using the enable_thinking parameter. Additionally, Qwen3 models deliver significant improvements in the following areas:
Reasoning ability: Significantly outperforms QwQ and similarly sized non-reasoning models on evaluations for math, code, and logical reasoning, achieving top-tier industry performance for a model of its size.
Human preference alignment: Features greatly enhanced capabilities for creative writing, role assumption, multi-turn conversation, and instruction following. Its general abilities significantly surpass those of similarly sized models.
Agent capabilities: Achieves industry-leading performance in both thinking and non-thinking modes and enables precise external tool invocation.
Multilingual support: Supports over 100 languages and dialects and provides notable improvements in multilingual translation, instruction understanding, and commonsense reasoning.
Response formatting: Fixes issues found in previouss, such as incorrect Markdown rendering, response truncation, and incorrectly formatted boxed output.
Qwen3 open-source models released in April 2025 do not support non-streaming output in thinking mode.
If you enable thinking mode for Qwen3 open-source models and no reasoning process appears in the output, billing applies at the non-thinking mode rate.
Thinking | Non-thinking mode | Usage
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
Model | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||||
qwen3-next-80b-a3b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.15 | $1.2 | 1 million tokens each Valid for 90 days after activating Model Studio |
qwen3-next-80b-a3b-instruct | Non-thinking | 129,024 | - | |||||
qwen3-235b-a22b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.23 | $2.3 | |||
qwen3-235b-a22b-instruct-2507 | Non-thinking | 129,024 | - | $0.92 | ||||
qwen3-30b-a3b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.2 | $2.4 | |||
qwen3-30b-a3b-instruct-2507 | Non-thinking | 129,024 | - | $0.8 | ||||
qwen3-235b-a22b This model and the following models were released in April 2025. | Non-thinking | 129,024 | - | 16,384 | $0.7 | $2.8 | ||
Thinking | 98,304 | 38,912 | $8.4 | |||||
qwen3-32b | Non-thinking | 129,024 | - | $0.16 | $0.64 | |||
Thinking | 98,304 | 38,912 | ||||||
qwen3-30b-a3b | Non-thinking | 129,024 | - | $0.2 | $0.8 | |||
Thinking | 98,304 | 38,912 | $2.4 | |||||
qwen3-14b | Non-thinking | 129,024 | - | 8,192 | $0.35 | $1.4 | ||
Thinking | 98,304 | 38,912 | $4.2 | |||||
qwen3-8b | Non-thinking | 129,024 | - | $0.18 | $0.7 | |||
Thinking | 98,304 | 38,912 | $2.1 | |||||
qwen3-4b | Non-thinking | 129,024 | - | $0.11 | $0.42 | |||
Thinking | 98,304 | 38,912 | $1.26 | |||||
qwen3-1.7b | Non-thinking | 32,768 | 30,720 | - | $0.42 | |||
Thinking | 28,672 | The sum of the values must not exceed 30,720. | $1.26 | |||||
qwen3-0.6b | Non-thinking | 30,720 | - | $0.42 | ||||
Thinking | 28,672 | The sum of the inputs cannot exceed 30,720. | $1.26 | |||||
Global
In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.
Model | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M Tokens) | |||||||
qwen3-next-80b-a3b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.15 | $1.2 | No free quota |
qwen3-next-80b-a3b-instruct | Non-thinking | 129,024 | - | |||||
qwen3-235b-a22b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.23 | $2.3 | |||
qwen3-235b-a22b-instruct-2507 | Non-thinking | 129,024 | - | $0.92 | ||||
qwen3-30b-a3b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.2 | $2.4 | |||
qwen3-30b-a3b-instruct-2507 | Non-thinking | 129,024 | - | $0.8 | ||||
qwen3-235b-a22b | Non-thinking | 129,024 | - | 16,384 | $0.7 | $2.8 | ||
Thinking | 98,304 | 38,912 | $8.4 | |||||
qwen3-32b | Non-thinking | 129,024 | - | $0.16 | $0.64 | |||
Thinking | 98,304 | 38,912 | ||||||
qwen3-30b-a3b | Non-thinking | 129,024 | - | $0.2 | $0.8 | |||
Thinking | 98,304 | 38,912 | $2.4 | |||||
qwen3-14b | Non-thinking | 129,024 | - | 8,192 | $0.35 | $1.4 | ||
Thinking | 98,304 | 38,912 | $4.2 | |||||
qwen3-8b | Non-thinking | 129,024 | - | $0.18 | $0.7 | |||
Thinking | 98,304 | 38,912 | $2.1 | |||||
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.
Model | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||||
qwen3-next-80b-a3b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.144 | $1.434 | No free quota |
qwen3-next-80b-a3b-instruct | Thinking mode is unavailable. | 129,024 | - | $0.574 | ||||
qwen3-235b-a22b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.287 | $2.868 | |||
qwen3-235b-a22b-instruct-2507 | Non-thinking | 129,024 | - | $1.147 | ||||
qwen3-30b-a3b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.108 | $1.076 | |||
qwen3-30b-a3b-instruct-2507 | Non-thinking | 129,024 | - | $0.431 | ||||
qwen3-235b-a22b | Non-thinking | 129,024 | - | 16,384 | $0.287 | $1.147 | ||
Thinking | 98,304 | 38,912 | $2.868 | |||||
qwen3-32b | Non-thinking | 129,024 | - | $0.287 | $1.147 | |||
Thinking | 98,304 | 38,912 | $2.868 | |||||
qwen3-30b-a3b | Non-thinking | 129,024 | - | $0.108 | $0.431 | |||
Thinking | 98,304 | 38,912 | $1.076 | |||||
qwen3-14b | Non-thinking | 129,024 | - | 8,192 | $0.144 | $0.574 | ||
Thinking | 98,304 | 38,912 | $1.434 | |||||
qwen3-8b | Non-thinking | 129,024 | - | $0.072 | $0.287 | |||
Thinking | 98,304 | 38,912 | $0.717 | |||||
qwen3-4b | Non-thinking | 129,024 | - | $0.044 | $0.173 | |||
Thinking | 98,304 | 38,912 | $0.431 | |||||
qwen3-1.7b | Non-thinking | 32,768 | 30,720 | - | $0.173 | |||
Thinking | 28,672 | The sum of the input values must not exceed 30,720. | $0.431 | |||||
qwen3-0.6b | Non-thinking | 30,720 | - | $0.173 | ||||
Thinking | 28,672 | The sum of the input must not exceed 30,720. | $0.431 | |||||
QwQ - Open source
The QwQ reasoning model is trained on Qwen2.5-32B. Reinforcement learning has significantly improved its inference capabilities. Core metrics for math and code (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench) are comparable to the full-power version of DeepSeek-R1. All metrics significantly exceed those of DeepSeek-R1-Distill-Qwen-32B, which is also based on Qwen2.5-32B. Usage | API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwq-32b | 131,072 | 98,304 | 32,768 | 8,192 | $0.287 | $0.861 |
QwQ-Preview
The qwq-32b-preview model is an experimental research model developed by the Qwen team in 2024. It focuses on enhancing AI reasoning capabilities, especially in math and programming. For more information about the limitations of the qwq-32b-preview model, see the QwQ official blog. Usage | API reference | Try it online
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwq-32b-preview | 32,768 | 30,720 | 16,384 | $0.287 | $0.861 |
Qwen2.5
QVQ
The qvq-72b-preview model is an experimental research model developed by the Qwen team. It focuses on enhancing visual reasoning capabilities, especially in mathematical reasoning. For more information about the limitations of the qvq-72b-preview model, see the QVQ official blog.Usage | API reference
To have the model output its thinking process before the final answer, you can use the commercial version of the QVQ model.
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qvq-72b-preview | 32,768 | 16,384 Max 16,384 tokens per image | 16,384 | $1.721 | $5.161 |
Qwen-Omni
This is a new multimodal large model for understanding and generation, trained on Qwen2.5. It supports text, image, speech, and video inputs, and can generate text and speech simultaneously in a stream. Its multimodal content understanding speed is significantly improved.Usage | API reference
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Context window | Max input | Max output | Free quota |
(tokens) | ||||
qwen2.5-omni-7b | 32,768 | 30,720 | 2,048 | 1 million tokens (regardless of modality) Valid for 90 days after activating Model Studio. |
After the free quota is used up, the following billing rules apply to inputs and outputs:
|
|
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output |
(tokens) | |||
qwen2.5-omni-7b | 32,768 | 30,720 | 2,048 |
The billing rules for inputs and outputs are as follows:
|
|
Qwen3-Omni-Captioner
Qwen3-Omni-Captioner is an open-source model based on Qwen3-Omni. Without any prompts, it automatically generates accurate and comprehensive descriptions for complex audio, such as speech, ambient sounds, music, and sound effects. It can identify speaker emotions, musical elements (such as style and instruments), and sensitive information, making it suitable for applications such as audio content analysis, security audits, intent recognition, and audio editing. Usage | API reference
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||
qwen3-omni-30b-a3b-captioner | 65,536 | 32,768 | 32,768 | $3.81 | $3.06 | 1 million tokens Valid for 90 days after activating Model Studio |
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||
qwen3-omni-30b-a3b-captioner | 65,536 | 32,768 | 32,768 | $2.265 | $1.821 | No free quota. |
Qwen-VL
This is the open-source version of Alibaba Cloud's Qwen-VL. Usage | API reference
Compared to Qwen2.5-VL, Qwen3-VL delivers significant improvements:
Agent interaction: It can operate computer or mobile interfaces, recognize GUI elements, understand their functions, and call tools to perform tasks, achieving top-tier performance in evaluations such as OS World.
Visual coding: It generates code from images or videos and supports creating HTML, CSS, and JavaScript code from design mockups, website screenshots, and similar inputs.
Spatial intelligence: It supports 2D and 3D positioning and accurately judges object orientation, perspective changes, and occlusion relationships.
Long video understanding: It supports understanding video content up to 20 minutes long and provides precise localization down to the second.
Deep thinking: It has deep thinking capabilities and excels at capturing fine details and analyzing cause-and-effect relationships, achieving top-tier performance in evaluations such as MathVista and MMMU.
OCR: Language support is expanded to 33 languages. The model delivers more stable performance in scenarios with complex lighting, blur, or tilted text. It also provides significantly improved accuracy for rare characters, ancient texts, and professional terminology.
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output | Free quota |
(tokens) | (per 1M tokens) | |||||||
qwen3-vl-235b-a22b-thinking | Thinking only | 126,976 | 81,920 | $0.4 | $4 | 1 million tokens each Valid for 90 days after activating Model Studio | ||
qwen3-vl-235b-a22b-instruct | Non-thinking | 129,024 | - | $1.6 | ||||
qwen3-vl-32b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.16 | $0.64 | |
qwen3-vl-32b-instruct | Non-thinking only | 129,024 | - | |||||
qwen3-vl-30b-a3b-thinking | Thinking only | 126,976 | 81,920 | $0.2 | $2.4 | |||
qwen3-vl-30b-a3b-instruct | Non-thinking | 129,024 | - | $0.8 | ||||
qwen3-vl-8b-thinking | Thinking | 126,976 | 81,920 | $0.18 | $2.1 | |||
qwen3-vl-8b-instruct | Non-thinking | 129,024 | - | $0.7 | ||||
Global
In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.
Model | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output |
(tokens) | (per 1M tokens) | ||||||
qwen3-vl-235b-a22b-thinking | Thinking only | 126,976 | 81,920 | $0.4 | $4 | ||
qwen3-vl-235b-a22b-instruct | Non-thinking only | 129,024 | - | $1.6 | |||
qwen3-vl-32b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.16 | $0.64 |
qwen3-vl-32b-instruct | Non-thinking only | 129,024 | - | ||||
qwen3-vl-30b-a3b-thinking | Thinking only | 126,976 | 81,920 | $0.2 | $2.4 | ||
qwen3-vl-30b-a3b-instruct | Non-thinking only | 129,024 | - | $0.8 | |||
qwen3-vl-8b-thinking | Thinking only | 126,976 | 81,920 | $0.18 | $2.1 | ||
qwen3-vl-8b-instruct | Non-thinking only | 129,024 | - | $0.7 | |||
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.
Model | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output | Free quota |
(tokens) | (per 1M tokens) | |||||||
qwen3-vl-235b-a22b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | $0.287 | $2.867 | No free quota | |
qwen3-vl-235b-a22b-instruct | Non-thinking only | 129,024 | - | $1.147 | ||||
qwen3-vl-32b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.287 | $2.868 | |
qwen3-vl-32b-instruct | Non-thinking only | 129,024 | - | $1.147 | ||||
qwen3-vl-30b-a3b-thinking | Thinking only | 126,976 | 81,920 | $0.108 | $1.076 | |||
qwen3-vl-30b-a3b-instruct | Non-thinking only | 129,024 | - | $0.431 | ||||
qwen3-vl-8b-thinking | Thinking only | 126,976 | 81,920 | $0.072 | $0.717 | |||
qwen3-vl-8b-instruct | Non-thinking only | 129,024 | - | $0.287 | ||||
Qwen-Math
This is a language model built on the Qwen model that is specialized for solving mathematical problems. Qwen2.5-Math supports Chinese and English and integrates multiple reasoning methods, such as Chain of Thought (CoT), Program of Thought (PoT), and Tool-Integrated Reasoning (TIR). Usage | API reference | Try it online
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen2.5-math-72b-instruct | 4,096 | 3,072 | 3,072 | $0.574 | $1.721 |
qwen2.5-math-7b-instruct | $0.144 | $0.287 | |||
qwen2.5-math-1.5b-instruct | Free for a limited time | ||||
Qwen-Coder
Qwen-Coder is an open-source code model from the Qwen series. The latest Qwen3-Coder series has powerful coding agent capabilities. It excels at tool calling, environment interaction, and autonomous programming. The model combines excellent coding skills with general-purpose capabilities. Usage | API reference
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Number of tokens) | ||||||
qwen3-coder-480b-a35b-instruct | 1 million tokens each Valid for 90 days after activating Model Studio | Tiered pricing. See the note below the table. | 65,536 | 204,800 | 262,144 | |
qwen3-coder-30b-a3b-instruct | ||||||
The above models use tiered pricing based on the number of input tokens in the current request.
Model | Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
qwen3-coder-480b-a35b-instruct | 0 < tokens ≤ 32K | $1.5 | $7.5 |
32K < tokens ≤ 128K | $2.7 | $13.5 | |
128K < tokens ≤ 200K | $4.5 | $22.5 | |
qwen3-coder-30b-a3b-instruct | 0 < tokens ≤ 32K | $0.45 | $2.25 |
32K < tokens ≤ 128K | $0.75 | $3.75 | |
128K < tokens ≤ 200K | $1.2 | $6 |
Global
In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen3-coder-480b-a35b-instruct | 262,144 | 204,800 | 65,536 | Pricing is tiered. See the note below the table. | |
qwen3-coder-30b-a3b-instruct | |||||
qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct use tiered pricing based on the number of input tokens in the current request.
Model | Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
qwen3-coder-480b-a35b-instruct | 0 < Tokens ≤ 32K | $1.50 | $7.50 |
32K < Tokens ≤ 128K | $2.70 | $13.50 | |
128K < Tokens ≤ 200K | $4.50 | $22.50 | |
qwen3-coder-30b-a3b-instruct | 0 < Tokens ≤ 32K | $0.45 | $2.25 |
32K < Tokens ≤ 128K | $0.75 | $3.75 | |
128K < Tokens ≤ 200K | $1.2 | $6 |
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen3-coder-480b-a35b-instruct | Tiered pricing. See the description below the table. | 65,536 | 204,800 | 262,144 | |
qwen3-coder-30b-a3b-instruct | |||||
The above models use tiered pricing based on the number of input tokens in the current request.
Model | Input tokens per request | Cost per 1M input tokens | Cost per 1M output tokens |
qwen3-coder-480b-a35b-instruct | 0 < Tokens ≤ 32K | $0.861 | $3.441 |
32K < Tokens ≤ 128K | $1.291 | $5.161 | |
128K < Tokens ≤ 200K | $2.151 | $8.602 | |
qwen3-coder-30b-a3b-instruct | 0 < Tokens ≤ 32K | $0.216 | $0.861 |
32K < Tokens ≤ 128K | $0.323 | $1.291 | |
128K < Tokens ≤ 200K | $0.538 | $2.151 |
Text generation - Third-party
DeepSeek
DeepSeek is a large language model from DeepSeek AI. API reference | Try it online
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
deepseek-v3.2 685B full-power version Context cache discounts | 131,072 | 98,304 | 32,768 | 65,536 | $0.287 | $0.431 |
deepseek-v3.2-exp 685B full-power version | ||||||
deepseek-v3.1 685B full-power version | $0.574 | $1.721 | ||||
deepseek-r1 685B full-power version Batch half price | 16,384 | $2.294 | ||||
deepseek-r1-0528 685B full-power version | ||||||
deepseek-v3 671B full-power version Batch half price | 131,072 | N/A | $0.287 | $1.147 | ||
deepseek-r1-distill-qwen-1.5b Based on Qwen2.5-Math-1.5B | 32,768 | 32,768 | 16,384 | 16,384 | Free trial for a limited time | |
deepseek-r1-distill-qwen-7b Based on Qwen2.5-Math-7B | $0.072 | $0.144 | ||||
deepseek-r1-distill-qwen-14b Based on Qwen2.5-14B | $0.144 | $0.431 | ||||
deepseek-r1-distill-qwen-32b Based on Qwen2.5-32B | $0.287 | $0.861 | ||||
deepseek-r1-distill-llama-8b Based on Llama-3.1-8B | Free trial for a limited time | |||||
deepseek-r1-distill-llama-70b Based on Llama-3.3-70B | ||||||
Kimi
Kimi-K2 is a large language model launched by Moonshot AI. It has excellent coding and tool-calling capabilities. Usage | Try it online
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Mode | Context window | Max input | Max CoT | Max response | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||||
kimi-k2.5 | Thinking mode | 262,144 | 258,048 | 32,768 | 32,768 | $0.574 | $3.011 |
kimi-k2.5 | Non-thinking mode | 262,144 | 260,096 | - | 32,768 | $0.574 | $3.011 |
kimi-k2-thinking | Thinking mode | 262,144 | 229,376 | 32,768 | 16,384 | $0.574 | $2.294 |
Moonshot-Kimi-K2-Instruct | Non-thinking mode | 131,072 | 131,072 | - | 8,192 | $0.574 | $2.294 |
GLM
The GLM series models are hybrid reasoning models from Zhipu AI that are designed for agents and support two modes: thinking and non-thinking. GLM
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
glm-4.7 | 202,752 | 169,984 | 32,768 | 16,384 | Tiered pricing, see the table below. | |
glm-4.6 | ||||||
The above models use tiered pricing based on input tokens per request.
Model | Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
glm-4.7 | 0<Token<=32K | $0.431 | $2.007 |
32K<Token<=166K | $0.574 | $2.294 | |
glm-4.6 | 0<Token<=32K | $0.431 | $2.007 |
32K<Token<=166K | $0.574 | $2.294 |
The models are not integrated third-party services, but deployed on Model Studio servers.
GLM models have the same prices under both thinking and non-thinking modes.
Image generation
Qwen-Image
The Qwen text-to-image model excels at rendering complex text, especially in Chinese and English. API reference
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota |
qwen-image-max Currently has the same capabilities as qwen-image-max-2025-12-30 | $0.075/image | Free quota: 100 images for each model Valid for 90 days after activating Model Studio |
qwen-image-max-2025-12-30 | $0.075/image | |
qwen-image-plus Currently has the same capabilities as qwen-image | $0.03/image | |
qwen-image-plus-2026-01-09 | $0.03/image | |
qwen-image | $0.035/image |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota |
qwen-image-max Currently has the same capabilities as qwen-image-max-2025-12-30 | $0.071677/image | No free quota |
qwen-image-max-2025-12-30 | $0.071677/image | |
qwen-image-plus Currently has the same capabilities as qwen-image | $0.028671/image | |
qwen-image-plus-2026-01-09 | $0.028671/image | |
qwen-image | $0.035/image |
Input prompt | Output image |
Healing-style hand-drawn poster featuring three puppies playing with a ball on lush green grass, adorned with decorative elements such as birds and stars. The main title “Come Play Ball!” is prominently displayed at the top in bold, blue cartoon font. Below it, the subtitle “Come [Show Off Your Skills]!” appears in green font. A speech bubble adds playful charm with the text: “Hehe, watch me amaze my little friends next!” At the bottom, supplementary text reads: “We get to play ball with our friends again!” The color palette centers on fresh greens and blues, accented with bright pink and yellow tones to highlight a cheerful, childlike atmosphere. |
|
Qwen-Image-Edit
The Qwen image editing model supports precise text editing in Chinese and English. It also supports operations such as color adjustment, detail enhancement, style transfer, adding or removing objects, and changing positions and actions. These features enable complex editing of images and text. API reference
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota |
qwen-image-edit-max Currently has the same capabilities as qwen-image-edit-max-2026-01-16 | $0.075/image | Free quota: 100 images for each model Valid for 90 days after activating Model Studio |
qwen-image-edit-max-2026-01-16 | $0.075/image | |
qwen-image-edit-plus Currently has the same capabilities as qwen-image-edit-plus-2025-10-30 | $0.03/image | |
qwen-image-edit-plus-2025-12-15 | $0.03/image | |
qwen-image-edit-plus-2025-10-30 | $0.03/image | |
qwen-image-edit | $0.045/image |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota |
qwen-image-edit-max Currently has the same capabilities as qwen-image-edit-max-2026-01-16 | $0.071677/image | No free quota |
qwen-image-edit-max-2026-01-16 | $0.071677/image | |
qwen-image-edit-plus Currently has the same capabilities as qwen-image-edit-plus-2025-10-30 | $0.028671/image | |
qwen-image-edit-plus-2025-12-15 | $0.028671/image | |
qwen-image-edit-plus-2025-10-30 | $0.028671/image | |
qwen-image-edit | $0.043/image |
Original image |
Make the person bend over and hold the dog's front paw. |
Original image |
Change the text on the letter blocks from 'HEALTH INSURANCE' to 'Tomorrow will be better'. |
Original image |
Change the dotted shirt to a light blue shirt. |
Original image |
Change the background to Antarctica. |
Original image |
Create a cartoon-style profile picture of the person. |
Original image |
Remove the hair from the dinner plate. |
Qwen-MT-Image
The Qwen image translation model supports translating text from images in 11 languages into Chinese or English. It accurately preserves the original layout and content information and provides custom features such as term definition, sensitive word filtering, and image entity detection. API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota |
qwen-mt-image | $0.000431/image | No free quota |
Original image |
Japanese |
Portuguese |
Arabic |
Tongyi - text-to-Image - Z-Image
Tongyi - text-to-image - Z-Image is a lightweight model that quickly generates high-quality images. The model supports Chinese and English text rendering, complex semantic understanding, various styles, and multiple resolutions and aspect ratios. API reference
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
z-image-turbo | Prompt extension disabled ( Prompt extension enabled ( | 100 images |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota |
z-image-turbo | Prompt extension disabled ( Prompt extension enabled ( | No free quota |
Input prompt | Output image |
Photo of a stylish young woman with short black hair standing confidently in front of a vibrant cartoon-style mural wall. She wears an all-black outfit: a puffed bomber jacket with a ruffled collar, cargo shorts, fishnet tights, and chunky black Doc Martens, with a gold chain dangling from her waist. The background features four colorful comic-style panels: one reads “GRAND STAGE” and includes sneakers and a Gatorade bottle; another displays green Nike sneakers and a slice of pizza; the third reads “HARAJUKU st” with floating shoes; and the fourth shows a blue mouse riding a skateboard with the text “Takeshita WELCOME.” Dominant bright colors include yellow, teal, orange, pink, and green. Speech bubbles, halftone patterns, and playful characters enhance the urban street-art aesthetic. Daylight evenly illuminates the scene, and the ground beneath her feet is white tiled pavement. Full-body portrait, centered composition, slightly tilted stance, direct eye contact with the camera. High detail, sharp focus, dynamic framing. |
|
Wan text-to-image
The Wan text-to-image model generates high-quality images from text. API reference | Try it online
Global
In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Model | Description | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.6-t2i | Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.03/image | No free quota |
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Description | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.6-t2i | Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.03/image | 50 images |
wan2.5-t2i-preview | Wan 2.5 preview. Removes single-side length limits and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.03/image | 50 images |
wan2.2-t2i-plus | Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.05/image | 100 images |
wan2.2-t2i-flash | Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.025/image | 100 images |
wan2.1-t2i-plus | Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details. | $0.05/image | 200 images |
wan2.1-t2i-turbo | Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed. | $0.025/image | 200 images |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.6-t2i | Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.028671/image | No free quota |
wan2.5-t2i-preview | Wan 2.5 preview. Removes single-side length limits and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.028671/image | No free quota |
wan2.2-t2i-plus | Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.02007/image | No free quota |
wan2.2-t2i-flash | Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.028671/image | No free quota |
wanx2.1-t2i-plus | Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details. | $0.028671/image | No free quota |
wanx2.1-t2i-turbo | Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed. | $0.020070/image | No free quota |
wanx2.0-t2i-turbo | Wan 2.0 Turbo Edition. Excels at textured portraits and creative designs. It is cost-effective. | $0.005735/image | No free quota |
Input prompt | Output image |
A needle-felted Santa Claus holding a gift and a white cat standing next to him against a background of colorful gifts and green plants, creating a cute, warm, and cozy scene. |
|
Wan2.6 image generation and editing
The Wan2.6 image generation model supports image editing and can generate outputs that contain both text and images to meet various generation and integration requirements. API reference.
Global
In Global deployment mode, the access point and data storage are in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Model | Unit price | Free quota |
wan2.6-image | $0.03/image | No free quota |
International
In International deployment mode, the access point and data storage are in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.6-image | $0.03/image | 50 images |
Mainland China
In Mainland China deployment mode, the access point and data storage are in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota |
wan2.6-image | $0.028671/image | No free quota |
Wan general image editing 2.5
The Wan2.5 general image editing model supports entity-consistent image editing and multi-image fusion. It accepts text, a single image, or multiple images as input. API reference.
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.5-i2i-preview | $0.03/image | 50 units |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota |
wan2.5-i2i-preview | $0.028671/image | No free quota |
|
Feature |
Input example |
Output image |
|
Single-image editing |
|
Change the floral dress to a vintage-style lace long dress with exquisite embroidery details on the collar and cuffs. |
|
Multi-image fusion |
|
Place the alarm clock from Image 1 next to the vase on the dining table in Image 2. |
Wan general image editing 2.1
The Wan2.1 general image editing model performs diverse image editing with simple instructions. It is suitable for scenarios such as outpainting, watermark removal, style transfer, image restoration, and image enhancement. Usage | API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit Price | Free Quota |
wanx2.1-imageedit | $0.020070 per image | No free quota |
The general image editing model currently supports the following features:
Model Features | Input image | Input prompt | Output image |
Global stylization |
| French picture book style. |
|
Local stylization |
| Change the house to a wooden plank style. |
|
Instruction-based editing |
| Change the girl's hair to red. |
|
Inpainting | Input image
Masked image (The white area is the mask)
| A ceramic rabbit holding a ceramic flower. | Output image
|
Text watermark removal |
| Remove the text from the image. |
|
Outpainting |
| A green fairy. |
|
Image super-resolution | Blurry image
| Image super-resolution. | Clear image
|
Image colorization |
| Blue background, yellow leaves. |
|
Line art to image |
| A living room in a minimalist Nordic style. |
|
Placeholder Image |
| A cartoon character cautiously peeks out, spying on a brilliant blue gem inside the room. |
|
OutfitAnyone
Compared to the basic version, the OutfitAnyone-Plus model offers improvements in image definition, clothing texture details, and logo restoration. However, it takes longer to generate images and is suitable for scenarios that are not time-sensitive. API reference | Try it online
OutfitAnyone-Image Parsing supports parsing model and clothing images, which can be used for pre-processing and post-processing of OutfitAnyone images. API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Sample input | Sample output |
aitryon-plus | OutfitAnyone-Plus |
|
|
aitryon-parsing-v1 | OutfitAnyone image parsing |
OutfitAnyone pricing
Service | Model | Unit price | Discount | Tier |
OutfitAnyone - Plus | aitryon-plus | $0.071677/image | None | None |
OutfitAnyone - Image parsing | aitryon-parsing-v1 | $0.000574/image | None | None |
Video generation - Wan
Text-to-video
The Wan text-to-video model generates videos from a single sentence. The videos feature rich artistic styles and cinematic quality. API reference | Try it online
Global
In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Model | Description | Unit price | Free quota |
wan2.6-t2v | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.1/second 1080P: $0.15/second | No free quota |
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Description | Unit price | Free quota (Claim) Valid for 90 days after activating Model Studio |
wan2.6-t2v | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.10/second 1080P: $0.15/second | 50 seconds |
wan2.5-t2v-preview | Wan 2.5 preview. Supports automatic voiceover and custom audio file input. | 480P: $0.05/second 720P: $0.10/second 1080P: $0.15/second | 50 seconds |
wan2.2-t2v-plus | Wan 2.2 Professional Edition. Significantly improved image detail and motion stability. | 480P: $0.02/second 1080P: $0.10/second | 50 seconds |
wan2.1-t2v-turbo | Wan 2.1 Turbo Edition. Fast generation speed and balanced performance. | $0.036/second | 200 seconds |
wan2.1-t2v-plus | Wan 2.1 Professional Edition. Generates rich details and higher-quality visuals. | $0.10/second | 200 seconds |
US
In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.
Model | Description | Unit price | Free quota |
wan2.6-t2v-us | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.1/second 1080P: $0.15/second | No free quota |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price | Free quota |
wan2.6-t2v | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.086012/second 1080p: 0.143353 per second | No free quota |
wan2.5-t2v-preview | Wan 2.5 preview. Supports automatic voiceover and custom audio file input. | 480P: $0.043006/second 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
wan2.2-t2v-plus | Wan 2.2 Professional Edition. Significantly improved image detail and motion stability. | 480P: $0.02007/second 1080P: $0.100347/second | No free quota |
wanx2.1-t2v-turbo | Faster generation speed and balanced performance. | $0.034405/second | No free quota |
wanx2.1-t2v-plus | Generates richer details and higher-quality visuals. | $0.100347/second | No free quota |
Input prompt | Output video (wan2.6, multi-shot video) |
Shot from a low angle, in a medium close-up, with warm tones, mixed lighting (the practical light from the desk lamp blends with the overcast light from the window), side lighting, and a central composition. In a classic detective office, wooden bookshelves are filled with old case files and ashtrays. A green desk lamp illuminates a case file spread out in the center of the desk. A fox, wearing a dark brown trench coat and a light gray fedora, sits in a leather chair, its fur crimson, its tail resting lightly on the edge, its fingers slowly turning yellowed pages. Outside, a steady drizzle falls beneath a blue sky, streaking the glass with meandering streaks. It slowly raises its head, its ears twitching slightly, its amber eyes gazing directly at the camera, its mouth clearly moving as it speaks in a smooth, cynical voice: 'The case was cold, colder than a fish in winter. But every chicken has its secrets, and I, for one, intended to find them '. |
Image-to-video - first frame
The Wan image-to-video model uses an input image as the first frame of a video. It then generates the rest of the video based on a prompt. The videos feature rich artistic styles and cinematic quality. API reference | Try it online
Global
In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Model | Description | Unit price | Free quota |
wan2.6-i2v | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.1/second 1080P: $0.15/second | No free quota |
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Description | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.6-i2v-flash | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | Output video with audio
Output video without audio
| 50 seconds |
wan2.6-i2v | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.10/second 1080P: $0.15/second | 50 seconds |
wan2.5-i2v-preview | Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads. | 480P: $0.05/second 720P: $0.10/second 1080P: $0.15/second | 50 seconds |
wan2.2-i2v-flash | Wan 2.2 Flash Edition. Extremely fast generation speed with significant improvements in visual detail and motion stability. | 480P: $0.015/second 720P: $0.036/second | 50 seconds |
wan2.2-i2v-plus | Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability. | 480P: $0.02/second 1080P: $0.10/second | 50 seconds |
wan2.1-i2v-turbo | Wan 2.1 Turbo Edition. Fast generation speed with balanced performance. | $0.036/second | 200 seconds |
wan2.1-i2v-plus | Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals. | $0.10/second | 200 seconds |
US
In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.
Model | Description | Unit price | Free quota |
wan2.6-i2v-us | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.1/second 1080P: $0.15/second | No free quota |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price | Free quota |
wan2.6-i2v-flash | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | Output video with audio
Output video without audio
| No free quota |
wan2.6-i2v | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
wan2.5-i2v-preview | Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads. | 480P: $0.043006/second 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
wan2.2-i2v-plus | Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability. | 480P: $0.02007/second 1080P: $0.100347/second | No free quota |
wanx2.1-i2v-turbo | Wan 2.1 Turbo Edition. Fast generation speed with balanced performance. | $0.034405/second | No free quota |
wanx2.1-i2v-plus | Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals. | $0.100347/second | No free quota |
Input first frame image and audio | Output video (wan2.6, multi-shot video) |
Input audio: | |
Input prompt: A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life from a concrete wall. He raps an English song at high speed while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise. | |
Image-to-video - first and last frames
The Wan first-and-last-frame video model generates a smooth, dynamic video from a prompt. You only need to provide the first and last frame images. The videos feature rich artistic styles and cinematic quality. API reference | Try it online
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.2-kf2v-flash | 480P: $0.015/second 720P: $0.036/second 1080P: $0.07/second | 50 seconds |
wan2.1-kf2v-plus | $0.10/second | 200 seconds |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota (Note) |
wan2.2-kf2v-flash | 480P: $0.014335/second 720P: $0.028671/second 1080P: $0.068809/second | No free quota |
wanx2.1-kf2v-plus | $0.100347/second | No free quota |
Example input | Output video | ||
First frame | Last frame | Prompt | |
|
| In a realistic style, the camera starts at eye level on a small black cat looking up at the sky, then gradually moves upward to a top-down shot that focuses on the cat's curious eyes. | |
Reference-to-video
The Wan reference-to-video model uses a character's appearance and voice from an input video and a prompt to generate a new video that maintains character consistency. API reference
Billing rule: Both input and output videos are billed by the second. Failed jobs are not billed and do not consume the free quota.
The billable duration of the input video does not exceed 5 seconds. For more information, see Wan - reference-to-video.
The billable duration of the output video is the duration in seconds of the successfully generated video.
Global
In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Model | Output video type | Input & output price | Free quota (Note) |
wan2.6-r2v | Video with audio | 720P: $0.1/second 1080P: $0.15/second | No free quota |
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Output video type | Input & output price | Free quota (Note) |
wan2.6-r2v-flash | Video with audio
| 720P: $0.05/second 1080P: $0.075/second | 50 seconds Valid for 90 days after activating Model Studio |
Video without audio
| 720P: $0.025/second 1080P: $0.0375/second | ||
wan2.6-r2v | Video with audio | 720P: $0.10/second 1080P: $0.15/second | 50 seconds Valid for 90 days after activating Model Studio |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Output video type | Input & output price | Free quota (Note) |
wan2.6-r2v-flash | Video with audio
| 720P: $0.043006/second 1080P: $0.071676/second | No free quota |
Video without audio
| 720P: $0.021503/second 1080P: $0.035838/second | ||
wan2.6-r2v | Video with audio | 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
General video editing
The Wan general video editing model supports multimodal inputs, including text, images, and videos. It can perform video generation and general editing tasks. API reference | Try it online
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota (Note) |
wan2.1-vace-plus | $0.1/second | 50 seconds Valid for 90 days after activating Model Studio |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota (Note) |
wanx2.1-vace-plus | $0.100347/second | No free quota |
The general video editing model supports the following features:
Feature | Input reference image | Input prompt | Output video |
Multi-image reference | Reference image 1 (for entity)
Reference image 2 (for background)
| In the video, a girl gracefully walks out from the depths of an ancient, misty forest. Her steps are light, and the camera captures her every nimble movement. When the girl stops and looks around at the lush woods, she breaks into a smile of surprise and joy. This moment is captured in the interplay of light and shadow, recording the wonderful encounter between the girl and nature. | Output video |
Video restyling | The video shows a black steampunk-style car driven by a gentleman, adorned with gears and copper pipes. The background is a steam-powered candy factory with retro elements, creating a vintage and playful scene. | ||
Local editing | Input video Input mask image (The white area indicates the editing region)
| The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, drinking with a look of contentment. The cafe is tastefully decorated, with soft tones and warm lighting illuminating the area where the lion is. | The content in the editing region is modified based on the prompt |
Video extension | Input initial video segment (1 second) | A dog wearing sunglasses skateboards on a street, 3D cartoon. | Output extended video (5 seconds) |
Video outpainting | An elegant lady is passionately playing the violin, with a full symphony orchestra behind her. |
Wan - digital human
This feature generates natural-looking videos of people speaking, singing, or performing, based on a single character image and an audio file. To use this feature, you can call the following models in sequence. wan2.2-s2v image detection | wan2.2-s2v video generation
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price |
wan2.2-s2v-detect | Checks if an input image meets requirements, such as sufficient definition, a single person, and a frontal view. | $0.000574/image |
wan2.2-s2v | Generates a dynamic video of a person from a valid image and an audio clip. | 480p: $0.071677/second 720p: $0.129018/second |
Sample input | Output video |
Input audio: |
Wan - animate image
Available in standard and professional modes. The model transfers the actions and expressions from a reference video to a character image, generating a video that animates the character from the image. API reference.
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Service | Description | Unit price | Free quota (View) |
wan2.2-animate-move | Standard mode | A cost-effective service with fast generation speeds. Suitable for basic needs, such as simple animation demos. | $0.12/second | The total time for both patterns is 50 seconds. |
Professional mode | Delivers high animation smoothness and natural transitions for actions and expressions. The output resembles a live-action video. | $0.18/second |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Service | Description | Unit price | Free quota (View) |
wan2.2-animate-move | Standard mode | Fast generation. Ideal for basic needs, such as simple animation demos. Cost-effective. | $0.06/second | No free quota |
Professional mode | Provides high-quality, smooth animation with natural transitions for actions and expressions. The output is similar to a live-action video. | $0.09/second |
Character image | Reference video | Standard video | Output Video (Professional Mode) |
|
Wan - video character swap
Available in standard and professional modes. The model replaces the main character in a video with a character from an image. It preserves the original video's scene, lighting, and hue. API reference.
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Service | Description | Unit price | Free quota (View) |
wan2.2-animate-mix | Standard mode | Generates animations quickly. Ideal for basic requirements, such as simple demos. Highly cost-effective. | $0.18/s | The combined duration of both services is 50 seconds. |
Professional mode | Produces highly smooth animations with natural transitions for actions and expressions. The result closely resembles a live-action video. | $0.26/s |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Service | Description | Unit price | Free quota (View) |
wan2.2-animate-mix | Standard mode | Generates animations quickly. Ideal for basic requirements, such as simple demos. Highly cost-effective. | $0.09/s | No free quota |
Professional mode | Produces highly smooth animations with natural transitions for actions and expressions. The result closely resembles a live-action video. | $0.13/s |
Character image | Reference video | Standard output video | Professional output video |
|
AnimateAnyone
This feature generates character motion videos based on a character image and a motion template. To use this feature, you can call the following three models in sequence. AnimateAnyone image detection API details | AnimateAnyone motion template generation | AnimateAnyone video generation API details
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price |
animate-anyone-detect-gen2 | Detects whether an input image meets the requirements. | $0.000574/image |
animate-anyone-template-gen2 | Extracts character motion from a video and generates a motion template. | $0.011469/second |
animate-anyone-gen2 | Generates a character action video from a character image and an action template. |
Input: Character image | Input: Motion video | Outputs (generated from the image background) | Outputs Generated by Video Background |
|
The preceding example was generated by the Tongyi App, which integrates AnimateAnyone.
The content generated by the AnimateAnyone model is video only and does not include audio.
EMO
This feature generates dynamic portrait videos based on a portrait image and a human voice audio file. To use this feature, you can call the following models in sequence. EMO image detection | EMO video generation
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price |
emo-detect-v1 | Detects whether an input image meets the required specifications. This model can be called directly without deployment. | $0.000574/image |
emo-v1 | Generates a dynamic portrait video. This model can be called directly without deployment. |
|
Input: Portrait image and human voice audio file | Output: Dynamic portrait video |
Portrait:
Human voice audio: See the video on the right. | Character video: Style level: active ("style_level": "active") |
LivePortrait
This model quickly and efficiently generates dynamic portrait videos based on a portrait image and a human voice audio file. Compared to the EMO model, it generates videos faster and at a lower cost, but the quality is not as good. To use this feature, you can call the following two models in sequence. LivePortrait image detection | LivePortrait video generation
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price |
liveportrait-detect | Detects whether an input image meets the requirements. | $0.000574/image |
liveportrait | Generates a dynamic portrait video. | $0.002868/second |
Input: Portrait image and voice audio | Output: Animated portrait video |
Portrait image:
Voice audio: Sourced from the video on the right. | Portrait video: |
Emoji
This feature generates dynamic face videos based on a face image and preset facial motion templates. This capability can be used for scenarios such as creating emojis and generating video materials. To use this feature, you can call the following models in sequence. Emoji image detection | Emoji video generation
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price |
emoji-detect-v1 | Detects whether an input image meets specified requirements. | $0.000574/image |
emoji-v1 | Generates custom emojis based on a portrait image and a specified emoji template. | $0.011469/second |
Input: Portrait image | Output: Dynamic portrait video |
| Parameter for the "Happy" emoji template: ("input.driven_id": "mengwa_kaixin") |
VideoRetalk
This feature generates a video where the character's lip movements match the input audio, based on a character video and a human voice audio file. To use this feature, you can call the following model. API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price |
videoretalk | Synchronizes a character's lip movements with input audio to generate a new video. | $0.011469/second |
Video style transform
This model generates videos in different styles that match the semantic description of user-input text, or restyles a user-input video. API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price | |
video-style-transform | Transforms an input video into styles such as Japanese comic and American comic. | 720P | $0.071677/second |
540P | $0.028671/second | ||
Input video | Output video (Manga style) |
Speech synthesis (text-to-speech)
Qwen speech synthesis
Supports mixed-language text input and streaming audio output. Usage | API reference
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
Qwen3-TTS-Instruct-Flash
|
Model |
Version |
Unit price |
Max input characters |
Free quota (Note) |
|
qwen3-tts-instruct-flash Currently, qwen3-tts-instruct-flash-2026-01-26. |
Stable |
$0.115/10,000 characters |
600 |
10,000 characters Valid for 90 days after activating Model Studio |
|
qwen3-tts-instruct-flash-2026-01-26 |
Snapshot |
-
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen3-TTS-VD
|
Model |
Version |
Unit price |
Max input characters |
Free quota (Note) |
|
qwen3-tts-vd-2026-01-26 |
Snapshot |
$0.115 per 10,000 characters |
600 |
10,000 characters Valid for 90 days after activating Model Studio |
-
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen3-TTS-VC
|
Model |
Version |
Unit price |
Max input characters |
Free quota (Note) |
|
qwen3-tts-vc-2026-01-22 |
Snapshot |
$0.115/10,000 characters |
600 |
10,000 characters Valid for 90 days after activating Model Studio. |
-
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen3-TTS-Flash
|
Model |
Version |
Unit price |
Max input characters |
Free quota (Note) |
|
qwen3-tts-flash Currently, qwen3-tts-flash-2025-11-27. |
Stable |
$0.10 per 10,000 characters |
600 |
10,000 characters Valid for 90 days after activating Model Studio |
|
qwen3-tts-flash-2025-11-27 |
Snapshot |
|||
|
qwen3-tts-flash-2025-09-18 |
Snapshot |
If you activate Alibaba Cloud Model Studio before 00:00 on November 13, 2025: 2,000 characters If you activate Alibaba Cloud Model Studio after 00:00 on November 13, 2025: 10,000 characters Valid for 90 days after activating Model Studio. |
-
Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Mainland China
In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.
Qwen3-TTS-Instruct-Flash
|
Model |
Version |
Unit price |
Max input characters |
Free quota (Note) |
|
qwen3-tts-instruct-flash Currently, qwen3-tts-instruct-flash-2026-01-26. |
Stable |
$0.115/10,000 characters |
600 |
No free quota is available. |
|
qwen3-tts-instruct-flash-2026-01-26 |
Snapshot |
-
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen3-TTS-VD
|
Model |
Version |
Unit price |
Max input characters |
Free quota (Note) |
|
qwen3-tts-vd-2026-01-26 |
Snapshot |
$0.115/10,000 characters |
600 |
No free quota is available. |
-
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen3-TTS-VC
|
Model |
Version |
Unit price |
Max input characters |
Free quota (Note) |
|
qwen3-tts-vc-2026-01-22 |
Snapshot |
$0.115/10,000 characters |
600 |
No free quota is available. |
-
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen3-TTS-Flash
|
Model |
Version |
Unit price |
Max input characters |
Free quota (Note) |
|
qwen3-tts-flash Currently, qwen3-tts-flash-2025-11-27. |
Stable |
$0.114682 per 10,000 characters |
600 |
No free quota is available. |
|
qwen3-tts-flash-2025-11-27 |
Snapshot |
|||
|
qwen3-tts-flash-2025-09-18 |
Snapshot |
-
Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen-TTS
|
Model |
Version |
Context window |
Max input |
Max output |
Input cost |
Output cost |
Free quota (Note) |
|
(tokens) |
(Per 1,000 tokens) |
||||||
|
qwen-tts Provides the same capabilities as qwen-tts-2025-04-10. |
Stable |
8,192 |
512 |
7,680 |
$0.230 |
$1.434 |
No free quota is available. |
|
qwen-tts-latest Provides the same capabilities as the latest snapshot version. |
Latest |
||||||
|
qwen-tts-2025-05-22 |
Snapshot |
||||||
|
qwen-tts-2025-04-10 |
|||||||
Audio-to-token conversion rule: Each second of audio corresponds to 50 tokens. Audio shorter than 1 second is calculated as 50 tokens.
Qwen real-time speech synthesis
Supports streaming text input and streaming audio output. It can automatically adjust the speech rate based on the text content and punctuation. Usage | API reference
Qwen3-TTS-Instruct-Flash-Realtime supports Qwen real-time speech synthesis and can only use the default voice. It does not support cloned or designed voices.
Qwen3-TTS-VD-Realtime supports using voices from Voice Design (Qwen) for real-time speech synthesis, but does not support the default voice.
Qwen3-TTS-VC-Realtime supports using voices from Voice Cloning (Qwen) for real-time speech synthesis, but does not support the default voice.
Qwen3-TTS-Flash-Realtime and Qwen-TTS-Realtime can only use the default voice. They do not support cloned or designed voices.
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
Qwen3-TTS-Instruct-Flash-Realtime
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-tts-instruct-flash-realtime Currently, qwen3-tts-instruct-flash-realtime-2026-01-22. |
Stable |
$0.143/10,000 characters |
10,000 characters Valid for 90 days after activating Model Studio. |
|
qwen3-tts-instruct-flash-realtime-2026-01-22 |
Snapshot |
-
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen3-TTS-VD-Realtime
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-tts-vd-realtime-2026-01-15 |
Snapshot |
$0.143353 per 10,000 characters |
10,000 characters Valid for 90 days after activating Model Studio |
|
qwen3-tts-vd-realtime-2025-12-16 |
Snapshot |
-
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen3-TTS-VC-Realtime
|
Model |
Version |
Unit price |
Free quota(Note) |
|
qwen3-tts-vc-realtime-2026-01-15 |
Snapshot |
$0.13/10,000 characters |
10,000 characters Valid for 90 days after activating Model Studio. |
|
qwen3-tts-vc-realtime-2025-11-27 |
Snapshot |
-
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen3-TTS-Flash-Realtime
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-tts-flash-realtime Currently, qwen3-tts-flash-realtime-2025-11-27. |
Stable |
$0.13 per 10,000 characters |
10,000 characters Valid for 90 days after activating Model Studio |
|
qwen3-tts-flash-realtime-2025-11-27 |
Snapshot |
||
|
qwen3-tts-flash-realtime-2025-09-18 |
Snapshot |
If you activate Alibaba Cloud Model Studio before 00:00 on November 13, 2025: 2,000 characters If you activate Alibaba Cloud Model Studio after 00:00 on November 13, 2025: 10,000 characters Valid for 90 days after activating Model Studio |
-
Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Mainland China
In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.
Qwen3-TTS-Instruct-Flash-Realtime
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-tts-instruct-flash-realtime Current capabilities match qwen3-tts-instruct-flash-realtime-2026-01-22. |
Stable version |
$0.143 per 10,000 characters |
No free quota |
|
qwen3-tts-instruct-flash-realtime-2026-01-22 |
Snapshot version |
-
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen3-TTS-VD-Realtime
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-tts-vd-realtime-2026-01-15 |
Snapshot |
$0.143353 per 10,000 characters |
No free quota |
|
qwen3-tts-vd-realtime-2025-12-16 |
Snapshot |
-
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen3-TTS-VC-Realtime
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-tts-vc-realtime-2026-01-15 |
Snapshot |
$0.143353 per 10,000 characters |
No free quota is available. |
|
qwen3-tts-vc-realtime-2025-11-27 |
Snapshot |
-
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen3-TTS-Flash-Realtime
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-tts-flash-realtime Currently, qwen3-tts-flash-realtime-2025-11-27. |
Stable |
$0.143353 per 10,000 characters |
No free quota is available. |
|
qwen3-tts-flash-realtime-2025-11-27 |
Snapshot |
||
|
qwen3-tts-flash-realtime-2025-09-18 |
Snapshot |
-
Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
-
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
-
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
-
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
-
Qwen-TTS-Realtime
|
Model |
Version |
Context window |
Max input |
Max output |
Input cost |
Output cost |
Supported languages |
Free quota (Note) |
|
(tokens) |
(Per 1,000 tokens) |
|||||||
|
qwen-tts-realtime Currently, qwen-tts-realtime-2025-07-15. |
Stable |
8,192 |
512 |
7,680 |
$0.345 |
$1.721 |
Chinese, English |
No free quota is available. |
|
qwen-tts-realtime-latest Currently, qwen-tts-realtime-2025-07-15. |
Latest |
Chinese, English |
||||||
|
qwen-tts-realtime-2025-07-15 |
Snapshot |
Chinese, English |
||||||
Audio-to-token conversion rule: Each second of audio corresponds to 50 tokens. Audio shorter than 1 second is calculated as 50 tokens.
Qwen voice cloning
Voice cloning uses a large model for feature extraction, allowing you to clone voices without training. Provide 10 to 20 seconds of audio to generate a highly similar and natural-sounding custom voice. Usage | API reference
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
|
Model |
Unit price |
Free quota (Note) |
|
qwen-voice-enrollment |
$0.01 per voice |
1,000 voices Valid for 90 days after activating Model Studio. |
Mainland China
In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.
|
Model |
Unit price |
Free quota (Note) |
|
qwen-voice-enrollment |
$0.01 per sound |
No free quota is available. |
Qwen voice design
Voice design generates custom voices from text descriptions. It supports multi-language and multi-dimensional voice feature definitions, making it suitable for applications such as ad dubbing, character creation, and audio content production. Usage | API reference
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
|
Model |
Unit price |
Free quota (Note) |
|
qwen-voice-design |
$0.2 per voice |
10 voices Valid for 90 days after activating Model Studio. |
Mainland China
In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.
|
Model |
Unit price |
Free quota (Note) |
|
qwen-voice-design |
$0.20 per voice |
No free quota is available. |
CosyVoice speech synthesis
CosyVoice is a next-generation generative speech synthesis large language model (LLM) from Alibaba Cloud. It deeply integrates text understanding and speech generation based on a large-scale pre-trained language model and supports real-time streaming text-to-speech synthesis. Usage | API reference
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).
|
Model |
Unit price |
Free quota (Note) |
|
cosyvoice-v3-plus |
$0.26/10,000 characters |
10,000 characters Valid for 90 days after activating Model Studio. |
|
cosyvoice-v3-flash |
$0.13/10,000 characters |
Character calculation rules: Chinese characters (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) are counted as 2 characters. All other characters (such as letters, numbers, and Japanese/Korean syllabaries) are counted as 1 character. SSML tag content is not billed.
Mainland China
In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.
|
Model |
Unit price |
Free quota (Note) |
|
cosyvoice-v3-plus |
$0.286706/10,000 characters |
No free quota |
|
cosyvoice-v3-flash |
$0.14335/10,000 characters |
|
|
cosyvoice-v2 |
$0.286706/10,000 characters |
Character calculation rules: Chinese characters (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) are counted as 2 characters. All other characters (such as letters, numbers, and Japanese/Korean syllabaries) are counted as 1 character. SSML tag content is not billed.
Speech recognition (speech-to-text) and translation (speech-to-translation)
Qwen3-LiveTranslate-Flash
Qwen3-LiveTranslate-Flash is an audio and video translation model based on the Qwen3-Omni architecture. It supports translation between 18 languages, including Chinese, English, Russian, and French. The model can use visual context to improve translation accuracy and outputs both text and speech. Usage | API reference
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.
|
Model |
Version |
Context window |
Max input |
Max output |
Free quota (Note) |
|
(tokens) |
|||||
|
qwen3-livetranslate-flash Currently, qwen3-livetranslate-flash-2025-12-01. |
Stable |
53,248 |
49,152 |
4,096 |
1 million tokens each Valid for 90 days after activating Model Studio |
|
qwen3-livetranslate-flash-2025-12-01 |
Snapshot |
||||
The billing rules for input and output are as follows:
|
|
Mainland China
In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.
|
Model |
Version |
Context window |
Max input |
Max output |
Free quota (Note) |
|
(tokens) |
|||||
|
qwen3-livetranslate-flash Currently, qwen3-livetranslate-flash-2025-12-01. |
Stable |
53,248 |
49,152 |
4,096 |
No free quota is available. |
|
qwen3-livetranslate-flash-2025-12-01 |
Snapshot |
||||
The billing rules for input and output are as follows:
|
|
Qwen3-LiveTranslate-Flash-Realtime
Qwen3-LiveTranslate-Flash-Realtime is a multilingual, real-time audio and video translation model. It can recognize 18 languages and translate them into audio in 10 languages in real time.
Core features:
-
Multi-language support: Supports 18 languages, such as Chinese, English, French, German, Russian, Japanese, and Korean, and 6 Chinese dialects, including Mandarin, Cantonese, and Sichuanese.
-
Visual enhancement: Uses visual content to improve translation accuracy. The model analyzes lip movements, actions, and on-screen text to improve translation in noisy environments or for words with multiple meanings.
-
Low latency: Achieves simultaneous interpretation latency as low as 3 seconds.
-
High-quality simultaneous interpretation: Addresses cross-language word order issues using semantic unit prediction technology. The real-time translation quality is comparable to offline translation results.
-
Natural voice: Generates natural-sounding, human-like speech. The model adapts its tone and emotion based on the source speech content.
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.
|
Model |
Version |
Context window |
Max input |
Max output |
Free quota |
|
(tokens) |
|||||
|
qwen3-livetranslate-flash-realtime Currently, qwen3-livetranslate-flash-realtime-2025-09-22. |
Stable |
53,248 |
49,152 |
4,096 |
1 million tokens Valid for 90 days after activating Model Studio. |
|
qwen3-livetranslate-flash-realtime-2025-09-22 |
Snapshot |
||||
After the free quota is used up, the billing rules for input and output are as follows:
|
|
Token calculation rules:
-
Audio: Each second of audio input or output consumes 12.5 tokens.
-
Image: Each 28×28 pixel input consumes 0.5 tokens.
Mainland China
In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.
|
Model |
Version |
Context window |
Max input |
Max output |
Free quota (Note) |
|
(tokens) |
|||||
|
qwen3-livetranslate-flash-realtime Currently, qwen3-livetranslate-flash-realtime-2025-09-22. |
Stable |
53,248 |
49,152 |
4,096 |
No free quota is available. |
|
qwen3-livetranslate-flash-realtime-2025-09-22 |
Snapshot |
||||
The billing rules for input and output are as follows:
|
|
Token calculation rules:
-
Audio: Each second of audio input or output consumes 12.5 tokens.
-
Image: Each 28×28 pixel input consumes 0.5 tokens.
Qwen audio file recognition
Based on the Qwen multimodal foundation model, this model supports features such as multi-language recognition, singing recognition, and noise rejection. Usage | API reference
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.
Qwen3-ASR-Flash-Filetrans
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-asr-flash-filetrans Currently, qwen3-asr-flash-filetrans-2025-11-17. |
Stable |
$0.000035/second |
36,000 seconds (10 hours) Valid for 90 days after activating Model Studio. |
|
qwen3-asr-flash-filetrans-2025-11-17 |
Snapshot |
-
Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
-
Supported sample rates: Any
Qwen3-ASR-Flash
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-asr-flash Its capabilities match those of qwen3-asr-flash-2025-09-08. |
Stable |
$0.000035 per second |
36,000 seconds (10 hours) Valid for 90 days after activating Model Studio. |
|
qwen3-asr-flash-2025-09-08 |
Snapshot |
-
Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
-
Supported sample rates: Any
US
In the US deployment mode, the endpoints and data storage are located in the US (Virginia) region. Model inference compute resources are limited to the US.
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-asr-flash-us Currently, qwen3-asr-flash-2025-09-08-us. |
Stable |
$0.000035/second |
No free quota is available. |
|
qwen3-asr-flash-2025-09-08-us |
Snapshot |
-
Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
-
Supported sample rates: Any
Mainland China
In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.
Qwen3-ASR-Flash-Filetrans
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-asr-flash-filetrans It offers the same capabilities as qwen3-asr-flash-filetrans-2025-11-17. |
Stable |
$0.000032/second |
No free quota is available. |
|
qwen3-asr-flash-filetrans-2025-11-17 |
Snapshot |
-
Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
-
Supported sample rates: Any
Qwen3-ASR-Flash
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-asr-flash Currently, qwen3-asr-flash-2025-09-08. |
Stable |
$0.000032/second |
No free quota is available. |
|
qwen3-asr-flash-2025-09-08 |
Snapshot |
-
Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
-
Supported sample rates: Any
Qwen real-time speech recognition
Qwen Real-Time Speech Recognition is a Large Language Model (LLM) with automatic language detection. It supports 11 languages and delivers accurate transcription even in complex audio environments. How to use | API reference
International
In international deployment mode, endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled across global regions, excluding Mainland China.
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-asr-flash-realtime Currently, qwen3-asr-flash-realtime-2025-10-27 |
Stable |
$0.00009/second |
36,000 seconds (10 hours) Valid for 90 days after activating Model Studio. |
|
qwen3-asr-flash-realtime-2025-10-27 |
Snapshot |
-
Languages supported: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Türkçe, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
-
Sample rates supported: 8 kHz, 16 kHz
Mainland China
In Mainland China deployment mode, endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China only.
|
Model |
Version |
Unit price |
Free quota (Note) |
|
qwen3-asr-flash-realtime Currently, qwen3-asr-flash-realtime-2025-10-27 |
Stable |
$0.000047/second |
No free quota |
|
qwen3-asr-flash-realtime-2025-10-27 |
Snapshot |
-
Languages supported: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Türkçe, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
-
Sample rates supported: 8 kHz, 16 kHz
Paraformer speech recognition
Paraformer speech recognition offers two versions: recorded file recognition and real-time speech recognition.
Recorded File Recognition
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
|
Model |
Unit price |
Free quota (Note) |
|
paraformer-v2 |
$0.000012/second |
No free quota |
|
paraformer-8k-v2 |
-
Languages supported:
-
paraformer-v2: Chinese (Mandarin, Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghai), English, Japanese, Korean, German, French, Russian
-
paraformer-8k-v2: Mandarin Chinese
-
-
Sample rates supported:
-
paraformer-v2: Any
-
paraformer-8k-v2: 8 kHz
-
-
Audio formats supported: AAC, AMR, AVI, FLAC, FLV, M4A, MKV, MOV, MP3, MP4, MPEG, OGG, OPUS, WAV, WEBM, WMA, WMV
Real-Time Speech Recognition
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
|
Model |
Unit price |
Free quota (Note) |
|
paraformer-realtime-v2 |
$0.000035/second |
No free quota |
|
paraformer-realtime-8k-v2 |
-
Languages supported:
-
paraformer-realtime-v2: Chinese (Mandarin, Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghai), English, Japanese, Korean, German, French, Russian
-
paraformer-realtime-8k-v2: Mandarin Chinese
-
-
Sample rates supported:
-
paraformer-realtime-v2: Any
-
paraformer-realtime-8k-v2: 8 kHz
-
-
Audio formats supported: PCM, WAV, MP3, OPUS, SPEEX, AAC, AMR
Fun-ASR speech recognition
Fun-ASR speech recognition offers two versions: audio file recognition and real-time speech recognition.
Audio File Recognition
Usage instructions | API reference
International
In the international deployment mode, endpoints and data storage are in the Singapore region. Model inference compute resources are dynamically scheduled globally, excluding Mainland China.
|
Model |
Version |
Unit price |
Free quota (Note) |
|
fun-asr Currently, fun-asr-2025-11-07 |
Stable |
$0.000035/second |
36,000 seconds (10 hours) Valid for 90 days |
|
fun-asr-2025-11-07 Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy |
Snapshot |
||
|
fun-asr-2025-08-25 |
|||
|
fun-asr-mtl Currently, fun-asr-mtl-2025-08-25 |
Stable |
||
|
fun-asr-mtl-2025-08-25 |
Snapshot |
-
Languages supported:
-
fun-asr and fun-asr-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
-
fun-asr-2025-08-25: Mandarin and English.
-
fun-asr-mtl and fun-asr-mtl-2025-08-25: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
-
-
Sample rates supported: Any
-
Audio formats supported: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv
Mainland China
In the Mainland China deployment mode, endpoints and data storage are in the Beijing region. Model inference compute resources are limited to Mainland China.
|
Model |
Version |
Unit price |
Free quota (Note) |
|
fun-asr Currently, fun-asr-2025-11-07 |
Stable |
$0.000032 / second |
No free quota |
|
fun-asr-2025-11-07 Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy |
Snapshot |
||
|
fun-asr-2025-08-25 |
|||
|
fun-asr-mtl Currently, fun-asr-mtl-2025-08-25 |
Stable |
||
|
fun-asr-mtl-2025-08-25 |
Snapshot |
-
Languages supported:
-
fun-asr and fun-asr-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
-
fun-asr-2025-08-25: Mandarin and English.
-
fun-asr-mtl and fun-asr-mtl-2025-08-25: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
-
-
Sample rates supported: Any
-
Audio formats supported: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv
Real-Time Speech Recognition
Usage instructions | API reference
International
In the international deployment mode, endpoints and data storage are in the Singapore region. Model inference compute resources are dynamically scheduled globally, excluding Mainland China.
|
Model |
Version |
Unit price |
Free quota (Note) |
|
fun-asr-realtime Currently, fun-asr-realtime-2025-11-07 |
Stable |
$0.00009/second |
36,000 seconds (10 hours) Valid for 90 days |
|
fun-asr-realtime-2025-11-07 |
Snapshot |
-
Languages supported: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
-
Sample rates supported: 16 kHz
-
Audio formats supported: pcm, wav, mp3, opus, speex, aac, amr
Mainland China
In the Mainland China deployment mode, endpoints and data storage are in the Beijing region. Model inference compute resources are limited to Mainland China.
|
Model |
Version |
Unit price |
Free quota (Note) |
|
fun-asr-realtime Currently, fun-asr-realtime-2025-11-07 |
Stable |
$0.000047/second |
No free quota |
|
fun-asr-realtime-2025-11-07 Improved far-field VAD compared to fun-asr-realtime-2025-09-15 for higher accuracy. |
Snapshot |
||
|
fun-asr-realtime-2025-09-15 |
-
Languages supported:
-
fun-asr-realtime and fun-asr-realtime-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
-
fun-asr-realtime-2025-09-15: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, and Thai.
-
-
Sample rates supported: 16 kHz
-
Audio formats supported: pcm, wav, mp3, opus, speex, aac, amr
Text embedding
Text embedding models convert text into numerical representations for tasks such as search, clustering, recommendation, and classification. Billing for these models is based on the number of input tokens. API reference
International
In the international deployment mode, endpoints and data storage are located in the Singapore region. Inference computing resources are scheduled globally, excluding Mainland China.
Model | Embedding dimensions | Batch size | Max tokens per batch (Note) | Supported languages | Price (1M input tokens) | Free quota (Note) |
text-embedding-v4 Part of the Qwen3-Embedding series | 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64 | 10 | 8,192 | More than 100 major languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, and various programming languages | $0.07 | 1 million tokens Valid for 90 days after you activate Model Studio. |
text-embedding-v3 | 1,024 (default), 768, or 512 | 10 | 8,192 | Over 50 languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian | 500,000 tokens Valid for 90 days after you activate Model Studio. |
Mainland China
In the Mainland China deployment mode, endpoints and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Model | Embedding dimensions | Batch size | Max tokens per batch (Note) | Supported languages | Price (1M input tokens) | Free quota |
text-embedding-v4 Part of the Qwen3-Embedding series Batch half price | 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64 | 10 | 8,192 | More than 100 major languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, and various programming languages | $0.072 | No free quota |
Batch size is the max number of texts that a single API call can process. For example, the batch size for text-embedding-v4 is 10. This means a single request can vectorize up to 10 texts, and each text cannot exceed 8,192 tokens. This limit applies to:
String array input: The array can contain up to 10 elements.
File input: The text file can contain up to 10 lines of text.
Multimodal embedding
A multimodal embedding model converts text, images, and videos into a vector of floating-point numbers. The model is suitable for applications such as video classification, image classification, and image-text retrieval. API reference
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are scheduled globally, excluding Mainland China.
Model | Data type | Embedding dimensions | Unit price (1M input tokens) | Free quota (Note) |
tongyi-embedding-vision-plus | float(32) | 1,152 | $0.09 | 1 million tokens Valid for 90 days after you activate Model Studio. |
tongyi-embedding-vision-flash | float(32) | 768 | Image/Video: $0.03 Text: $0.09 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Model | Data type | Embedding dimensions | Price (1,000 input tokens) |
multimodal-embedding-v1 | float(32) | 1,024 | Free trial |
Text rerank
This feature is typically used for semantic retrieval. Given a query, it sorts a list of candidate documents in descending order of their semantic relevance. API reference
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Only available in Mainland China (Beijing) region.
Model | Max number of documents | Max input tokens per item | Max input tokens | Supported languages | Price (1M input tokens) |
gte-rerank-v2 | 500 | 4,000 | 30,000 | More than 50 languages, such as Chinese, English, Japanese, Korean, Thai, Spanish, French, Portuguese, German, Indonesian, and Arabic | $0.115 |
Max input tokens per item: Each query or document is limited to 4,000 tokens. Input that exceeds this limit is truncated.
Max number of documents: Each request is limited to 500 documents.
Max input tokens: The total number of tokens for all queries and documents in a single request is limited to 30,000.
Domain specific
Intent recognition
The Qwen intent recognition model can quickly and accurately parse user intents in milliseconds and select the appropriate tools to resolve user issues. API reference | Usage
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
tongyi-intent-detect-v3 | 8,192 | 8,192 | 1,024 | $0.058 | $0.144 |
Role playing
Qwen's role-playing model is ideal for scenarios that require human-like conversation, such as virtual social interactions, NPCs in games, replicating IP characters, hardware, toys, and in-vehicle systems. Compared to other Qwen models, this model offers enhanced capabilities in character fidelity, conversation progression, and empathetic listening. Usage
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen-plus-character | 32,768 | 30,000 | 4,000 | $0.5 | $1.4 |
qwen-flash-character | 8,192 | 8,000 | 4,096 | $0.05 | $0.4 |
qwen-plus-character-ja | 7,680 | 512 | $0.5 | $1.4 | |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen-plus-character | 32,768 | 32,000 | 4,096 | $0.115 | $0.287 |
Retired models
Retired on January 30, 2026
Category | Model | Context window | Max input | Max output | Input cost (per 1M tokens) | Output cost (per 1M tokens) | Alternative model |
(tokens) | |||||||
Qwen-Plus | qwen-plus-2024-11-27 | 131,072 | 129,024 | 8,192 | $0.115 | $0.287 | qwen-plus-2025-12-01 |
qwen-plus-2024-11-25 | |||||||
qwen-plus-2024-09-19 | |||||||
qwen-plus-2024-08-06 | 128,000 | $0.574 | $1.721 | ||||
Qwen-Turbo | qwen-turbo-2024-09-19 | 131,072 | 129,023 | 8,192 | $0.044 | $0.087 | qwen-flash-2025-07-28 |
Qwen-VL | qwen-vl-max-2024-10-30 | 32,768 | 30,720 Max 16384 per image | 2,048 | $2.868 | $2.868 | qwen3-vl-plus-2025-12-19 |
qwen-vl-max-2024-08-09 | |||||||
qwen-vl-plus-2024-08-09 | $0.216 | $0.646 | qwen3-vl-flash-2025-10-15 | ||||


























































