Flagship models
Global
In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Flagship models |
Ideal for complex tasks. The most powerful model. |
A balance of performance, speed, and cost. |
Ideal for simple jobs. Fast and low-cost. |
An excellent code model that excels at tool calling and environment interaction. |
Max context window (Tokens) | 262,144 | 1,000,000 | 1,000,000 | 1,000,000 |
Min input cost (Million tokens) | $1.2 | $0.4 | $0.05 | $0.3 |
Min output cost (Million tokens) | $6 | $1.2 | $0.4 | $1.5 |
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Flagship models |
Ideal for complex tasks. The most powerful model. |
A balance of performance, speed, and cost. |
Ideal for simple jobs. Fast and low-cost. |
An excellent code model that excels at tool calling and environment interaction. |
Max context window (Tokens) | 262,144 | 1,000,000 | 1,000,000 | 1,000,000 |
Min input cost (Million tokens) | $1.2 | $0.4 | $0.05 | $0.3 |
Min output cost (Million tokens) | $6 | $1.2 | $0.4 | $1.5 |
US
In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.
Flagship models |
A balance of performance, speed, and cost. |
Ideal for simple jobs. Fast and low-cost. |
Max context window (Tokens) | 1,000,000 | 1,000,000 |
Min input cost (Million tokens) | $0.4 | $0.05 |
Min output cost (Million tokens) | $1.2 | $0.4 |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Flagship models |
Ideal for complex tasks. The most powerful model. |
A balance of performance, speed, and cost. |
Ideal for simple jobs. Fast and low-cost. |
An excellent code model that excels at tool calling and environment interaction. |
Max context window (Tokens) | 262,144 | 1,000,000 | 1,000,000 | 1,000,000 |
Min input cost (Million tokens) | $0.459 | $0.115 | $0.022 | $0.144 |
Min output cost (Million tokens) | $1.836 | $0.287 | $0.216 | $0.574 |
Model overview
Global
In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Category | Subcategory | Description |
Text generation | Qwen large language models: Commercial models (Qwen-Max, Qwen-Plus, Qwen-Flash), open source models (Qwen3) | |
Visual understanding model Qwen-VL | ||
Image generation |
| |
| ||
Video generation | Generates high-quality videos with rich styles from a single sentence. | |
First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt. | ||
Reference-to-video: Generates a video that maintains character consistency using a prompt and the appearance and voice from an input video. |
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Category | Subcategory | Description |
Text generation | Qwen large language models: Commercial models (Qwen-Max, Qwen-Plus, Qwen-Flash), open source models (Qwen3, Qwen2.5) | |
Visual understanding model Qwen-VL, visual reasoning model QVQ, omni-modal model Qwen-Omni, and real-time multi-modal model Qwen-Omni-Realtime | ||
Image generation |
| |
| ||
Speech synthesis and recognition | Qwen speech synthesis and Qwen realtime speech synthesis can be used for text-to-speech in scenarios such as intelligent voice customer service, audiobooks, in-car navigation, and educational tutoring. | |
Qwen realtime speech recognition, Qwen audio file recognition, Qwen3-LiveTranslate-Flash-Realtime, and Fun-ASR speech recognition can perform speech-to-text for scenarios such as real-time meeting records, real-time live stream captions, and telephone customer service. | ||
Video generation | Generates high-quality videos with rich styles from a single sentence. | |
| ||
Reference-to-video: Generates a video that maintains character consistency using a prompt and the appearance and voice from an input video. | ||
General video editing: Performs various video editing tasks based on input text, images, and videos. For example, it can generate a new video by extracting motion features from an input video and combining them with a prompt. | ||
Embedding | Converts text into a set of numbers that represent the text. It is suitable for search, clustering, recommendation, and classification tasks. |
US
In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.
Category | Subcategory | Description |
Text generation | Qwen large language models: Commercial models (Qwen-Plus, Qwen-Flash) | |
Visual understanding model Qwen-VL | ||
Video generation | Generates high-quality videos with rich styles from a single sentence. | |
First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt. | ||
Speech recognition | Qwen audio file recognition can perform speech-to-text for scenarios such as meeting transcription and live stream captioning. |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Category | Model | Description |
Text generation | ||
Visual understanding model Qwen-VL, visual reasoning model QVQ, and omni-modal model Qwen-Omni | ||
Code model, Mathematical model, Translation model, Data mining model, Research model, Intention recognition model, Role-playing model | ||
Image generation |
| |
General-purpose models:
More models: Qwen Image Translation, OutfitAnyone | ||
Speech synthesis and recognition | Qwen speech synthesis, Qwen realtime speech synthesis, and CosyVoice speech synthesis convert text to speech for scenarios such as voice-based customer service, audiobooks, in-car navigation, and educational tutoring. | |
Qwen realtime speech recognition, Qwen audio file recognition, Fun-ASR speech recognition, and Paraformer speech recognition convert speech to text for scenarios such as real-time meeting transcription, real-time live stream captioning, and customer service calls. | ||
Video editing and generation | Generates high-quality videos with rich styles from a single sentence. | |
| ||
Reference-to-video: Generates a video that maintains character consistency using a prompt and the appearance and voice from an input video. | ||
| ||
Vector | Converts text into a set of numbers that represent the text. It is used for search, clustering, recommendation, and classification. | |
Converts text, images, and speech into a set of numbers. It is used for audio and video classification, image classification, and image-text retrieval. |
Text generation - Qwen
The following are the Qwen commercial models. Compared to the open-source versions, the commercial models offer the latest capabilities and improvements.
The parameter sizes of the commercial models are not disclosed.
Each model is updated periodically. To use a fixed version, you can select a snapshot version. A snapshot version is typically maintained for one month after the release of the next snapshot version.
We recommend that you use the stable or latest version for more lenient rate limiting conditions.
Qwen-Max
The top-performing model in the Qwen series, suitable for complex and multi-step tasks. Usage | API reference | Try it online
Global
In the global deployment mode, endpoints and data storage are both located in the US (Virginia) region, while inference compute resources are dynamically scheduled globally.
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost | Free quota |
(Tokens) | (1,000 tokens) | ||||||||
qwen3-max Currently has the same capabilities as qwen3-max-2025-09-23 Context cache is available at a discount | Stable | Non-thinking only | 262,144 | 258,048 | - | 65,536 | Tiered pricing, see the description below the table. | None | |
qwen3-max-2025-09-23 | Snapshot | Non-thinking only | |||||||
qwen3-max-preview Context cache is available at a discount | Preview | Thinking | 81,920 | 32,768 | |||||
Non-thinking | - | 65,536 | |||||||
The preceding models use tiered pricing based on the number of input tokens per request.
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) Chain-of-thought + response |
0<Token≤32K | $1.2 | $6 |
32K<Token≤128K | $2.4 | $12 |
128K<Token≤252K | $3 | $15 |
International
In the International deployment mode, endpoints and data storage are located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost | Free quota |
(Tokens) | (1,000 tokens) | ||||||||
qwen3-max Currently has the same capabilities as qwen3-max-2025-09-23 Batch calls half price | Stable | Non-thinking only | 262,144 | 258,048 | - | 65,536 | Tiered pricing, see the description below the table. | 1 million tokens each Validity: 90 days after you activate Model Studio | |
qwen3-max-2025-09-23 | Snapshot | Non-thinking only | |||||||
qwen3-max-preview | Preview | Thinking | 81,920 | 32,768 | |||||
Non-thinking | - | 65,536 | |||||||
The preceding models use tiered pricing based on the number of input tokens per request.
Input tokens per request | Input price (Million tokens) qwen3-max, qwen3-max-preview support context cache. | Output price (Million tokens) |
0<Token≤32K | $1.2 | $6 |
32K<Token≤128K | $2.4 | $12 |
128K<Token≤252K | $3 | $15 |
Mainland China
In Mainland China deployment mode, endpoints and data storage are both located in the Beijing region, and inference compute resources are limited to Mainland China.
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost |
(Tokens) | (1,000 Tokens) | |||||||
qwen3-max Currently has the same capabilities as qwen3-max-2025-09-23 Batch calls are half price. | Stable | Non-thinking only | 262,144 | 258,048 | - | 65,536 | Tiered pricing, see the description below the table. | |
qwen3-max-2025-09-23 | Snapshot | Non-thinking only | ||||||
qwen3-max-preview | Preview | Thinking | 81,920 | 32,768 | ||||
Non-thinking | - | 65,536 | ||||||
The preceding models use tiered pricing based on the number of input tokens per request.
Model | Input tokens per request | Price (Million tokens) | Output price (Million tokens) Chain-of-thought + response |
qwen3-max Batch calls: Half price Context cache is discounted | 0<Token≤32K | $0.459 | $1.836 |
32K<Token≤128K | $0.918 | $3.672 | |
128K<Token≤252K | $1.377 | $5.508 | |
qwen3-max-2025-09-23 | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.434 | $5.735 | |
128K<Token≤252K | $2.151 | $8.602 | |
qwen3-max-preview Context cache is discounted | 0<Token≤32K | $0.861 | $3.441 |
32K<Token≤128K | $1.434 | $5.735 | |
128K<Token≤252K | $2.151 | $8.602 |
The `thinking mode` of the qwen3-max-preview model significantly improves its overall reasoning capabilities. It performs better on agent programming, common sense reasoning, and general tasks in math and science.
Qwen-Plus
A balanced model offering a mix of performance, cost, and speed, positioned between Qwen-Max and Qwen-Flash. It is ideal for tasks of moderate complexity. Usage | API reference | Try it online|Thinking mode
Global
In the Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference compute resources are dynamically scheduled globally.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
qwen-plus Currently has the same capabilities as qwen-plus-2025-07-28 Belongs to the Qwen3 series | Stable | 1,000,000 | Thinking mode 995,904 Non-thinking mode 997,952 The default value is 131,072, which you can change using the max_input_tokens parameter. | 32,768 Max chain-of-thought is 81,920 | Tiered pricing, see the description below the table. | |
qwen-plus-2025-12-01 Part of the Qwen3 series | Snapshot | Thinking mode 995,904 Non-thinking mode 997,952 | ||||
qwen-plus-2025-09-11 Part of the Qwen3 series | ||||||
qwen-plus-2025-07-28 Part of the Qwen3 series | ||||||
The preceding models use tiered pricing based on the number of input tokens per request. qwen-plus supports context cache.
Input tokens per request | Input price (Million tokens) | Mode | Output price (Million tokens) |
0<Token≤256K | $0.4 | Non-thinking mode | $1.2 |
Thinking mode | $4 | ||
256K<Token≤1M | $1.2 | Non-thinking mode | $3.6 |
Thinking mode | $12 |
International
In the International deployment mode, endpoints and data storage are located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | ||||||
qwen-plus Currently has the same capabilities as qwen-plus-2025-07-28 Belongs to the Qwen3 series | Stable | 1,000,000 | Thinking mode 995,904 Non-thinking mode 997,952 The default value is 262,144. You can adjust this value with the max_input_tokens parameter. | 32,768 Max chain-of-thought is 81,920 | Tiered pricing, see the description below the table. | 1 million tokens each Validity: 90 days after you activate Model Studio | |
qwen-plus-latest Currently has the same capabilities as qwen-plus-2025-12-01 Belongs to the Qwen3 series | Latest | Thinking mode 995,904 Non-thinking mode 997,952 | |||||
qwen-plus-2025-12-01 Belongs to the Qwen3 series | Snapshot | Thinking mode 995,904 Non-thinking mode 997,952 | |||||
qwen-plus-2025-09-11 Belongs to the Qwen3 series | |||||||
qwen-plus-2025-07-28 Also known as qwen-plus-0728 Belongs to the Qwen3 series | |||||||
qwen-plus-2025-07-14 Also known as qwen-plus-0714 Belongs to the Qwen3 series | 131,072 | Thinking mode 98,304 Non-thinking mode 129,024 | 16,384 Max chain-of-thought is 38,912 | $0.4 | Thinking mode $4 Non-thinking mode $1.2 | ||
qwen-plus-2025-04-28 Also known as qwen-plus-0428 Belongs to the Qwen3 series | |||||||
qwen-plus-2025-01-25 Also known as qwen-plus-0125 | 129,024 | 8,192 | $1.2 | ||||
The qwen-plus, qwen-plus-latest, qwen-plus-2025-12-01, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 models use tiered pricing based on the number of input tokens per request.
Input tokens per request | Input price (Million tokens) | Mode | Output price (Million tokens) |
0<Token≤256K | $0.4 | Non-thinking mode | $1.2 |
Thinking mode | $4 | ||
256K<Token≤1M | $1.2 | Non-thinking mode | $3.6 |
Thinking mode | $12 |
US
In the US deployment mode, endpoints and data storage are both located in the US (Virginia) region, and inference compute resources are limited to the United States.
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | ||||||
qwen-plus-us Currently has the same capabilities as qwen-plus-2025-12-01-us Part of the Qwen3 series | Stable | 1,000,000 | Thinking mode 995,904 Non-thinking mode 997,952 The default value is 262,144. You can adjust this value with the max_input_tokens parameter. | 32,768 Max chain-of-thought is 81,920 | Tiered pricing, see the description below the table. | None | |
qwen-plus-2025-12-01-us Belongs to the Qwen3 series | Snapshot | Thinking mode 995,904 Non-thinking mode 997,952 | |||||
These models use tiered pricing based on the number of input tokens per request. qwen-plus-us supports context cache.
Input tokens per request | Input price (Million tokens) | Mode | Output price (Million tokens) |
0<Token≤256K | $0.4 | Non-thinking mode | $1.2 |
Thinking mode | $4 | ||
256K<Token≤1M | $1.2 | Non-thinking mode | $3.6 |
Thinking mode | $12 |
Mainland China
In Mainland China deployment mode, the endpoints and data storage are both located in the Beijing region, while the inference compute resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
qwen-plus Currently has the same capabilities as qwen-plus-2025-07-28 Belongs to the Qwen3 series | Stable | 1,000,000 | Thinking mode 995,904 Non-thinking mode 997,952 The default value is 131,072. You can adjust this value using the max_input_tokens parameter. | 32,768 Max chain-of-thought is 81,920 | Tiered pricing, see the description below the table. | |
qwen-plus-latest Currently has the same capabilities as qwen-plus-2025-12-01 Part of the Qwen3 series | Latest | Thinking mode 995,904 Non-thinking mode 997,952 | ||||
qwen-plus-2025-12-01 Part of the Qwen3 series | Snapshot | Thinking mode 995,904 Non-thinking mode 997,952 | ||||
qwen-plus-2025-09-11 Part of the Qwen3 series | ||||||
qwen-plus-2025-07-28 Also known as qwen-plus-0728 Part of the Qwen3 series | ||||||
qwen-plus-2025-07-14 Also known as qwen-plus-0714 Part of the Qwen3 series | 131,072 | Thinking mode 98,304 Non-thinking mode 129,024 | 16,384 Max chain-of-thought is 38,912 | $0.115 | Thinking mode $1.147 Non-thinking mode $0.287 | |
qwen-plus-2025-04-28 Also known as qwen-plus-0428 Part of the Qwen3 series | ||||||
The qwen-plus, qwen-plus-latest, qwen-plus-2025-12-01, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 models use tiered pricing based on the number of input tokens per request.
Input tokens per request | Input price (Million tokens) | Mode | Output price (Million tokens) |
0<Token≤128K | $0.115 | Non-thinking mode | $0.287 |
Thinking mode | $1.147 | ||
128K<Token≤256K | $0.345 | Non-thinking mode | $2.868 |
Thinking mode | $3.441 | ||
256K<Token≤1M | $0.689 | Non-thinking mode | $6.881 |
Thinking mode | $9.175 |
The preceding models support both `thinking` and `non-thinking` modes. You can switch between the two modes using the enable_thinking parameter. In addition, the model's capabilities are significantly improved:
Reasoning: Significantly outperforms QwQ and non-reasoning models of the same size in evaluations for tasks such as math, code, and logical reasoning, reaching the top level in the industry for its scale.
Human preference: This model shows great improvements in creative writing, role assumption, multi-turn conversation, and instruction following. Its general capabilities significantly exceed those of other models of the same size.
Agent capabilities: This model achieves industry-leading performance in both `thinking` and `non-thinking` modes and can accurately call external tools.
Multilingual capabilities: Supports over 100 languages and dialects. Multilingual translation, instruction understanding, and common sense reasoning are significantly improved.
Response format: This model fixes response format issues from previous versions, such as abnormal Markdown, mid-sentence truncation, and incorrect boxed output.
For the preceding models, if `thinking mode` is enabled but no thought process is output, the `non-thinking mode` price is charged.
Qwen-Flash
The fastest and most cost-effective model in the Qwen series, ideal for simple tasks. Qwen-Flash uses flexible tiered pricing, which is more reasonable than Qwen-Turbo. Usage | API reference | Try it online | Thinking mode
Global
In the Global deployment mode, the endpoint and data storage are both located in the US (Virginia) region, and inference compute resources are dynamically scheduled globally.
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost Chain-of-thought + output |
(Tokens) | (1,000 Tokens) | |||||||
qwen-flash Currently has the same capabilities as qwen-flash-2025-07-28 Part of the Qwen3 series | Stable | Thinking | 1,000,000 | 995,904 | 81,920 | 32,768 | Tiered pricing, see the description below the table. | |
Non-thinking | 997,952 | - | ||||||
qwen-flash-2025-07-28 Part of the Qwen3 series | Snapshot | Thinking | 995,904 | 81,920 | ||||
Non-thinking | 997,952 | - | ||||||
These models use tiered billing based on the number of tokens in the request. qwen-flash supports context cache.
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0<Token≤256K | $0.05 | $0.4 |
256K<Token≤1M | $0.25 | $2 |
International
In the International deployment mode, both endpoints and data storage are located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost Chain-of-thought + output | Free quota |
(Tokens) | (1,000 tokens) | ||||||||
qwen-flash Currently has the same capabilities as qwen-flash-2025-07-28 Part of the Qwen3 series Batch calls at half price | Stable | Thinking | 1,000,000 | 995,904 | 81,920 | 32,768 | Tiered pricing, see the description below the table. | 1 million tokens each Validity: 90 days after you activate Model Studio | |
Non-thinking | 997,952 | - | |||||||
qwen-flash-2025-07-28 Part of the Qwen3 series | Snapshot | Thinking | 995,904 | 81,920 | |||||
Non-thinking | 997,952 | - | |||||||
These models use tiered billing based on the number of tokens in each request. The qwen-flash model supports cache and Batch invocation.
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0<Token≤256K | $0.05 | $0.4 |
256K<Token≤1M | $0.25 | $2 |
US
In the US deployment mode, the endpoint and data storage are both located in the US (Virginia) region, and the inference compute resource is limited to the United States.
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost Chain-of-thought + output | Free quota |
(Tokens) | (1,000 tokens) | ||||||||
qwen-flash-us Currently has the same capabilities as qwen-flash-2025-07-28-us Part of the Qwen3 series | Stable | Thinking | 1,000,000 | 995,904 | 81,920 | 32,768 | Tiered pricing, see the description below the table. | None | |
Non-thinking | 997,952 | - | |||||||
qwen-flash-2025-07-28-us Part of the Qwen3 series | Snapshot | Thinking | 995,904 | 81,920 | |||||
Non-thinking | 997,952 | - | |||||||
The preceding models use tiered pricing based on the number of input tokens per request.
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0<Token≤256K | $0.05 | $0.4 |
256K<Token≤1M | $0.25 | $2 |
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region, and inference compute resources are limited to Mainland China.
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost Chain-of-thought + output |
(Tokens) | Per 1,000 tokens | |||||||
qwen-flash Currently has the same capabilities as qwen-flash-2025-07-28 Part of the Qwen3 series | Stable | Thinking | 1,000,000 | 995,904 | 81,920 | 32,768 | Tiered pricing, see the description below the table. | |
Non-thinking | 997,952 | - | ||||||
qwen-flash-2025-07-28 Part of the Qwen3 series | Snapshot | Thinking | 995,904 | 81,920 | ||||
Non-thinking | 997,952 | - | ||||||
The preceding models use tiered billing based on the number of tokens per request. The qwen-flash model supports context cache.
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0<Token≤128K | $0.022 | $0.216 |
128K<Token≤256K | $0.087 | $0.861 |
256K<Token≤1M | $0.173 | $1.721 |
Qwen-Turbo
Qwen-Turbo will no longer be updated. We recommend that you replace it with Qwen-Flash. Qwen-Flash uses flexible tiered pricing, which is more reasonable. Usage | API reference | Try it online|Thinking mode
International
In the International deployment mode, both endpoints and data storage are located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | ||||||
qwen-turbo Currently has the same capabilities as qwen-turbo-2025-04-28 Part of the Qwen3 series | Stable | Thinking mode 131,072 Non-thinking mode 1,000,000 | Thinking mode 98,304 Non-thinking mode 1,000,000 | 16,384 Max chain-of-thought is 38,912 | $0.05 Batch calls are half price. | Thinking mode: $0.5 Non-thinking mode: $0.2 Batch calls are half price. | 1 million tokens each Validity: 90 days after you activate Model Studio |
qwen-turbo-latest Always has the same capabilities as the latest snapshot version Part of the Qwen3 series | Latest | $0.05 | Thinking mode: $0.5 Non-thinking mode: $0.2 | ||||
qwen-turbo-2025-04-28 Also known as qwen-turbo-0428 Part of the Qwen3 series | Snapshot | ||||||
qwen-turbo-2024-11-01 Also known as qwen-turbo-1101 | 1,000,000 | 1,000,000 | 8,192 | $0.2 | |||
Mainland China
In Mainland China deployment mode, endpoints and data storage are both located in the Beijing region, and inference compute resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
qwen-turbo Currently has the same capabilities as qwen-turbo-2025-04-28 Part of the Qwen3 series | Stable | Thinking mode 131,072 Non-thinking mode 1,000,000 | Thinking mode 98,304 Non-thinking mode 1,000,000 | 16,384 Max chain-of-thought is 38,912 | $0.044 | Thinking mode $0.431 Non-thinking mode $0.087 |
qwen-turbo-latest Always has the same capabilities as the latest snapshot version Part of the Qwen3 series | Latest | |||||
qwen-turbo-2025-07-15 Also known as qwen-turbo-0715 Part of the Qwen3 series | Snapshot | |||||
qwen-turbo-2025-04-28 Also known as qwen-turbo-0428 Part of the Qwen3 series | ||||||
QwQ
The QwQ reasoning model is trained based on the Qwen2.5 model. Its reasoning capabilities are significantly enhanced through reinforcement learning. Its performance on core metrics for math and code (AIME 24/25 and LiveCodeBench) and some general metrics (IFEval and LiveBench) is on par with the full-power DeepSeek-R1. Usage
International
In the International deployment mode, endpoints and data storage are located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Version | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||||
qwq-plus | Stable | 131,072 | 98,304 | 32,768 | 8,192 | $0.8 | $2.4 | 1 million tokens Validity: 90 days after you activate Model Studio |
Mainland China
In Mainland China deployment mode, both the endpoint and data storage are located in the Beijing region, and inference compute resources are restricted to Mainland China.
Model | Version | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||||
qwq-plus Currently has the same capabilities as qwq-plus-2025-03-05 | Stable | 131,072 | 98,304 | 32,768 | 8,192 | $0.230 | $0.574 |
qwq-plus-latest Always has the same capabilities as the latest snapshot version | Latest | ||||||
qwq-plus-2025-03-05 Also known as qwq-plus-0305 | Snapshot | ||||||
Qwen-Long
The model in the Qwen series with the longest context window. It offers balanced capabilities at a low cost, making it ideal for tasks such as long-text analytics, information extraction, summarization, and classification. Usage | Try it online
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
qwen-long-latest Always has the same capabilities as the latest snapshot version | Stable | 10,000,000 | 10,000,000 | 32,768 | $0.072 | $0.287 |
qwen-long-2025-01-25 Also known as qwen-long-0125 | Snapshot | |||||
Qwen-Omni
The Qwen-Omni model can accept a combination of inputs in multiple modalities, such as text, images, audio, and video, and generate replies in text or speech format. It provides a variety of expressive, human-like voices and supports speech output in multiple languages and dialects. This model can be used in audio-video chat scenarios for tasks such as visual recognition, emotion analysis, education, and training. Usage|API reference
International
In the International deployment mode, the access point and data storage are located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Free quota |
(Tokens) | |||||||
qwen3-omni-flash Currently has the same capabilities as qwen3-omni-flash-2025-12-01 | Stable | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | 1 million tokens each (regardless of modality) Validity: 90 days after you activate Model Studio |
Non-thinking mode | 49,152 | - | |||||
qwen3-omni-flash-2025-12-01 | Snapshot | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | |
Non-thinking mode | 49,152 | - | |||||
qwen3-omni-flash-2025-09-15 Also known as qwen3-omni-flash-0915 | Snapshot | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | |
Non-thinking mode | 49,152 | - | |||||
After the free quota is used, the following billing rules apply to input and output. The billing is the same for both `thinking` and `non-thinking` modes. Audio output is not supported in `thinking mode`.
|
|
Mainland China
In Mainland China deployment mode, both endpoints and data storage are located in the Beijing region, and inference compute resources are limited to Mainland China.
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Free quota |
(Tokens) | |||||||
qwen3-omni-flash This model has the same capabilities as qwen3-omni-flash-2025-12-01. | Stable | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | No free quota |
Non-thinking mode | 49,152 | - | |||||
qwen3-omni-flash-2025-12-01 | Snapshot | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | |
Non-thinking mode | 49,152 | - | |||||
qwen3-omni-flash-2025-09-15 Also known as qwen3-omni-flash-0915 | Snapshot | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | |
Non-thinking mode | 49,152 | - | |||||
After the free quota is used, the following billing rules apply to input and output. The billing is the same for both `thinking` and `non-thinking` modes. Audio output is not supported in `thinking mode`.
|
|
We recommend using the Qwen3-Omni-Flash model first. Compared to Qwen-Omni-Turbo, which will no longer be updated, its capabilities are significantly improved:
It is a hybrid thinking model that supports both `thinking` and `non-thinking` modes. You can switch between the two modes using the
enable_thinkingparameter. By default, `thinking mode` is not enabled.Audio output is not supported in `thinking mode`. In `non-thinking mode`, for the model's audio output:
qwen3-omni-flash-2025-12-01 supports up to 49 voices, qwen3-omni-flash-2025-09-15 and qwen3-omni-flash support up to 17 voices, while Qwen-Omni-Turbo supports only 4.
The number of supported languages is increased to 10, while Qwen-Omni-Turbo supports only 2.
Qwen-Omni-Realtime
Compared to Qwen-Omni, this model supports streaming audio input and has a built-in Voice Activity Detection (VAD) feature, which automatically detects the start and end of user speech. Usage|Client events|Server-side events
International
In the International deployment mode, endpoints and data storage are located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Version | Context window | Max input | Max output | Free quota |
(Tokens) | |||||
qwen3-omni-flash-realtime Offers the same capabilities as qwen3-omni-flash-realtime-2025-12-01. | Stable | 65,536 | 49,152 | 16,384 | 1 million tokens (regardless of modality) Validity: 90 days after you activate Model Studio |
qwen3-omni-flash-realtime-2025-12-01 | Snapshot | ||||
qwen3-omni-flash-realtime-2025-09-15 | |||||
After the free quota is used, the following billing rules apply to input and output:
|
|
Mainland China
In Mainland China deployment mode, endpoints and data storage are both located in the Beijing region, and inference compute resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Free quota |
(Tokens) | |||||
qwen3-omni-flash-realtime This model provides the same capabilities as qwen3-omni-flash-realtime-2025-12-01. | Stable | 65,536 | 49,152 | 16,384 | No free quota |
qwen3-omni-flash-realtime-2025-12-01 | Snapshot | ||||
qwen3-omni-flash-realtime-2025-09-15 | |||||
After the free quota is used, the following billing rules apply to input and output:
|
|
We recommend using the Qwen3-Omni-Flash-Realtime model first. Compared to Qwen-Omni-Turbo-Realtime, which will no longer be updated, its capabilities are significantly improved. For the model's audio output:
qwen3-omni-flash-realtime-2025-12-01 supports up to 49 voices, qwen3-omni-flash-realtime-2025-09-15 and qwen3-omni-realtime-flash support up to 17 voices, while Qwen-Omni-Turbo-Realtime supports only 4.
The number of supported languages is increased to 10, while Qwen-Omni-Turbo-Realtime supports only 2.
QVQ
QVQ is a visual reasoning model that supports visual input and chain-of-thought output. It demonstrates enhanced capabilities in math, programming, visual analysis, creation, and other general tasks. Usage | Try it online
International
In the International deployment mode, endpoints and data storage are both located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Version | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||||
qvq-max Has the same capabilities as qvq-max-2025-03-25. | Stable | 131,072 | 106,496 A max of 16,384 tokens per image. | 16,384 | 8,192 | $1.2 | $4.8 | 1 million tokens for each model Valid for 90 days after you activate Model Studio. |
qvq-max-latest Has the same capabilities as the latest snapshot version. | Latest | |||||||
qvq-max-2025-03-25 Also known as qvq-max-0325. | Snapshot | |||||||
Mainland China
In Mainland China deployment mode, the endpoint and data storage are located in the Beijing region, and inference compute resources are limited to Mainland China.
Model | Version | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||||
qvq-max Compared to qvq-plus, this model provides superior visual reasoning and instruction-following capabilities and is optimized for complex tasks. This model currently provides the same capabilities as qvq-max-2025-03-25. | Stable | 131,072 | 106,496 A max of 16,384 per image. | 16,384 | 8,192 | $1.147 | $4.588 |
qvq-max-latest This model always provides the same capabilities as the latest snapshot version. | Latest | ||||||
qvq-max-2025-05-15 Also known as qvq-max-0515. | Snapshot | ||||||
qvq-max-2025-03-25 Also known as qvq-max-0325. | |||||||
qvq-plus This model currently provides the same capabilities as qvq-plus-2025-05-15. | Stable | $0.287 | $0.717 | ||||
qvq-plus-latest This model always provides the same capabilities as the latest snapshot version. | Latest | ||||||
qvq-plus-2025-05-15 Also known as qvq-plus-0515. | Snapshot | ||||||
Qwen-VL
Qwen-VL is a text generation model with visual (image) understanding capabilities. It can perform Optical Character Recognition (OCR), summarization, and reasoning. For example, it can extract attributes from product photos or solve problems based on diagrams in exercises. Usage | API reference | Try it online
The Qwen-VL model is billed based on the total number of input and output tokens. For more information about the calculation rules for image tokens, see Visual understanding.
Global
In Global deployment mode, endpoints and data storage are both located in the US (Virginia) region, and inference compute resources are dynamically scheduled globally.
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost Chain-of-thought and output |
(Tokens) | (Million tokens) | |||||||
qwen3-vl-plus Currently provides the same capabilities as qwen3-vl-plus-2025-09-23 | Stable | Thinking | 262,144 | 258,048 Max of 16,384 tokens per image | 81,920 | 32,768 | Tiered pricing applies. For more information, see the description below the table. | |
Non-thinking | 260,096 Max of 16,384 tokens per image | - | ||||||
qwen3-vl-plus-2025-09-23 | Snapshot | Thinking | 258,048 A single graph supports a max of 16,384. | 81,920 | ||||
Non-thinking | 260,096 Max per graph: 16,384 | - | ||||||
qwen3-vl-flash Currently provides the same capabilities as qwen3-vl-flash-2025-10-15 | Stable | Thinking | 258,048 Max of 16,384 tokens per image | 81,920 | ||||
Non-thinking | 260,096 A single graph has a max capacity of 16,384. | - | ||||||
qwen3-vl-flash-2025-10-15 | Snapshot | Thinking | 258,048 Max of 16,384 tokens per image | 81,920 | ||||
Non-thinking | 260,096 Max of 16,384 tokens per image | - | ||||||
The preceding models use tiered billing based on the number of input tokens for each request. The input and output prices are the same for both `thinking` and `non-thinking` modes.
qwen3-vl-plus series
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0 < Tokens ≤ 32K | $0.2 | $1.6 |
32K < Tokens ≤ 128K | $0.3 | $2.4 |
128K < Tokens ≤ 256K | $0.6 | $4.8 |
qwen3-vl-flash series
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
Up to 32K | $0.05 | $0.4 |
Over 32K to 128K | $0.075 | $0.6 |
Over 128K to 256K | $0.12 | $0.96 |
International
In International deployment mode, the endpoint and data storage are located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost Chain-of-thought and output | Free quota |
(Tokens) | (Million tokens) | ||||||||
qwen3-vl-plus Offers the same capabilities as qwen3-vl-plus-2025-09-23 | Stable | Thinking | 262,144 | 258,048 Max 16,384 per image | 81,920 | 32,768 | Tiered pricing, see the description below the table. | Each contains 1 million tokens. Validity: 90 days after you activate Model Studio | |
Non-thinking | 260,096 Max 16,384 per image | - | |||||||
qwen3-vl-plus-2025-12-19 | Snapshot | Thinking | 258,048 Max 16,384 per image | 81,920 | |||||
Non-thinking | 260,096 Max 16,384 per image | - | |||||||
qwen3-vl-plus-2025-09-23 | Snapshot | Thinking | 258,048 Max 16,384 per image | 81,920 | |||||
Non-thinking | 260,096 Max 16,384 per image | - | |||||||
qwen3-vl-flash Offers the same capabilities as qwen3-vl-flash-2025-10-15 | Stable | Thinking | 258,048 Max 16,384 per image | 81,920 | |||||
Non-thinking | 260,096 Max 16,384 per image | - | |||||||
qwen3-vl-flash-2025-10-15 | Snapshot | Thinking | 258,048 Max 16,384 per image | 81,920 | |||||
Non-thinking | 260,096 Max 16,384 per image | - | |||||||
The preceding models use tiered pricing based on the number of input tokens per request. The input and output prices are the same for both `thinking` and `non-thinking` modes.
qwen3-vl-plus series
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0 < Tokens ≤ 32K | $0.20 | $1.60 |
32K < Tokens ≤ 128K | $0.30 | 2.4 |
128K < Tokens ≤ 256K | $0.60 | $4.80 |
qwen3-vl-flash series
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0 < Tokens ≤ 32K | $0.05 | $0.4 |
32K < Tokens ≤ 128K | $0.075 | $0.6 |
128K < Tokens ≤ 256K | $0.12 | $0.96 |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region, and the inference compute resource is limited to the United States.
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost Chain-of-thought and output |
(Tokens) | (Million tokens) | |||||||
qwen3-vl-flash-us Same capabilities as qwen3-vl-flash-2025-10-15-us | Stable | Thinking | 258,048 Max 16,384 per image | 81,920 | 32,768 | Tiered pricing, see the description below the table. | ||
Non-thinking | 260,096 Max 16,384 per image | - | ||||||
qwen3-vl-flash-2025-10-15-us | Snapshot | Thinking | 258,048 Max 16,384 per image | 81,920 | ||||
Non-thinking | 260,096 Max 16,384 per image | - | ||||||
The preceding models are billed on a tiered basis according to the number of input tokens in the request. The input and output prices are the same for both thinking mode and non-thinking mode. The qwen3-vl-flash-us model supports context cache.
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
Up to 32K | $0.05 | $0.4 |
Over 32K, up to 128K | $0.075 | $0.6 |
Over 128K, up to 256K | $0.12 | $0.96 |
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region, and inference compute resources are limited to Mainland China.
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | ||||||||
qwen3-vl-plus Has the same capabilities as qwen3-vl-plus-2025-09-23. | Stable | Thinking | 262,144 | 258,048 A max of 16,384 per image | 81,920 | 32,768 | Tiered pricing applies. For details, see the description below this table. | No free quota | |
Non-thinking | 260,096 A max of 16,384 per image | - | |||||||
qwen3-vl-plus-2025-12-19 | Snapshot | Thinking | 258,048 A max of 16,384 per image | 81,920 | |||||
Non-thinking | 260,096 A max of 16,384 per image | - | |||||||
qwen3-vl-plus-2025-09-23 | Snapshot | Thinking | 258,048 A max of 16,384 per image | 81,920 | |||||
Non-thinking | 260,096 A max of 16,384 per image | - | |||||||
qwen3-vl-flash Has the same capabilities as qwen3-vl-flash-2025-10-15. | Stable | Thinking | 258,048 A max of 16,384 per image | 81,920 | |||||
Non-thinking | 260,096 A max of 16,384 per image | - | |||||||
qwen3-vl-flash-2025-10-15 | Snapshot | Thinking | 258,048 A max of 16,384 per image | 81,920 | |||||
Non-thinking | 260,096 A max of 16,384 per image | - | |||||||
The preceding models use tiered pricing based on the number of input tokens per request. The input and output prices are the same for both `thinking` and `non-thinking` modes.
qwen3-vl-plus series
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0 < tokens ≤ 32K | $0.143 | $1.434 |
32K < tokens ≤ 128K | $0.215 | $2.15 |
128K < tokens ≤ 256K | $0.43 | $4.301 |
qwen3-vl-flash series
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0 < Tokens ≤ 32K | $0.022 | $0.215 |
32K < Tokens ≤ 128K | $0.043 | $0.43 |
128K < Tokens ≤ 256K | $0.086 | $0.859 |
Qwen-OCR
The Qwen-OCR model specializes in text extraction. Compared to the Qwen-VL model, Qwen-OCR focuses more on extracting text from images of documents, tables, exam questions, and handwriting. It can recognize multiple languages, including English, French, Japanese, Korean, German, Russian, and Italian. Usage | API reference|Try it online
Global
In the global deployment mode, both the endpoint and data storage are located in the US (Virginia) region, and inference compute resources are dynamically scheduled globally.
Model | Version | Context window | Max input | Max output | Input price | Output price |
(Tokens) | (Million tokens) | |||||
qwen-vl-ocr Provides the same capabilities as qwen-vl-ocr-2025-11-20. | Stable | 34,096 | 30,000 Max of 30,000 per image. | 4096 | $0.07 | $0.16 |
qwen-vl-ocr-2025-11-20 Also known as qwen-vl-ocr-1120. This model is based on the Qwen3-VL architecture and provides significantly improved document parsing and text localization capabilities. | Snapshot | 38,192 | 8,192 | |||
International
In the International deployment mode, endpoints and data storage are located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Version | Context window | Max input | Max output | Input price | Output price | Free quota |
(Tokens) | (Million tokens) | ||||||
qwen-vl-ocr | Stable | 34,096 | 30,000 Max of 30,000 per image. | 4,096 | $0.72 | $0.72 | 1 million input tokens and 1 million output tokens Valid for 90 days after you activate Model Studio. |
qwen-vl-ocr-2025-11-20 Also known as qwen-vl-ocr-1120. Based on the Qwen3-VL architecture, this model offers significantly improved document parsing and text localization. | Snapshot | 38,192 | 8,192 | $0.07 | $0.16 | ||
Mainland China
In Mainland China deployment mode, endpoints and data storage are located in the Beijing region, and inference compute resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Input price | Output price | Free quota |
(Tokens) | (Million tokens) | ||||||
qwen-vl-ocr Equivalent to qwen-vl-ocr-2025-08-28. | Stable | 34,096 | 30,000 Max 30,000 per image. | 4,096 | $0.717 | $0.717 | No free quota |
qwen-vl-ocr-latest It always has the capabilities of the latest version. | Latest | 38,192 | 8,192 | $0.043 | $0.072 | ||
qwen-vl-ocr-2025-11-20 Also known as qwen-vl-ocr-1120. Based on the Qwen3-VL architecture, this version provides significantly improved document parsing and text localization capabilities. | Snapshot | ||||||
qwen-vl-ocr-2025-08-28 Also known as qwen-vl-ocr-0828. | 34,096 | 4,096 | $0.717 | $0.717 | |||
qwen-vl-ocr-2025-04-13 Also known as qwen-vl-ocr-0413. | |||||||
qwen-vl-ocr-2024-10-28 Also known as qwen-vl-ocr-1028. | |||||||
Qwen-Math
Qwen-Math is a language model that specializes in solving math problems. Usage | API reference | Try it online
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
qwen-math-plus Equivalent to qwen-math-plus-2024-09-19 | Stable | 4,096 | 3,072 | 3,072 | $0.574 | $1.721 |
qwen-math-plus-latest Equivalent to the latest snapshot version | Latest | |||||
qwen-math-plus-2024-09-19 Also known as qwen-math-plus-0919 | Snapshot | |||||
qwen-math-plus-2024-08-16 Also known as qwen-math-plus-0816 | ||||||
qwen-math-turbo Equivalent to qwen-math-turbo-2024-09-19 | Stable | $0.287 | $0.861 | |||
qwen-math-turbo-latest Equivalent to the latest snapshot version | Latest | |||||
qwen-math-turbo-2024-09-19 Also known as qwen-math-turbo-0919 | Snapshot | |||||
Qwen-Coder
Qwen-Coder is a code model. The latest Qwen3-Coder-Plus series is a code generation model based on Qwen3. It has powerful coding agent capabilities, excels at tool calling and environment interaction, and can perform autonomous programming. It has excellent coding and general-purpose capabilities. Usage | API reference | Try it online
Global
In the Global deployment mode, endpoints and data storage are located in the US (Virginia) region, while inference compute resources are dynamically scheduled globally.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
qwen3-coder-plus Currently identical to qwen3-coder-plus-2025-09-23. | Stable | 1,000,000 | 997,952 | 65,536 | Tiered pricing applies. For more information, see the description below this table. | |
qwen3-coder-plus-2025-09-23 | Snapshot | |||||
qwen3-coder-plus-2025-07-22 | Snapshot | |||||
qwen3-coder-flash Currently identical to qwen3-coder-flash-2025-07-28. | Stable | |||||
qwen3-coder-flash-2025-07-28 | Snapshot | |||||
The preceding models use tiered pricing based on the number of input tokens per request.
qwen3-coder-plus series
The prices for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are as follows. qwen3-coder-plus supports context cache. Input text that hits the implicit cache is billed at 20% of the price.
Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
0 < Tokens ≤ 32K | $1 | $5 |
32K < Tokens ≤ 128K | $1.8 | $9 |
128K < Tokens ≤ 256K | $3 | $15 |
256K < Tokens ≤ 1M | $6 | $60 |
qwen3-coder-flash series
The prices for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are as follows. qwen3-coder-flash supports context cache. Input text that hits the cache is billed at 20% of the price.
Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
0 < Tokens ≤ 32K | $0.3 | $1.5 |
32K < Tokens ≤ 128K | $0.5 | $2.5 |
128K < Tokens ≤ 256K | $0.8 | $4 |
256K < Tokens ≤ 1M | $1.6 | $9.6 |
International
In the International deployment mode, endpoints and data storage are located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | ||||||
qwen3-coder-plus Same capabilities as qwen3-coder-plus-2025-09-23 | Stable | 1,000,000 | 997,952 | 65,536 | Tiered pricing, see the description below the table. | 1 million tokens each Validity: 90 days after you activate Model Studio | |
qwen3-coder-plus-2025-09-23 | Snapshot | ||||||
qwen3-coder-plus-2025-07-22 | Snapshot | ||||||
qwen3-coder-flash Same capabilities as qwen3-coder-flash-2025-07-28 | Stable | ||||||
qwen3-coder-flash-2025-07-28 | Snapshot | ||||||
The preceding models use tiered pricing based on the number of input tokens per request.
qwen3-coder-plus series
The prices for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are as follows. qwen3-coder-plus supports context cache. Input text that hits the implicit cache is billed at 20% of the price, and input text that hits the explicit cache is billed at 10% of the price.
Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
0 < tokens ≤ 32K | $1 | $5 |
32K < tokens ≤ 128K | $1.8 | $9 |
128K < tokens ≤ 256K | $3 | $15 |
256K < tokens ≤ 1M | $6 | $60 |
qwen3-coder-flash series
The prices for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are as follows. qwen3-coder-flash supports context cache. Input text that hits the implicit cache is billed at 20% of the price, and input text that hits the explicit cache is billed at 10% of the price.
Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
0 < Tokens ≤ 32K | $0.30 | $1.50 |
32K < Tokens ≤ 128K | $0.50 | $2.50 |
128K < Tokens ≤ 256K | $0.80 | $4.00 |
256K < Tokens ≤ 1M | $1.6 | $9.6 |
Mainland China
In Mainland China deployment mode, endpoints and data storage are both located in the Beijing region, and inference compute resources are limited to Mainland China.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million Tokens) | |||||
qwen3-coder-plus This is functionally equivalent to qwen3-coder-plus-2025-09-23. | Stable | 1,000,000 | 997,952 | 65,536 | Tiered pricing applies. See the notes below the table. | |
qwen3-coder-plus-2025-09-23 | Snapshot | |||||
qwen3-coder-plus-2025-07-22 | Snapshot | |||||
qwen3-coder-flash The current capabilities are the same as those of qwen3-coder-flash-2025-07-28. | Stable | |||||
qwen3-coder-flash-2025-07-28 | Snapshot | |||||
The preceding models use tiered pricing based on the number of input tokens per request.
qwen3-coder-plus series
The prices for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are as follows. qwen3-coder-plus supports context cache. Input text that hits the implicit cache is billed at 20% of the price, and input text that hits the explicit cache is billed at 10% of the price.
Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
0 < Tokens ≤ 32K | $0.574 | $2.294 |
32K < Tokens ≤ 128K | $0.861 | $3.441 |
128K < Tokens ≤ 256K | $1.434 | $5.735 |
256K < Tokens ≤ 1M | $2.868 | $28.671 |
qwen3-coder-flash series
The prices for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are as follows. qwen3-coder-flash supports context cache. Input text that hits the implicit cache is billed at 20% of the price, and input text that hits the explicit cache is billed at 10% of the price.
Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
Up to 32 K | $0.144 | $0.574 |
Over 32 K to 128 K | $0.216 | $0.861 |
Over 128 K to 256 K | $0.359 | $1.434 |
Over 256 K to 1 M | $0.717 | $3.584 |
Qwen-MT
Qwen-MT is a flagship large translation model that is fully upgraded based on Qwen 3. It supports mutual translation between 92 languages, including Chinese, English, Japanese, Korean, French, Spanish, German, Thai, Indonesian, Vietnamese, and Arabic. The model's performance and translation quality are fully upgraded. It provides more stable term customization, format restoration, and domain prompt capabilities, which makes translations more accurate and natural. Usage
Global
In the global deployment mode, the endpoint and data storage are both located in the US (Virginia) region, while inference compute resources are dynamically scheduled globally.
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
qwen-mt-plus Part of the Qwen3-MT | 16,384 | 8,192 | 8,192 | $2.46 | $7.37 |
qwen-mt-flash Part of the Qwen3-MT | $0.16 | $0.49 | |||
qwen-mt-lite Part of the Qwen3-MT | $0.12 | $0.36 | |||
International
In the International deployment mode, the endpoint and data storage are both located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||
qwen-mt-plus Part of the Qwen3-MT | 16,384 | 8,192 | 8,192 | $2.46 | $7.37 | 1 million tokens Valid for 90 days after you activate Model Studio. |
qwen-mt-flash Part of the Qwen3-MT | $0.16 | $0.49 | ||||
qwen-mt-lite Part of the Qwen3-MT | $0.12 | $0.36 | ||||
qwen-mt-turbo Part of the Qwen3-MT | $0.16 | $0.49 | ||||
Mainland China
In Mainland China deployment mode, endpoints and data storage are both located in the Beijing region, and inference compute resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (Million tokens) | ||||
qwen-mt-plus Part of the Qwen3-MT | 16,384 | 8,192 | 8,192 | $0.259 | $0.775 |
qwen-mt-flash Part of the Qwen3-MT | $0.101 | $0.280 | |||
qwen-mt-lite Part of the Qwen3-MT | $0.086 | $0.229 | |||
qwen-mt-turbo Part of the Qwen3-MT | $0.101 | $0.280 | |||
Qwen data mining model
The Qwen data mining model can extract structured information from documents and can be used in fields such as data annotation and content moderation. Usage | API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||
qwen-doc-turbo | 262,144 | 253,952 | 32,768 | $0.087 | $0.144 | No free quota |
Qwen deep research model
The Qwen deep research model can break down complex problems, conduct reasoning and analysis in combination with internet searches, and generate research reports. Usage | API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (1,000 tokens) | ||||
qwen-deep-research | 1,000,000 | 997,952 | 32,768 | $0.007742 | $0.023367 |
Text generation - Qwen - Open source
In the model names, xxb indicates the parameter size. For example, qwen2-72b-instruct indicates a parameter size of 72 billion (72B).
Model Studio supports invoking the open-source versions of Qwen. You do not need to deploy the models locally. For open-source versions, we recommend using the Qwen3 and Qwen2.5 models.
Qwen3
The qwen3-next-80b-a3b-thinking model, released in September 2025, supports only `thinking mode`. It offers improved instruction-following capabilities and more concise summaries compared to qwen3-235b-a22b-thinking-2507.
The qwen3-next-80b-a3b-instruct model, released in September 2025, supports only `non-thinking mode`. It provides enhanced Chinese comprehension, logical reasoning, and text generation capabilities compared to qwen3-235b-a22b-instruct-2507.
The qwen3-235b-a22b-thinking-2507 and qwen3-30b-a3b-thinking-2507 models, released in July 2025, support only `thinking mode`. They are upgraded versions of qwen3-235b-a22b (`thinking mode`) and qwen3-30b-a3b (`thinking mode`).
The qwen3-235b-a22b-instruct-2507 and qwen3-30b-a3b-instruct-2507 models, released in July 2025, support only `non-thinking mode`. They are upgraded versions of qwen3-235b-a22b (`non-thinking mode`) and qwen3-30b-a3b (`non-thinking mode`).
The Qwen3 models, released in April 2025, support both `thinking` and `non-thinking` modes. You can use the enable_thinking parameter to switch between the modes. The Qwen3 models also feature significant capability enhancements:
Reasoning capabilities: In evaluations for math, code, and logical reasoning, these models significantly outperform QwQ and non-reasoning models of the same size, reaching top-tier industry performance for their scale.
Human preference alignment: These models show greatly improved capabilities in creative writing, role-playing, multi-turn conversation, and instruction following. Their general capabilities significantly surpass those of other models of the same size.
Agent capabilities: These models reach industry-leading levels in both `thinking` and `non-thinking` modes, enabling precise external tool calling.
Multilingual capabilities: Supports over 100 languages and dialects, with significant improvements in multilingual translation, instruction understanding, and common-sense reasoning.
Response format fixes: These models have fixed response format issues from previous versions, such as abnormal Markdown, mid-response truncation, and incorrect boxed output.
The open source Qwen3 models released in April 2025 do not support non-streaming output in `thinking mode`.
If an open source Qwen3 model is in `thinking mode` but does not output a chain-of-thought, it is billed at the `non-thinking mode` rate.
Thinking mode | Non-thinking mode | Usage
Global
In the Global deployment mode, endpoints and data storage are located in the US (Virginia) region, and inference compute resources are dynamically scheduled globally.
Model | Mode | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||||
qwen3-next-80b-a3b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.15 | $1.2 | No free quota |
qwen3-next-80b-a3b-instruct | Non-thinking only | 129,024 | - | |||||
qwen3-235b-a22b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.23 | $2.30 | |||
qwen3-235b-a22b-instruct-2507 | Non-thinking only | 129,024 | - | $0.92 | ||||
qwen3-30b-a3b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.20 | $2.40 | |||
qwen3-30b-a3b-instruct-2507 | Non-thinking only | 129,024 | - | $0.80 | ||||
qwen3-235b-a22b | Non-thinking | 129,024 | - | 16,384 | $0.70 | $2.8 | ||
Thinking | 98,304 | 38,912 | $8.40 | |||||
qwen3-32b | Non-thinking | 129,024 | - | $0.16 | $0.64 | |||
Thinking | 98,304 | 38,912 | ||||||
qwen3-30b-a3b | Non-thinking | 129,024 | - | $0.20 | $0.80 | |||
Thinking | 98,304 | 38,912 | $2.40 | |||||
qwen3-14b | Non-thinking | 129,024 | - | 8,192 | $0.35 | $1.40 | ||
Thinking | 98,304 | 38,912 | $4.2 | |||||
qwen3-8b | Non-thinking | 129,024 | - | $0.18 | $0.70 | |||
Thinking | 98,304 | 38,912 | $2.1 | |||||
International
In the International deployment mode, endpoints and data storage are located in the Singapore region, while inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Mode | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||||
qwen3-next-80b-a3b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.15 | $1.2 | 1 million tokens for each model Valid for 90 days after you activate Model Studio |
qwen3-next-80b-a3b-instruct | Non-thinking only | 129,024 | - | |||||
qwen3-235b-a22b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.23 | $2.3 | |||
qwen3-235b-a22b-instruct-2507 | Non-thinking only | 129,024 | - | $0.92 | ||||
qwen3-30b-a3b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.2 | $2.4 | |||
qwen3-30b-a3b-instruct-2507 | Non-thinking only | 129,024 | - | $0.8 | ||||
qwen3-235b-a22b This model and the following models were released in April 2025. | Non-thinking | 129,024 | - | 16,384 | $0.7 | $2.8 | ||
Thinking | 98,304 | 38,912 | $8.4 | |||||
qwen3-32b | Non-thinking | 129,024 | - | $0.16 | $0.64 | |||
Thinking | 98,304 | 38,912 | ||||||
qwen3-30b-a3b | Non-thinking | 129,024 | - | $0.2 | $0.8 | |||
Thinking | 98,304 | 38,912 | $2.4 | |||||
qwen3-14b | Non-thinking | 129,024 | - | 8,192 | $0.35 | $1.4 | ||
Thinking | 98,304 | 38,912 | $4.2 | |||||
qwen3-8b | Non-thinking | 129,024 | - | $0.18 | $0.7 | |||
Thinking | 98,304 | 38,912 | $2.1 | |||||
qwen3-4b | Non-thinking | 129,024 | - | $0.11 | $0.42 | |||
Thinking | 98,304 | 38,912 | $1.26 | |||||
qwen3-1.7b | Non-thinking | 32,768 | 30,720 | - | $0.42 | |||
Thinking | 28,672 | The total value cannot exceed 30,720. | $1.26 | |||||
qwen3-0.6b | Non-thinking | 30,720 | - | $0.42 | ||||
Thinking | 28,672 | The total value cannot exceed 30,720. | $1.26 | |||||
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region, and inference compute resources are limited to Mainland China.
Model | Mode | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||||
qwen3-next-80b-a3b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.144 | $1.434 | No free quota |
qwen3-next-80b-a3b-instruct | Non-thinking only | 129,024 | - | $0.574 | ||||
qwen3-235b-a22b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.287 | $2.868 | |||
qwen3-235b-a22b-instruct-2507 | Non-thinking only | 129,024 | - | $1.147 | ||||
qwen3-30b-a3b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.108 | $1.076 | |||
qwen3-30b-a3b-instruct-2507 | Non-thinking only | 129,024 | - | $0.431 | ||||
qwen3-235b-a22b | Non-thinking | 129,024 | - | 16,384 | $0.287 | $1.147 | ||
Thinking | 98,304 | 38,912 | $2.868 | |||||
qwen3-32b | Non-thinking | 129,024 | - | $0.287 | $1.147 | |||
Thinking | 98,304 | 38,912 | $2.868 | |||||
qwen3-30b-a3b | Non-thinking | 129,024 | - | $0.108 | $0.431 | |||
Thinking | 98,304 | 38,912 | $1.076 | |||||
qwen3-14b | Non-thinking | 129,024 | - | 8,192 | $0.144 | $0.574 | ||
Thinking | 98,304 | 38,912 | $1.434 | |||||
qwen3-8b | Non-thinking | 129,024 | - | $0.072 | $0.287 | |||
Thinking | 98,304 | 38,912 | $0.717 | |||||
qwen3-4b | Non-thinking | 129,024 | - | $0.044 | $0.173 | |||
Thinking | 98,304 | 38,912 | $0.431 | |||||
qwen3-1.7b | Non-thinking | 32,768 | 30,720 | - | $0.173 | |||
Thinking | 28,672 | The total value cannot exceed 30,720. | $0.431 | |||||
qwen3-0.6b | Non-thinking | 30,720 | - | $0.173 | ||||
Thinking | 28,672 | The total value cannot exceed 30,720. | $0.431 | |||||
QwQ - Open source
The QwQ reasoning model is trained on Qwen2.5-32B. Reinforcement learning has significantly improved its inference capabilities. Core metrics for math and code (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench) are comparable to the full-power version of DeepSeek-R1. All metrics significantly exceed those of DeepSeek-R1-Distill-Qwen-32B, which is also based on Qwen2.5-32B. Usage | API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
qwq-32b | 131,072 | 98,304 | 32,768 | 8,192 | $0.287 | $0.861 |
QwQ-Preview
The qwq-32b-preview model is an experimental research model developed by the Qwen team in 2024. It focuses on enhancing AI reasoning capabilities, especially in math and programming. For more information about the limitations of the qwq-32b-preview model, see the QwQ official blog. Usage | API reference | Try it online
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
qwq-32b-preview | 32,768 | 30,720 | 16,384 | $0.287 | $0.861 |
Qwen2.5
QVQ
The qvq-72b-preview model is an experimental research model developed by the Qwen team. It focuses on enhancing visual reasoning capabilities, especially in mathematical reasoning. For more information about the limitations of the qvq-72b-preview model, see the QVQ official blog.Usage | API reference
To have the model output its thinking process before the final answer, you can use the commercial version of the QVQ model.
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
qvq-72b-preview | 32,768 | 16,384 Max 16,384 tokens per image | 16,384 | $1.721 | $5.161 |
Qwen-Omni
This is a new multimodal large model for understanding and generation, trained on Qwen2.5. It supports text, image, speech, and video inputs, and can generate text and speech simultaneously in a stream. Its multimodal content understanding speed is significantly improved.Usage | API reference
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Context window | Max input | Max output | Free quota |
(Tokens) | ||||
qwen2.5-omni-7b | 32,768 | 30,720 | 2,048 | 1 million tokens (regardless of modality) Valid for 90 days after activating Model Studio. |
After the free quota is used up, the following billing rules apply to inputs and outputs:
|
|
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output |
(Tokens) | |||
qwen2.5-omni-7b | 32,768 | 30,720 | 2,048 |
The billing rules for inputs and outputs are as follows:
|
|
Qwen3-Omni-Captioner
Qwen3-Omni-Captioner is an open-source model based on Qwen3-Omni. Without any prompts, it automatically generates accurate and comprehensive descriptions for complex audio, such as speech, ambient sounds, music, and sound effects. It can identify speaker emotions, musical elements (such as style and instruments), and sensitive information, making it suitable for applications such as audio content analysis, security audits, intent recognition, and audio editing. Usage | API reference
International
In the International deployment mode, the endpoint and data storage are located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||
qwen3-omni-30b-a3b-captioner | 65,536 | 32,768 | 32,768 | $3.81 | $3.06 | 1 million tokens Valid for 90 days after you activate Alibaba Cloud Model Studio |
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region, and inference compute resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Number of tokens) | (Million tokens) | |||||
qwen3-omni-30b-a3b-captioner | 65,536 | 32,768 | 32,768 | $2.265 | $1.821 | No free quota |
Qwen-VL
This is the open-source version of Alibaba Cloud's Qwen-VL. Usage | API reference
The Qwen3-VL model offers significant improvements over Qwen2.5-VL:
Agent interaction: This model operates computer and mobile interfaces by recognizing Graphical User Interface (GUI) elements, understanding features, and calling tools to perform tasks. It achieves top-tier performance in evaluations such as OS World.
Visual encoding: This model generates code from images or videos. You can use it to create HTML, CSS, and JS code from design files or website screenshots.
Spatial intelligence: This model supports 2D and 3D positioning. It accurately determines object orientation, perspective changes, and occlusion relationships.
Long video understanding: This model understands video content up to 20 minutes long. It can pinpoint specific moments with second-level precision.
Deep thinking: This model excels at capturing details and analyzing cause and effect. It achieves top-tier performance in evaluations such as MathVista and MMMU.
OCR: This model supports 33 languages. It performs more stably in scenarios with complex lighting, blur, or tilt. The model significantly improves recognition accuracy for rare characters, ancient scripts, and technical terms.
Global
In the global deployment mode, the endpoint and data storage are located in the US (Virginia) region, and inference compute resources are dynamically scheduled globally.
Model | Mode | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost Chain-of-thought and outputs |
(Tokens) | (Million tokens) | ||||||
qwen3-vl-235b-a22b-thinking | Thinking only | 126,976 | 81,920 | $0.4 | $4 | ||
qwen3-vl-235b-a22b-instruct | Non-Thinking only | 129,024 | - | $1.6 | |||
qwen3-vl-32b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.16 | $0.64 |
qwen3-vl-32b-instruct | Non-Thinking only | 129,024 | - | ||||
qwen3-vl-30b-a3b-thinking | Thinking only | 126,976 | 81,920 | $0.2 | $2.4 | ||
qwen3-vl-30b-a3b-instruct | Non-Thinking only | 129,024 | - | $0.8 | |||
qwen3-vl-8b-thinking | Thinking only | 126,976 | 81,920 | $0.18 | $2.1 | ||
qwen3-vl-8b-instruct | Non-Thinking only | 129,024 | - | $0.7 | |||
International
In the International deployment mode, endpoints and data storage are located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Mode | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost chain-of-thought + Outputs | Free quota |
(Tokens) | (Million tokens) | |||||||
qwen3-vl-235b-a22b-thinking | Thinking mode | 126,976 | 81,920 | $0.4 | $4 | 1 million tokens for each model Valid for 90 days after you activate Model Studio | ||
qwen3-vl-235b-a22b-instruct | Non-thinking mode | 129,024 | - | $1.6 | ||||
qwen3-vl-32b-thinking | Thinking mode | 131,072 | 126,976 | 81,920 | 32,768 | $0.16 | $0.64 | |
qwen3-vl-32b-instruct | Non-thinking mode | 129,024 | - | |||||
qwen3-vl-30b-a3b-thinking | Thinking mode | 126,976 | 81,920 | $0.2 | $2.4 | |||
qwen3-vl-30b-a3b-instruct | Non-thinking mode | 129,024 | - | $0.8 | ||||
qwen3-vl-8b-thinking | Thinking mode | 126,976 | 81,920 | $0.18 | $2.1 | |||
qwen3-vl-8b-instruct | Non-thinking mode | 129,024 | - | $0.7 | ||||
Mainland China
In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region, and inference compute resources are restricted to Mainland China.
Model | Mode | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost Chain-of-thought + Outputs | Free quota |
(Tokens) | (Million tokens) | |||||||
qwen3-vl-235b-a22b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | $0.287 | $2.867 | No free quota | |
qwen3-vl-235b-a22b-instruct | Non-Thinking only | 129,024 | - | $1.147 | ||||
qwen3-vl-32b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.287 | $2.868 | |
qwen3-vl-32b-instruct | Non-Thinking only | 129,024 | - | $1.147 | ||||
qwen3-vl-30b-a3b-thinking | Thinking only | 126,976 | 81,920 | $0.108 | $1.076 | |||
qwen3-vl-30b-a3b-instruct | Non-Thinking only | 129,024 | - | $0.431 | ||||
qwen3-vl-8b-thinking | Thinking only | 126,976 | 81,920 | $0.072 | $0.717 | |||
qwen3-vl-8b-instruct | Non-Thinking only | 129,024 | - | $0.287 | ||||
Qwen-Math
This is a language model built on the Qwen model that is specialized for solving mathematical problems. Qwen2.5-Math supports Chinese and English and integrates multiple reasoning methods, such as Chain of Thought (CoT), Program of Thought (PoT), and Tool-Integrated Reasoning (TIR). Usage | API reference | Try it online
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
qwen2.5-math-72b-instruct | 4,096 | 3,072 | 3,072 | $0.574 | $1.721 |
qwen2.5-math-7b-instruct | $0.144 | $0.287 | |||
qwen2.5-math-1.5b-instruct | Free for a limited time | ||||
Qwen-Coder
Qwen-Coder is an open-source code model from the Qwen series. The latest Qwen3-Coder series has powerful coding agent capabilities. It excels at tool calling, environment interaction, and autonomous programming. The model combines excellent coding skills with general-purpose capabilities. Usage | API reference
Global
In Global Deployment Mode, endpoints and data storage are both located in the US (Virginia) region, and inference compute resources are dynamically scheduled globally.
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
qwen3-coder-480b-a35b-instruct | 262,144 | 204,800 | 65,536 | Tiered pricing applies. For more information, see the description below this table. | |
qwen3-coder-30b-a3b-instruct | |||||
qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct use tiered pricing based on the number of input tokens per request.
Model | Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
qwen3-coder-480b-a35b-instruct | Up to 32K | $1.5 | $7.5 |
32 K < Token ≤ 128 K | $2.7 | $13.5 | |
128 KB < Token ≤ 200 KB | $4.5 | $22.5 | |
qwen3-coder-30b-a3b-instruct | Up to 32K | $0.45 | $2.25 |
32 K < Token ≤ 128 K | $0.75 | $3.75 | |
128 K < Token ≤ 200 K | $1.2 | $6 |
International
In the International deployment mode, the endpoint and data storage are both located in the Singapore region, and inference compute resources are dynamically scheduled globally (excluding Mainland China).
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | ||||||
qwen3-coder-480b-a35b-instruct | 262,144 | 204,800 | 65,536 | Tiered pricing, see the description below the table. | 1 million tokens each Validity: 90 days after you activate Model Studio | |
qwen3-coder-30b-a3b-instruct | ||||||
qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct use tiered pricing based on the number of input tokens per request.
Model | Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
qwen3-coder-480b-a35b-instruct | 0 < tokens ≤ 32K | $1.5 | $7.5 |
32K < tokens ≤ 128K | $2.7 | $13.5 | |
128K < tokens ≤ 200K | $4.5 | $22.5 | |
qwen3-coder-30b-a3b-instruct | 0 < tokens ≤ 32K | $0.45 | $2.25 |
32K < tokens ≤ 128K | $0.75 | $3.75 | |
128K < tokens ≤ 200K | $1.2 | $6 |
Mainland China
In Mainland China deployment mode, endpoints and data storage are both located in the Beijing region, and inference compute resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
Tokens | (Million tokens) | ||||
qwen3-coder-480b-a35b-instruct | 262,144 | 204,800 | 65,536 | Tiered pricing applies. See the notes below the table. | |
qwen3-coder-30b-a3b-instruct | |||||
qwen2.5-coder-32b-instruct | 131,072 | 129,024 | 8,192 | $0.287 | $0.861 |
qwen2.5-coder-14b-instruct | |||||
qwen2.5-coder-7b-instruct | $0.144 | $0.287 | |||
qwen2.5-coder-3b-instruct | 32,768 | 30,720 | Limited-time free trial | ||
qwen2.5-coder-1.5b-instruct | |||||
qwen2.5-coder-0.5b-instruct | |||||
qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct use tiered pricing based on the number of input tokens per request.
Model | Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
qwen3-coder-480b-a35b-instruct | Up to 32K | $0.861 | $3.441 |
32K to 128K | $1.291 | $5.161 | |
128K to 200K | $2.151 | $8.602 | |
qwen3-coder-30b-a3b-instruct | Up to 32K | $0.216 | $0.861 |
32K to 128K | $0.323 | $1.291 | |
128K to 200K | $0.538 | $2.151 |
Text generation - Third-party
DeepSeek
DeepSeek is a large language model from DeepSeek AI. API reference | Try it online
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
deepseek-v3.2 685B full-power version | 131,072 | 98,304 | 32,768 | 65,536 | $0.287 | $0.431 |
deepseek-v3.2-exp 685B full-power version | ||||||
deepseek-v3.1 685B full-power version | $0.574 | $1.721 | ||||
deepseek-r1 685B full-power version | 16,384 | $2.294 | ||||
deepseek-r1-0528 685B full-power version | ||||||
deepseek-v3 671B full-power version | 131,072 | N/A | $0.287 | $1.147 | ||
deepseek-r1-distill-qwen-1.5b Based on Qwen2.5-Math-1.5B | 32,768 | 32,768 | 16,384 | 16,384 | Free trial for a limited time | |
deepseek-r1-distill-qwen-7b Based on Qwen2.5-Math-7B | $0.072 | $0.144 | ||||
deepseek-r1-distill-qwen-14b Based on Qwen2.5-14B | $0.144 | $0.431 | ||||
deepseek-r1-distill-qwen-32b Based on Qwen2.5-32B | $0.287 | $0.861 | ||||
deepseek-r1-distill-llama-8b Based on Llama-3.1-8B | Free trial for a limited time | |||||
deepseek-r1-distill-llama-70b Based on Llama-3.3-70B | ||||||
Kimi
Kimi-K2 is a large language model launched by Moonshot AI. It has excellent coding and tool-calling capabilities. Usage | Try it online
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max chain-of-thought | Max response | Input price | Output price |
(Tokens) | (Million tokens) | |||||
kimi-k2-thinking | 262,144 | 229,376 | 32,768 | 16,384 | $0.574 | $2.294 |
Moonshot-Kimi-K2-Instruct | 131,072 | 131,072 | - | 8,192 | $0.574 | $2.294 |
Image generation
Qwen-Image
The Qwen text-to-image model excels at rendering complex text, especially in Chinese and English. API reference
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota |
qwen-image-max Currently has the same capabilities as qwen-image-max-2025-12-30 | $0.075/image | Free quota: 100 images for each model Validity period: Within 90 days after you activate Model Studio |
qwen-image-max-2025-12-30 | $0.075/image | |
qwen-image-plus Currently has the same capabilities as qwen-image | $0.03/image | |
qwen-image-plus-2026-01-09 | $0.03/image | |
qwen-image | $0.035/image |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota |
qwen-image-max Currently has the same capabilities as qwen-image-max-2025-12-30 | $0.071677/image | No free quota |
qwen-image-max-2025-12-30 | $0.071677/image | |
qwen-image-plus Currently has the same capabilities as qwen-image | $0.028671/image | |
qwen-image-plus-2026-01-09 | $0.028671/image | |
qwen-image | $0.035/image |
Input prompt | Output image |
Healing-style hand-drawn poster featuring three puppies playing with a ball on lush green grass, adorned with decorative elements such as birds and stars. The main title “Come Play Ball!” is prominently displayed at the top in bold, blue cartoon font. Below it, the subtitle “Come [Show Off Your Skills]!” appears in green font. A speech bubble adds playful charm with the text: “Hehe, watch me amaze my little friends next!” At the bottom, supplementary text reads: “We get to play ball with our friends again!” The color palette centers on fresh greens and blues, accented with bright pink and yellow tones to highlight a cheerful, childlike atmosphere. |
|
Qwen-Image-Edit
The Qwen image editing model supports precise text editing in Chinese and English. It also supports operations such as color adjustment, detail enhancement, style transfer, adding or removing objects, and changing positions and actions. These features enable complex editing of images and text. API reference
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota |
qwen-image-edit-plus Currently has the same capabilities as qwen-image-edit-plus-2025-10-30 | $0.03/image | Free quota: 100 images for each model Validity period: Within 90 days after you activate Model Studio |
qwen-image-edit-plus-2025-12-15 | $0.03/image | |
qwen-image-edit-plus-2025-10-30 | $0.03/image | |
qwen-image-edit | $0.045/image |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota |
qwen-image-edit-plus Currently has the same capabilities as qwen-image-edit-plus-2025-10-30 | $0.028671/image | No free quota |
qwen-image-edit-plus-2025-12-15 | $0.028671/image | |
qwen-image-edit-plus-2025-10-30 | $0.028671/image | |
qwen-image-edit | $0.043/image |
Original image |
Make the person bend over and hold the dog's front paw. |
Original image |
Change the text on the letter blocks from 'HEALTH INSURANCE' to 'Tomorrow will be better'. |
Original image |
Change the dotted shirt to a light blue shirt. |
Original image |
Change the background to Antarctica. |
Original image |
Create a cartoon-style profile picture of the person. |
Original image |
Remove the hair from the dinner plate. |
Qwen-MT-Image
The Qwen image translation model supports translating text from images in 11 languages into Chinese or English. It accurately preserves the original layout and content information and provides custom features such as term definition, sensitive word filtering, and image entity detection. API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota |
qwen-mt-image | $0.000431/image | No free quota |
Original image |
Japanese |
Portuguese |
Arabic |
Tongyi - text-to-Image - Z-Image
Tongyi - text-to-image - Z-Image is a lightweight model that quickly generates high-quality images. The model supports Chinese and English text rendering, complex semantic understanding, various styles, and multiple resolutions and aspect ratios. API reference
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota (Note) Validity period: Within 90 days after you activate Model Studio |
z-image-turbo | Prompt extension disabled ( Prompt extension enabled ( | 100 images |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota |
z-image-turbo | Prompt extension disabled ( Prompt extension enabled ( | No free quota |
Input prompt | Output image |
Photo of a stylish young woman with short black hair standing confidently in front of a vibrant cartoon-style mural wall. She wears an all-black outfit: a puffed bomber jacket with a ruffled collar, cargo shorts, fishnet tights, and chunky black Doc Martens, with a gold chain dangling from her waist. The background features four colorful comic-style panels: one reads “GRAND STAGE” and includes sneakers and a Gatorade bottle; another displays green Nike sneakers and a slice of pizza; the third reads “HARAJUKU st” with floating shoes; and the fourth shows a blue mouse riding a skateboard with the text “Takeshita WELCOME.” Dominant bright colors include yellow, teal, orange, pink, and green. Speech bubbles, halftone patterns, and playful characters enhance the urban street-art aesthetic. Daylight evenly illuminates the scene, and the ground beneath her feet is white tiled pavement. Full-body portrait, centered composition, slightly tilted stance, direct eye contact with the camera. High detail, sharp focus, dynamic framing. |
|
Wan text-to-image
The Wan text-to-image model generates high-quality images from text. API reference | Try it online
Global
In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Model | Description | Unit price | Free quota (Note) Validity period: Within 90 days after you activate Model Studio |
wan2.6-t2i | Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.03/image | No free quota |
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Description | Unit price | Free quota (Note) Validity period: Within 90 days after you activate Model Studio |
wan2.6-t2i | Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.03/image | 50 images |
wan2.5-t2i-preview | Wan 2.5 preview. Removes single-side length limits and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.03/image | 50 images |
wan2.2-t2i-plus | Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.05/image | 100 images |
wan2.2-t2i-flash | Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.025/image | 100 images |
wan2.1-t2i-plus | Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details. | $0.05/image | 200 images |
wan2.1-t2i-turbo | Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed. | $0.025/image | 200 images |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price | Free quota (Note) Validity period: Within 90 days after you activate Model Studio |
wan2.6-t2i | Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.028671/image | No free quota |
wan2.5-t2i-preview | Wan 2.5 preview. Removes single-side length limits and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.028671/image | No free quota |
wan2.2-t2i-plus | Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.02007/image | No free quota |
wan2.2-t2i-flash | Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.028671/image | No free quota |
wanx2.1-t2i-plus | Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details. | $0.028671/image | No free quota |
wanx2.1-t2i-turbo | Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed. | $0.020070/image | No free quota |
wanx2.0-t2i-turbo | Wan 2.0 Turbo Edition. Excels at textured portraits and creative designs. It is cost-effective. | $0.005735/image | No free quota |
Input prompt | Output image |
A needle-felted Santa Claus holding a gift and a white cat standing next to him against a background of colorful gifts and green plants, creating a cute, warm, and cozy scene. |
|
Wan2.6 image generation and editing
The Wan2.6 image generation model supports image editing and can generate outputs that contain both text and images to meet various generation and integration requirements. API reference.
Global
In Global deployment mode, the access point and data storage are in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Model | Unit price | Free quota |
wan2.6-image | $0.03/image | No free quota |
International
In International deployment mode, the access point and data storage are in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota (Note) Validity period: Within 90 days after you activate Model Studio |
wan2.6-image | $0.03/image | 50 images |
Mainland China
In Mainland China deployment mode, the access point and data storage are in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota |
wan2.6-image | $0.028671/image | No free quota |
Wan general image editing 2.5
The Wan2.5 general image editing model supports entity-consistent image editing and multi-image fusion. It accepts text, a single image, or multiple images as input. API reference.
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota (Note) Validity period: Within 90 days after you activate Model Studio |
wan2.5-i2i-preview | $0.03/image | 50 units |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota |
wan2.5-i2i-preview | $0.028671/image | No free quota |
Feature | Input example | Output image |
Single-image editing |
|
Change the floral dress to a vintage-style lace long dress with exquisite embroidery details on the collar and cuffs. |
Multi-image fusion |
|
Place the alarm clock from Image 1 next to the vase on the dining table in Image 2. |
Wan general image editing 2.1
The Wan2.1 general image editing model performs diverse image editing with simple instructions. It is suitable for scenarios such as outpainting, watermark removal, style transfer, image restoration, and image enhancement. Usage | API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit Price | Free Quota |
wanx2.1-imageedit | $0.020070 per image | No free quota |
The general image editing model currently supports the following features:
Model Features | Input image | Input prompt | Output image |
Global stylization |
| French picture book style. |
|
Local stylization |
| Change the house to a wooden plank style. |
|
Instruction-based editing |
| Change the girl's hair to red. |
|
Inpainting | Input image
Masked image (The white area is the mask)
| A ceramic rabbit holding a ceramic flower. | Output image
|
Text watermark removal |
| Remove the text from the image. |
|
Outpainting |
| A green fairy. |
|
Image super-resolution | Blurry image
| Image super-resolution. | Clear image
|
Image colorization |
| Blue background, yellow leaves. |
|
Line art to image |
| A living room in a minimalist Nordic style. |
|
Placeholder Image |
| A cartoon character cautiously peeks out, spying on a brilliant blue gem inside the room. |
|
OutfitAnyone
Compared to the basic version, the OutfitAnyone-Plus model offers improvements in image definition, clothing texture details, and logo restoration. However, it takes longer to generate images and is suitable for scenarios that are not time-sensitive. API reference | Try it online
OutfitAnyone-Image Parsing supports parsing model and clothing images, which can be used for pre-processing and post-processing of OutfitAnyone images. API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Sample input | Sample output |
aitryon-plus | OutfitAnyone-Plus |
|
|
aitryon-parsing-v1 | OutfitAnyone image parsing |
OutfitAnyone pricing
Service | Model | Unit price | Discount | Tier |
OutfitAnyone - Plus | aitryon-plus | $0.071677/image | None | None |
OutfitAnyone - Image parsing | aitryon-parsing-v1 | $0.000574/image | None | None |
Video generation - Wan
Text-to-video
The Wan text-to-video model generates videos from a single sentence. The videos feature rich artistic styles and cinematic quality. API reference | Try it online
Global
In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Model | Description | Unit price | Free quota |
wan2.6-t2v | Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files. | 720P: $0.1/second 1080P: $0.15/second | No free quota |
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Description | Unit price | Free quota (Claim) Valid for 90 days after you activate Alibaba Cloud Model Studio |
wan2.6-t2v | Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files. | 720P: $0.10/second 1080P: $0.15/second | 50 seconds |
wan2.5-t2v-preview | Wan 2.5 preview. Supports automatic voiceover and custom audio file input. | 480p: $0.05/second 720p: $0.10/second 1080p: $0.15/second | 50 seconds |
wan2.2-t2v-plus | Wan 2.2 Professional Edition. Significantly improved image detail and motion stability. | 480p: $0.02/second 1080p: $0.10/second | 50 seconds |
wan2.1-t2v-turbo | Wan 2.1 Turbo Edition. Fast generation speed and balanced performance. | $0.036/second | 200 seconds |
wan2.1-t2v-plus | Wan 2.1 Professional Edition. Generates rich details and higher-quality images. | $0.10/second | 200 seconds |
US
In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.
Model | Description | Unit price | Free quota |
wan2.6-t2v-us | Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files. | 720P: $0.1/second 1080P: $0.15/second | No free quota |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price | Free quota |
wan2.6-t2v | Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files. | 720P: $0.086012/second 1080P: 0.143353/second | No free quota |
wan2.5-t2v-preview | Wan 2.5 preview. Supports automatic voiceover and custom audio file input. | 480p: $0.043006/second 720p: $0.086012/second 1080p: $0.143353/second | No free quota |
wan2.2-t2v-plus | Wan 2.2 Professional Edition. Significantly improved image detail and motion stability. | 480p: $0.02007/second 1080p: $0.100347/second | No free quota |
wanx2.1-t2v-turbo | Faster generation speed and balanced performance. | $0.034405/second | No free quota |
wanx2.1-t2v-plus | Generates richer details and higher-quality images. | $0.100347/second | No free quota |
Input prompt | Output video (wan2.6, multi-shot video) |
Shot from a low angle, in a medium close-up, with warm tones, mixed lighting (the practical light from the desk lamp blends with the overcast light from the window), side lighting, and a central composition. In a classic detective office, wooden bookshelves are filled with old case files and ashtrays. A green desk lamp illuminates a case file spread out in the center of the desk. A fox, wearing a dark brown trench coat and a light gray fedora, sits in a leather chair, its fur crimson, its tail resting lightly on the edge, its fingers slowly turning yellowed pages. Outside, a steady drizzle falls beneath a blue sky, streaking the glass with meandering streaks. It slowly raises its head, its ears twitching slightly, its amber eyes gazing directly at the camera, its mouth clearly moving as it speaks in a smooth, cynical voice: 'The case was cold, colder than a fish in winter. But every chicken has its secrets, and I, for one, intended to find them '. |
Image-to-video - first frame
The Wan image-to-video model uses an input image as the first frame of a video. It then generates the rest of the video based on a prompt. The videos feature rich artistic styles and cinematic quality. API reference | Try it online
Global
In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Model | Description | Unit price | Free quota |
wan2.6-i2v | Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files. | 720P: $0.1/second 1080P: $0.15/second | No free quota |
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Description | Unit price | Free quota (Note) Validity: Within 90 days after you activate Alibaba Cloud Model Studio |
wan2.6-i2v | Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files. | 720P: $0.10/seconds 1080P: $0.15/seconds | 50seconds |
wan2.5-i2v-preview | Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads. | 480P: $0.05/second 720P: $0.10/second 1080P: $0.15/second | 50 seconds |
wan2.2-i2v-flash | Wan 2.2 Flash Edition. Delivers extremely fast generation speed with significant improvements in visual detail and motion stability. | 480P: $0.015/second 720P: $0.036/second | 50 seconds |
wan2.2-i2v-plus | Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability. | 480P: $0.02/second 1080P: $0.10/second | 50 seconds |
wan2.1-i2v-turbo | Wan 2.1 Turbo Edition. Fast generation speed with balanced performance. | $0.036/second | 200 seconds |
wan2.1-i2v-plus | Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals. | $0.10/second | 200 seconds |
US
In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.
Model | Description | Unit price | Free quota |
wan2.6-i2v-us | Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files. | 720P: $0.1/second 1080P: $0.15/second | No free quota |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price | Free quota |
wan2.6-i2v | Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files. | 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
wan2.5-i2v-preview | Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads. | 480P: $0.043006/second 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
wan2.2-i2v-plus | Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability. | 480P: $0.02007/second 1080P: $0.100347/second | No free quota |
wanx2.1-i2v-turbo | Wan 2.1 Turbo Edition. Fast generation speed with balanced performance. | $0.034405/second | No free quota |
wanx2.1-i2v-plus | Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals. | $0.100347/second | No free quota |
Input first frame image and audio | Output video (wan2.6, multi-shot video) |
Input audio: | |
Input prompt: A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise. | |
Image-to-video - first and last frames
The Wan first-and-last-frame video model generates a smooth, dynamic video from a prompt. You only need to provide the first and last frame images. The videos feature rich artistic styles and cinematic quality. API reference | Try it online
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota (Note) Validity: 90 days after you activate Model Studio |
wan2.2-kf2v-flash | 480P: $0.015/second 720P: $0.036/second 1080P: $0.07/second | 50 seconds |
wan2.1-kf2v-plus | $0.10/second | 200 seconds |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota (Note) |
wan2.2-kf2v-flash | 480P: $0.014335/second 720P: $0.028671/second 1080P: $0.068809/second | No free quota |
wanx2.1-kf2v-plus | $0.100347/second | No free quota |
Example input | Output video | ||
First frame | Last frame | Prompt | |
|
| In a realistic style, the camera starts at eye level on a small black cat looking up at the sky, then gradually moves upward to a top-down shot that focuses on the cat's curious eyes. | |
Reference-to-video
The Wan reference-to-video model uses a character's appearance and voice from an input video and a prompt to generate a new video that maintains character consistency. API reference
Billing rule: Both input and output videos are billed by the second. Failed jobs are not billed and do not consume the free quota.
The billable duration of the input video does not exceed 5 seconds. For more information, see Billing and rate limits.
The billable duration of the output video is the duration in seconds of the successfully generated video.
Global
In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.
Model | Input price | Output price | Free quota (Note) |
wan2.6-r2v | 720P: $0.086012/second 1080P: $0.143353/second | 720P: $0.1/second 1080P: $0.15/second | No free quota |
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Input price | Output price | Free quota (Note) |
wan2.6-r2v | 720P: $0.10/second 1080P: $0.15/second | 720P: $0.10/second 1080P: $0.15/second | 50 seconds Validity: 90 days after you activate Model Studio |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Input price | Output price | Free quota (Note) |
wan2.6-r2v | 720P: $0.086012/second 1080P: $0.143353/second | 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
General video editing
The Wan general video editing model supports multimodal inputs, including text, images, and videos. It can perform video generation and general editing tasks. API reference | Try it online
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Unit price | Free quota (Note) |
wan2.1-vace-plus | $0.1/s | 50 seconds Validity: Valid for 90 days after Model Studio activation. |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Unit price | Free quota (Note) |
wanx2.1-vace-plus | $0.100347/s | No free quota |
The general video editing model supports the following features:
Feature | Input reference image | Input prompt | Output video |
Multi-image reference | Reference image 1 (reference entity)
Reference image 2 (reference background)
| In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When the girl stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records her wonderful encounter with nature. | Output video |
Video repainting | The video shows a black steampunk-style car driven by a gentleman. The car is decorated with gears and copper pipes. The background features a steam-powered candy factory and retro elements, creating a vintage and playful scene. | ||
Local editing | Input video Input mask image (The white area indicates the editing area)
| The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, taking a gentle sip with a relaxed expression. The cafe is tastefully decorated, with soft hues and warm lighting illuminating the area where the lion is. | The content in the editing area is modified based on the prompt. |
Video extension | Input first clip (1 second) | A dog wearing sunglasses is skateboarding on the street, 3D cartoon. | Output extended video (5 seconds) |
Video outpainting | An elegant lady is passionately playing the violin, with a full symphony orchestra behind her. |
Wan - digital human
This feature generates natural-looking videos of people speaking, singing, or performing, based on a single character image and an audio file. To use this feature, you can call the following models in sequence. wan2.2-s2v image detection | wan2.2-s2v video generation
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price |
wan2.2-s2v-detect | Checks if an input image meets requirements, such as sufficient definition, a single person, and a frontal view. | $0.000574/image |
wan2.2-s2v | Generates a dynamic video of a person from a valid image and an audio clip. | 480p: $0.071677/second 720p: $0.129018/second |
Sample input | Output video |
Input audio: |
Wan - animate image
Available in standard and professional modes. The model transfers the actions and expressions from a reference video to a character image, generating a video that animates the character from the image. API reference.
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Service | Description | Unit price | Free quota (View) |
wan2.2-animate-move | Standard mode | A cost-effective service with fast generation speeds. Suitable for basic needs, such as simple animation demos. | $0.12/second | The total time for both patterns is 50 seconds. |
Professional mode | Delivers high animation smoothness and natural transitions for actions and expressions. The output resembles a live-action video. | $0.18/second |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Service | Description | Unit price | Free quota (View) |
wan2.2-animate-move | Standard mode | Fast generation. Ideal for basic needs, such as simple animation demos. Cost-effective. | $0.06/second | No free quota |
Professional mode | Provides high-quality, smooth animation with natural transitions for actions and expressions. The output is similar to a live-action video. | $0.09/second |
Character image | Reference video | Standard video | Output Video (Professional Mode) |
|
Wan - video character swap
Available in standard and professional modes. The model replaces the main character in a video with a character from an image. It preserves the original video's scene, lighting, and hue. API reference.
International
In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).
Model | Service | Description | Unit price | Free quota (View) |
wan2.2-animate-mix | Standard mode | Generates animations quickly. Ideal for basic requirements, such as simple demos. Highly cost-effective. | $0.18/s | The combined duration of both services is 50 seconds. |
Professional mode | Produces highly smooth animations with natural transitions for actions and expressions. The result closely resembles a live-action video. | $0.26/s |
Mainland China
In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Service | Description | Unit price | Free quota (View) |
wan2.2-animate-mix | Standard mode | Generates animations quickly. Ideal for basic requirements, such as simple demos. Highly cost-effective. | $0.09/s | No free quota |
Professional mode | Produces highly smooth animations with natural transitions for actions and expressions. The result closely resembles a live-action video. | $0.13/s |
Character image | Reference video | Standard output video | Professional output video |
|
AnimateAnyone
This feature generates character motion videos based on a character image and a motion template. To use this feature, you can call the following three models in sequence. AnimateAnyone image detection API details | AnimateAnyone motion template generation | AnimateAnyone video generation API details
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price |
animate-anyone-detect-gen2 | Detects whether an input image meets the requirements. | $0.000574/image |
animate-anyone-template-gen2 | Extracts character motion from a video and generates a motion template. | $0.011469/second |
animate-anyone-gen2 | Generates a character action video from a character image and an action template. |
Input: Character image | Input: Motion video | Outputs (generated from the image background) | Outputs Generated by Video Background |
|
The preceding example was generated by the Tongyi App, which integrates AnimateAnyone.
The content generated by the AnimateAnyone model is video only and does not include audio.
EMO
This feature generates dynamic portrait videos based on a portrait image and a human voice audio file. To use this feature, you can call the following models in sequence. EMO image detection | EMO video generation
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price |
emo-detect-v1 | Detects whether an input image meets the required specifications. This model can be called directly without deployment. | $0.000574/image |
emo-v1 | Generates a dynamic portrait video. This model can be called directly without deployment. |
|
Input: Portrait image and human voice audio file | Output: Dynamic portrait video |
Portrait:
Human voice audio: See the video on the right. | Character video: Style level: active ("style_level": "active") |
LivePortrait
This model quickly and efficiently generates dynamic portrait videos based on a portrait image and a human voice audio file. Compared to the EMO model, it generates videos faster and at a lower cost, but the quality is not as good. To use this feature, you can call the following two models in sequence. LivePortrait image detection | LivePortrait video generation
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price |
liveportrait-detect | Detects whether an input image meets the requirements. | $0.000574/image |
liveportrait | Generates a dynamic portrait video. | $0.002868/second |
Input: Portrait image and voice audio | Output: Animated portrait video |
Portrait image:
Voice audio: Sourced from the video on the right. | Portrait video: |
Emoji
This feature generates dynamic face videos based on a face image and preset facial motion templates. This capability can be used for scenarios such as creating emojis and generating video materials. To use this feature, you can call the following models in sequence. Emoji image detection | Emoji video generation
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price |
emoji-detect-v1 | Detects whether an input image meets specified requirements. | $0.000574/image |
emoji-v1 | Generates custom emojis based on a portrait image and a specified emoji template. | $0.011469/second |
Input: Portrait image | Output: Dynamic portrait video |
| Parameter for the "Happy" emoji template: ("input.driven_id": "mengwa_kaixin") |
VideoRetalk
This feature generates a video where the character's lip movements match the input audio, based on a character video and a human voice audio file. To use this feature, you can call the following model. API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price |
videoretalk | Synchronizes a character's lip movements with input audio to generate a new video. | $0.011469/second |
Video style transform
This model generates videos in different styles that match the semantic description of user-input text, or restyles a user-input video. API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Description | Unit price | |
video-style-transform | Transforms an input video into styles such as Japanese comic and American comic. | 720P | $0.071677/second |
540P | $0.028671/second | ||
Input video | Output video (Manga style) |
Speech synthesis (text-to-speech)
Qwen speech synthesis
This feature supports multilingual mixed-text input and provides streaming audio output. Usage | API reference
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.
Model | Version | Price | Maximum input characters | Supported languages | Free quota (Note) |
qwen3-tts-flash Same capabilities as qwen3-tts-flash-2025-09-18. | Stable | $0.10 per 10,000 characters | 600 | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | 2,000 characters if you activate Model Studio before 00:00 on November 13, 2025. 10,000 characters if you activate Model Studio on or after 00:00 on November 13, 2025. Valid for 90 days after you activate Model Studio. |
qwen3-tts-flash-2025-11-27 | Snapshot | 10,000 characters Valid for 90 days after you activate Model Studio. | |||
qwen3-tts-flash-2025-09-18 | Snapshot | 2,000 characters if you activate Model Studio before 00:00 on November 13, 2025. 10,000 characters if you activate Model Studio on or after 00:00 on November 13, 2025. Valid for 90 days after you activate Model Studio. |
Billing is based on the number of input characters. The calculation rules are as follows:
Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.
Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.
Qwen3-TTS-Flash
Model | Version | Price | Max input characters | Supported languages | Free quota (Note) |
qwen3-tts-flash Same capabilities as qwen3-tts-flash-2025-09-18. | Stable | $0.114682/10,000 characters | 600 | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | No free quota |
qwen3-tts-flash-2025-11-27 | Snapshot | ||||
qwen3-tts-flash-2025-09-18 | Snapshot |
Billing is based on the number of input characters. The calculation rules are as follows:
Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.
Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.
Qwen-TTS
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota (Note) |
(Tokens) | (per 1,000 tokens) | ||||||
qwen-tts This model has the same capabilities as qwen-tts-2025-04-10. | Stable | 8,192 | 512 | 7,680 | $0.230 | $1.434 | No free quota |
qwen-tts-latest This model always has the same capabilities as the latest snapshot version. | Latest | ||||||
qwen-tts-2025-05-22 | Snapshot | ||||||
qwen-tts-2025-04-10 | |||||||
Audio is converted to tokens at a rate of 50 tokens per second. Audio clips shorter than 1 second are billed as 50 tokens.
Qwen real-time speech synthesis
This feature supports streaming text input and streaming audio output. It automatically adjusts the speech rate based on the text content and punctuation. Usage | API reference
Qwen3-TTS-VD-Realtime supports real-time speech synthesis with voice design voices but does not support default voices.
Qwen3-TTS-VC-Realtime supports real-time speech synthesis with cloned voices but does not support default voices.
Qwen3-TTS-Flash-Realtime and Qwen-TTS-Realtime support only default voices and do not support cloned or designed voices.
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.
Qwen3-TTS-VD-Realtime
Model | Version | Price | Supported languages | Free quota (Note) |
qwen3-tts-vd-realtime-2025-12-16 | Snapshot | $0.143353/10,000 characters | Chinese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | 10,000 characters Validity: Valid for 90 days after you activate Model Studio |
Billing is based on the number of input characters. The calculation rules are as follows:
Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.
Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.
Qwen3-TTS-VC-Realtime
Model | Version | Price | Supported languages | Free quota (Note) |
qwen3-tts-vc-realtime-2025-11-27 | Snapshot | $0.13/10,000 characters | Chinese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | 10,000 characters Valid for 90 days after you activate Model Studio. |
Billing is based on the number of input characters. The calculation rules are as follows:
Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.
Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.
Qwen3-TTS-Flash-Realtime
Model | Version | Price | Supported languages | Free quota (Note) |
qwen3-tts-flash-realtime This model has the same capabilities as qwen3-tts-flash-realtime-2025-09-18. | Stable | $0.13/10,000 characters | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | If you activate Model Studio before 00:00 on November 13, 2025: 2,000 characters If you activate Model Studio on or after 00:00 on November 13, 2025: 10,000 characters Valid for 90 days after you activate Model Studio |
qwen3-tts-flash-realtime-2025-11-27 | Snapshot | 10,000 characters Valid for 90 days after you activate Model Studio | ||
qwen3-tts-flash-realtime-2025-09-18 | Snapshot | If you activate Model Studio before 00:00 on November 13, 2025: 2,000 characters If you activate Model Studio on or after 00:00 on November 13, 2025: 10,000 characters Valid for 90 days after you activate Model Studio |
Billing is based on the number of input characters. The calculation rules are as follows:
Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.
Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.
Qwen3-TTS-VD-Realtime
Model | Version | Price | Supported languages | Free quota (Note) |
qwen3-tts-vd-realtime-2025-12-16 | Snapshot | $0.143353 per 10,000 characters | Chinese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | No free quota |
Billing is based on the number of input characters. The calculation rules are as follows:
Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.
Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.
Qwen3-TTS-VC-Realtime
Model | Version | Price | Supported languages | Free quota (Note) |
qwen3-tts-vc-realtime-2025-11-27 | Snapshot | $0.143353/10,000 characters | Chinese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | No free quota |
Billing is based on the number of input characters. The calculation rules are as follows:
Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.
Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.
Qwen3-TTS-Flash-Realtime
Model | Version | Price | Supported languages | Free quota (Note) |
qwen3-tts-flash-realtime Functionally identical to qwen3-tts-flash-realtime-2025-09-18. | Stable | $0.143353/10,000 characters | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | No free quota |
qwen3-tts-flash-realtime-2025-11-27 | Snapshot | |||
qwen3-tts-flash-realtime-2025-09-18 | Snapshot |
Billing is based on the number of input characters. The calculation rules are as follows:
Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.
Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.
Qwen-TTS-Realtime
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Supported languages | Free quota (Note) |
(Tokens) | (per 1,000 tokens) | |||||||
qwen-tts-realtime This model has the same capabilities as qwen-tts-realtime-2025-07-15. | Stable | 8,192 | 512 | 7,680 | $0.345 | $1.721 | Chinese, English | No free quota |
qwen-tts-realtime-latest This model has the same capabilities as qwen-tts-realtime-2025-07-15. | Latest | Chinese, English | ||||||
qwen-tts-realtime-2025-07-15 | Snapshot | Chinese, English | ||||||
Audio is converted to tokens at a rate of 50 tokens per second. Audio clips shorter than 1 second are billed as 50 tokens.
Qwen voice cloning
Voice cloning uses a model for feature extraction to clone voices without training. You can provide as little as 10 to 20 seconds of audio to generate a highly similar and natural-sounding custom voice. Usage | API reference
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.
Model | Price | Free quota (Note) |
qwen-voice-enrollment | $0.01/voice | 1,000 voices Valid for 90 days after you activate Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.
Model | Price | Free quota (Note) |
qwen-voice-enrollment | $0.01/voice | No free quota |
Qwen voice design
Voice design generates custom voices from text descriptions. It supports multilingual and multi-dimensional voice feature definitions. This feature is suitable for various applications, such as ad dubbing, character creation, and audio content creation. Usage | API reference
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.
Model | Price | Free quota (Note) |
qwen-voice-design | $0.20 per voice | 10 timbres Valid for 90 days after you activate Model Studio |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.
Model | Price | Free quota (Note) |
qwen-voice-design | $0.20 per voice | No free quota |
CosyVoice speech synthesis
CosyVoice is a next-generation generative speech synthesis model from Tongyi Lab. Built on large-scale pre-trained language models, CosyVoice deeply integrates text understanding with speech generation and supports real-time, streaming text-to-speech synthesis. Usage | API reference
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Price | Free quota (Note) |
cosyvoice-v3-plus | $0.286706 per 10,000 characters | No free quota |
cosyvoice-v3-flash | $0.14335 per 10,000 characters | |
cosyvoice-v2 | $0.286706 per 10,000 characters |
Characters are calculated as follows: Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters. Any other character, such as letters, numbers, Japanese Kana, and Korean Hangul, counts as 1 character. Content within SSML tags is not billed.
Speech recognition (speech-to-text) and translation (speech-to-translation)
Qwen3-LiveTranslate-Flash
Qwen3-LiveTranslate-Flash-Realtime
Qwen3-LiveTranslate-Flash-Realtime is a multilingual model for real-time audio and video translation. It recognizes 18 languages and provides real-time audio translations in 10 languages.
Core features:
Multilingual support: Supports 18 languages and 6 Chinese dialects, including Chinese, English, French, German, Russian, Japanese, and Korean. It also supports dialects such as Mandarin, Cantonese, and Sichuanese.
Vision enhancement: Uses visual content to improve translation accuracy. The model analyzes visual cues, such as lip movements, actions, and on-screen text, to enhance translation accuracy in noisy environments or when speech is ambiguous.
3-second latency: Achieves a simultaneous interpretation latency of as low as 3 seconds.
Lossless simultaneous interpretation: Resolves cross-lingual word order issues using semantic unit prediction technology. The quality of real-time translation is comparable to that of offline translation.
Natural voice: Generates speech with a natural, human-like voice. The model automatically adjusts its tone and emotion based on the source audio content.
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.
Model | Version | Context window | Max input | Max output | Free quota |
(Tokens) | |||||
qwen3-livetranslate-flash-realtime This model has the same capabilities as qwen3-livetranslate-flash-realtime-2025-09-22. | Stable | 53,248 | 49,152 | 4,096 | 1 million tokens for each version Valid for 90 days after you activate Model Studio |
qwen3-livetranslate-flash-realtime-2025-09-22 | Snapshot | ||||
After the free quota is used, billing for input and output is calculated as follows:
|
|
Token calculation:
Audio: Each second of input or output audio consumes 12.5 tokens.
Image: Each 28 × 28 pixel input consumes 0.5 tokens.
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.
Model | Version | Context window | Max input | Max output | Free quota (Note) |
(Tokens) | |||||
qwen3-livetranslate-flash-realtime This model has the same capabilities as qwen3-livetranslate-flash-realtime-2025-09-22. | Stable | 53,248 | 49,152 | 4,096 | No free quota |
qwen3-livetranslate-flash-realtime-2025-09-22 | Snapshot | ||||
Billing for input and output is calculated as follows:
|
|
Token calculation:
Audio: Each second of input or output audio consumes 12.5 tokens.
Image: Each 28 × 28 pixel input consumes 0.5 tokens.
Qwen audio file recognition
Based on the Qwen multimodal foundation model, this feature supports multilingual recognition, singing recognition, and noise rejection. Usage | API reference
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.
Qwen3-ASR-Flash-Filetrans
Model | Version | Supported languages | Supported sample rates | Price | Free quota (Note) |
qwen3-asr-flash-filetrans Provides the same capabilities as qwen3-asr-flash-filetrans-2025-11-17. | Stable | Chinese (Mandarin, Sichuanese, Minnan, Wu, and Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish | Any | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days after you activate Model Studio |
qwen3-asr-flash-filetrans-2025-11-17 | Snapshot |
Qwen3-ASR-Flash
Model | Version | Supported languages | Supported sample rates | Unit price | Free quota (Note) |
qwen3-asr-flash Same capabilities as qwen3-asr-flash-2025-09-08. | Stable | Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish | Any | $0.000035/second | 36,000 seconds (10 hours) Validity: Valid for 90 days after you activate Model Studio |
qwen3-asr-flash-2025-09-08 | Snapshot |
US
In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region, and inference computing resources are restricted to the United States.
Model | Version | Supported languages | Supported sample rates | Price | Free quota (Note) |
qwen3-asr-flash-us This model provides the same capabilities as qwen3-asr-flash-2025-09-08-us. | Stable | Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish | Any | $0.000035/second | No free quota |
qwen3-asr-flash-2025-09-08-us | Snapshot |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.
Qwen3-ASR-Flash-Filetrans
Model | Version | Supported languages | Supported sample rates | Price | Free quota (Note) |
qwen3-asr-flash-filetrans Current equivalent: qwen3-asr-flash-filetrans-2025-11-17 | Stable | Chinese (Mandarin, Sichuanese, Minnan, Wu, and Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish | Any | $0.000032/second | No free quota |
qwen3-asr-flash-filetrans-2025-11-17 | Snapshot |
Qwen3-ASR-Flash
Model | Version | Supported languages | Supported sample rates | Price | Free quota (Note) |
qwen3-asr-flash Functionally identical to qwen3-asr-flash-2025-09-08. | Stable | Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish | Any | $0.000032/second | No free quota |
qwen3-asr-flash-2025-09-08 | Snapshot |
Qwen real-time speech recognition
The Qwen real-time speech recognition model provides automatic language detection. It detects 11 types of speech and accurately transcribes audio in complex environments. Usage | API reference
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.
Model | Version | Supported languages | Supported sample rates | Price | Free quota (Note) |
qwen3-asr-flash-realtime Same capabilities as qwen3-asr-flash-realtime-2025-10-27. | Stable | Chinese (Mandarin, Sichuanese, Minnan, Wu, and Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish | 8 kHz, 16 kHz | $0.00009/second | 36,000 seconds (10 hours) Valid for 90 days after you activate Model Studio |
qwen3-asr-flash-realtime-2025-10-27 | Snapshot |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.
Model | Version | Supported languages | Supported sample rates | Price | Free quota (Note) |
qwen3-asr-flash-realtime This model has the same capabilities as qwen3-asr-flash-realtime-2025-10-27. | Stable | Chinese (Mandarin, Sichuanese, Minnan, Wu, and Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish | 8 kHz, 16 kHz | $0.000047/second | No free quota |
qwen3-asr-flash-realtime-2025-10-27 | Snapshot |
Paraformer speech recognition
Paraformer is a speech recognition model from Tongyi Lab. It is available in two versions: audio file recognition and real-time speech recognition.
Audio file recognition
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Supported languages | Supported sample rates | Scenarios | Supported audio formats | Price | Free quota (Note) |
paraformer-v2 | Chinese (Mandarin, Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghai), English, Japanese, Korean, German, French, Russian | Any | Live stream | aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv | $0.000012/second | No free quota |
paraformer-8k-v2 | Chinese (Mandarin) | 8 kHz | Phone calls |
Real-time speech recognition
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Supported languages | Supported sample rates | Scenarios | Supported audio formats | Price | Free quota (Note) |
paraformer-realtime-v2 | Chinese (Mandarin, Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, and Shanghai), English, Japanese, Korean, German, French, and Russian You can switch between multiple languages. | Any | Live video streaming and conferences | pcm, wav, mp3, opus, speex, aac, amr | $0.000035/second | No free quota |
paraformer-realtime-8k-v2 | Chinese (Mandarin) | 8 kHz | Call centers and more |
Fun-ASR speech recognition
Fun-ASR is a speech recognition model from the Tongyi Fun series. It is available in two versions: audio file recognition and real-time speech recognition.
Audio file recognition
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.
Model | Version | Supported languages | Supported sample rates | Scenarios | Supported audio formats | Price | Free quota (Note) |
fun-asr Its capabilities are the same as fun-asr-2025-11-07. | Stable | Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin), Mandarin accents from regions such as Zhongyuan, Southwest, Ji-Lu, Jianghuai, Lan-Yin, Jiao-Liao, Northeast, Beijing, Hong Kong, and Taiwan (including accents from Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia), English, and Japanese | Any | Live stream, phone calls, conference interpretation, and more | aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, and wmv | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
fun-asr-2025-11-07 Compared to fun-asr-2025-08-25, this version is optimized for far-field VAD to improve recognition accuracy. | Snapshot | ||||||
fun-asr-2025-08-25 | Chinese (Mandarin), English | ||||||
fun-asr-mtl Its capabilities are the same as fun-asr-mtl-2025-08-25. | Stable | Chinese (Mandarin and Cantonese), English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish | |||||
fun-asr-mtl-2025-08-25 | Snapshot |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.
Model | Version | Supported languages | Supported sample rates | Scenarios | Supported audio formats | Price | Free quota (Note) |
fun-asr Same capabilities as fun-asr-2025-11-07. | Stable | Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin), Mandarin accents from regions such as Zhongyuan, Southwest, Ji-Lu, Jianghuai, Lan-Yin, Jiao-Liao, Northeast, Beijing, Hong Kong, and Taiwan (including accents from Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia), English, and Japanese | Any | Live stream, phone calls, conference interpretation, and more | aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, and wmv | $0.000032/second | No free quota |
fun-asr-2025-11-07 Compared to fun-asr-2025-08-25, this version is optimized for far-field VAD to improve recognition accuracy. | Snapshot | ||||||
fun-asr-2025-08-25 | Chinese (Mandarin), English | ||||||
fun-asr-mtl Same capabilities as fun-asr-mtl-2025-08-25. | Stable | Chinese (Mandarin and Cantonese), English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish | |||||
fun-asr-mtl-2025-08-25 | Snapshot |
Real-time speech recognition
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.
Model | Version | Supported languages | Supported sample rates | Scenarios | Supported audio formats | Price | Free quota (Note) |
fun-asr-realtime The capabilities of this model are the same as fun-asr-realtime-2025-11-07. | Stable | Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin), English, and Japanese. This model also supports Mandarin accents from regions such as Zhongyuan, Southwest, Ji-Lu, Jianghuai, Lan-Yin, Jiao-Liao, Northeast, Beijing, Hong Kong, and Taiwan. Additionally, it supports accents from areas such as Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. | 16 kHz | Live video streaming, video conferencing, call centers, and more | pcm, wav, mp3, opus, speex, aac, and amr | $0.00009/second | 36,000 seconds (10 hours) Valid for 90 days |
fun-asr-realtime-2025-11-07 | Snapshot |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.
Model | Version | Supported languages | Supported sample rates | Scenarios | Supported audio formats | Price | Free quota (Note) |
fun-asr-realtime This model has the same capabilities as fun-asr-realtime-2025-11-07. | Stable | Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin), English, and Japanese. This model also supports Mandarin accents from regions such as Zhongyuan, Southwest, Ji-Lu, Jianghuai, Lan-Yin, Jiao-Liao, Northeast, Beijing, Hong Kong, and Taiwan. Additionally, it supports accents from areas such as Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. | 16 kHz | Live video streaming, video conferences, call centers, and more | pcm, wav, mp3, opus, speex, aac, and amr | $0.000047/second | No free quota |
fun-asr-realtime-2025-11-07 Compared to fun-asr-realtime-2025-09-15, this version is optimized for far-field VAD to improve recognition accuracy. | Snapshot | ||||||
fun-asr-realtime-2025-09-15 | Chinese (Mandarin), English |
Text embedding
Text embedding models convert text into numerical representations for tasks such as search, clustering, recommendation, and classification. Billing for these models is based on the number of input tokens. API reference
International
In the international deployment mode, endpoints and data storage are located in the Singapore region. Inference computing resources are scheduled globally, excluding Mainland China.
Model | Embedding dimensions | Batch size | Max tokens per batch (Note) | Supported languages | Price (Million input tokens) | Free quota (Note) |
text-embedding-v4 Part of the Qwen3-Embedding series | 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64 | 10 | 8,192 | More than 100 major languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, and various programming languages | $0.07 | 1 million tokens Valid for 90 days after you activate Model Studio. |
text-embedding-v3 | 1,024 (default), 768, or 512 | 10 | 8,192 | Over 50 languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian | 500,000 tokens Valid for 90 days after you activate Model Studio. |
Mainland China
In the Mainland China deployment mode, endpoints and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Model | Embedding dimensions | Batch size | Max tokens per batch (Note) | Supported languages | Price (Million input tokens) | Free quota |
text-embedding-v4 Part of the Qwen3-Embedding series | 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64 | 10 | 8,192 | More than 100 major languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, and various programming languages | $0.072 | No free quota |
Batch size is the max number of texts that a single API call can process. For example, the batch size for text-embedding-v4 is 10. This means a single request can vectorize up to 10 texts, and each text cannot exceed 8,192 tokens. This limit applies to:
String array input: The array can contain up to 10 elements.
File input: The text file can contain up to 10 lines of text.
Multimodal embedding
A multimodal embedding model converts text, images, and videos into a vector of floating-point numbers. The model is suitable for applications such as video classification, image classification, and image-text retrieval. API reference
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are scheduled globally, excluding Mainland China.
Model | Data type | Embedding dimensions | Unit price (Million input tokens) | Free quota (Note) |
tongyi-embedding-vision-plus | float(32) | 1,152 | $0.09 | 1 million tokens Valid for 90 days after you activate Model Studio. |
tongyi-embedding-vision-flash | float(32) | 768 | Image/Video: $0.03 Text: $0.09 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Model | Data type | Embedding dimensions | Price (1,000 input tokens) | Free quota (Note) |
multimodal-embedding-v1 | float(32) | 1,024 | Free trial | No token limit |
Text rerank
This feature is typically used for semantic retrieval. Given a query, it sorts a list of candidate documents in descending order of their semantic relevance. API reference
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.
Model | Max number of documents | Max input tokens per item | Max Input Tokens | Supported languages | Price (Million input tokens) |
qwen3-rerank | 500 | 4,000 | 30,000 | Over 100 major languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian | $0.1 |
Max input tokens per item: Each query or document is limited to 4,000 tokens. Input that exceeds this limit is truncated.
Max number of documents: Each request is limited to 500 documents.
Max input tokens: The total number of tokens for all queries and documents in a single request is limited to 30,000.
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Model | Max number of documents | Max input tokens per item | Max input tokens | Supported languages | Price (Million input tokens) |
gte-rerank-v2 | 500 | 4,000 | 30,000 | More than 50 languages, such as Chinese, English, Japanese, Korean, Thai, Spanish, French, Portuguese, German, Indonesian, and Arabic | $0.115 |
Max input tokens per item: Each query or document is limited to 4,000 tokens. Input that exceeds this limit is truncated.
Max number of documents: Each request is limited to 500 documents.
Max input tokens: The total number of tokens for all queries and documents in a single request is limited to 30,000.
Domain specific
Intent recognition
The Qwen intent recognition model can quickly and accurately parse user intents in milliseconds and select the appropriate tools to resolve user issues. API reference | Usage
Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
tongyi-intent-detect-v3 | 8,192 | 8,192 | 1,024 | $0.058 | $0.144 |
Role playing
Qwen's role-playing model is ideal for scenarios that require human-like conversation, such as virtual social interactions, NPCs in games, replicating IP characters, hardware, toys, and in-vehicle systems. Compared to other Qwen models, this model offers enhanced capabilities in character fidelity, conversation progression, and empathetic listening. Usage
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
qwen-plus-character-ja | 8,192 | 7,680 | 512 | $0.5 | $1.4 |
Mainland China
In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
qwen-plus-character | 32,768 | 32,000 | 4,096 | $0.115 | $0.287 |


























































