Flagship models
International
In international deployment mode, endpoints and data storage are both located in Singapore region. Model inference computing resources are dynamically scheduled globally (excluding Chinese mainland).
Qwen3.5-Plus supports text, image, and video inputs. Its text performance is comparable to Qwen3-Max, but faster and more cost-effective. Its multimodal capabilities also significantly outperform the Qwen3-VL series.
Flagship models |
Best for complex tasks, most capable |
Balanced performance, speed, and cost |
Best for simple tasks, fast and cost-effective |
Max context window (tokens) | 262,144 | 1,000,000 | 1,000,000 |
Min input price (per 1M tokens) | $1.2 | $0.4 | $0.1 |
Min output price (per 1M tokens) | $6 | $2.4 | $0.4 |
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
Qwen3.5-Plus supports text, image, and video inputs. Its text performance is comparable to Qwen3-Max, but faster and more cost-effective. Its multimodal capabilities also significantly outperform the Qwen3-VL series.
Flagship models |
Best for complex tasks, most capable |
Balanced performance, speed, and cost |
Best for simple tasks, fast and cost-effective |
Max context window (tokens) | 262,144 | 1,000,000 | 1,000,000 |
Min input price (per 1M tokens) | $0.359 | $0.115 | $0.029 |
Min output price (per 1M tokens) | $1.434 | $0.688 | $0.287 |
US
In US deployment mode, endpoints and data storage are both located in US (Virginia) region. Model inference computing resources are limited to US only.
Flagship models |
Balanced performance, speed, and cost |
Best for simple tasks, fast and cost-effective |
Max context window (tokens) | 1,000,000 | 1,000,000 |
Min input price (per 1M tokens) | $0.4 | $0.05 |
Min output price (per 1M tokens) | $1.2 | $0.4 |
Chinese mainland
In Chinese mainland deployment mode, endpoints and data storage are both located in Beijing region. Model inference computing resources are limited to Chinese mainland only.
Qwen3.5-Plus supports text, image, and video inputs. Its text performance is comparable to Qwen3-Max, but faster and more cost-effective. Its multimodal capabilities also significantly outperform the Qwen3-VL series.
Flagship models |
Best for complex tasks, most capable |
Balanced performance, speed, and cost |
Best for simple tasks, fast and cost-effective |
Max context window (tokens) | 262,144 | 1,000,000 | 1,000,000 |
Min input price (per 1M tokens) | $0.359 | $0.115 | $0.029 |
Min output price (per 1M tokens) | $1.434 | $0.688 | $0.287 |
China (Hong Kong)
In China (Hong Kong) deployment mode, endpoints and data storage are both located in China (Hong Kong). Model inference computing resources are limited to China (Hong Kong) only.
Flagship models |
Best for complex tasks, most capable |
Balanced performance, speed, and cost |
Best for simple tasks, fast and cost-effective |
Max context window (tokens) | 262,144 | 1,000,000 | 1,000,000 |
Min input price (per 1M tokens) | $1.2 | $0.4 | $0.1 |
Min output price (per 1M tokens) | $6 | $1.2 | $0.4 |
EU
In EU deployment mode, endpoints and data storage are both located in Germany (Frankfurt). Model inference computing resources are limited to EU only.
Flagship models |
Best for complex tasks, most capable |
Balanced performance, speed, and cost |
Best for simple tasks, fast and cost-effective |
Max context window (tokens) | 262,144 | 1,000,000 | 1,000,000 |
Min input price (per 1M tokens) | $1.2 | $0.4 | $0.1 |
Min output price (per 1M tokens) | $6 | $2.4 | $0.4 |
Model overview
International
In the International deployment mode, endpoints and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Category | Subcategory | Description |
Text generation | Qwen large language model: | |
Visual understanding models (Qwen-Plus, Qwen-VL, QVQ), omni-modal model Qwen-Omni, and real-time multimodal model Qwen-Omni-Realtime | ||
Image generation |
| |
| ||
Speech synthesis and recognition | Qwen speech synthesis and Qwen real-time speech synthesis can convert text to speech. They are suitable for scenarios such as intelligent voice assistants, audiobooks, in-car navigation, and educational tutoring. | |
Qwen real-time speech recognition, Qwen audio file recognition, Qwen3-LiveTranslate-Flash-Realtime, and Fun-ASR speech recognition can convert speech to text. They are suitable for scenarios such as real-time meeting transcription, live streaming captions, and call center services. | ||
Video generation | Generates videos from a single sentence, offering a wide range of styles and high-quality visuals. | |
| ||
Reference-to-video: Generates a performance video based on a prompt by referencing the character's appearance from an input video or image, and can also reference the timbre from the video. | ||
General-purpose video editing: Performs various video editing tasks based on input text prompts, images, and videos. For example, it can extract motion features from an input video and generate a new video based on a prompt. | ||
Embedding | Converts text into a set of numbers that represent the text. It is suitable for search, clustering, recommendation, and classification tasks. |
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
Category | Subcategory | Description |
Text generation | Qwen large language model:
| |
Visual understanding model Qwen-VL | ||
Image generation |
| |
| ||
Video generation | Generates videos from a single sentence, offering a wide range of styles and high-quality visuals. | |
First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt. | ||
Reference-to-video: Generates a performance video based on a prompt by referencing the character's appearance from an input video or image, and can also reference the timbre from the video. |
US
In the US deployment mode, endpoints and data storage are both located in the US (Virginia) region. Model inference compute resources are restricted to the United States.
Category | Subcategory | Description |
Text generation | Qwen large language model: Commercial (Qwen-Plus, Qwen-Flash) | |
Visual understanding model Qwen-VL | ||
Video generation | Generates videos from a single sentence, offering a wide range of styles and high-quality visuals. | |
First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt. | ||
Speech recognition | Qwen audio file recognition can convert speech to text. It is suitable for scenarios such as meeting transcription and live streaming captions. |
Chinese mainland
In the Chinese mainland deployment mode, endpoints and data storage are both located in the Beijing region. Model inference compute resources are restricted to the Chinese mainland.
Category | Model | Description |
Text generation | ||
Visual understanding models (Qwen-Plus, Qwen-VL, QVQ), omni-modal model Qwen-Omni | ||
Coder models, math model, translation models, data mining model, in-depth research model, intention recognition model, role-playing models | ||
Image generation |
| |
General-purpose models:
More models: Qwen image translation, OutfitAnyone | ||
Speech synthesis and recognition | Qwen speech synthesis, Qwen real-time speech synthesis, and CosyVoice speech synthesis can convert text to speech. They are suitable for scenarios such as intelligent voice assistants, audiobooks, in-car navigation, and educational tutoring. | |
Qwen real-time speech recognition, Qwen audio file recognition, Fun-ASR speech recognition, and Paraformer speech recognition can convert speech to text. They are suitable for scenarios such as real-time meeting transcription, live streaming captions, and call center services. | ||
Video editing and generation | Generates videos from a single sentence, offering a wide range of styles and high-quality visuals. | |
| ||
Reference-to-video: Generates a performance video based on a prompt by referencing the character's appearance from an input video or image, and can also reference the timbre from the video. | ||
| ||
Embedding | Converts text into a set of numbers that represent the text. It is used for search, clustering, recommendation, and classification. | |
Converts text, images, and speech into a set of numbers. It is used for audio and video classification, image classification, and image-text retrieval. |
China (Hong Kong)
In the China (Hong Kong) deployment mode, endpoints and data storage are both located in China (Hong Kong). Model inference compute resources are restricted to China (Hong Kong).
Category | Subcategory | Description |
Text generation | Qwen large language model: Commercial (Qwen-Plus, Qwen-Flash) | |
Visual understanding model Qwen-VL | ||
Embedding | Converts text into a set of numbers that represent the text. It is suitable for search, clustering, recommendation, and classification tasks. |
EU
In the EU deployment mode, endpoints and data storage are both located in Germany (Frankfurt). Model inference compute resources are restricted to the European Union.
Category | Subcategory | Description |
Text generation | Qwen large language model: Commercial (Qwen-Plus, Qwen-Flash) | |
Visual understanding model Qwen-VL | ||
Text generation – Qwen
This is the commercial version of the Qwen model. Compared with the open-source version, it offers the latest capabilities and improvements.
The parameter count for commercial models is not disclosed.
Models are updated periodically. To use a fixed version, select a snapshot version. Snapshot versions are typically maintained until one month after the next snapshot version is released.
We recommend prioritizing the stable version or the latest version because the rate limiting conditions are looser.
Qwen-Max
Qwen-Max is the highest-performing model in the Qwen series and excels at complex, multi-step tasks.Usage | Thinking | API reference | Try online
International
In international deployment mode, endpoints and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the China (Mainland) region).
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||||
qwen3-max Currently qwen3-max-2026-01-23 Part of the Qwen3 series Supports tool calling | Stable | Thinking | 262,144 | 258,048 | 81,920 | 32,768 | Tiered pricing. See details below. | 1 million tokens each Valid for 90 days after activating Model Studio | |
Non-thinking | - | 65,536 | |||||||
qwen3-max-2026-01-23 Thinking mode aka Qwen3-Max-Thinking Part of the Qwen3 series Supports tool calling | Snapshot | Thinking | 81,920 | 32,768 | |||||
Non-thinking | - | 65,536 | |||||||
qwen3-max-2025-09-23 Part of the Qwen3 series | Snapshot | Non-thinking only | |||||||
qwen3-max-preview Part of the Qwen3 series | Preview | Thinking | 81,920 | 32,768 | |||||
Non-thinking | - | 65,536 | |||||||
The models above use tiered pricing based on the number of input tokens in your request.
Input tokens per request | Input cost (per 1M tokens) qwen3-max and qwen3-max-preview support context cache. | Output cost (per 1M tokens) |
0 < Tokens ≤32K | $1.2 | $6 |
32K < Tokens ≤128K | $2.4 | $12 |
128K < Tokens ≤ 252K | $3 | $15 |
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||||
qwen3-max Currently qwen3-max-2025-09-23 Context Cache discount available | Stable | Non-thinking only | 262,144 | 258,048 | - | 65,536 | Tiered pricing. See details below. | None | |
qwen3-max-2025-09-23 | Snapshot | Non-thinking only | |||||||
qwen3-max-preview Context Cache discount available | Preview | Thinking | 81,920 | 32,768 | |||||
Non-thinking | - | 65,536 | |||||||
The models above use tiered pricing based on the number of input tokens in your request.
Model | Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) CoT + response |
qwen3-max Context Cache discount available | 0 < Tokens ≤32K | $0.359 | $1.434 |
32K < Tokens ≤128K | $0.574 | $2.294 | |
128K < Tokens ≤252K | $1.004 | $4.014 | |
qwen3-max-2025-09-23 | 0 < Tokens ≤32K | $0.861 | $3.441 |
32K < Tokens ≤128K | $1.434 | $5.735 | |
128K < Tokens ≤252K | $2.151 | $8.602 | |
qwen3-max-preview Context Cache discount available | 0 < Tokens ≤32K | $0.861 | $3.441 |
32K < Tokens ≤128K | $1.434 | $5.735 | |
128K < Tokens ≤252K | $2.151 | $8.602 |
Chinese Mainland
In China (Mainland) deployment mode, endpoints and data storage are both located in the Beijing region. Model inference compute resources are limited to the China (Mainland) region.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + response |
(tokens) | (per 1M tokens) | |||||||
qwen3-max Currently qwen3-max-2026-01-23 Part of the Qwen3 series Supports tool calling | Stable | Thinking | 262,144 | 258,048 | 81,920 | 32,768 | Tiered pricing. See details below. | |
Non-thinking | - | 65,536 | ||||||
qwen3-max-2026-01-23 Thinking mode aka Qwen3-Max-Thinking Part of the Qwen3 series Supports tool calling | Snapshot | Thinking | 81,920 | 32,768 | ||||
Non-thinking | - | 65,536 | ||||||
qwen3-max-2025-09-23 Part of the Qwen3 series | Snapshot | Non-thinking only | ||||||
qwen3-max-preview Part of the Qwen3 series | Preview | Thinking | 81,920 | 32,768 | ||||
Non-thinking | - | 65,536 | ||||||
The models above use tiered pricing based on the number of input tokens in your request.
Model | Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) CoT + response |
qwen3-max Batch calls at half price Context Cache discount available | 0 < Tokens ≤ 32K | $0.359 | $1.434 |
32K < Tokens ≤ 128K | $0.574 | $2.294 | |
128K < Tokens ≤ 252K | $1.004 | $4.014 | |
qwen3-max-2026-01-23 | 0 < Tokens ≤ 32K | $0.359 | $1.434 |
32K < Tokens ≤ 128K | $0.574 | $2.294 | |
128K < Tokens ≤ 252K | $1.004 | $4.014 | |
qwen3-max-2025-09-23 | 0 < Tokens ≤ 32K | $0.861 | $3.441 |
32K < Tokens ≤ 128K | $1.434 | $5.735 | |
128K < Tokens ≤ 252K | $2.151 | $8.602 | |
qwen3-max-preview Context Cache discount available | 0 < Tokens ≤ 32K | $0.861 | $3.441 |
32K < Tokens ≤ 128K | $1.434 | $5.735 | |
128K < Tokens ≤ 252K | $2.151 | $8.602 |
China (Hong Kong)
In China (Hong Kong) deployment mode, endpoints and data storage are both located in China (Hong Kong). Model inference compute resources are limited to China (Hong Kong).
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||||
qwen3-max Currently qwen3-max-2026-01-23 Part of the Qwen3 series Supports tool calling | Stable | Thinking | 262,144 | 258,048 | 81,920 | 32,768 | Tiered pricing. See details below. | |
Non-thinking | - | 65,536 | ||||||
qwen3-max-2026-01-23 Thinking mode aka Qwen3-Max-Thinking Part of the Qwen3 series Supports tool calling | Snapshot | Thinking | 81,920 | 32,768 | ||||
Non-thinking | - | 65,536 | ||||||
The models above use tiered pricing based on the number of input tokens in your request.
Input tokens per request | Input cost (per 1M tokens) qwen3-max context cache. | Output cost (per 1M tokens) |
0 < Tokens ≤32K | $1.2 | $6 |
32K < Tokens ≤128K | $2.4 | $12 |
128K < Tokens ≤252K | $3 | $15 |
EU
In EU deployment mode, endpoints and data storage are both located in Germany (Frankfurt). Model inference compute resources are limited to the EU region.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||||
qwen3-max Currently qwen3-max-2026-01-23 Part of the Qwen3 series Supports tool calling | Stable | Thinking | 262,144 | 258,048 | 81,920 | 32,768 | Tiered pricing. See details below. | |
Non-thinking | - | 65,536 | ||||||
qwen3-max-2026-01-23 Thinking mode aka Qwen3-Max-Thinking Part of the Qwen3 series Supports tool calling | Snapshot | Thinking | 81,920 | 32,768 | ||||
Non-thinking | - | 65,536 | ||||||
The models above use tiered pricing based on the number of input tokens in your request.
Input tokens per request | Input cost (per 1M tokens) qwen3-max context cache. | Output cost (per 1M tokens) |
0 < Tokens ≤32K | $1.2 | $6 |
32K < Tokens ≤128K | $2.4 | $12 |
128K < Tokens ≤252K | $3 | $15 |
qwen3-max-2026-01-23 thinking mode: Compared with the September 23, 2025 snapshot, this version effectively combines thinking and non-thinking modes, significantly improving overall model performance. In thinking mode, the model integrates three tools—web search, web information extraction, and code interpreter—to achieve higher accuracy on complex tasks by incorporating external tools during reasoning.
qwen3-max, qwen3-max-2026-01-23, and qwen3-max-2025-09-23 natively support search agents. For details, see web search.
Qwen-Plus
Qwen-Plus offers balanced capabilities: inference quality, cost, and speed are between Qwen-Max and Qwen-Flash, making it ideal for medium-complexity tasks. Usage | Thinking | API reference | Try online
Qwen3.5 Plus supports text, image, and video inputs. Its performance on plain text tasks is comparable to that of Qwen3 Max, while offering superior efficiency at a lower cost. Its multimodal capabilities are a significant improvement over the Qwen3 VL series.
International
Under International Deployment Mode, access points and data storage are both located in the Singapore region, and model inference compute resources are dynamically scheduled globally (excluding the Chinese Mainland).
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||
Qwen3.5-Plus Currently qwen3.5-plus-2026-02-15 Thinking mode aka enabled by default | Stable | 1,000,000 | Thinking 983,616 Non-thinking 991,808 | 65,536 Maximum chain-of-thought length: 81,920 | Pricing is tiered. For details, see the note below the table. | 1,000,000 tokens each Valid for 90 days after activating Model Studio | |
qwen3.5-plus-2026-02-15 Thinking mode aka enabled by default. | Snapshot | Thinking 983,616 Non-thinking 991,808 | 65,536 Maximum chain-of-thought length: 81,920 | ||||
qwen-plus Currently qwen-plus-2025-12-01. Belongs to the Qwen3 series. Batch calls at half price | Stable | Thinking 995,904 Non-thinking 997,952 | 32,768 Maximum chain-of-thought: 81,920. | ||||
qwen-plus-latest Currently qwen-plus-2025-12-01 Part of the Qwen3 series | Latest | Thinking 995,904 Non-thinking 997,952 | |||||
qwen-plus-2025-12-01 Part of the Qwen3 series | Snapshot | Thinking 995,904 Non-thinking 997,952 | |||||
qwen-plus-2025-09-11 Part of the Qwen3 series | |||||||
qwen-plus-2025-07-28 Also known as qwen-plus-0728 Part of the Qwen3 series | |||||||
qwen-plus-2025-07-14 Also known as qwen-plus-0714 Part of the Qwen3 series | 131,072 | Thinking 98,304 Non-thinking 129,024 | 16,384 Max CoT 38,912 | $0.4 | Thinking $4 Non-thinking $1.2 | ||
qwen-plus-2025-04-28 Also known as qwen-plus-0428 Part of the Qwen3 series | |||||||
qwen-plus-2025-01-25 Also known as qwen-plus-0125 | 129,024 | 8,192 | $1.2 | ||||
qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen-plus, qwen-plus-latest, qwen-plus-2025-12-01, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 are subject to tiered billing based on the number of input tokens per request.
Qwen3.5-Plus
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤256K | $0.4 | $2.4 |
256K < Tokens ≤1M | $0.5 | $3 |
Qwen-Plus
Input tokens per request | Mode | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤256K | Non-thinking | $0.4 | $1.2 |
Thinking | $4 | ||
256K < Tokens ≤1M | Non-thinking | $1.2 | $3.6 |
Thinking | $12 |
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output | Free quota |
(tokens) | (per 1M tokens) | ||||||||
qwen3.5-plus Currently qwen3.5-plus-2026-02-15 Thinking mode enabled by default | Stable | Thinking | 1,000,000 | 983,616 | 81,920 | 65,536 | Tiered pricing. See details below. | None | |
991,808 | - | ||||||||
qwen3.5-plus-2026-02-15 Thinking mode enabled by default | Snapshot | Non-thinking | 983,616 | 81,920 | |||||
991,808 | - | ||||||||
qwen-plus Currently qwen-plus-2025-12-01 Part of the Qwen3 series | Stable | Thinking | 995,904 | 81,920 | 32,768 | ||||
Non-thinking | 997,952 | - | |||||||
qwen-plus-2025-12-01 Part of the Qwen3 series | Snapshot | Thinking | 995,904 | 81,920 | |||||
Non-thinking | 997,952 | - | |||||||
qwen-plus-2025-09-11 Part of the Qwen3 series | Thinking | 995,904 | 81,920 | ||||||
Non-thinking | 997,952 | - | |||||||
qwen-plus-2025-07-28 Also known as qwen-plus-0728 Part of the Qwen3 series | Thinking | 995,904 | 81,920 | ||||||
Non-thinking | 997,952 | - | |||||||
The models above use tiered billing based on the number of input tokens in each request.
Qwen3.5-Plus
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤128K | $0.115 | $0.688 |
128K < Tokens ≤ 256K | $0.287 | $1.72 |
256K < Tokens ≤1M | $0.573 | $3.44 |
Qwen-Plus
Input tokens per request | Mode | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤128K | Non-thinking | $0.115 | $0.287 |
Thinking | $1.147 | ||
128K < Tokens ≤256K | Non-thinking | $0.345 | $2.868 |
Thinking | $3.441 | ||
256K < Tokens ≤1M | Non-thinking | $0.689 | $6.881 |
Thinking | $9.175 |
US
In US deployment mode, the access point and data storage are both located in the US (Virginia) region, and model inference compute resources are restricted to the United States.
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||
qwen-plus-us Currently qwen-plus-2025-12-01-us. Part of the Qwen3 series. | Stable | 1,000,000 | Thinking 995,904 Non-thinking 997,952 | 32,768 Maximum chain-of-thought length: 81,920 | Tiered pricing. See details below. | None | |
qwen-plus-2025-12-01-us Part of the Qwen3 series | Snapshot | Thinking 995,904 Non-thinking 997,952 | |||||
The preceding models are subject to tiered billing based on the number of tokens in each request, and qwen-plus-us supports context cache.
Input tokens per request | Input cost (per 1M tokens) | Mode | Output cost (per 1M tokens) |
0 < Tokens ≤256K | $0.4 | Non-thinking | $1.2 |
Thinking | $4 | ||
256K < Tokens ≤1M | $1.2 | Non-thinking | $3.6 |
Thinking | $12 |
Chinese Mainland
In the Chinese Mainland deployment mode, both the endpoint and data storage are in the Beijing region. Model inference computing resources are limited to the Chinese Mainland.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen3.5-plus Currently qwen3.5-plus-2026-02-15 Thinking mode aka enabled by default Batch calls at half price | Stable | 1,000,000 | Thinking 983,616 Non-thinking 991,808 | 65,536 Max chain-of-thought length: 81,920 | Tiered pricing. See details below. | |
qwen3.5-plus-2026-02-15 Thinking mode aka enabled by default | Snapshot | Thinking 983,616 Non-thinking 991,808 | 65,536 Max chain-of-thought length: 81,920 | |||
qwen-plus Currently qwen-plus-2025-12-01 Part of the Qwen3 series Batch calls at half price | Stable | Thinking 995,904 Non-thinking 997,952 | 32,768 Max chain-of-thought length: 81,920 | |||
qwen-plus-latest Currently qwen-plus-2025-12-01 Part of the Qwen3 series Batch calls at half price | Latest | Thinking 995,904 Non-thinking 997,952 | ||||
qwen-plus-2025-12-01 Part of the Qwen3 series | Snapshot | Thinking 995,904 Non-thinking 997,952 | ||||
qwen-plus-2025-09-11 Part of the Qwen3 series | ||||||
qwen-plus-2025-07-28 Also known as qwen-plus-0728 Part of the Qwen3 series | ||||||
qwen-plus-2025-07-14 Also known as qwen-plus-0714 Part of the Qwen3 series | 131,072 | Thinking 98,304 Non-thinking 129,024 | 16,384 Max chain-of-thought length: 38,912 | $0.115 | Thinking $1.147 Non-thinking $0.287 | |
qwen-plus-2025-04-28 Also known as qwen-plus-0428 Part of the Qwen3 series | ||||||
qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen-plus, qwen-plus-latest, qwen-plus-2025-12-01, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 use tiered billing based on the number of input tokens per request.
Qwen3.5-Plus
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 128 K | $0.115 | $0.688 |
128 K < Tokens ≤ 256 K | $0.287 | $1.72 |
256 K < Tokens ≤ 1 M | $0.573 | $3.44 |
Qwen-Plus
Input tokens per request | Mode | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 128 K | Non-thinking | $0.115 | $0.287 |
Thinking | $1.147 | ||
128 K < Tokens ≤ 256 K | Non-thinking | $0.345 | $2.868 |
Thinking | $3.441 | ||
256 K < Tokens ≤ 1 M | Non-thinking | $0.689 | $6.881 |
Thinking | $9.175 |
These models support thinking mode and non-thinking mode. Switch between the modes using the enable_thinking parameter. For these models, if you enable thinking mode but the model does not output a thought process, you are billed at the non-thinking mode rate.
China (Hong Kong)
In the China (Hong Kong) deployment mode, both the endpoint and data storage are located in China (Hong Kong). Model inference computing resources are also limited to China (Hong Kong).
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen-plus Currently qwen-plus-2025-12-01 Part of the Qwen3 series | Stable | 1,000,000 | Thinking 995,904 Non-thinking 997,952 | 32,768 Maximum chain-of-thought: 81,920 | Tiered pricing. See details below. | |
qwen-plus-2025-12-01 Part of the Qwen3 series | Snapshot | |||||
These models use tiered billing based on the number of input tokens per request.
Input tokens per request | Mode | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤256K | Non-thinking | $0.4 | $1.2 |
Thinking | $4 | ||
256K < Tokens ≤1M | Non-thinking | $1.2 | $3.6 |
Thinking | $12 |
EU
Under the European Union deployment mode, access points and data storage are located in Germany (Frankfurt). Model inference compute resources are limited to the European Union.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen-plus Currently qwen-plus-2025-12-01 Belongs to the Qwen3 series | Stable | 1,000,000 | Thinking 995,904 Non-thinking 997,952 | 32,768 Max CoT 81,920 | Tiered pricing. See details below. | |
qwen-plus-2025-12-01 Belongs to the Qwen3 series | Snapshot | |||||
The above models use tiered billing based on the number of input tokens for each request.
Input tokens per request | Mode | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 256K | Non-thinking | $0.4 | $1.2 |
Thinking | $4 | ||
256K < Tokens ≤ 1M | Non-thinking | $1.2 | $3.6 |
Thinking | $12 |
Qwen-Flash
Qwen-Flash is the fastest and most cost-effective model in the Qwen series, designed for simple jobs. Qwen-Flash features flexible tiered pricing, resulting in more reasonable billing compared to Qwen-Turbo. Usage | API reference | Try online | Thinking
International
In International Deployment Mode, access points and data storage are both located in the Singapore region, and model inference compute resources are dynamically scheduled globally (excluding the Chinese Mainland).
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output | Free quota |
(tokens) | (per 1M tokens) | ||||||||
qwen3.5-flash Currently qwen3.5-flash-2026-02-23. Thinking mode aka enabled by default. | Stable | Thinking | 1,000,000 | 983,616 | 81,920 | 65,536 | $0.1 | $0.4 | 1 million tokens each Valid for 90 days after activating Model Studio |
Non-thinking | 991,808 | - | |||||||
qwen3.5-flash-2026-02-23 Thinking mode enabled by default | Snapshot | Thinking | 983,616 | 81,920 | |||||
Non-thinking | 991,808 | - | |||||||
qwen-flash Currently qwen-flash-2025-07-28. Part of the Qwen3 series. Batch calls at half price | Stable | Thinking | 995,904 | 81,920 | 32,768 | Tiered pricing. See details below. | |||
Non-thinking | 997,952 | - | |||||||
qwen-flash-2025-07-28 Part of the Qwen3 series | Snapshot | Thought | 995,904 | 81,920 | |||||
Non-thinking | 997,952 | - | |||||||
qwen-flash and qwen-flash-2025-07-28 tiered pricing
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤256K | $0.05 | $0.4 |
256K < Tokens ≤1M | $0.25 | $2 |
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output |
(tokens) | (per 1M tokens) | |||||||
qwen3.5-flash Currently qwen3.5-flash-2026-02-23 Thinking mode aka enabled by default. | Stable | Thinking | 1,000,000 | 983,616 | 81,920 | 65,536 | Tiered pricing. See details below. | |
Non-thinking | 991,808 | - | ||||||
qwen3.5-flash-2026-02-23 Thinking mode enabled by default. | Snapshot | Thinking | 983,616 | 81,920 | ||||
Non-thinking | 991,808 | - | ||||||
qwen-flash Currently qwen-flash-2025-07-28 Part of the Qwen3 series | Stable | Thought | 995,904 | 81,920 | 32,768 | |||
Non-thinking | 997,952 | - | ||||||
qwen-flash-2025-07-28 Part of the Qwen3 series | Snapshot | Thought | 995,904 | 81,920 | ||||
Non-thinking | 997,952 | - | ||||||
The above models use tiered billing based on the number of tokens in this request, and qwen-flash supports context cache.
qwen3.5-flash, qwen3.5-flash-2026-02-23 tiered pricing
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤128K | $0.029 | $0.287 |
128K < Tokens ≤256K | $0.115 | $1.147 |
256K < Tokens ≤1M | $0.172 | $1.72 |
qwen-flash, qwen-flash-2025-07-28 tiered pricing
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤128K | $0.022 | $0.216 |
128K < Tokens ≤256K | $0.087 | $0.861 |
256K < Tokens ≤1M | $0.173 | $1.721 |
US
In the United States deployment mode, endpoints and data storage are both located in the US (Virginia) region. Model inference compute resources are limited to the United States.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output | Free quota |
(tokens) | (per 1M tokens) | ||||||||
qwen-flash-us Always the latest snapshot Part of the Qwen3 series | Stable | Thinking | 1,000,000 | 995,904 | 81,920 | 32,768 | Pricing is tiered. See the note below the table. | None | |
Non-thinking | 997,952 | - | |||||||
qwen-flash-2025-07-28-us Part of the Qwen3 series | Snapshot | Thinking | 995,904 | 81,920 | |||||
Non-thinking | 997,952 | - | |||||||
The models above use tiered pricing based on the number of input tokens in each request.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 256K | $0.05 | $0.4 |
256K < Tokens ≤ 1M | $0.25 | $2 |
Chinese Mainland
In the deployment mode for the Chinese Mainland, both the endpoint and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese Mainland.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output |
(tokens) | (per 1M tokens) | |||||||
qwen3.5-flash Currently qwen3.5-flash-2026-02-23 Thinking mode enabled by default Batch calls at half price | Stable | Thinking | 1,000,000 | 983,616 | 81,920 | 65,536 | Tiered pricing. See details below. | |
Non-thinking | 991,808 | - | ||||||
qwen3.5-flash-2026-02-23 Thinking mode enabled by default | Snapshot | Thinking | 983,616 | 81,920 | ||||
Non-thinking | 991,808 | - | ||||||
qwen-flash Currently qwen-flash-2025-07-28 Part of the Qwen3 series Batch calls at half price | Stable | Thinking | 995,904 | 81,920 | 32,768 | |||
Non-thinking | 997,952 | - | ||||||
qwen-flash-2025-07-28 Part of the Qwen3 series | Snapshot | Thinking | 995,904 | 81,920 | ||||
Non-thinking | 997,952 | - | ||||||
The models above use tiered pricing based on the number of input tokens in each request. The qwen3.5-flash and qwen-flash models support context cache Batch calls.
Tiered pricing for qwen3.5-flash and qwen3.5-flash-2026-02-23
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤128K | $0.029 | $0.287 |
128K < Tokens ≤256K | $0.115 | $1.147 |
256K < Tokens ≤1M | $0.172 | $1.72 |
Tiered pricing for qwen-flash and qwen-flash-2025-07-28
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤128K | $0.022 | $0.216 |
128K < Tokens ≤256K | $0.087 | $0.861 |
256K < Tokens ≤1M | $0.173 | $1.721 |
China (Hong Kong)
In the China (Hong Kong) deployment mode, the endpoint and data storage are located in China (Hong Kong), and model inference compute resources are limited to China (Hong Kong).
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output |
(tokens) | (per 1M tokens) | |||||||
qwen3.5-flash Currently qwen3.5-flash-2026-02-23. Thinking mode aka enabled by default. | Stable | Thinking | 1,000,000 | 983,616 | 81,920 | 65,536 | $0.1 | $0.4 |
Non-thinking | 991,808 | - | ||||||
qwen3.5-flash-2026-02-23 Thinking mode aka enabled by default. | Snapshot | Thinking | 983,616 | 81,920 | ||||
Non-thinking | 991,808 | - | ||||||
EU
In the European Union deployment mode, both the endpoint and data storage are located in Germany (Frankfurt). Model inference compute resources are restricted to within the European Union.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output |
(tokens) | (per 1M tokens) | |||||||
qwen3.5-flash Currently qwen3.5-flash-2026-02-23 Thinking mode enabled by default | Stable | Thinking | 1,000,000 | 983,616 | 81,920 | 65,536 | $0.1 | $0.4 |
Non-thinking | 991,808 | - | ||||||
qwen3.5-flash-2026-02-23 Thinking mode enabled by default | Snapshot | Thinking | 983,616 | 81,920 | ||||
Non-thinking | 991,808 | - | ||||||
Qwen-Turbo
Qwen-Turbo is no longer updated. Replace it with Qwen-Flash. Qwen-Flash uses a flexible tiered pricing model for fairer billing. Usage instructions | API reference | Try online | Deep thinking
International
In international deployment mode, endpoints and data storage are in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding the Chinese Mainland.
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||
qwen-turbo Currently qwen-turbo-2025-04-28 Part of the Qwen3 series Batch calls at half price | Stable | Thinking 131,072 Non-thinking 1,000,000 | Thinking 98,304 Non-thinking 1,000,000 | 16,384 Maximum chain-of-thought length: 38,912 | $0.05 | Thinking: $0.5 Non-thinking: $0.2 | 1 million tokens each Valid for 90 days after activating Model Studio |
qwen-turbo-latest Always matches the latest snapshot version Part of the Qwen3 series | Latest | $0.05 | Thinking: $0.5 Non-thinking: $0.2 | ||||
qwen-turbo-2025-04-28 Also known as qwen-turbo-0428 Part of the Qwen3 series | Snapshot | ||||||
qwen-turbo-2024-11-01 Also known as qwen-turbo-1101 | 1,000,000 | 1,000,000 | 8,192 | $0.2 | |||
Chinese Mainland
In Chinese Mainland deployment mode, endpoints and data storage are in the Beijing region. Model inference compute resources are limited to the Chinese Mainland.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen-turbo Currently qwen-turbo-2025-04-28 Part of the Qwen3 series | Stable | Thinking 131,072 Non-thinking 1,000,000 | Thinking 98,304 Non-thinking 1,000,000 | 16,384 Maximum chain-of-thought length: 38,912 | $0.044 | Thinking $0.431 Non-thinking $0.087 |
qwen-turbo-latest Always matches the latest snapshot version Part of the Qwen3 series | Latest | |||||
qwen-turbo-2025-07-15 Also known as qwen-turbo-0715 Part of the Qwen3 series | Snapshot | |||||
qwen-turbo-2025-04-28 Also known as qwen-turbo-0428 Part of the Qwen3 series | ||||||
QwQ
The QwQ reasoning model, trained on the Qwen2.5 model, uses reinforcement learning to significantly improve its model inference capabilities. The model's core metrics for math and code (AIME 24/25, LiveCodeBench) and general metrics (IFEval, LiveBench) are on par with the full-performance version of DeepSeek-R1. Usage
International
In the international deployment mode, endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding the Chinese Mainland.
Model | Version | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||||
qwq-plus | Stable | 131,072 | 98,304 | 32,768 | 8,192 | $0.8 | $2.4 | 1 million tokens Valid for 90 days after activating Model Studio |
Chinese Mainland
In the Chinese Mainland deployment mode, endpoints and data storage are located in the Beijing region, and model inference compute resources are limited to the Chinese Mainland.
Model | Version | Context window | Max input | Max CoT | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||||
qwq-plus Currently qwq-plus-2025-03-05. Batch calls at half price | Stable | 131,072 | 98,304 | 32,768 | 8,192 | $0.230 | $0.574 |
qwq-plus-latest Always the latest snapshot. | Latest | ||||||
qwq-plus-2025-03-05 Also known as qwq-plus-0305. | Snapshot | ||||||
Qwen-Long
Qwen-Long is the longest-context-length model in the Qwen series. It offers balanced capabilities and lower costs, making it ideal for long-text analysis, information extraction, summarization, and classification tasks.Usage | Try online
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen-long-latest Always matches the latest snapshot version Batch calls at half price | Stable | 10,000,000 | 10,000,000 | 32,768 | $0.072 | $0.287 |
qwen-long-2025-01-25 Also known as qwen-long-0125 | Snapshot | |||||
Qwen-Omni
The Qwen-Omni model accepts multimodal inputs—including text, images, audio, and video—and generates responses in text or speech form. It offers multiple expressive, human-like voices and supports multilingual and dialect speech output. You can use it in audio-video chat scenarios such as visual recognition, emotion perception, and education and training.Usage | API reference
International
In the International deployment mode, endpoint and data storage are located in the Singapore region, while model inference computing resources are dynamically scheduled globally (excluding Chinese Mainland).
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | |||||||||
qwen3.5-omni-plus Currently qwen3.5-omni-plus-2026-03-15 | Stable | Non-thinking | 262,144 | 196,608 | - | 65,536 | In preview. Model invocation is temporarily free, excluding tool calling fees. | 1 million tokens each (all modalities) Valid for 90 days after activating Model Studio | |
qwen3.5-omni-plus-2026-03-15 | Snapshot | Non-thinking | 262,144 | 196,608 | - | 65,536 | |||
qwen3.5-omni-flash Currently qwen3.5-omni-flash-2026-03-15 | Stable | Non-thinking | 262,144 | 196,608 | - | 65,536 | |||
qwen3.5-omni-flash-2026-03-15 | Snapshot | Non-thinking | 262,144 | 196,608 | - | 65,536 | |||
qwen3-omni-flash Currently qwen3-omni-flash-2025-12-01 | Stable | Thinking | 65,536 | 16,384 | 32,768 | 16,384 | See pricing details below. | ||
Non-thinking | 49,152 | - | |||||||
qwen3-omni-flash-2025-12-01 | Snapshot | Thinking | 65,536 | 16,384 | 32,768 | 16,384 | |||
Non-thinking | 49,152 | - | |||||||
qwen3-omni-flash-2025-09-15 Also known as qwen3-omni-flash-0915 | Snapshot | Thinking | 65,536 | 16,384 | 32,768 | 16,384 | |||
Non-thinking | 49,152 | - | |||||||
Qwen3-Omni-Flash
|
|
Chinese Mainland
In the Chinese Mainland deployment mode, endpoint and data storage are located in the Beijing region, and model inference computing resources are limited to Chinese Mainland.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | |||||||||
qwen3.5-omni-plus Currently qwen3.5-omni-plus-2026-03-15 | Stable | Non-thinking | 262,144 | 196,608 | - | 65,536 | In preview. Model invocation is temporarily free, excluding tool calling fees. | No free quota | |
qwen3.5-omni-plus-2026-03-15 | Snapshot | Non-thinking | 262,144 | 196,608 | - | 65,536 | |||
qwen3.5-omni-flash Currently qwen3.5-omni-flash-2026-03-15 | Stable | Non-thinking | 262,144 | 196,608 | - | 65,536 | |||
qwen3.5-omni-flash-2026-03-15 | Snapshot | Non-thinking | 262,144 | 196,608 | - | 65,536 | |||
qwen3-omni-flash Currently qwen3-omni-flash-2025-12-01 | Stable | Thinking | 65,536 | 16,384 | 32,768 | 16,384 | See pricing details below. | ||
Non-thinking | 49,152 | - | |||||||
qwen3-omni-flash-2025-12-01 | Snapshot | Thinking | 65,536 | 16,384 | 32,768 | 16,384 | |||
Non-thinking | 49,152 | - | |||||||
qwen3-omni-flash-2025-09-15 Also known as qwen3-omni-flash-0915 | Snapshot | Thinking | 65,536 | 16,384 | 32,768 | 16,384 | |||
Non-thinking | 49,152 | - | |||||||
Qwen3-Omni-Flash
|
|
Qwen-Omni-Realtime
Compared to Qwen-Omni, Qwen-Omni-Realtime supports streaming audio input and includes built-in VAD (Voice Activity Detection) to automatically detect speech start and end points.Usage | Client-side events | Server-side events
International
In international deployment mode, endpoints and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the China (Mainland) region).
Model | Version | Context window | Max input | Max output | Free quota |
(tokens) | |||||
qwen3-omni-flash-realtime Currently qwen3-omni-flash-realtime-2025-12-01 | Stable | 65,536 | 49,152 | 16,384 | 1 million tokens each, regardless of modality Valid for 90 days after activating Model Studio |
qwen3-omni-flash-realtime-2025-12-01 | Snapshot | ||||
qwen3-omni-flash-realtime-2025-09-15 | |||||
After your free quota is used up, billing rules for input and output are as follows:
|
|
Chinese Mainland
In China (Mainland) deployment mode, endpoints and data storage are both located in the Beijing region. Model inference compute resources are limited to the China (Mainland) region.
Model | Version | Context window | Max input | Max output | Free quota |
(tokens) | |||||
qwen3-omni-flash-realtime Currently qwen3-omni-flash-realtime-2025-12-01 | Stable | 65,536 | 49,152 | 16,384 | No free quota |
qwen3-omni-flash-realtime-2025-12-01 | Snapshot | ||||
qwen3-omni-flash-realtime-2025-09-15 | |||||
After your free quota is used up, billing rules for input and output are as follows:
|
|
QVQ
QVQ is a visual reasoning model that supports visual inputs and chain-of-thought outputs. It shows stronger performance on math, programming, visual analysis, creative tasks, and general-purpose tasks.Usage | Try online
International
In international deployment mode, endpoints and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the China (Mainland) region).
Model | Version | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||||
qvq-max Currently qwen-max-2025-03-25 | Stable | 131,072 | 106,496 Max per image: 16,384 | 16,384 | 8,192 | $1.2 | $4.8 | 1 million tokens each Valid for 90 days after activating Model Studio |
qvq-max-latest Always matches the latest snapshot version | Latest | |||||||
qvq-max-2025-03-25 Also known as qvq-max-0325 | Snapshot | |||||||
Chinese Mainland
In China (Mainland) deployment mode, endpoints and data storage are both located in the Beijing region. Model inference compute resources are limited to the China (Mainland) region.
Model | Version | Context window | Max input | Max CoT | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||||
qvq-max Compared to qvq-plus, qvq-max provides stronger visual reasoning and instruction-following capabilities, delivering the best performance on more complex tasks. Currently qwen-max-2025-03-25 | Stable | 131,072 | 106,496 Max per image: 16,384 | 16,384 | 8,192 | $1.147 | $4.588 |
qvq-max-latest Always matches the latest snapshot version | Latest | ||||||
qvq-max-2025-05-15 Also known as qvq-max-0515 | Snapshot | ||||||
qvq-max-2025-03-25 Also known as qvq-max-0325 | |||||||
qvq-plus Currently qwen-plus-2025-05-15 | Stable | $0.287 | $0.717 | ||||
qvq-plus-latest Always matches the latest snapshot version | Latest | ||||||
qvq-plus-2025-05-15 Also known as qvq-plus-0515 | Snapshot | ||||||
Qwen-VL
Qwen-VL is a text generation model with visual (image) understanding capabilities. It performs OCR (Optical Character Recognition) and further summarizes and infers. For example, it extracts properties from product photos and solves problems based on exercise diagrams. Usage | API reference | Try online
Qwen-VL models are billed based on the total number of input and output tokens. For rules on calculating image tokens, see Image and Video Understanding.
International
In the International deployment mode, access points and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the Chinese Mainland).
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output | Free quota |
(tokens) | (per 1M tokens) | ||||||||
qwen3-vl-plus Currently qwen3-vl-plus-2025-12-19. | Stable | Thinking | 262,144 | 258,048 Max per image: 16,384 | 81,920 | 32,768 | Tiered pricing. See details below. | 1 million tokens Valid for 90 days after activating Model Studio | |
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-plus-2025-12-19 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-plus-2025-09-23 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-flash Currently qwen3-vl-flash-2025-10-15. | Stable | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-flash-2026-01-22 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-flash-2025-10-15 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
The above models use tiered pricing based on the number of input tokens per request. Input and output prices are the same for thinking and non-thinking modes. The qwen3-vl-plus and qwen3-vl-flash models support context cache.
qwen3-vl-plus Series
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < tokens ≤ 32K | $0.2 | $1.6 |
32K < tokens ≤ 128K | $0.30 | $2.4 |
128K < tokens ≤ 256K | $0.6 | $4.8 |
qwen3-vl-flash Series
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤32K | $0.05 | $0.4 |
32K < Tokens ≤128K | $0.075 | $0.6 |
128K < Tokens ≤256K | $0.12 | $0.96 |
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output |
(tokens) | (per 1M tokens) | |||||||
qwen3-vl-plus Currently qwen3-vl-plus-2025-12-19. | Stable | Thinking | 262,144 | 258,048 Max per image: 16,384 | 81,920 | 32,768 | Tiered pricing. See details below. | |
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
qwen3-vl-plus-2025-09-23 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | ||||
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
qwen3-vl-flash Currently qwen3-vl-flash-2025-10-15. | Stable | Thinking | 258,048 Max per image: 16,384 | 81,920 | ||||
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
qwen3-vl-flash-2025-10-15 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | ||||
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
The above models use tiered pricing based on the number of input tokens per request. Input and output prices are the same for thinking and non-thinking modes. The qwen3-vl-plus and qwen3-vl-flash models support context cache.
qwen3-vl-plus Series
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32K | $0.143 | $1.434 |
32K < Tokens ≤ 128K | $0.215 | $2.15 |
128K < Tokens ≤ 256K | $0.43 | $4.301 |
qwen3-vl-flash Series
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32K | $0.022 | $0.215 |
32K < Tokens ≤ 128K | $0.043 | $0.43 |
128K < Tokens ≤ 256K | $0.086 | $0.859 |
US
Under the United States deployment mode, access points and data storage are both located in the US (Virginia) region. Model inference compute resources are limited to the United States.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output |
(tokens) | (per 1M tokens) | |||||||
qwen3-vl-flash-us Currently qwen3-vl-flash-2025-10-15-us. | Stable | Thinking | 262,144 | 258,048 Max per image: 16,384 | 81,920 | 32,768 | Tiered pricing. See details below. | |
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
qwen3-vl-flash-2026-01-22-us | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | ||||
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
qwen3-vl-flash-2025-10-15-us | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | ||||
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
The models above use tiered pricing based on the number of input tokens for each request. Input and output prices are the same for thinking mode and non-thinking mode. qwen3-vl-flash-us supports context cache.
Input Tokens Per Request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32K | $0.05 | $0.4 |
32K < Tokens ≤ 128K | $0.075 | $0.6 |
128K < Tokens ≤ 256K | $0.12 | $0.96 |
Chinese Mainland
In the Chinese Mainland deployment mode, endpoints and data storage are located in the Beijing region. Model inference compute resources are available only in the Chinese Mainland.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||||
qwen3-vl-plus Matches qwen3-vl-plus-2025-12-19 in capability. Batch calls at half price | Stable | Thinking | 262,144 | 258,048 Max per image: 16,384 | 81,920 | 32,768 | Priced in tiers. See the notes below the table. | No free quota | |
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-plus-2025-12-19 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-plus-2025-09-23 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-flash Matches qwen3-vl-flash-2025-10-15 in capability. Batch calls at half price | Stable | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-flash-2026-01-22 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
qwen3-vl-flash-2025-10-15 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | |||||
Non-thinking | 260,096 Max per image: 16,384 | - | |||||||
All models above use tiered pricing based on the number of tokens in your request. Input and output costs are identical for thinking and non-thinking modes. The qwen3-vl-plus and qwen3-vl-flash models support context cache.
qwen3-vl-plus series
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32 K | $0.143 | $1.434 |
32 K < Tokens ≤ 128 K | $0.215 | $2.15 |
128 K < Tokens ≤ 256 K | $0.43 | $4.301 |
qwen3-vl-flash series
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32K | $0.022 | $0.215 |
32K < Tokens ≤ 128K | $0.043 | $0.43 |
128K < Tokens ≤ 256K | $0.086 | $0.859 |
Hong Kong (China)
In Hong Kong (China) deployment mode, both the endpoint and data storage are located in Hong Kong (China). Model inference uses compute resources only in Hong Kong (China).
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output |
(tokens) | (per 1M tokens) | |||||||
qwen3-vl-plus Currently qwen3-vl-plus-2025-12-19 | Stable | Thinking | 262,144 | 258,048 Max per image: 16,384 | 81,920 | 32,768 | Pricing is tiered. See the note below the table. | |
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
qwen3-vl-plus-2025-12-19 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | ||||
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
These models use tiered billing based on the number of input tokens in each request. Input and output costs are the same for thinking and non-thinking modes.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32 K | $0.20 | $1.60 |
32 K < Tokens ≤ 128 K | $0.30 | $2.40 |
128 K < Tokens ≤ 256 K | $0.60 | $4.80 |
EU
In the EU deployment mode, both the endpoint and data storage are located in Germany (Frankfurt). Model inference compute resources are restricted to the European Union.
Model | Version | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output |
(tokens) | (per 1M tokens) | |||||||
qwen3-vl-plus | Stable | Thinking | 262,144 | 258,048 Max per image: 16,384 | 81,920 | 32,768 | Tiered pricing. See details below. | |
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
qwen3-vl-flash Currently qwen3-vl-flash-2025-10-15 | Stable | Thinking | 258,048 Max per image: 16,384 | 81,920 | ||||
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
qwen3-vl-flash-2025-10-15 | Snapshot | Thinking | 258,048 Max per image: 16,384 | 81,920 | ||||
Non-thinking | 260,096 Max per image: 16,384 | - | ||||||
The models above use tiered pricing based on the number of input tokens in each request. Input and output costs are identical for thinking and non-thinking modes.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < tokens ≤ 32,000 | $0.2 | $1.6 |
32,000 < tokens ≤ 128,000 | $0.30 | $2.4 |
128,000 < tokens ≤ 256,000 | $0.60 | $4.8 |
The Qwen3-VL-Flash-2026-01-22 model effectively integrates thinking mode and non-thinking mode. Compared to the snapshot version from October 15, 2025, it significantly improves overall model performance and enables higher-accuracy inference in business scenarios such as general-purpose visual recognition, security surveillance, store inspection, routine inspection, and photo-based problem solving.
Qwen-OCR
Qwen-OCR is a model specifically designed for text extraction. Compared to Qwen-VL models, it focuses more on extracting text from documents, tables, test questions, and handwritten images. It can recognize multiple languages, including English, French, Japanese, Korean, German, Russian, and Italian.Usage | API reference | Try online
International
In international deployment mode, endpoints and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the China (Mainland) region).
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||
qwen-vl-ocr Currently qwen-vl-ocr-2025-11-20 | Stable | 38,192 | 30,000 Max per image: 30,000 | 8,192 | $0.07 | $0.16 | 1 million tokens each Valid for 90 days after activating Model Studio |
qwen-vl-ocr-2025-11-20 Also known as qwen-vl-ocr-1120 Based on the Qwen3-VL architecture, significantly improving document parsing and text localization capabilities. | Snapshot | ||||||
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen-vl-ocr Currently qwen-vl-ocr-2025-11-20 | Stable | 38,192 | 30,000 Max per image: 30,000 | 8,192 | $0.043 | $0.072 |
qwen-vl-ocr-2025-11-20 Also known as qwen-vl-ocr-1120 Based on the Qwen3-VL architecture, this version significantly improves document parsing and text localization capabilities. | Snapshot | |||||
Chinese Mainland
In China (Mainland) deployment mode, endpoints and data storage are both located in the Beijing region. Model inference compute resources are limited to the China (Mainland) region.
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||
qwen-vl-ocr Currently qwen-vl-ocr-2025-11-20 Batch calls at half price | Stable | 38,192 | 30,000 Max per image: 30,000 | 8,192 | $0.043 | $0.072 | No free quota |
qwen-vl-ocr-latest Always the latest snapshot | Latest | ||||||
qwen-vl-ocr-2025-11-20 Also known as qwen-vl-ocr-1120 Based on the Qwen3-VL architecture, significantly improving document parsing and text localization capabilities. | Snapshot | ||||||
qwen-vl-ocr-2025-08-28 Also known as qwen-vl-ocr-0828 | 34,096 | 4,096 | $0.717 | $0.717 | |||
qwen-vl-ocr-2025-04-13 Also known as qwen-vl-ocr-0413 | |||||||
qwen-vl-ocr-2024-10-28 Also known as qwen-vl-ocr-1028 | |||||||
Qwen-Math
The Qwen-Math model is a language model designed specifically for solving mathematical problems. Usage | API reference | Try online
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen-math-plus Currently qwen-math-plus-2024-09-19 | Stable | 4,096 | 3,072 | 3,072 | $0.574 | $1.721 |
qwen-math-plus-latest Always matches the latest snapshot version | Latest | |||||
qwen-math-plus-2024-09-19 Also known as qwen-math-plus-0919 | Snapshot | |||||
qwen-math-plus-2024-08-16 Also known as qwen-math-plus-0816 | ||||||
qwen-math-turbo Currently qwen-math-turbo-2024-09-19 | Stable | $0.287 | $0.861 | |||
qwen-math-turbo-latest Always matches the latest snapshot version | Latest | |||||
qwen-math-turbo-2024-09-19 Also known as qwen-math-turbo-0919 | Snapshot | |||||
Qwen-Coder
Qwen code models: The latest Qwen3-Coder-Plus series are code generation models built on Qwen3, featuring robust Coding Agent capabilities. They excel at tool calling and environment interaction, enabling autonomous programming with exceptional coding skills while retaining general-purpose functionality. Usage | API reference | Try online
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese Mainland).
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||
qwen3-coder-plus Currently qwen3-coder-plus-2025-09-23 | Stable | 1,000,000 | 997,952 | 65,536 | Tiered pricing. See details below. | 1 million tokens each Valid for 90 days after activating Model Studio | |
qwen3-coder-plus-2025-09-23 | Snapshot | ||||||
qwen3-coder-plus-2025-07-22 | Snapshot | ||||||
qwen3-coder-flash Currently qwen3-coder-flash-2025-07-28 | Stable | ||||||
qwen3-coder-flash-2025-07-28 | Snapshot | ||||||
The models above use tiered billing based on the number of input tokens in the current request.
qwen3-coder-plus series
Pricing for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 is shown below. qwen3-coder-plus supports context cache. Input text that hits an implicit cache is billed at 20% of the standard rate. Input text that hits an explicit cache is billed at 10% of the standard rate.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤32K | $1 | $5 |
32K < Tokens ≤128K | $1.8 | $9 |
128K < Tokens ≤256K | $3 | $15 |
256K < Tokens ≤1M | $6 | $60 |
qwen3-coder-flash series
Pricing for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 is shown below. qwen3-coder-flash supports context cache. Input text that hits an implicit cache is billed at 20% of the standard rate. Input text that hits an explicit cache is billed at 10% of the standard rate.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32K | $0.30 | $1.50 |
32K < Tokens ≤ 128K | $0.50 | $2.50 |
128K < Tokens ≤ 256K | $0.80 | $4 |
256K < Tokens ≤ 1M | $1.60 | $9.60 |
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen3-coder-plus Currently qwen3-coder-plus-2025-09-23. | Stable | 1,000,000 | 997,952 | 65,536 | Tiered pricing. See details below. | |
qwen3-coder-plus-2025-09-23 | Snapshot | |||||
qwen3-coder-plus-2025-07-22 | Snapshot | |||||
qwen3-coder-flash Currently qwen3-coder-flash-2025-07-28. | Stable | |||||
qwen3-coder-flash-2025-07-28 | Snapshot | |||||
The models above use tiered billing based on the number of input tokens in the current request.
qwen3-coder-plus series
Pricing for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 is shown below. qwen3-coder-plus supports context cache. Input text that hits a cache is billed at 20% of the standard rate.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32K | $0.574 | $2.294 |
32K < Tokens ≤ 128K | $0.861 | $3.441 |
128K < Tokens ≤ 256K | $1.434 | $5.735 |
256K < Tokens ≤ 1M | $2.868 | $28.671 |
qwen3-coder-flash series
Pricing for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 is shown below. qwen3-coder-flash supports context cache. Input text that hits a cache is billed at 20% of the standard rate.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32K | $0.144 | $0.574 |
32K < Tokens ≤ 128K | $0.216 | $0.861 |
128K < Tokens ≤ 256K | $0.359 | $1.434 |
256K < Tokens ≤ 1M | $0.717 | $3.584 |
Chinese Mainland
In Chinese Mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese Mainland.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwen3-coder-plus Currently qwen3-coder-plus-2025-09-23 | Stable | 1,000,000 | 997,952 | 65,536 | Tiered pricing. See details below. | |
qwen3-coder-plus-2025-09-23 | Snapshot | |||||
qwen3-coder-plus-2025-07-22 | Snapshot | |||||
qwen3-coder-flash Currently qwen3-coder-flash-2025-07-28 | Stable | |||||
qwen3-coder-flash-2025-07-28 | Snapshot | |||||
The models above use tiered billing based on the number of input tokens in the current request.
qwen3-coder-plus series
Pricing for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 is shown below. qwen3-coder-plus supports context cache. Input text that hits an implicit cache is billed at 20% of the standard rate. Input text that hits an explicit cache is billed at 10% of the standard rate.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32K | $0.574 | $2.294 |
32K < Tokens ≤ 128K | $0.861 | $3.441 |
128K < Tokens ≤ 256K | $1.434 | $5.735 |
256K < Tokens ≤ 1M | $2.868 | $28.671 |
qwen3-coder-flash series
Pricing for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 is shown below. qwen3-coder-flash supports context cache. Input text that hits an implicit cache is billed at 20% of the standard rate. Input text that hits an explicit cache is billed at 10% of the standard rate.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32K | $0.144 | $0.574 |
32K < Tokens ≤ 128K | $0.216 | $0.861 |
128K < Tokens ≤ 256K | $0.359 | $1.434 |
256K < Tokens ≤ 1M | $0.717 | $3.584 |
Qwen translation models
Qwen3-MT is a flagship Large Language Model (LLM) for translation and a comprehensive upgrade of Qwen 3. It supports translation between 92 languages, including Chinese, English, Japanese, Korean, French, Spanish, German, Thai, Indonesian, Vietnamese, and Arabic. The model's performance and translation quality are significantly improved. It provides enhanced stability for terminology customization, format preservation, and domain-specific prompting, resulting in more accurate and natural translations. Usage
International
In the international deployment mode, both the endpoint and data storage are located in the Singapore region. Model inference computing resources are dynamically scheduled worldwide, except in the Chinese Mainland.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||
qwen-mt-plus Part of Qwen3-MT | 16,384 | 8,192 | 8,192 | $2.46 | $7.37 | 1 million tokens each Valid for 90 days after activating Model Studio |
qwen-mt-flash Part of Qwen3-MT | $0.16 | $0.49 | ||||
qwen-mt-lite Part of Qwen3-MT | $0.12 | $0.36 | ||||
qwen-mt-turbo Part of Qwen3-MT | $0.16 | $0.49 | ||||
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen-mt-plus Part of Qwen3-MT | 16,384 | 8,192 | 8,192 | $0.259 | $0.775 |
qwen-mt-flash Part of Qwen3-MT | $0.101 | $0.280 | |||
qwen-mt-lite Part of Qwen3-MT | $0.086 | $0.229 | |||
Chinese Mainland
In the Chinese Mainland deployment mode, both the endpoint and data storage are in the Beijing region. Computing resources for model inference are limited to the Chinese Mainland.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen-mt-plus Part of Qwen3-MT | 16,384 | 8,192 | 8,192 | $0.259 | $0.775 |
qwen-mt-flash Part of Qwen3-MT | $0.101 | $0.280 | |||
qwen-mt-lite Part of Qwen3-MT | $0.086 | $0.229 | |||
qwen-mt-turbo Part of Qwen3-MT | $0.101 | $0.280 | |||
Qwen-MT
You can use the Qwen data mining model to extract structured information from documents for data annotation, content moderation, and other tasks. Usage | API reference
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||
qwen-doc-turbo | 262,144 | 253,952 | 32,768 | $0.087 | $0.144 | No free quota |
Qwen-Deep-Research
The Qwen deep research model breaks down complex problems, performs inference and analysis using web search, and generates research reports. Usage | API reference
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1,000 tokens) | ||||
qwen-deep-research | 1,000,000 | 997,952 | 32,768 | $0.007742 | $0.023367 |
Text generation - Qwen open-source edition
In model names, xxb indicates parameter scale. For example, qwen2-72b-instruct has 72 billion (72B) parameters.
Model Studio supports calling Qwen's open-source models without requiring local deployment. For open-source models, use Qwen3 and Qwen2.5.
Qwen3.5
Accepts text, image, and video input. Performs on par with Qwen3 Max for plain text tasks—faster and more cost-effective. Offers significant improvements in multimodal capabilities compared to the Qwen3 VL series.
Model | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output | Free quota |
(tokens) | (per 1M tokens) | |||||||
qwen3.5-397b-a17b Default: Thinking | Thinking | 262,144 | 258,048 | 81,920 | 65,536 | Tiered pricing. See details below. | 1 million tokens each Valid for 90 days after activating Model Studio Available in international regions only | |
Non-thinking | 260,096 | - | ||||||
qwen3.5-122b-a10b Default: Thinking | Thinking | 262,144 | 258,048 | 81,920 | 65,536 | |||
Non-thinking | 260,096 | - | ||||||
qwen3.5-27b Default: Thinking | Thinking | 262,144 | 258,048 | 81,920 | 65,536 | |||
Non-thinking | 260,096 | - | ||||||
qwen3.5-35b-a3b Default: Thinking | Thinking | 262,144 | 258,048 | 81,920 | 65,536 | |||
Non-thinking | 260,096 | - | ||||||
qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, and qwen3.5-35b-a3b use tiered pricing based on the number of input tokens per request.
International
Model | Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
qwen3.5-397b-a17b | 0 < tokens ≤ 256K | $0.60 | $3.6 |
qwen3.5-122b-a10b | $0.40 | $3.2 | |
Qwen3.5-27B | $0.3 | $2.40 | |
Qwen3.5-35B-A3B | $0.25 | $2 |
Global
Model | Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
qwen3.5-397b-a17b | 0 < tokens ≤ 128K | $0.172 | $1.032 |
128K < tokens ≤ 256K | $0.43 | $2.58 | |
qwen3.5-122b-a10b | 0 < tokens ≤ 128K | $0.115 | $0.917 |
128K < tokens ≤ 256K | $0.287 | $2.294 | |
qwen3.5-27b | 0 < tokens ≤ 128K | $0.086 | $0.688 |
128K < tokens ≤ 256K | $0.258 | $2.064 | |
qwen3.5-35b-a3b | 0 < tokens ≤ 128K | $0.057 | $0.459 |
128K < tokens ≤ 256K | $0.229 | $1.835 |
Chinese Mainland
Model | Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
qwen3.5-397b-a17b | 0 < tokens ≤ 128K | $0.172 | $1.032 |
128K < tokens ≤ 256K | $0.43 | $2.58 | |
qwen3.5-122b-a10b | 0 < tokens ≤ 128K | $0.115 | $0.917 |
128K < tokens ≤ 256K | $0.287 | $2.294 | |
qwen3.5-27b | 0 < tokens ≤ 128K | $0.086 | $0.688 |
128K < tokens ≤ 256K | $0.258 | $2.064 | |
qwen3.5-35b-a3b | 0 < tokens ≤ 128K | $0.057 | $0.459 |
128K < tokens ≤ 256K | $0.229 | $1.835 |
Qwen3
The qwen3-next-80b-a3b-thinking model, released in September 2025, supports only thinking mode. It improves instruction following compared with qwen3-235b-a22b-thinking-2507 and generates more concise summaries.
The qwen3-next-80b-a3b-instruct model, released in September 2025, supports only non-thinking mode. It improves Chinese understanding, logical reasoning, and text generation compared with qwen3-235b-a22b-instruct-2507.
The qwen3-235b-a22b-thinking-2507 and qwen3-30b-a3b-thinking-2507 models, released in July 2025, support only thinking mode. They are upgrades of qwen3-235b-a22b (thinking mode) and qwen3-30b-a3b (thinking mode).
The qwen3-235b-a22b-instruct-2507 and qwen3-30b-a3b-instruct-2507 models, released in July 2025, support only non-thinking mode. They are upgrades of qwen3-235b-a22b (non-thinking mode) and qwen3-30b-a3b (non-thinking mode).
The Qwen3 model, released in April 2025, supports both thinking mode and non-thinking mode. Use the enable_thinking parameter to switch between them. Qwen3 also delivers major capability improvements:
Inference capability: Outperforms QwQ and same-size non-inference models on math, coding, and logical reasoning benchmarks. Matches top industry performance at this scale.
Human preference capability: Improves creative writing, role assumption, multi-turn conversation, and instruction following. General capabilities exceed those of same-size models.
Agent capability: Leads the industry in both thinking and non-thinking modes. Enables precise external tool calling.
Multi-language capability: Supports over 100 languages and dialects. Translation, instruction understanding, and commonsense reasoning all improve significantly.
Response format fixes: Resolves response format issues from earlier versions, such as malformed Markdown, mid-response truncation, and incorrect boxed output.
The open-source Qwen3 model, released in April 2025, does not support non-streaming output in thinking mode.
If you enable thinking mode for the open-source Qwen3 model but it does not output the thinking process, billing applies at the non-thinking mode rate.
Thinking | Non-thinking | Usage
International
In international deployment mode, endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese Mainland).
Model | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||||
qwen3-next-80b-a3b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.15 | $1.2 | 1 million tokens each Valid for 90 days after activating Model Studio |
qwen3-next-80b-a3b-instruct | Thinking not supported | 129,024 | - | |||||
qwen3-235b-a22b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.23 | $2.3 | |||
qwen3-235b-a22b-instruct-2507 | Thinking not supported | 129,024 | - | $0.92 | ||||
qwen3-30b-a3b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.2 | $2.4 | |||
qwen3-30b-a3b-instruct-2507 | Thinking not supported | 129,024 | - | $0.8 | ||||
qwen3-235b-a22b This model and the following models were all released in April 2025. | Non-thinking | 129,024 | - | 16,384 | $0.7 | $2.8 | ||
Thinking | 98,304 | 38,912 | $8.4 | |||||
qwen3-32b | Non-thinking | 129,024 | - | $0.16 | $0.64 | |||
Thinking | 98,304 | 38,912 | ||||||
qwen3-30b-a3b | Non-thinking | 129,024 | - | $0.2 | $0.8 | |||
Thinking | 98,304 | 38,912 | $2.4 | |||||
qwen3-14b | Non-thinking | 129,024 | - | 8,192 | $0.35 | $1.4 | ||
Thinking | 98,304 | 38,912 | $4.2 | |||||
qwen3-8b | Non-thinking | 129,024 | - | $0.18 | $0.7 | |||
Thinking | 98,304 | 38,912 | $2.1 | |||||
qwen3-4b | Non-thinking | 129,024 | - | $0.11 | $0.42 | |||
Thinking | 98,304 | 38,912 | $1.26 | |||||
qwen3-1.7b | Non-thinking | 32,768 | 30,720 | - | $0.42 | |||
Thinking | 28,672 | The sum of the input values cannot exceed 30,720. | $1.26 | |||||
qwen3-0.6b | Non-thinking | 30,720 | - | $0.42 | ||||
Thinking | 28,672 | The sum of the input must not exceed 30,720. | $1.26 | |||||
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
The qwen3-32b, qwen3-14b, and qwen3-8b models currently support global deployment mode only in the US (Virginia) region.
Model | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||||
qwen3-next-80b-a3b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.144 | $1.434 | No free quota |
qwen3-next-80b-a3b-instruct | Thinking not supported | 129,024 | - | $0.574 | ||||
qwen3-235b-a22b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.23 | $2.3 | |||
qwen3-235b-a22b-instruct-2507 | Thinking not supported | 129,024 | - | $0.92 | ||||
qwen3-30b-a3b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.108 | $1.076 | |||
qwen3-30b-a3b-instruct-2507 | Thinking not supported | 129,024 | - | $0.431 | ||||
qwen3-235b-a22b | Non-thinking | 129,024 | - | 16,384 | $0.287 | $1.147 | ||
Thinking | 98,304 | 38,912 | $2.868 | |||||
qwen3-32b | Non-thinking | 129,024 | - | $0.16 | $0.64 | |||
Thinking | 98,304 | 38,912 | ||||||
qwen3-30b-a3b | Non-thinking | 129,024 | - | $0.108 | $0.431 | |||
Thinking | 98,304 | 38,912 | $1.076 | |||||
qwen3-14b | Non-thinking | 129,024 | - | 8,192 | $0.144 | $0.574 | ||
Thinking | 98,304 | 38,912 | $1.434 | |||||
qwen3-8b | Non-thinking | 129,024 | - | $0.072 | $0.287 | |||
Thinking | 98,304 | 38,912 | $0.717 | |||||
Chinese Mainland
In Chinese Mainland deployment mode, endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese Mainland.
Model | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||||
qwen3-next-80b-a3b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.144 | $1.434 | No free quota |
qwen3-next-80b-a3b-instruct | Thinking not supported | 129,024 | - | $0.574 | ||||
qwen3-235b-a22b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.287 | $2.868 | |||
qwen3-235b-a22b-instruct-2507 | Thinking not supported | 129,024 | - | $1.147 | ||||
qwen3-30b-a3b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.108 | $1.076 | |||
qwen3-30b-a3b-instruct-2507 | Thinking not supported | 129,024 | - | $0.431 | ||||
qwen3-235b-a22b | Non-thinking | 129,024 | - | 16,384 | $0.287 | $1.147 | ||
Thinking | 98,304 | 38,912 | $2.868 | |||||
qwen3-32b | Non-thinking | 129,024 | - | $0.287 | $1.147 | |||
Thinking | 98,304 | 38,912 | $2.868 | |||||
qwen3-30b-a3b | Non-thinking | 129,024 | - | $0.108 | $0.431 | |||
Thinking | 98,304 | 38,912 | $1.076 | |||||
qwen3-14b | Non-thinking | 129,024 | - | 8,192 | $0.144 | $0.574 | ||
Thinking | 98,304 | 38,912 | $1.434 | |||||
qwen3-8b | Non-thinking | 129,024 | - | $0.072 | $0.287 | |||
Thinking | 98,304 | 38,912 | $0.717 | |||||
qwen3-4b | Non-thinking | 129,024 | - | $0.044 | $0.173 | |||
Thinking | 98,304 | 38,912 | $0.431 | |||||
qwen3-1.7b | Non-thinking | 32,768 | 30,720 | - | $0.173 | |||
Thinking | 28,672 | Combined with input must not exceed 30,720 | $0.431 | |||||
qwen3-0.6b | Non-thinking | 30,720 | - | $0.173 | ||||
Thinking | 28,672 | Combined with input must not exceed 30,720 | $0.431 | |||||
QwQ open-source
QwQ is a reasoning model trained from Qwen2.5-32B, with significantly enhanced reasoning capabilities through reinforcement learning. Its core metrics (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench, etc.) match DeepSeek-R1 full version levels and significantly outperform DeepSeek-R1-Distill-Qwen-32B, which is also based on Qwen2.5-32B. Usage | API reference
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Context window | Max input | Max CoT | Max response | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
qwq-32b | 131,072 | 98,304 | 32,768 | 8,192 | $0.287 | $0.861 |
QwQ-Preview
qwq-32b-preview is an experimental research model developed by the Qwen team in 2024, focused on enhancing AI reasoning capabilities, especially in math and programming. See QwQ official blog for model limitations. Usage | API reference | Try online
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwq-32b-preview | 32,768 | 30,720 | 16,384 | $0.287 | $0.861 |
Qwen2.5
QVQ
qvq-72b-preview is an experimental research model developed by the Qwen team, focused on enhancing visual reasoning capabilities, especially in mathematical reasoning. See QVQ official blog for model limitations. Usage | API reference
To have the model output its thinking process before the answer, use the commercial model QVQ.
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qvq-72b-preview | 32,768 | 16,384 Max 16,384 per image | 16,384 | $1.721 | $5.161 |
Qwen-Omni
A new multimodal understanding and generation LLM trained from Qwen2.5, supporting text, image, audio, and video input understanding. Capable of simultaneous streaming generation of text and speech, with significantly improved multimodal content understanding speed. Usage | API reference
International
In international deployment mode, endpoints and data storage are located in Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the Chinese mainland).
Model | Context window | Max input | Max output | Free quota |
(tokens) | ||||
qwen2.5-omni-7b | 32,768 | 30,720 | 2,048 | 1 million tokens (modality agnostic) Valid for 90 days after Model Studio activation |
After free quota is exhausted, input and output follow these billing rules:
|
|
Chinese mainland
In Chinese mainland deployment mode, endpoints and data storage are located in Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Context window | Max input | Max output |
(tokens) | |||
qwen2.5-omni-7b | 32,768 | 30,720 | 2,048 |
Input and output follow these billing rules:
|
|
Qwen3-Omni-Captioner
Qwen3-Omni-Captioner is an open-source model based on Qwen3-Omni. Without any prompts, it automatically generates accurate, comprehensive descriptions for complex audio, ambient sounds, music, film sound effects, and more. It detects speaker emotions, musical elements (such as genre and instruments), sensitive information, and is suitable for audio content analysis, security review, intent recognition, audio editing, and other fields. Usage | API reference
International
Under the international deployment mode, access points and data storage are located in the Singapore region. Model inference computing resources are dynamically scheduled globally (excluding the Chinese Mainland).
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||
qwen3-omni-30b-a3b-captioner | 65,536 | 32,768 | 32,768 | $3.81 | $3.06 | 1 million tokens Valid for 90 days after activating Model Studio |
Chinese Mainland
Under the Chinese Mainland deployment mode, access points and data storage are located in the Beijing region. Model inference computing resources are limited to the Chinese Mainland.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||
qwen3-omni-30b-a3b-captioner | 65,536 | 32,768 | 32,768 | $2.265 | $1.821 | No free quota |
Qwen-VL
Alibaba Cloud's open-source Qwen-VL edition. Usage | API reference
Compared to Qwen2.5-VL, Qwen3-VL delivers major improvements in model capabilities:
Agent interaction: Operates computer or mobile interfaces. Detects GUI elements, understands functions, and calls tools to complete tasks. Achieves top-tier performance on benchmarks such as OS World.
Visual coding: Generates code from images or videos. Converts design mockups and website screenshots into HTML, CSS, and JavaScript code.
Spatial intelligence: Supports 2D and 3D localization, and accurately determines object position, viewpoint changes, and occlusion relationships.
Long-video understanding: Understands videos up to 20 minutes in length and pinpoints moments down to the second.
Deep thinking: Performs deep reasoning. Excels at spotting fine details and analyzing cause-and-effect relationships. Achieves top-tier performance on benchmarks such as MathVista and MMMU.
OCR: Supports 33 languages. Delivers stable performance under challenging conditions such as low light, blur, and skew. Significantly improves accuracy for rare characters, ancient script, and domain-specific terms.
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding the Chinese Mainland.
Model | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output | Free quota |
(tokens) | (per 1M tokens) | |||||||
qwen3-vl-235b-a22b-thinking | Thinking only | 126,976 | 81,920 | $0.4 | $4 | 1 million tokens each Valid for 90 days after activating Model Studio | ||
qwen3-vl-235b-a22b-instruct | Non-thinking only | 129,024 | - | $1.6 | ||||
qwen3-vl-32b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.16 | $0.64 | |
qwen3-vl-32b-instruct | Non-thinking only | 129,024 | - | |||||
qwen3-vl-30b-a3b-thinking | Thinking only | 126,976 | 81,920 | $0.2 | $2.4 | |||
qwen3-vl-30b-a3b-instruct | Non-thinking only | 129,024 | - | $0.8 | ||||
qwen3-vl-8b-thinking | Thinking only | 126,976 | 81,920 | $0.18 | $2.1 | |||
qwen3-vl-8b-instruct | Non-thinking only | 129,024 | - | $0.7 | ||||
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
Model | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output |
(tokens) | (per 1M tokens) | ||||||
qwen3-vl-235b-a22b-thinking | Thinking only | 126,976 | 81,920 | $0.287 | $2.867 | ||
qwen3-vl-235b-a22b-instruct | Non-thinking only | 129,024 | - | $1.147 | |||
qwen3-vl-32b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.16 | $0.64 |
qwen3-vl-32b-instruct | Non-thinking only | 129,024 | - | ||||
qwen3-vl-30b-a3b-thinking | Thinking only | 126,976 | 81,920 | $0.108 | $1.076 | ||
qwen3-vl-30b-a3b-instruct | Non-thinking only | 129,024 | - | $0.431 | |||
qwen3-vl-8b-thinking | Thinking only | 126,976 | 81,920 | $0.072 | $0.717 | ||
qwen3-vl-8b-instruct | Non-thinking only | 129,024 | - | $0.287 | |||
Chinese Mainland
In Chinese Mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese Mainland.
Model | Mode | Context window | Max input | Max CoT | Max output | Input cost | Output cost CoT + output | Free quota |
(tokens) | (per 1M tokens) | |||||||
qwen3-vl-235b-a22b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | $0.287 | $2.867 | No free quota | |
qwen3-vl-235b-a22b-instruct | Non-thinking only | 129,024 | - | $1.147 | ||||
qwen3-vl-32b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.287 | $2.868 | |
qwen3-vl-32b-instruct | Non-thinking only | 129,024 | - | $1.147 | ||||
qwen3-vl-30b-a3b-thinking | Thinking only | 126,976 | 81,920 | $0.108 | $1.076 | |||
qwen3-vl-30b-a3b-instruct | Non-thinking only | 129,024 | - | $0.431 | ||||
qwen3-vl-8b-thinking | Thinking only | 126,976 | 81,920 | $0.072 | $0.717 | |||
qwen3-vl-8b-instruct | Non-thinking only | 129,024 | - | $0.287 | ||||
Qwen-Math
Qwen-Math is a language model built on the Qwen model to solve math problems. Qwen2.5-Math supports Chinese and English and integrates multiple inference methods, such as Chain of Thought (CoT), Program of Thought (PoT), and Tool-Integrated Reasoning (TIR). How to use | API reference | Try it online
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen2.5-math-72b-instruct | 4,096 | 3,072 | 3,072 | $0.574 | $1.721 |
qwen2.5-math-7b-instruct | $0.144 | $0.287 | |||
qwen2.5-math-1.5b-instruct | Free for a limited time | ||||
Qwen-Coder
An open-source code model from Qwen. The latest Qwen3-Coder series delivers strong Coding Agent capabilities. It excels at tool calling and environment interaction. It supports autonomous programming and delivers outstanding coding performance while maintaining broad general-purpose abilities.How to use | API reference
International
In the international deployment mode, endpoints and data storage are in the Singapore region. Model inference compute resources are scheduled dynamically across the globe, excluding the Chinese Mainland.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | ||||||
qwen3-coder-next | 262,144 | 204,800 | 65,536 | Tiered pricing. See details below. | 1 million tokens each Valid for 90 days after activating Model Studio | |
qwen3-coder-480b-a35b-instruct | ||||||
qwen3-coder-30b-a3b-instruct | ||||||
The models above use tiered billing based on the number of input tokens per request.
Model | Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
qwen3-coder-next | 0 < Tokens ≤ 32K | $0.3 | $1.5 |
32K < Tokens ≤ 128K | $0.5 | $2.5 | |
128K < Tokens ≤ 256K | $0.8 | $4 | |
qwen3-coder-480b-a35b-instruct | 0 < Tokens ≤ 32K | $1.5 | $7.5 |
32K < Tokens ≤ 128K | $2.7 | $13.5 | |
128K < Tokens ≤ 200K | $4.5 | $22.5 | |
qwen3-coder-30b-a3b-instruct | 0 < Tokens ≤ 32K | $0.45 | $2.25 |
32K < Tokens ≤ 128K | $0.75 | $3.75 | |
128K < Tokens ≤ 200K | $1.2 | $6 |
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen3-coder-480b-a35b-instruct | 262,144 | 204,800 | 65,536 | Tiered pricing. See details below. | |
qwen3-coder-30b-a3b-instruct | |||||
qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct use tiered billing based on the number of input tokens per request.
Model | Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
qwen3-coder-480b-a35b-instruct | 0 < Tokens ≤32K | $0.861 | $3.441 |
32K < Tokens ≤128K | $1.291 | $5.161 | |
128K < Tokens ≤200K | $2.151 | $8.602 | |
qwen3-coder-30b-a3b-instruct | 0 < Tokens ≤32K | $0.216 | $0.861 |
32K < Tokens ≤128K | $0.323 | $1.291 | |
128K < Tokens ≤200K | $0.538 | $2.151 |
Chinese Mainland
In the Chinese Mainland deployment mode, endpoints and data storage are in the Beijing region. Model inference compute resources are limited to the Chinese Mainland.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen3-coder-next | 262,144 | 204,800 | 65,536 | Tiered pricing. See details below. | |
qwen3-coder-480b-a35b-instruct | |||||
qwen3-coder-30b-a3b-instruct | |||||
The models above use tiered billing based on the number of input tokens per request.
Model | Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
qwen3-coder-next | 0 < Tokens ≤ 32 K | $0.144 | $0.574 |
32 K < Tokens ≤ 128 K | $0.216 | $0.861 | |
128 K < Tokens ≤ 256 K | $0.359 | $1.434 | |
qwen3-coder-480b-a35b-instruct | 0 < Tokens ≤ 32 K | $0.861 | $3.441 |
32 K < Tokens ≤ 128 K | $1.291 | $5.161 | |
128 K < Tokens ≤ 200 K | $2.151 | $8.602 | |
qwen3-coder-30b-a3b-instruct | 0 < Tokens ≤ 32 K | $0.216 | $0.861 |
32 K < Tokens ≤ 128 K | $0.323 | $1.291 | |
128 K < Tokens ≤ 200 K | $0.538 | $2.151 |
EU
In the EU deployment mode, endpoints and data storage are in the Germany (Frankfurt) region. Model inference compute resources are limited to the EU.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | |||||
qwen3-coder-next | 262,144 | 204,800 | 65,536 | Tiered pricing. See details below. | |
The model above uses tiered billing based on the number of input tokens per request.
Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
0 < Tokens ≤ 32K | $0.3 | $1.5 |
32K < Tokens ≤ 128K | $0.5 | $2.5 |
128K < Tokens ≤ 256K | $0.8 | $4 |
Text generation - Third-party models
DeepSeek
DeepSeek is a large language model from DeepSeek AI. API reference | Try it online
International
In International deployment mode, the endpoints and data storage are in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding the Chinese mainland.
Model | Context window | Max input | Max CoT | Max response | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | ||||||
deepseek-v3.2 685B parameter size context cache | 131,072 | 98,304 | 32,768 | 65,536 | $0.57 | $1.71 | 1 million tokens Valid for 90 days after you activate Model Studio |
Chinese mainland
In Chinese mainland deployment mode, the endpoints and data storage are in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Context window | Max input | Max CoT | Max response | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
deepseek-v3.2 685B parameter size context cache batch calls | 131,072 | 98,304 | 32,768 | 65,536 | $0.287 | $0.431 |
deepseek-v3.2-exp 685B parameter size | ||||||
deepseek-v3.1 685B parameter size | $0.574 | $1.721 | ||||
deepseek-r1 685B parameter size batch calls | 16,384 | $2.294 | ||||
deepseek-r1-0528 685B parameter size | ||||||
deepseek-v3 671B parameter size batch calls | 131,072 | N/A | $0.287 | $1.147 | ||
deepseek-r1-distill-qwen-1.5b Based on Qwen2.5-Math-1.5B | 32,768 | 32,768 | 16,384 | 16,384 | Free trial for a limited time | |
deepseek-r1-distill-qwen-7b Based on Qwen2.5-Math-7B | $0.072 | $0.144 | ||||
deepseek-r1-distill-qwen-14b Based on Qwen2.5-14B | $0.144 | $0.431 | ||||
deepseek-r1-distill-qwen-32b Based on Qwen2.5-32B | $0.287 | $0.861 | ||||
deepseek-r1-distill-llama-8b Based on Llama-3.1-8B | Free trial for a limited time | |||||
deepseek-r1-distill-llama-70b Based on Llama-3.3-70B | ||||||
Kimi
Kimi-K2 is a large language model from Moonshot AI. It has excellent capabilities in encoding and tool calling. How to use | Try it online
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Mode | Context window | Max input | Max CoT | Max response | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||||
kimi-k2.5 | Thinking mode | 262,144 | 258,048 | 81,920 | 98,304 | $0.574 | $3.011 |
Non-thinking mode | 262,144 | 260,096 | - | 98,304 | $0.574 | $3.011 | |
kimi-k2-thinking | Thinking mode | 262,144 | 229,376 | 32,768 | 16,384 | $0.574 | $2.294 |
Moonshot-Kimi-K2-Instruct | Non-thinking mode | 131,072 | 131,072 | - | 8,192 | $0.574 | $2.294 |
MiniMax
MiniMax is a large language model from MiniMax. It focuses on complex, real-world tasks. Its core strengths include multilingual programming and agent task processing. How to use
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Context window | Max input | Max CoT + response The thinking_budget parameter is not supported | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
MiniMax-M2.5 | 196,608 | 196,601 | 32,768 | $0.304 | $1.213 |
GLM
The GLM series models are hybrid reasoning models from Zhipu AI designed for agents. They offer both thinking and non-thinking modes. GLM
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Context window | Max input | Max CoT | Max response | Input cost | Output cost |
(tokens) | (per 1M tokens) | |||||
glm-5 | 202,752 | 202,752 | 32,768 | 16,384 | Tiered billing applies. See the table below. | |
glm-4.7 | 169,984 | |||||
glm-4.6 | ||||||
These models use a tiered billing plan based on the number of input tokens per request.
Model | Input tokens per request | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
glm-5 | 0 < Tokens <= 32K | $0.573 | $2.58 |
32K < Tokens <= 198K | $0.86 | $3.154 | |
glm-4.7 | 0 < Tokens <= 32K | $0.431 | $2.007 |
32K < Tokens <= 166K | $0.574 | $2.294 | |
glm-4.6 | 0 < Tokens <= 32K | $0.431 | $2.007 |
32K < Tokens <= 166K | $0.574 | $2.294 |
The models above are not integrated third-party services. They are deployed on Alibaba Cloud Model Studio servers.
The thinking and non-thinking modes for GLM models have the same price.
Image generation
Qwen text-to-image
The Qwen text-to-image model excels at complex text rendering, especially for Chinese and English text. API reference
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Unit price | Free quota |
qwen-image-2.0-pro Currently has the same capabilities as qwen-image-2.0-pro-2026-03-03 | $0.075/image | Free quota for new users: 100 images each Validity: within 90 days after activating Model Studio |
qwen-image-2.0-pro-2026-03-03 | $0.075/image | |
qwen-image-2.0 Currently has the same capabilities as qwen-image-2.0-2026-03-03 | $0.035/image | |
qwen-image-2.0-2026-03-03 | $0.035/image | |
qwen-image-max Currently has the same capabilities as qwen-image-max-2025-12-30 | $0.075/image | |
qwen-image-max-2025-12-30 | $0.075/image | |
qwen-image-plus Currently has the same capabilities as qwen-image | $0.03/image | |
qwen-image-plus-2026-01-09 | $0.03/image | |
qwen-image | $0.035/image |
Chinese mainland
In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Unit price | Free quota |
qwen-image-2.0-pro Currently has the same capabilities as qwen-image-2.0-pro-2026-03-03 | $0.071676/image | No free quota |
qwen-image-2.0-pro-2026-03-03 | $0.071676/image | |
qwen-image-2.0 Currently has the same capabilities as qwen-image-2.0-2026-03-03 | $0.028671/image | |
qwen-image-2.0-2026-03-03 | $0.028671/image | |
qwen-image-max Currently has the same capabilities as qwen-image-max-2025-12-30 | $0.071677/image | |
qwen-image-max-2025-12-30 | $0.071677/image | |
qwen-image-plus Currently has the same capabilities as qwen-image | $0.028671/image | |
qwen-image-plus-2026-01-09 | $0.028671/image | |
qwen-image | $0.035/image |
Input prompt | Output image |
Healing-style hand-drawn poster featuring three puppies playing with a ball on lush green grass, adorned with decorative elements such as birds and stars. The main title “Come Play Ball!” is prominently displayed at the top in bold, blue cartoon font. Below it, the subtitle “Come [Show Off Your Skills]!” appears in green font. A speech bubble adds playful charm with the text: “Hehe, watch me amaze my little friends next!” At the bottom, supplementary text reads: “We get to play ball with our friends again!” The color palette centers on fresh greens and blues, accented with bright pink and yellow tones to highlight a cheerful, childlike atmosphere. |
|
Qwen image editing
The Qwen image editing model supports precise bilingual (Chinese and English) text editing, color grading, detail enhancement, style transfer, object addition or removal, position changes, action modifications, and other operations to enable complex image-and-text editing. API reference
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Unit price | Free quota |
qwen-image-2.0-pro Currently has the same capabilities as qwen-image-2.0-pro-2026-03-03 | $0.075/image | Free quota for new users: 100 images each Validity: within 90 days after activating Model Studio |
qwen-image-2.0-pro-2026-03-03 | $0.075/image | |
qwen-image-2.0 Currently has the same capabilities as qwen-image-2.0-2026-03-03 | $0.035/image | |
qwen-image-2.0-2026-03-03 | $0.035/image | |
qwen-image-edit-max Currently has the same capabilities as qwen-image-edit-max-2026-01-16 | $0.075/image | |
qwen-image-edit-max-2026-01-16 | $0.075/image | |
qwen-image-edit-plus Currently has the same capabilities as qwen-image-edit-plus-2025-10-30 | $0.03/image | |
qwen-image-edit-plus-2025-12-15 | $0.03/image | |
qwen-image-edit-plus-2025-10-30 | $0.03/image | |
qwen-image-edit | $0.045/image |
Chinese mainland
In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Unit price | Free quota |
qwen-image-2.0-pro Currently has the same capabilities as qwen-image-2.0-pro-2026-03-03 | $0.071676/image | No free quota |
qwen-image-2.0-pro-2026-03-03 | $0.071676/image | |
qwen-image-2.0 Currently has the same capabilities as qwen-image-2.0-2026-03-03 | $0.028671/image | |
qwen-image-2.0-2026-03-03 | $0.028671/image | |
qwen-image-edit-max Currently has the same capabilities as qwen-image-edit-max-2026-01-16 | $0.071677/image | |
qwen-image-edit-max-2026-01-16 | $0.071677/image | |
qwen-image-edit-plus Currently has the same capabilities as qwen-image-edit-plus-2025-10-30 | $0.028671/image | |
qwen-image-edit-plus-2025-12-15 | $0.028671/image | |
qwen-image-edit-plus-2025-10-30 | $0.028671/image | |
qwen-image-edit | $0.043/image |
Original image |
Change the person in the image to a standing pose, bending over to hold the dog's front paws |
Original image |
Replace the words 'HEALTH INSURANCE' on the letter blocks with 'Tomorrow will be better' |
Original image |
Replace the polka-dot shirt with a light blue shirt |
Original image |
Change the background in the image to Antarctica |
Original image |
Generate a cartoon profile picture of the person |
Original image |
Remove hair from the plate |
Qwen image translation
The Qwen image translation models supports translating text in images from 11 languages into Chinese or English. It accurately preserves original layout and content information and offers customizable features such as glossary definition, sensitive words filter, and image entity detection. API reference
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Unit price | Free quota |
qwen-mt-image | $0.000431/image | No free quota |
Original image |
Japanese |
Portuguese |
Arabic |
Z-Image
Tongyi - text-to-image - Z-Image is a lightweight model that quickly generates high-quality images. The model supports Chinese and English text rendering, complex semantic understanding, various styles, and multiple resolutions and aspect ratios. API reference
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
z-image-turbo | Prompt extension disabled ( Prompt extension enabled ( | 100 images |
Chinese mainland
In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Unit price | Free quota |
z-image-turbo | Prompt extension disabled ( Prompt extension enabled ( | No free quota |
|
Input prompt |
Output image |
|
Photo of a stylish young woman with short black hair standing confidently in front of a vibrant cartoon-style mural wall. She wears an all-black outfit: a puffed bomber jacket with a ruffled collar, cargo shorts, fishnet tights, and chunky black Doc Martens, with a gold chain dangling from her waist. The background features four colorful comic-style panels: one reads “GRAND STAGE” and includes sneakers and a Gatorade bottle; another displays green Nike sneakers and a slice of pizza; the third reads “HARAJUKU st” with floating shoes; and the fourth shows a blue mouse riding a skateboard with the text “Takeshita WELCOME.” Dominant bright colors include yellow, teal, orange, pink, and green. Speech bubbles, halftone patterns, and playful characters enhance the urban street-art aesthetic. Daylight evenly illuminates the scene, and the ground beneath her feet is white tiled pavement. Full-body portrait, centered composition, slightly tilted stance, direct eye contact with the camera. High detail, sharp focus, dynamic framing. |
|
Wan text-to-image
The Wan text-to-image model generates high-quality images from text. API reference | Try online
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
wan2.6-t2i currently only supports global deployment mode in the US (Virginia) region.
Model | Description | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.6-t2i | Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.028671/image | No free quota |
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Description | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.6-t2i | Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.03/image | 50 images |
wan2.5-t2i-preview | Wan 2.5 preview. Removes single-side length limits and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.03/image | 50 images |
wan2.2-t2i-plus | Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.05/image | 100 images |
wan2.2-t2i-flash | Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.025/image | 100 images |
wan2.1-t2i-plus | Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details. | $0.05/image | 200 images |
wan2.1-t2i-turbo | Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed. | $0.025/image | 200 images |
Chinese mainland
In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Description | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.6-t2i | Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.028671/image | No free quota |
wan2.5-t2i-preview | Wan 2.5 preview. Removes single-side length limits and lets you freely select dimensions within the constraints of total pixel area and aspect ratio. | $0.028671/image | No free quota |
wan2.2-t2i-plus | Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.02007/image | No free quota |
wan2.2-t2i-flash | Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.028671/image | No free quota |
wanx2.1-t2i-plus | Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details. | $0.028671/image | No free quota |
wanx2.1-t2i-turbo | Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed. | $0.020070/image | No free quota |
wanx2.0-t2i-turbo | Wan 2.0 Turbo Edition. Excels at textured portraits and creative designs. It is cost-effective. | $0.005735/image | No free quota |
Input prompt | Output image |
A needle-felted Santa Claus holding a gift, standing next to a white cat, with many colorful presents in the background. The entire scene should be cute, warm, and cozy, with some green plants in the background. |
|
Wan image generation and editing 2.6
The Wan image generation model supports image editing and mixed text-and-image output to meet diverse generation and integration needs. API reference
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
wan2.6-image currently only supports global deployment mode in the US (Virginia) region.
Model | Unit price | Free quota |
wan2.6-image | $0.028671/image | No free quota |
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.6-image | $0.03/image | 50 images |
Chinese mainland
In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Unit price | Free quota |
wan2.6-image | $0.028671/image | No free quota |
Wan general image editing 2.5
The Wan general image editing 2.5 model supports inputting text, a single image, or multiple images to perform subject-consistent image editing and multi-image fusion creation. API reference
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.5-i2i-preview | $0.03/image | 50 units |
Chinese mainland
In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Unit price | Free quota |
wan2.5-i2i-preview | $0.028671/image | No free quota |
Model capabilities | Input example | Output image |
Single-image editing |
|
Replace the floral dress with a vintage-style lace gown featuring delicate embroidery on the collar and cuffs. |
Multi-image fusion |
|
Place the alarm clock from image 1 beside the vase on the dining table in image 2. |
Wan general image editing 2.1
The Wan general image editing model supports diverse image editing tasks using simple instructions. Use it for image outpainting, watermark removal, style transfer, image inpainting, and image enhancement. Usage | API reference
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Billing rate | Free quota |
wanx2.1-imageedit | $0.020070 per image | No free quota |
The general image editing model currently supports the following features:
Model feature | Input image | Input prompt | Output image |
Global stylization |
| Transform into a French picture book style. |
|
Local stylization |
| Turn the house into a wooden plank style. |
|
Instruction-based editing |
| Change the girl's hair to red. |
|
Local redraw | Input image
Masked area image (white indicates the masked area)
| A ceramic rabbit holding a ceramic flower. | Output image
|
Text watermark removal |
| Remove text from the image. |
|
Image outpainting |
| A green fairy. |
|
Image super resolution | Blurred image
| Apply super resolution. | Sharp image
|
Image colorization |
| Blue background with yellow leaves. |
|
Sketch-to-image |
| A Nordic minimalist living room. |
|
Reference image |
| A cartoon character cautiously peeks out, gazing at a brilliant blue gem inside the room. |
|
OutfitAnyone
The OutfitAnyone Plus model improves image definition, clothing texture detail, and logo fidelity compared to the Basic Edition. However, it takes longer to generate results. Use it for scenarios where speed is not critical. API reference | Try it online
OutfitAnyone Image Segmentation splits model images and clothing images. Use it for pre-processing and post-processing of OutfitAnyone images. API reference
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Description | Sample input | Sample output |
aitryon-plus | OutfitAnyone Plus |
|
|
aitryon-parsing-v1 | OutfitAnyone Image Segmentation |
OutfitAnyone billing rate
Model service | Model | Unit price | Discount | Tier |
OutfitAnyone Plus | aitryon-plus | $0.071677 per image | None | None |
OutfitAnyone Image Segmentation | aitryon-parsing-v1 | $0.000574 per image | None | None |
Video generation – Wan
Text-to-video
The Wan text-to-video model generates videos from a single sentence. Videos feature rich artistic styles and cinematic-quality visuals. API reference | Try it now
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
The wan2.6-t2v model supports only global deployment mode in the US (Virginia) region.
Model | Description | Unit price | Free quota |
wan2.6-t2v | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
International
In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Description | Unit price | Free quota (Claim) Valid for 90 days after activating Model Studio |
wan2.6-t2v | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.10/second 1080P: $0.15/second | 50 seconds |
wan2.5-t2v-preview | Wan 2.5 preview. Supports automatic voiceover and custom audio file input. | 480P: $0.05/second 720P: $0.10/second 1080P: $0.15/second | 50 seconds |
wan2.2-t2v-plus | Wan 2.2 Professional Edition. Significantly improved image detail and motion stability. | 480P: $0.02/second 1080P: $0.10/second | 50 seconds |
wan2.1-t2v-turbo | Wan 2.1 Turbo Edition. Fast generation speed and balanced performance. | $0.036/second | 200 seconds |
wan2.1-t2v-plus | Wan 2.1 Professional Edition. Generates rich details and higher-quality visuals. | $0.10/second | 200 seconds |
US
In US deployment mode, both the access point and data storage are located in the US (Virginia) region. Model inference compute resources are limited to the United States.
Model | Description | Unit price | Free quota |
wan2.6-t2v-us | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.1/second 1080P: $0.15/second | No free quota |
Chinese mainland
In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Description | Unit price | Free quota |
wan2.6-t2v | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.086012/second 1080p: 0.143353 per second | No free quota |
wan2.5-t2v-preview | Wan 2.5 preview. Supports automatic voiceover and custom audio file input. | 480P: $0.043006/second 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
wan2.2-t2v-plus | Wan 2.2 Professional Edition. Significantly improved image detail and motion stability. | 480P: $0.02007/second 1080P: $0.100347/second | No free quota |
wanx2.1-t2v-turbo | Faster generation speed and balanced performance. | $0.034405/second | No free quota |
wanx2.1-t2v-plus | Generates richer details and higher-quality visuals. | $0.100347/second | No free quota |
Input prompt | Output video (wan2.6, multi-shot video) |
Shot from a low angle, in a medium close-up, with warm tones, mixed lighting (the practical light from the desk lamp blends with the overcast light from the window), side lighting, and a central composition. In a classic detective office, wooden bookshelves are filled with old case files and ashtrays. A green desk lamp illuminates a case file spread out in the center of the desk. A fox, wearing a dark brown trench coat and a light gray fedora, sits in a leather chair, its fur crimson, its tail resting lightly on the edge, its fingers slowly turning yellowed pages. Outside, a steady drizzle falls beneath a blue sky, streaking the glass with meandering streaks. It slowly raises its head, its ears twitching slightly, its amber eyes gazing directly at the camera, its mouth clearly moving as it speaks in a smooth, cynical voice: 'The case was cold, colder than a fish in winter. But every chicken has its secrets, and I, for one, intended to find them'. |
Image-to-video – first frame
The Wan image-to-video model uses your input image as the first frame, then generates a video based on your prompt. Videos feature rich artistic styles and cinematic-quality visuals. API reference | Try it now
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
The wan2.6-i2v model supports only global deployment mode in the US (Virginia) region.
Model | Description | Unit price | Free quota |
wan2.6-i2v | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
International
In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Description | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.6-i2v-flash | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | Output video with audio
Output video without audio
| 50 seconds |
wan2.6-i2v | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.10/second 1080P: $0.15/second | 50 seconds |
wan2.5-i2v-preview | Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads. | 480P: $0.05/second 720P: $0.10/second 1080P: $0.15/second | 50 seconds |
wan2.2-i2v-flash | Wan 2.2 Flash Edition. Extremely fast generation speed with significant improvements in visual detail and motion stability. | 480P: $0.015/second 720P: $0.036/second | 50 seconds |
wan2.2-i2v-plus | Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability. | 480P: $0.02/second 1080P: $0.10/second | 50 seconds |
wan2.1-i2v-turbo | Wan 2.1 Turbo Edition. Fast generation speed with balanced performance. | $0.036/second | 200 seconds |
wan2.1-i2v-plus | Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals. | $0.10/second | 200 seconds |
US
In US deployment mode, both the access point and data storage are located in the US (Virginia) region. Model inference compute resources are limited to the United States.
Model | Description | Unit price | Free quota |
wan2.6-i2v-us | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.1/second 1080P: $0.15/second | No free quota |
Chinese mainland
In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Description | Unit price | Free quota |
wan2.6-i2v-flash | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | Output video with audio
Output video without audio
| No free quota |
wan2.6-i2v | Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input. | 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
wan2.5-i2v-preview | Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads. | 480P: $0.043006/second 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
wan2.2-i2v-plus | Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability. | 480P: $0.02007/second 1080P: $0.100347/second | No free quota |
wanx2.1-i2v-turbo | Wan 2.1 Turbo Edition. Fast generation speed with balanced performance. | $0.034405/second | No free quota |
wanx2.1-i2v-plus | Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals. | $0.100347/second | No free quota |
Input prompt | Input first-frame image and audio | Output video (wan2.6, multi-shot video) |
A scene of urban fantasy art. A dynamic graffiti-style character. A boy painted with spray paint comes alive from a concrete wall. He raps in English at high speed while striking a classic, energetic rapper pose. The setting is under a railway bridge in an urban area at night. Lighting comes from a single streetlamp, creating a cinematic atmosphere full of high energy and stunning detail. The video's audio consists entirely of his rap, with no other dialogue or noise. |
Input audio: |
Image-to-video – first and last frame
The Wan first-and-last-frame image-to-video model generates smooth, fluid videos using just two images—the first and last frames—plus your prompt. Videos feature rich artistic styles and cinematic-quality visuals. API reference | Try it now
International
In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Unit price | Free quota (Note) Valid for 90 days after activating Model Studio |
wan2.2-kf2v-flash | 480P: $0.015/second 720P: $0.036/second 1080P: $0.07/second | 50 seconds |
wan2.1-kf2v-plus | $0.10/second | 200 seconds |
Chinese mainland
In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Unit price | Free quota (Note) |
wan2.2-kf2v-flash | 480P: $0.014335/second 720P: $0.028671/second 1080P: $0.068809/second | No free quota |
wanx2.1-kf2v-plus | $0.100347/second | No free quota |
Example input | Output video | ||
First-frame image | Last-frame image | Prompt | |
|
| Realistic style. A black kitten curiously looks up at the sky. The camera starts level, rises gradually, and ends with a top-down view of the kitten’s curious expression. | |
Reference-to-video
The Wan reference-to-video model lets you generate performance videos using characters and voices from reference videos or images. API reference
Billing rule: Both input and output videos are billed by video duration in seconds. Failed requests are not billed and do not consume your free quota.
Input video duration is capped at 5 seconds. See Wan reference-to-video for details.
Output video duration equals the duration of successfully generated video.
Global
In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.
The wan2.6-r2v model supports only global deployment mode in the US (Virginia) region.
Model | Output video type | Input & output price | Free quota (Note) |
wan2.6-r2v | Video with audio | 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
International
In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Output video type | Input & output price | Free quota (Note) |
wan2.6-r2v-flash | Video with audio
| 720P: $0.05/second 1080P: $0.075/second | 50 seconds Valid for 90 days after activating Model Studio |
Video without audio
| 720P: $0.025/second 1080P: $0.0375/second | ||
wan2.6-r2v | Video with audio | 720P: $0.10/second 1080P: $0.15/second | 50 seconds Valid for 90 days after activating Model Studio |
Chinese mainland
In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Output video type | Input & output price | Free quota (Note) |
wan2.6-r2v-flash | Video with audio
| 720P: $0.043006/second 1080P: $0.071676/second | No free quota |
Video without audio
| 720P: $0.021503/second 1080P: $0.035838/second | ||
wan2.6-r2v | Video with audio | 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
General video editing
The Wan general video editing unified model accepts multimodal inputs—including text, images, and videos—and performs both video generation and general editing tasks. API reference | Try it now
International
In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Unit price | Free quota (Note) |
wan2.1-vace-plus | $0.1/second | 50 seconds Valid for 90 days after activating Model Studio |
Chinese mainland
In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Unit price | Free quota (Note) |
wanx2.1-vace-plus | $0.100347/second | No free quota |
The unified video editing model supports these features:
Feature | Input reference image | Input prompt | Output video |
Multi-image reference | Reference image 1 (for entity)
Reference image 2 (for background)
| In the video, a girl gracefully walks out from the depths of an ancient, misty forest. Her steps are light, and the camera captures her every nimble movement. When the girl stops and looks around at the lush woods, she breaks into a smile of surprise and joy. This moment is captured in the interplay of light and shadow, recording the wonderful encounter between the girl and nature. | Output video |
Video restyling | The video shows a black steampunk-style car driven by a gentleman, adorned with gears and copper pipes. The background is a steam-powered candy factory with retro elements, creating a vintage and playful scene. | ||
Local editing | Input video Input mask image (The white area indicates the editing region)
| The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, drinking with a look of contentment. The cafe is tastefully decorated, with soft tones and warm lighting illuminating the area where the lion is. | The content in the editing region is modified based on the prompt |
Video extension | Input initial video segment (1 second) | A dog wearing sunglasses skateboards on a street, 3D cartoon. | Output extended video (5 seconds) |
Video outpainting | An elegant lady is passionately playing the violin, with a full symphony orchestra behind her. |
Wan – digital human
Generate natural talking, singing, or performing videos from a single portrait image and audio. Call the following models in order. wan2.2-s2v image detection | wan2.2-s2v video generation
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Model description | Unit price |
wan2.2-s2v-detect | Checks whether the input image meets requirements (such as clarity, single person, front-facing). | $0.000574/image |
wan2.2-s2v | Generates a dynamic portrait video from a validated image and an audio clip. | 480P: $0.071677/second 720P: $0.129018/second |
Example input | Output video |
Input audio: |
Wan – image-to-action
Offers standard and professional service modes. Uses a portrait image and reference video to transfer the video subject’s actions and expressions to the portrait image, generating a dynamic action video. API reference
International
In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Model service | Service description | Billing unit price | Free quota(View) |
wan2.2-animate-move | Standard mode | Fast generation speed. Meets light needs such as basic animation demos. High cost-effectiveness. | $0.12/second | 50 seconds total for both modes |
Professional mode | High animation smoothness. Natural transitions between actions and expressions. Results closely resemble real filming. | $0.18/second |
Chinese mainland
In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Model service | Service description | Billing unit price | Free quota(View) |
wan2.2-animate-move | Standard mode | Fast generation speed. Meets light needs such as basic animation demos. High cost-effectiveness. | $0.06/second | No free quota |
Professional mode | High animation smoothness. Natural transitions between actions and expressions. Results closely resemble real filming. | $0.09/second |
Portrait image | Reference video | Output video (standard mode) | Output video (professional mode) |
|
Wan – video character swap
Offers standard and professional service modes. Uses a portrait image and reference video to replace the main subject in the video with the portrait image, while preserving the original video’s scene, lighting, and hue. API reference
International
In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Model service | Service description | Billing unit price | Free quota(View) |
wan2.2-animate-mix | Standard mode | Fast generation speed. Meets light needs such as basic animation demos. High cost-effectiveness. | $0.18/second | 50 seconds total for both services |
Professional mode | High animation smoothness. Natural transitions between actions and expressions. Results closely resemble real filming. | $0.26/second |
Chinese mainland
In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Model service | Service description | Billing unit price | Free quota(View) |
wan2.2-animate-mix | Standard mode | Fast generation speed. Meets light needs such as basic animation demos. High cost-effectiveness. | $0.09/second | No free quota |
Professional mode | High animation smoothness. Natural transitions between actions and expressions. Results closely resemble real filming. | $0.13/second |
Portrait image | Reference video | Output video (standard mode) | Output video (professional mode) |
|
AnimateAnyone
Generate action videos from a portrait image and action templates. Call the following three models in order. AnimateAnyone image detection API details | AnimateAnyone action template generation | AnimateAnyone video generation API details
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Description | Unit price |
animate-anyone-detect-gen2 | Checks whether the input image meets requirements | $0.000574/image |
animate-anyone-template-gen2 | Extracts human motion from a motion video and generates an action template | $0.011469/second |
animate-anyone-gen2 | Generates an action video from a portrait image and an action template |
Input: Portrait image | Input: Action video | Output (generated against image background) | Output (generated against video background) |
|
The examples above were generated by an app that integrates AnimateAnyone.
AnimateAnyone generates only video frames—not audio.
EMO
Generate dynamic portrait videos from a portrait image and a human voice audio file. Call the following models in order. EMO image detection | EMO video generation
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Description | Unit price |
emo-detect-v1 | Checks whether the input image meets requirements. No deployment needed. Call directly. | $0.000574/image |
emo-v1 | Generates dynamic portrait videos. No deployment needed. Call directly. |
|
Input: Portrait image + human voice audio file | Output: Dynamic portrait video |
Portrait:
Human voice audio: See video on the right | Portrait video: Animation style intensity: Active ("style_level": "active") |
LivePortrait
Quickly and efficiently generate dynamic portrait videos from a portrait image and a human voice audio file. Compared to EMO, LivePortrait offers faster generation and lower cost—but slightly lower quality. Call the following two models in order. LivePortrait image detection | LivePortrait video generation
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Description | Unit price |
liveportrait-detect | Checks whether the input image meets requirements | $0.000574/image |
liveportrait | Generates dynamic portrait videos | $0.002868/second |
Input: Portrait image + human voice audio file | Output: Dynamic portrait video |
Portrait:
Human voice audio: See video on the right | Portrait video: |
Emoji
Generate dynamic facial videos from a face image and preset facial motion templates. Use cases include emoji creation and video asset generation. Call the following models in order. Emoji image detection | Emoji video generation
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Description | Unit price |
emoji-detect-v1 | Checks whether the input image meets requirements | $0.000574/image |
emoji-v1 | Generates matching facial expressions from a portrait image and a specified emoji template | $0.011469/second |
Input: Portrait image | Output: Dynamic portrait video |
| Template sequence for “happy” expression: ("input.driven_id": "mengwa_kaixin") |
VideoRetalk
Generate new videos where the subject’s lip movements match the input audio. Call the following model. API reference
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Description | Unit price |
videoretalk | Generates a new video where the subject’s lip movements match the input audio | $0.011469/second |
Video style transfer
Generate videos in different styles based on text input—or apply style transfer to input videos. API reference
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Description | Unit price | |
video-style-transform | Converts input videos into Japanese manga, American comic, or other styles | 720P | $0.071677/second |
540P | $0.028671/second | ||
Input video | Output video (Japanese manga style) |
Speech synthesis (text-to-speech)
Qwen speech synthesis
Supports mixed-language text input and streaming audio output. Usage | API reference
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Chinese Mainland).
Qwen3-TTS-Instruct-Flash
Model | Version | Unit price | Max input characters | Free quota (Note) |
qwen3-tts-instruct-flash Currently, qwen3-tts-instruct-flash-2026-01-26. | Stable | $0.115/10K characters | 600 | 10,000 characters Valid for 90 days after activating Model Studio |
qwen3-tts-instruct-flash-2026-01-26 | Snapshot |
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen3-TTS-VD
Model | Version | Unit price | Max input characters | Free quota (Note) |
qwen3-tts-vd-2026-01-26 | Snapshot | $0.115 per 10,000 characters | 600 | 10,000 characters Valid for 90 days after activating Model Studio |
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen3-TTS-VC
Model | Version | Unit price | Max input characters | Free quota (Note) |
qwen3-tts-vc-2026-01-22 | Snapshot | $0.115/10K characters | 600 | 10,000 characters Valid for 90 days after activating Model Studio. |
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen3-TTS-Flash
Model | Version | Unit price | Max input characters | Free quota (Note) |
qwen3-tts-flash Currently, qwen3-tts-flash-2025-11-27. | Stable | $0.10 per 10,000 characters | 600 | 10,000 characters Valid for 90 days after activating Model Studio |
qwen3-tts-flash-2025-11-27 | Snapshot | |||
qwen3-tts-flash-2025-09-18 | Snapshot | If you activate Alibaba Cloud Model Studio before 00:00 on November 13, 2025: 2,000 characters If you activate Alibaba Cloud Model Studio after 00:00 on November 13, 2025: 10,000 characters Valid for 90 days after activating Model Studio. |
Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.
Qwen3-TTS-Instruct-Flash
Model | Version | Unit price | Max input characters | Free quota (Note) |
qwen3-tts-instruct-flash Currently, qwen3-tts-instruct-flash-2026-01-26. | Stable | $0.115/10K characters | 600 | No free quota is available. |
qwen3-tts-instruct-flash-2026-01-26 | Snapshot |
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen3-TTS-VD
Model | Version | Unit price | Max input characters | Free quota (Note) |
qwen3-tts-vd-2026-01-26 | Snapshot | $0.115/10K characters | 600 | No free quota is available. |
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen3-TTS-VC
Model | Version | Unit price | Max input characters | Free quota (Note) |
qwen3-tts-vc-2026-01-22 | Snapshot | $0.115/10K characters | 600 | No free quota is available. |
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen3-TTS-Flash
Model | Version | Unit price | Max input characters | Free quota (Note) |
qwen3-tts-flash Currently, qwen3-tts-flash-2025-11-27. | Stable | $0.114682 per 10,000 characters | 600 | No free quota is available. |
qwen3-tts-flash-2025-11-27 | Snapshot | |||
qwen3-tts-flash-2025-09-18 | Snapshot |
Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen-TTS
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota (Note) |
(tokens) | (Per 1,000 tokens) | ||||||
qwen-tts Provides the same capabilities as qwen-tts-2025-04-10. | Stable | 8,192 | 512 | 7,680 | $0.230 | $1.434 | No free quota is available. |
qwen-tts-latest Provides the same capabilities as the latest snapshot. | Latest | ||||||
qwen-tts-2025-05-22 | Snapshot | ||||||
qwen-tts-2025-04-10 | |||||||
Audio-to-token conversion rule: Each second of audio corresponds to 50 tokens. Audio shorter than 1 second is calculated as 50 tokens.
Qwen real-time text-to-speech
Supports streaming text input and streaming audio output. It can automatically adjust the speech rate based on the text content and punctuation. Usage | API reference
Qwen3-TTS-Instruct-Flash-Realtime supports Qwen real-time speech synthesis and can only use the default voice. It does not support cloned or designed voices.
Qwen3-TTS-VD-Realtime supports using voices from Qwen voice design for real-time speech synthesis, but does not support the default voice.
Qwen3-TTS-VC-Realtime supports using voices from Qwen voice cloning for real-time speech synthesis, but does not support the default voice.
Qwen3-TTS-Flash-Realtime and Qwen-TTS-Realtime can only use the default voice. They do not support cloned or designed voices.
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Chinese Mainland).
Qwen3-TTS-Instruct-Flash-Realtime
Model | Version | Unit price | Free quota (Note) |
qwen3-tts-instruct-flash-realtime Currently, qwen3-tts-instruct-flash-realtime-2026-01-22. | Stable | $0.143/10K characters | 10,000 characters Valid for 90 days after activating Model Studio. |
qwen3-tts-instruct-flash-realtime-2026-01-22 | Snapshot |
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen3-TTS-VD-Realtime
Model | Version | Unit price | Free quota (Note) |
qwen3-tts-vd-realtime-2026-01-15 | Snapshot | $0.143353 per 10,000 characters | 10,000 characters Valid for 90 days after activating Model Studio |
qwen3-tts-vd-realtime-2025-12-16 | Snapshot |
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen3-TTS-VC-Realtime
Model | Version | Unit price | Free quota(Note) |
qwen3-tts-vc-realtime-2026-01-15 | Snapshot | $0.13/10K characters | 10,000 characters Valid for 90 days after activating Model Studio. |
qwen3-tts-vc-realtime-2025-11-27 | Snapshot |
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen3-TTS-Flash-Realtime
Model | Version | Unit price | Free quota (Note) |
qwen3-tts-flash-realtime Currently, qwen3-tts-flash-realtime-2025-11-27. | Stable | $0.13 per 10,000 characters | 10,000 characters Valid for 90 days after activating Model Studio |
qwen3-tts-flash-realtime-2025-11-27 | Snapshot | ||
qwen3-tts-flash-realtime-2025-09-18 | Snapshot | If you activate Alibaba Cloud Model Studio before 00:00 on November 13, 2025: 2,000 characters If you activate Alibaba Cloud Model Studio after 00:00 on November 13, 2025: 10,000 characters Valid for 90 days after activating Model Studio |
Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.
Qwen3-TTS-Instruct-Flash-Realtime
Model | Version | Unit price | Free quota (Note) |
qwen3-tts-instruct-flash-realtime Current capabilities match qwen3-tts-instruct-flash-realtime-2026-01-22. | Stable | $0.143 per 10,000 characters | No free quota |
qwen3-tts-instruct-flash-realtime-2026-01-22 | Snapshot |
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen3-TTS-VD-Realtime
Model | Version | Unit price | Free quota (Note) |
qwen3-tts-vd-realtime-2026-01-15 | Snapshot | $0.143353 per 10,000 characters | No free quota |
qwen3-tts-vd-realtime-2025-12-16 | Snapshot |
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen3-TTS-VC-Realtime
Model | Version | Unit price | Free quota (Note) |
qwen3-tts-vc-realtime-2026-01-15 | Snapshot | $0.143353 per 10,000 characters | No free quota is available. |
qwen3-tts-vc-realtime-2025-11-27 | Snapshot |
Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen3-TTS-Flash-Realtime
Model | Version | Unit price | Free quota (Note) |
qwen3-tts-flash-realtime Currently, qwen3-tts-flash-realtime-2025-11-27. | Stable | $0.143353 per 10,000 characters | No free quota is available. |
qwen3-tts-flash-realtime-2025-11-27 | Snapshot | ||
qwen3-tts-flash-realtime-2025-09-18 | Snapshot |
Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
Other characters, such as an English letter, a punctuation mark, or a space = 1 character
Qwen-TTS-Realtime
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Supported languages | Free quota (Note) |
(tokens) | (Per 1,000 tokens) | |||||||
qwen-tts-realtime Currently, qwen-tts-realtime-2025-07-15. | Stable | 8,192 | 512 | 7,680 | $0.345 | $1.721 | Chinese, English | No free quota is available. |
qwen-tts-realtime-latest Currently, qwen-tts-realtime-2025-07-15. | Latest | Chinese, English | ||||||
qwen-tts-realtime-2025-07-15 | Snapshot | Chinese, English | ||||||
Audio-to-token conversion rule: Each second of audio corresponds to 50 tokens. Audio shorter than 1 second is calculated as 50 tokens.
Qwen voice cloning
Voice cloning uses a large model for feature extraction, allowing you to clone voices without training. Provide 10 to 20 seconds of audio to generate a highly similar and natural-sounding custom voice. Usage | API reference
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Chinese Mainland).
Model | Unit price | Free quota (Note) |
qwen-voice-enrollment | $0.01 per voice | 1,000 voices Valid for 90 days after activating Model Studio. |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.
Model | Unit price | Free quota (Note) |
qwen-voice-enrollment | $0.01 per sound | No free quota is available. |
Qwen voice design
Voice design generates custom voices from text descriptions. It supports multi-language and multi-dimensional voice feature definitions, making it suitable for applications such as ad dubbing, character creation, and audio content production. Usage | API reference
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Chinese Mainland).
Model | Unit price | Free quota (Note) |
qwen-voice-design | $0.2 per voice | 10 voices Valid for 90 days after activating Model Studio. |
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.
Model | Unit price | Free quota (Note) |
qwen-voice-design | $0.20 per voice | No free quota is available. |
CosyVoice speech synthesis
CosyVoice is a next-generation generative speech synthesis model from Alibaba Cloud. It deeply integrates text understanding and speech generation based on a large-scale pre-trained language model and supports real-time streaming text-to-speech synthesis. Usage | API reference
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Chinese Mainland).
Model | Unit price | Free quota (Note) |
cosyvoice-v3-plus | $0.26/10K characters | 10,000 characters Valid for 90 days after activating Model Studio. |
cosyvoice-v3-flash | $0.13/10K characters |
Character calculation rules: Chinese characters (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) are counted as 2 characters. All other characters (such as letters, numbers, and Japanese/Korean syllabaries) are counted as 1 character. SSML tag content is not billed.
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.
Model | Unit price | Free quota (Note) |
cosyvoice-v3.5-plus | $0.22/10K characters | No free quota |
cosyvoice-v3.5-flash | $0.116/10K characters | |
cosyvoice-v3-plus | $0.286706/10K characters | |
cosyvoice-v3-flash | $0.14335/10K characters | |
cosyvoice-v2 | $0.286706/10K characters |
Character calculation rules: Chinese characters (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) are counted as 2 characters. All other characters (such as letters, numbers, and Japanese/Korean syllabaries) are counted as 1 character. SSML tag content is not billed.
Speech recognition (speech-to-text) and translation (speech-to-target-language text)
Qwen3-LiveTranslate-Flash
Qwen3-LiveTranslate-Flash is an audio and video translation model based on the Qwen3-Omni architecture. It supports translation between 18 languages, including Chinese, English, Russian, and French. The model can use visual context to improve translation accuracy and outputs both text and speech. Usage | API reference
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Version | Context window | Max input | Max output | Free quota (Note) |
(tokens) | |||||
qwen3-livetranslate-flash Currently, qwen3-livetranslate-flash-2025-12-01. | Stable | 53,248 | 49,152 | 4,096 | 1 million tokens each Valid for 90 days after activating Model Studio |
qwen3-livetranslate-flash-2025-12-01 | Snapshot | ||||
The billing rules for input and output are as follows:
|
|
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.
Model | Version | Context window | Max input | Max output | Free quota (Note) |
(tokens) | |||||
qwen3-livetranslate-flash Currently, qwen3-livetranslate-flash-2025-12-01. | Stable | 53,248 | 49,152 | 4,096 | No free quota is available. |
qwen3-livetranslate-flash-2025-12-01 | Snapshot | ||||
The billing rules for input and output are as follows:
|
|
Qwen3-LiveTranslate-Flash-Realtime
Qwen3-LiveTranslate-Flash-Realtime is a multilingual, real-time audio and video translation model. It can recognize 18 languages and translate them into audio in 10 languages in real time.
Core features:
Multi-language support: Supports 18 languages, such as Chinese, English, French, German, Russian, Japanese, and Korean, and 6 Chinese dialects, including Mandarin, Cantonese, and Sichuanese.
Visual enhancement: Uses visual content to improve translation accuracy. The model analyzes lip movements, actions, and on-screen text to improve translation in noisy environments or for words with multiple meanings.
Low latency: Achieves simultaneous interpretation latency as low as 3 seconds.
High-quality simultaneous interpretation: Addresses cross-language word order issues using semantic unit prediction technology. The real-time translation quality is comparable to offline translation results.
Natural voice: Generates natural-sounding, human-like speech. The model adapts its tone and emotion based on the source speech content.
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Version | Context window | Max input | Max output | Free quota |
(tokens) | |||||
qwen3-livetranslate-flash-realtime Currently, qwen3-livetranslate-flash-realtime-2025-09-22. | Stable | 53,248 | 49,152 | 4,096 | 1 million tokens Valid for 90 days after activating Model Studio. |
qwen3-livetranslate-flash-realtime-2025-09-22 | Snapshot | ||||
After the free quota is used up, the billing rules for input and output are as follows:
|
|
Token calculation rules:
Audio: Each second of audio input or output consumes 12.5 tokens.
Image: Each 28×28 pixel input consumes 0.5 tokens.
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.
Model | Version | Context window | Max input | Max output | Free quota (Note) |
(tokens) | |||||
qwen3-livetranslate-flash-realtime Currently, qwen3-livetranslate-flash-realtime-2025-09-22. | Stable | 53,248 | 49,152 | 4,096 | No free quota is available. |
qwen3-livetranslate-flash-realtime-2025-09-22 | Snapshot | ||||
The billing rules for input and output are as follows:
|
|
Token calculation rules:
Audio: Each second of audio input or output consumes 12.5 tokens.
Image: Each 28×28 pixel input consumes 0.5 tokens.
Qwen audio file recognition
Based on the Qwen multimodal foundation model, this model supports features such as multi-language recognition, singing recognition, and noise rejection. Usage | API reference
International
In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Qwen3-ASR-Flash-Filetrans
Model | Version | Unit price | Free quota (Note) |
qwen3-asr-flash-filetrans Currently, qwen3-asr-flash-filetrans-2025-11-17. | Stable | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days after activating Model Studio. |
qwen3-asr-flash-filetrans-2025-11-17 | Snapshot |
Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Supported sample rates: Any
Qwen3-ASR-Flash
Model | Version | Unit price | Free quota (Note) |
qwen3-asr-flash Currently, qwen3-asr-flash-2025-09-08. | Stable | $0.000035 per second | 36,000 seconds (10 hours) Valid for 90 days after activating Model Studio. |
qwen3-asr-flash-2026-02-10 | Snapshot | ||
qwen3-asr-flash-2025-09-08 | Snapshot |
Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Supported sample rates: Any
US
In the US deployment mode, the endpoints and data storage are located in the US (Virginia) region. Model inference compute resources are limited to the US.
Model | Version | Unit price | Free quota (Note) |
qwen3-asr-flash-us Currently, qwen3-asr-flash-2025-09-08-us. | Stable | $0.000035/second | No free quota is available. |
qwen3-asr-flash-2025-09-08-us | Snapshot |
Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Supported sample rates: Any
Chinese Mainland
In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.
Qwen3-ASR-Flash-Filetrans
Model | Version | Unit price | Free quota (Note) |
qwen3-asr-flash-filetrans It offers the same capabilities as qwen3-asr-flash-filetrans-2025-11-17. | Stable | $0.000032/second | No free quota is available. |
qwen3-asr-flash-filetrans-2025-11-17 | Snapshot |
Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Supported sample rates: Any
Qwen3-ASR-Flash
Model | Version | Unit price | Free quota (Note) |
qwen3-asr-flash Currently, qwen3-asr-flash-2025-09-08. | Stable | $0.000032/second | No free quota is available. |
qwen3-asr-flash-2026-02-10 | Snapshot | ||
qwen3-asr-flash-2025-09-08 | Snapshot |
Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Supported sample rates: Any
Qwen real-time speech recognition
Qwen Real-Time Speech Recognition is a model with automatic language detection. It supports 11 languages and delivers accurate transcription even in complex audio environments. How to use | API reference
International
In international deployment mode, endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled across global regions, excluding Chinese Mainland.
Model | Version | Unit price | Free quota (Note) |
qwen3-asr-flash-realtime Currently, qwen3-asr-flash-realtime-2025-10-27 | Stable | $0.00009/second | 36,000 seconds (10 hours) Valid for 90 days after activating Model Studio. |
qwen3-asr-flash-realtime-2026-02-10 | Snapshot | ||
qwen3-asr-flash-realtime-2025-10-27 | Snapshot |
Languages supported: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Türkçe, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Sample rates supported: 8 kHz, 16 kHz
Chinese Mainland
In Chinese Mainland deployment mode, endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland only.
Model | Version | Unit price | Free quota (Note) |
qwen3-asr-flash-realtime Currently, qwen3-asr-flash-realtime-2025-10-27 | Stable | $0.000047/second | No free quota |
qwen3-asr-flash-realtime-2026-02-10 | Snapshot | ||
qwen3-asr-flash-realtime-2025-10-27 | Snapshot |
Languages supported: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Türkçe, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Sample rates supported: 8 kHz, 16 kHz
Paraformer ASR
Paraformer speech recognition offers two versions: recorded file recognition and real-time speech recognition.
Recorded file recognition
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Unit price | Free quota (Note) |
paraformer-v2 | $0.000012/second | No free quota |
paraformer-8k-v2 |
Languages supported:
paraformer-v2: Chinese (Mandarin, Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghai), English, Japanese, Korean, German, French, Russian
paraformer-8k-v2: Mandarin Chinese
Sample rates supported:
paraformer-v2: Any
paraformer-8k-v2: 8 kHz
Audio formats supported: AAC, AMR, AVI, FLAC, FLV, M4A, MKV, MOV, MP3, MP4, MPEG, OGG, OPUS, WAV, WEBM, WMA, WMV
Real-time speech recognition
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Unit price | Free quota (Note) |
paraformer-realtime-v2 | $0.000035/second | No free quota |
paraformer-realtime-8k-v2 |
Languages supported:
paraformer-realtime-v2: Chinese (Mandarin, Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghai), English, Japanese, Korean, German, French, Russian
paraformer-realtime-8k-v2: Mandarin Chinese
Sample rates supported:
paraformer-realtime-v2: Any
paraformer-realtime-8k-v2: 8 kHz
Audio formats supported: PCM, WAV, MP3, OPUS, SPEEX, AAC, AMR
Fun-ASR speech recognition
Fun-ASR speech recognition offers two versions: audio file recognition and real-time speech recognition.
Audio file recognition
International
In the international deployment mode, endpoints and data storage are in the Singapore region. Model inference compute resources are dynamically scheduled globally, excluding Chinese Mainland.
Model | Version | Unit price | Free quota (Note) |
fun-asr Currently, fun-asr-2025-11-07 | Stable | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
fun-asr-2025-11-07 Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy | Snapshot | ||
fun-asr-2025-08-25 | |||
fun-asr-mtl Currently, fun-asr-mtl-2025-08-25 | Stable | ||
fun-asr-mtl-2025-08-25 | Snapshot |
Languages supported:
fun-asr and fun-asr-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
fun-asr-2025-08-25: Mandarin and English.
fun-asr-mtl and fun-asr-mtl-2025-08-25: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
Sample rates supported: Any
Audio formats supported: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv
Chinese Mainland
In the Chinese Mainland deployment mode, endpoints and data storage are in the Beijing region. Model inference compute resources are limited to Chinese Mainland.
Model | Version | Unit price | Free quota (Note) |
fun-asr Currently, fun-asr-2025-11-07 | Stable | $0.000032 / second | No free quota |
fun-asr-2025-11-07 Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy | Snapshot | ||
fun-asr-2025-08-25 | |||
fun-asr-mtl Currently, fun-asr-mtl-2025-08-25 | Stable | ||
fun-asr-mtl-2025-08-25 | Snapshot |
Languages supported:
fun-asr and fun-asr-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
fun-asr-2025-08-25: Mandarin and English.
fun-asr-mtl and fun-asr-mtl-2025-08-25: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
Sample rates supported: Any
Audio formats supported: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv
Real-time speech recognition
International
In the international deployment mode, endpoints and data storage are in the Singapore region. Model inference compute resources are dynamically scheduled globally, excluding Chinese Mainland.
Model | Version | Unit price | Free quota (Note) |
fun-asr-realtime Currently, fun-asr-realtime-2025-11-07 | Stable | $0.00009/second | 36,000 seconds (10 hours) Valid for 90 days |
fun-asr-realtime-2025-11-07 | Snapshot |
Languages supported: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
Sample rates supported: 16 kHz
Audio formats supported: pcm, wav, mp3, opus, speex, aac, amr
Chinese Mainland
In the Chinese Mainland deployment mode, endpoints and data storage are in the Beijing region. Model inference compute resources are limited to Chinese Mainland.
Model | Version | Unit price | Free quota (Note) |
fun-asr-realtime Currently, fun-asr-realtime-2025-11-07 | Stable | $0.000047/second | No free quota |
fun-asr-realtime-2026-02-28 | Snapshot | ||
fun-asr-realtime-2025-11-07 | Snapshot | ||
fun-asr-realtime-2025-09-15 | |||
fun-asr-flash-8k-realtime Currently, fun-asr-flash-8k-realtime-2026-01-28 | Stable | $0.000032/second | |
fun-asr-flash-8k-realtime-2026-01-28 | Snapshot |
Languages supported:
fun-asr-realtime, fun-asr-realtime-2026-02-28, fun-asr-realtime-2025-11-07: Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia), English, and Japanese.
fun-asr-realtime-2025-09-15: Chinese (Mandarin), English
Sample rates supported: 16 kHz
Sample rates supported:
fun-asr-flash-8k-realtime and fun-asr-flash-8k-realtime-2026-01-28: 8 kHz
All other models: 16 kHz
Audio formats supported: pcm, wav, mp3, opus, speex, aac, amr
Text embeddings
Text embedding models convert text into numerical vectors that represent the meaning of the input. These models are suitable for search, clustering, recommendation, and classification tasks. Billing is based on the number of input tokens. API reference
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the Chinese mainland).
Model | Vector dimensions | Batch size | Max tokens per batch | Supported languages | Price (per 1M input tokens) | Free quota |
text-embedding-v4 Part of the Qwen3-Embedding series | 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, 64 | 10 | 8,192 | Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and over 100 other major languages, plus multiple programming languages | $0.07 | 1 million tokens Valid for 90 days after activating Model Studio |
text-embedding-v3 | 1,024 (default), 768, or 512 | 10 | 8,192 | Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and over 50 other languages | 500,000 tokens Valid for 90 days after activating Model Studio |
Chinese mainland
In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Vector dimensions | Batch size | Max tokens per batch | Supported languages | Price (per 1M input tokens) | Free quota |
text-embedding-v4 Part of the Qwen3-Embedding series Batch calls are half price | 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, 64 | 10 | 8,192 | Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and over 100 other major languages, plus multiple programming languages | $0.072 | No free quota |
Hong Kong (China)
In Hong Kong (China) deployment mode, the endpoint and data storage are both located in Hong Kong (China). Model inference compute resources are limited to Hong Kong (China).
Model | Vector dimensions | Batch size | Max tokens per batch | Supported languages | Price (per 1M input tokens) | Free quota |
text-embedding-v4 Part of the Qwen3-Embedding series | 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, 64 | 10 | 8,192 | Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and over 100 other major languages, plus multiple programming languages | $0.07 | 1 million tokens Valid for 90 days after activating Model Studio |
Batch size refers to the max number of texts you can process in a single API call. For example, text-embedding-v4 has a batch size of 10, meaning one request can include up to 10 texts for vectorization, and each text must not exceed 8,192 tokens. This limit applies to:
String array input: The array can contain up to 10 elements.
File input: A text file can contain up to 10 lines of text.
Multimodal embedding
Multimodal embedding models transform text, images, or videos into a vector of floating-point numbers. These models are suitable for video classification, image classification, and image-text retrieval. API reference
International
In the International deployment mode, endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).
Model | Data type | Vector dimensions | Price (per 1 million input tokens) | Free quota(Note) |
tongyi-embedding-vision-plus | float(32) | 1,152 | $0.09 | 1 million tokens Validity: Within 90 days of activating Model Studio. |
tongyi-embedding-vision-flash | float(32) | 768 | Image/Video: $0.03 Text: $0.09 |
Chinese mainland
In the Chinese mainland deployment mode, endpoints and data storage are located in the Beijing region. Model inference compute resources are restricted to the Chinese mainland.
Model | Data type | Vector dimensions | Price (per 1 million input tokens) |
qwen3-vl-embedding | float(32) | 2560, 2048, 1536, 1024, 768, 512, 256 | Image/Video: $0.258 Text: $0.1 |
multimodal-embedding-v1 | 1,024 | Free trial |
Ranking
These models are typically used for semantic retrieval. Given a query and a list of candidate documents, they rank the documents from highest to lowest based on semantic relevance to the query. API reference
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are scheduled dynamically worldwide (excluding the Chinese mainland).
Model | Max number of documents | Max input tokens per line | Max input tokens | Supported languages | Unit price (per 1M input tokens) |
qwen3-rerank | 500 | 4,000 | 30,000 | Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and over 100 other major languages | $0.1 |
Max input tokens per line: Each query or document can contain up to 4,000 tokens. Input exceeding this limit will be truncated.
Max number of documents: A request can include up to 500 documents.
Max input tokens: The total number of tokens across all queries and documents in a single request must not exceed 30,000.
Chinese mainland
In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Max number of documents | Max input tokens per line | Max input tokens | Supported languages | Unit price (per 1M input tokens) |
qwen3-vl-rerank | 100 | 8,000 | 120,000 | Chinese, English, Japanese, Korean, French, German, and 33 other major languages | Images: $0.258 Text: $0.1 |
gte-rerank-v2 | 500 | 4,000 | 30,000 | Chinese, English, Japanese, Korean, Thai, Spanish, French, Portuguese, German, Indonesian, Arabic, and over 50 other languages | $0.115 |
Max input tokens per line: Each query or document can contain up to 4,000 tokens. Input exceeding this limit will be truncated.
Max number of documents: A request can include up to 500 documents.
Max input tokens: The total number of tokens across all queries and documents in a single request must not exceed 30,000.
Domain specific
Intention recognition
The intention recognition model parses user intent quickly and accurately in under 100 milliseconds. It selects the right tool to solve user problems.API reference | Usage
Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
tongyi-intent-detect-v3 | 8,192 | 8,192 | 1,024 | $0.058 | $0.144 |
Role playing
Qwen role-playing models are designed for anthropomorphic dialog scenarios, such as virtual socializing, game NPCs, IP replication, and hardware, toys, or in-vehicle systems. Compared with other Qwen models, they improve persona fidelity, topic progression, and empathetic listening.Usage
International
In the international deployment mode, the endpoint and data storage are both in the Singapore region. Model inference compute resources are dynamically scheduled across global regions, excluding the Chinese mainland.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen-plus-character | 32,768 | 30,000 | 4,000 | $0.5 | $1.4 |
qwen-plus-character-ja | 8,192 | 7,680 | 512 | $0.5 | $1.4 |
Chinese mainland
In the Chinese mainland deployment mode, the endpoint and data storage are both in the Beijing region. Model inference compute resources are limited to the Chinese mainland.
Model | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||
qwen-plus-character | 32,768 | 32,000 | 4,096 | $0.115 | $0.287 |
Retired models
Retired on January 30, 2026
Category | Model | Context window | Max input | Max output | Input cost (per 1M tokens) | Output cost (per 1M tokens) | Alternative |
(tokens) | |||||||
Qwen Plus | qwen-plus-2024-11-27 | 131,072 | 129,024 | 8,192 | $0.115 | $0.287 | qwen-plus-2025-12-01 |
qwen-plus-2024-11-25 | |||||||
qwen-plus-2024-09-19 | |||||||
qwen-plus-2024-08-06 | 128,000 | $0.574 | $1.721 | ||||
Qwen Turbo | qwen-turbo-2024-09-19 | 131,072 | 129,023 | 8,192 | $0.044 | $0.087 | qwen-flash-2025-07-28 |
Qwen VL | qwen-vl-max-2024-10-30 | 32,768 | 30,720 Max 16384 per image | 2,048 | $2.868 | $2.868 | qwen3-vl-plus-2025-12-19 |
qwen-vl-max-2024-08-09 | |||||||
qwen-vl-plus-2024-08-09 | $0.216 | $0.646 | qwen3-vl-flash-2025-10-15 | ||||


























































