Flagship models
International (Singapore)
Flagship models |
Ideal for complex tasks. The most powerful model. |
A balance of performance, speed, and cost. |
Ideal for simple jobs. Fast and low-cost. |
An excellent code model that excels at tool calling and environment interaction. |
Maximum context window (Tokens) | 262,144 | 1,000,000 | 1,000,000 | 1,000,000 |
Minimum input price (Million tokens) | $1.2 | $0.4 | $0.05 | $0.3 |
Minimum output price (Million tokens) | $6 | $1.2 | $0.4 | $1.5 |
Chinese Mainland (Beijing)
Flagship models |
Ideal for complex tasks. The most powerful model. |
A balance of performance, speed, and cost. |
Ideal for simple jobs. Fast and low-cost. |
An excellent code model that excels at tool calling and environment interaction. |
Maximum context window (Tokens) | 262,144 | 1,000,000 | 1,000,000 | 1,000,000 |
Minimum input price (Million tokens) | $0.459 | $0.115 | $0.022 | $0.144 |
Minimum output price (Million tokens) | $1.836 | $0.287 | $0.216 | $0.574 |
Model overview
International (Singapore)
Category | Subcategory | Description |
Text generation | Qwen large language models: Commercial models (Qwen-Max, Qwen-Plus, Qwen-Flash), open-source models (Qwen3, Qwen2.5) | |
Visual understanding model Qwen-VL, visual reasoning model QVQ, omni-modal model Qwen-Omni, and real-time multi-modal model Qwen-Omni-Realtime | ||
Image generation |
| |
| ||
Speech synthesis and recognition | Qwen-TTS and Qwen-TTS-Realtime can be used for text-to-speech in scenarios such as intelligent voice customer service, audiobooks, in-car navigation, and educational tutoring. | |
Qwen-ASR-Realtime, Qwen-ASR, Qwen3-LiveTranslate-Flash-Realtime and Fun-ASR can perform speech-to-text for scenarios such as real-time meeting records, real-time live stream captions, and telephone customer service. | ||
Video generation | Generates high-quality videos with rich styles from a single sentence. | |
| ||
General-purpose video editing: Performs various video editing tasks based on input text, images, and videos. For example, it can generate a new video by extracting motion features from an input video and combining them with a prompt. | ||
Embedding | Converts text into a set of numbers that represent the text. It is suitable for search, clustering, recommendation, and classification tasks. |
Mainland China (Beijing)
Category | Subcategory | Description |
Text generation | ||
The visual understanding model Qwen-VL, the visual reasoning model QVQ, and the omni-modal model Qwen-Omni | ||
Code model, Mathematical model, Translation model, Data mining model, Research model, Intention recognition model, Role-playing model | ||
Image generation |
| |
General-purpose models:
More models: Qwen Image Translation, OutfitAnyone | ||
Speech synthesis and recognition | Qwen-TTS, Qwen-TTS-Realtime, and CosyVoice convert text to speech for scenarios such as voice-based customer service, audiobooks, in-car navigation, and educational tutoring. | |
Qwen-ASR-Realtime, Qwen-ASR, Fun-ASR, and Paraformer convert speech to text for scenarios such as real-time meeting transcription, real-time live stream captioning, and customer service calls. | ||
Video editing and generation | Generates high-quality videos with rich styles from a single sentence. | |
| ||
| ||
Embedding | Converts text into a set of numbers that represent the text. It is used for search, clustering, recommendation, and classification. | |
Converts text, images, and speech into a set of numbers. It is used for audio and video classification, image classification, and image-text retrieval. |
Text generation - Qwen
The following are the Qwen commercial models. Compared to the open-source versions, the commercial models offer the latest capabilities and improvements.
The parameter sizes of the commercial models are not disclosed.
Each model is updated periodically. To use a fixed version, you can select a snapshot version. A snapshot version is typically maintained for one month after the release of the next snapshot version.
We recommend that you use the stable or latest version for more lenient rate limiting conditions.
Qwen-Max
The best-performing model in the Qwen series, suitable for complex, multi-step tasks. Usage | API reference | Try it online
International (Singapore)
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | ||||||||
qwen3-max Currently same capability as qwen3-max-2025-09-23 Batch calling half price | Stable | Non-thinking only | 262,144 | 258,048 | - | 65,536 | Tiered pricing, see the description below. | 1 million tokens each Valid for 90 days after activation | |
qwen3-max-2025-09-23 | Snapshot | Non-thinking only | |||||||
qwen3-max-preview | Preview | Thinking | 81,920 | 32,768 | |||||
Non-thinking | - | 65,536 | |||||||
Billing for the models listed above is tiered based on the number of input tokens per request.
Input tokens per request | Input price (Million tokens) qwen3-max and qwen3-max-preview support context cache. | Output price (Million tokens) |
0 < Tokens ≤ 32K | $1.2 | $6 |
32K < Tokens ≤ 128K | $2.4 | $12 |
128K < Tokens ≤ 252K | $3 | $15 |
Chinese mainland (Beijing)
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||||
qwen3-max Currently same capability as qwen3-max-2025-09-23 Batch calling half price | Stable | Non-thinking only | 262,144 | 258,048 | - | 65,536 | Tiered pricing, see the description below. | |
qwen3-max-2025-09-23 | Snapshot | Non-thinking only | ||||||
qwen3-max-preview | Preview | Thinking | 81,920 | 32,768 | ||||
Non-thinking | - | 65,536 | ||||||
Billing for the models listed above is tiered based on the number of input tokens per request.
Model | Input tokens per request | Input price (Million tokens) | Output price (Million tokens) Chain-of-thought + response |
qwen3-max batch calls are half price Context cache discounts | 0 < Tokens ≤ 32K | $0.459 | $1.836 |
32K < Tokens ≤ 128K | $0.918 | $3.672 | |
128K < Tokens ≤ 252K | $1.377 | $5.508 | |
qwen3-max-2025-09-23 | 0 < Tokens ≤ 32K | $0.861 | $3.441 |
32K < Tokens ≤ 128K | $1.434 | $5.735 | |
128K < Tokens ≤ 252K | $2.151 | $8.602 | |
qwen3-max-preview Context cache discounts | 0 < Tokens ≤ 32K | $0.861 | $3.441 |
32K < Tokens ≤ 128K | $1.434 | $5.735 | |
128K < Tokens ≤ 252K | $2.151 | $8.602 |
The thinking mode of qwen3-max-preview significantly improves overall inference capabilities and excels in agent programming, common sense reasoning, mathematics, science, and general tasks.
Qwen-Plus
A balanced model that offers performance, cost, and speed between those of Qwen-Max and Qwen-Flash. It is suitable for moderately complex tasks. Usage | API reference | Try it online | Deep thinking
International (Singapore)
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | ||||||
qwen-plus Currently has the same capabilities as qwen-plus-2025-07-28 Part of the Qwen3 series | Stable | 1,000,000 | Thinking mode 995,904 Non-thinking mode 997,952 The default is 262,144. You can adjust this value using the max_input_tokens parameter. | 32,768 Max chain-of-thought: 81,920 | Tiered pricing applies. For more information, see the notes below the table. | 1 million tokens for input and output each Valid for 90 days after you activate Model Studio. | |
qwen-plus-latest Currently has the same capabilities as qwen-plus-2025-07-28 Part of the Qwen3 series | Latest | Thinking mode 995,904 Non-thinking mode 997,952 | |||||
qwen-plus-2025-09-11 Part of the Qwen3 series. | Snapshot | Thinking mode 995,904 Non-thinking mode 997,952 | |||||
qwen-plus-2025-07-28 also known as qwen-plus-0728 Part of the Qwen3 series | |||||||
qwen-plus-2025-07-14 also known as qwen-plus-0714 Part of the Qwen3 series | 131,072 | Thinking mode 98,304 Non-thinking mode 129,024 | 16,384 Max chain-of-thought: 38,912 | $0.4 | Thinking mode $4 Non-thinking mode $1.2 | ||
qwen-plus-2025-04-28 also known as qwen-plus-0428 Part of the Qwen3 series | |||||||
qwen-plus-2025-01-25 also known as qwen-plus-0125 | 129,024 | 8,192 | $1.2 | ||||
Billing for qwen-plus, qwen-plus-latest, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 is tiered based on the number of input tokens per request.
Input tokens per request | Input price (Million tokens) | Mode | Output price (Million tokens) |
0 < Tokens ≤ 256K | $0.4 | Non-thinking mode | $1.2 |
Thinking mode | $4 | ||
256K < Tokens ≤ 1M | $1.2 | Non-thinking mode | $3.6 |
Thinking mode | $12 |
Chinese mainland (Beijing)
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
qwen-plus Currently has the same capabilities as qwen-plus-2025-07-28 Part of the Qwen3 series | Stable | 1,000,000 | Thinking mode 995,904 Non-thinking mode 997,952 The default is 131,072. You can adjust this value using the max_input_tokens parameter. | 32,768 Max chain-of-thought: 81,920 | Tiered pricing applies. For more information, see the notes below the table. | |
qwen-plus-latest Currently has the same capabilities as qwen-plus-2025-07-28 Part of the Qwen3 series | Latest | Thinking mode 995,904 Non-thinking mode 997,952 | ||||
qwen-plus-2025-09-11 Part of the Qwen3 series | Snapshot | Thinking mode 995,904 Non-thinking mode 997,952 | ||||
qwen-plus-2025-07-28 also known as qwen-plus-0728 Part of the Qwen3 series | ||||||
qwen-plus-2025-07-14 also known as qwen-plus-0714 Part of the Qwen3 series | 131,072 | Thinking mode 98,304 Non-thinking mode 129,024 | 16,384 Max chain-of-thought: 38,912 | $0.115 | Thinking mode $1.147 Non-thinking mode $0.287 | |
qwen-plus-2025-04-28 also known as qwen-plus-0428 Part of the Qwen3 series | ||||||
Billing for qwen-plus, qwen-plus-latest, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 is tiered based on the number of input tokens per request.
Input tokens per request | Input price (Million tokens) | Mode | Output price (Million tokens) |
0 < Tokens ≤ 128K | $0.115 | Non-thinking mode | $0.287 |
Thinking mode | $1.147 | ||
128K < Tokens ≤ 256K | $0.345 | Non-thinking mode | $2.868 |
Thinking mode | $3.441 | ||
256K < Tokens ≤ 1M | $0.689 | Non-thinking mode | $6.881 |
Thinking mode | $9.175 |
These models support both thinking and non-thinking modes. You can switch between them using the enable_thinking parameter. In addition, the models' capabilities are significantly improved:
Reasoning capabilities: In evaluations for math, code, and logical reasoning, the model significantly outperforms QwQ and other models of similar size without a reasoning mode. It achieves top-tier performance among models of its scale.
Human preference alignment: The model shows significant improvements in creative writing, role assumption, multi-turn conversation, and instruction following. Its general capabilities are significantly better than those of other models of similar size.
Agent capabilities: The model achieves industry-leading performance in both thinking and non-thinking modes and can accurately invoke external tools.
Multilingual capabilities: The model supports more than 100 languages and dialects. Its capabilities in multilingual translation, instruction understanding, and common-sense reasoning are significantly improved.
Response format: This version fixes response format issues from previous versions, such as incorrect Markdown formatting, premature truncation, and incorrect boxed output.
For the models listed above, if you enable thinking mode but no thought process is generated, you are charged based on the pricing for non-thinking mode.
Qwen-Flash
The fastest and most cost-effective model in the Qwen series, ideal for simple jobs. Qwen-Flash features flexible tiered pricing, making it more cost-effective than Qwen-Turbo. Usage | API reference | Try it online | Thinking mode
International (Singapore)
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost Chain-of-thought + Outputs | Free quota |
(Tokens) | (1,000 tokens) | ||||||||
qwen-flash Same capabilities as qwen-flash-2025-07-28 Part of the Qwen3 series. Batch calls are charged at half the standard price. | Stable | Thinking | 1,000,000 | 995,904 | 81,920 | 32,768 | Tiered pricing. See the description below the table. | 1 million tokens each Valid for 90 days after activating Alibaba Cloud Model Studio. | |
Non-thinking | 997,952 | - | |||||||
qwen-flash-2025-07-28 Part of the Qwen3 series. | Snapshot | Thinking | 995,904 | 81,920 | |||||
Non-thinking | 997,952 | - | |||||||
Billing for the models listed above is tiered based on the number of input tokens per request. qwen-flash supports context cache and batch calls.
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0< Tokens ≤256K | $0.05 | $0.4 |
256K< Tokens ≤1M | $0.25 | $2 |
Chinese mainland (Beijing)
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost Chain-of-thought + Outputs |
(Tokens) | (1,000 tokens) | |||||||
qwen-flash Same capabilities as qwen-flash-2025-07-28 Part of the Qwen3 series | Stable | Thinking | 1,000,000 | 995,904 | 81,920 | 32,768 | Tiered pricing. See the description below the table. | |
Non-thinking | 997,952 | - | ||||||
qwen-flash-2025-07-28 Part of the Qwen3 series | Snapshot | Thinking | 995,904 | 81,920 | ||||
Non-thinking | 997,952 | - | ||||||
Billing for the models listed above is tiered based on the number of input tokens per request. qwen-flash supports context cache.
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0< Tokens ≤128K | $0.022 | $0.216 |
128K< Tokens ≤256K | $0.087 | $0.861 |
256K< Tokens ≤1M | $0.173 | $1.721 |
Qwen-Turbo
Qwen-Turbo will no longer be updated. We recommend replacing it with Qwen-Flash. Qwen-Flash uses flexible tiered pricing, which offers a more granular pricing model. Usage | API reference | Try it online | Deep thinking
International (Singapore)
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | ||||||
qwen-turbo Currently has the same capabilities as qwen-turbo-2025-04-28 Part of the Qwen3 series | Stable | Thinking mode 131,072 Non-thinking mode 1,000,000 | Thinking mode 98,304 Non-thinking mode 1,000,000 | 16,384 Max chain-of-thought is 38,912 | $0.05 Batch calls are half price | Thinking mode: $0.5 Non-thinking mode: $0.2 Batch calls are half price | 1 million tokens for each Validity: 90 days after you activate Alibaba Cloud Model Studio |
qwen-turbo-latest Always has the same capabilities as the latest snapshot version Part of the Qwen3 series | Latest | $0.05 | Thinking mode: $0.5 Non-thinking mode: $0.2 | ||||
qwen-turbo-2025-04-28 Also known as qwen-turbo-0428 Part of the Qwen3 series | Snapshot | ||||||
qwen-turbo-2024-11-01 Also known as qwen-turbo-1101 | 1,000,000 | 1,000,000 | 8,192 | $0.2 | |||
Mainland China (Beijing)
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
qwen-turbo Currently has the same capabilities as qwen-turbo-2025-04-28 Part of the Qwen3 series | Stable | Thinking mode 131,072 Non-thinking mode 1,000,000 | Thinking mode 98,304 Non-thinking mode 1,000,000 | 16,384 Max chain-of-thought is 38,912 | $0.044 | Thinking mode $0.431 Non-thinking mode $0.087 |
qwen-turbo-latest Always has the same capabilities as the latest snapshot version Part of the Qwen3 series | Latest | |||||
qwen-turbo-2025-07-15 Also known as qwen-turbo-0715 Part of the Qwen3 series | Snapshot | |||||
qwen-turbo-2025-04-28 Also known as qwen-turbo-0428 Part of the Qwen3 series | ||||||
QwQ
The QwQ reasoning model is trained on the Qwen2.5 model and uses reinforcement learning to significantly improve its reasoning capabilities. The model's core metrics for math and code, such as AIME 24/25 and LiveCodeBench, and some of its general metrics, such as IFEval and LiveBench, are comparable to the full-performance version of DeepSeek-R1. Usage
Singapore
Model | Version | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||||
qwq-plus | Stable | 131,072 | 98,304 | 32,768 | 8,192 | $0.8 | $2.4 | 1 million tokens Validity: Within 90 days after you activate Alibaba Cloud Model Studio. |
Mainland China (Beijing)
Model | Version | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||||
qwq-plus Same capabilities as qwq-plus-2025-03-05. | Stable | 131,072 | 98,304 | 32,768 | 8,192 | $0.230 | $0.574 |
qwq-plus-latest Always has the same capabilities as the latest snapshot version. | Latest | ||||||
qwq-plus-2025-03-05 Also known as qwq-plus-0305. | Snapshot | ||||||
Qwen-Long
The Qwen-Long model has the longest context window in the Qwen series. It offers balanced performance at a low cost. This model is ideal for tasks such as long-text analysis, information extraction, summarization, classification, and tagging. Usage | Try it online
China (Beijing)
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
qwen-long-latest Always matches the capabilities of the latest snapshot version. | Stable | 10,000,000 | 10,000,000 | 8,192 | $0.072 | $0.287 |
qwen-long-2025-01-25 Also known as qwen-long-0125. | Snapshot | |||||
Qwen-Omni
The Qwen-Omni model accepts combined inputs from multiple modalities, such as text, images, audio, and video, and generates responses in text or speech format. It provides a variety of expressive, human-like voices and supports audio output in multiple languages and dialects. You can use it in audio and video chat scenarios, such as for visual recognition, sentiment analysis, and education and training. Usage | API reference
Singapore
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Free quota |
(Tokens) | |||||||
qwen3-omni-flash Currently has the same capabilities as qwen3-omni-flash-2025-09-15 | Stable | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | 1 million tokens each (modality-agnostic) Valid for 90 days after you activate Model Studio |
Non-thinking mode | 49,152 | - | |||||
qwen3-omni-flash-2025-09-15 Also known as qwen3-omni-flash-0915 | Snapshot | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | |
Non-thinking mode | 49,152 | - | |||||
After you use your free quota, the following billing rules apply to inputs and outputs. The billing is the same for both thinking mode and non-thinking mode. Audio output is not supported in thinking mode.
|
|
Mainland China (Beijing)
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Free quota |
(Tokens) | |||||||
qwen3-omni-flash Currently has the same capabilities as qwen3-omni-flash-2025-09-15 | Stable | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | No free quota |
Non-thinking mode | 49,152 | - | |||||
qwen3-omni-flash-2025-09-15 Also known as qwen3-omni-flash-0915 | Snapshot | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | |
Non-thinking mode | 49,152 | - | |||||
After your free quota is used up, inputs and outputs are billed according to the following rules. The billing is the same for both thinking mode and non-thinking mode. Audio output is not supported in thinking mode.
|
|
The Qwen3-Omni-Flash model offers significant improvements over Qwen-Omni-Turbo, which is no longer updated:
It is a hybrid thinking model that supports both thinking and non-thinking modes. You can switch between the modes using the
enable_thinkingparameter. By default, thinking mode is disabled.Audio output is not supported in thinking mode. In non-thinking mode, the audio output from the model has the following features:
It supports 17 voices, an increase from the 4 supported by Qwen-Omni-Turbo.
It supports 10 languages, an increase from the 2 supported by Qwen-Omni-Turbo.
Qwen-Omni-Realtime
Compared to Qwen Omni, these models support audio stream input. They have a built-in Voice Activity Detection (VAD) feature that automatically detects the start and end of user speech. Usage | Client events | Server events
International (Singapore)
Model | Version | Context window | Max input | Max output | Free quota |
(Tokens) | |||||
qwen3-omni-flash-realtime Equivalent to qwen3-omni-flash-realtime-2025-09-15 | Stable | 65,536 | 49,152 | 16,384 | 1 million tokens each, regardless of modality Valid for 90 days after you activate Model Studio |
qwen3-omni-flash-realtime-2025-09-15 | Snapshot | ||||
After the free quota is exhausted, the following billing rules apply to inputs and outputs:
|
|
Mainland China (Beijing)
Model | Version | Context window | Max input | Max output | Free quota |
(Tokens) | |||||
qwen3-omni-flash-realtime Equivalent to qwen3-omni-flash-realtime-2025-09-15 | Stable | 65,536 | 49,152 | 16,384 | No free quota |
qwen3-omni-flash-realtime-2025-09-15 | Snapshot | ||||
The following billing rules apply to inputs and outputs:
|
|
The Qwen3-Omni-Flash-Realtime model is recommended. It offers significantly improved capabilities compared to Qwen-Omni-Turbo-Realtime, which will no longer be updated. For audio output from the model:
It supports 17 voices. Qwen-Omni-Turbo-Realtime supports only 4.
It supports 10 languages. Qwen-Omni-Turbo-Realtime supports only 2.
QVQ
QVQ is a visual reasoning model that supports visual inputs and chain-of-thought outputs. It delivers superior performance in math, programming, visual analysis, creative tasks, and general tasks. Usage | Try it online
International (Singapore)
Model | Version | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||||
qvq-max Equivalent to qvq-max-2025-03-25. | Stable | 131,072 | 106,496 Maximum of 16,384 tokens for a single image. | 16,384 | 8,192 | $1.2 | $4.8 | 1 million input tokens and 1 million output tokens. Valid for 90 days after you activate Alibaba Cloud Model Studio. |
qvq-max-latest Always equivalent to the latest snapshot version. | Latest | |||||||
qvq-max-2025-03-25 Also known as qvq-max-0325. | Snapshot | |||||||
Mainland China (Beijing)
Model | Version | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||||
qvq-max Offers stronger visual reasoning and instruction-following capabilities than qvq-plus, providing optimal performance for more complex tasks. Has the same capabilities as qvq-max-2025-03-25. | Stable | 131,072 | 106,496 Maximum of 16,384 for a single image. | 16,384 | 8,192 | $1.147 | $4.588 |
qvq-max-latest Always has the same capabilities as the latest snapshot version. | Latest | ||||||
qvq-max-2025-05-15 Also known as qvq-max-0515. | Snapshot | ||||||
qvq-max-2025-03-25 Also known as qvq-max-0325. | |||||||
qvq-plus Has the same capabilities as qvq-plus-2025-05-15. | Stable | $0.287 | $0.717 | ||||
qvq-plus-latest Always has the same capabilities as the latest snapshot version. | Latest | ||||||
qvq-plus-2025-05-15 Also known as qvq-plus-0515. | Snapshot | ||||||
Qwen-VL
Qwen-VL is a text generation model with visual understanding (image) capabilities. It not only performs Optical Character Recognition (OCR) but also provides further summarization and reasoning, such as extracting properties from product photos or solving problems shown in diagrams. Usage | API reference | Try it online
Qwen-VL models are billed based on the total number of input and output tokens. For more information about how image tokens are calculated, see Visual understanding.
International (Singapore)
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost (Chain-of-thought + Output) | Free quota |
(Tokens) | (Million tokens) | ||||||||
qwen3-vl-plus Same capabilities as qwen3-vl-plus-2025-09-23 | Stable | thinking | 262,144 | 258,048 Maximum of 16,384 tokens per image | 81,920 | 32,768 | Tiered pricing. For more information, see the description below the table. | 1 million input tokens and 1 million output tokens Valid for 90 days after you activate Alibaba Cloud Model Studio. | |
non-thinking | 260,096 Maximum of 16,384 tokens per image | - | |||||||
qwen3-vl-plus-2025-09-23 | Snapshot | thinking | 258,048 Maximum of 16,384 tokens per image | 81,920 | |||||
non-thinking | 260,096 Maximum of 16,384 tokens per image | - | |||||||
qwen3-vl-flash Same capabilities as qwen3-vl-flash-2025-10-15 | Stable | thinking | 258,048 Maximum of 16,384 tokens per image | 81,920 | |||||
non-thinking | 260,096 Maximum of 16,384 tokens per image | - | |||||||
qwen3-vl-flash-2025-10-15 | Snapshot | thinking | 258,048 Maximum of 16,384 tokens per image | 81,920 | |||||
non-thinking | 260,096 Maximum of 16,384 tokens per image | - | |||||||
The models listed above use tiered pricing based on the number of input tokens per request. The input and output prices are the same for both thinking and non-thinking modes.
qwen3-vl-plus series
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0 < Tokens ≤ 32K | $0.20 | $1.60 |
32K < Tokens ≤ 128K | $0.30 | $2.40 |
128K < Tokens ≤ 256K | $0.60 | $4.80 |
qwen3-vl-flash series
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0 < Tokens ≤ 32K | $0.05 | $0.40 |
32K < Tokens ≤ 128K | $0.075 | $0.60 |
128K < Tokens ≤ 256K | $0.12 | $0.96 |
Mainland China (Beijing)
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | ||||||||
qwen3-vl-plus Same capabilities as qwen3-vl-plus-2025-09-23 | Stable | thinking | 262,144 | 258,048 Maximum of 16,384 tokens per image | 81,920 | 32,768 | Tiered pricing. For more information, see the description below the table. | No free quota | |
non-thinking | 260,096 Maximum of 16,384 tokens per image | - | |||||||
qwen3-vl-plus-2025-09-23 | Snapshot | thinking | 258,048 Maximum of 16,384 tokens per image | 81,920 | |||||
non-thinking | 260,096 Maximum of 16,384 tokens per image | - | |||||||
qwen3-vl-flash Same capabilities as qwen3-vl-flash-2025-10-15 | Stable | thinking | 258,048 Maximum of 16,384 tokens per image | 81,920 | |||||
non-thinking | 260,096 Maximum of 16,384 tokens per image | - | |||||||
qwen3-vl-flash-2025-10-15 | Snapshot | thinking | 258,048 Maximum of 16,384 tokens per image | 81,920 | |||||
non-thinking | 260,096 Maximum of 16,384 tokens per image | - | |||||||
The models listed above use tiered pricing based on the number of input tokens per request. The input and output prices are the same for both thinking and non-thinking modes.
qwen3-vl-plus series
Input tokens per request | Input price (Million tokens) | Output price (Million tokens) |
0 < Tokens ≤ 32K | $0.143353 | $1.433525 |
32K < Tokens ≤ 128K | $0.215029 | $2.150288 |
128K < Tokens ≤ 256K | $0.430058 | $4.300576 |
qwen3-vl-flash series
Input tokens per request | Input price (per 1M tokens) | Output price (per 1M tokens) |
0 < Tokens ≤ 32K | $0.022 | $0.215 |
32K < Tokens ≤ 128K | $0.043 | $0.43 |
128K < Tokens ≤ 256K | $0.086 | $0.859 |
Qwen-OCR
The Qwen-OCR model is designed for text extraction. Compared to the Qwen-VL model, it specializes in extracting text from images of documents, tables, exam papers, and handwriting. It can recognize multiple languages, such as English, French, Japanese, Korean, German, Russian, and Italian. Usage | API reference | Try it online
International (Singapore)
Model | Version | Context window | Max input | Max output | Unit price | Free quota |
(tokens) | (Million tokens) | |||||
qwen-vl-ocr | Stable | 34,096 | 30,000 A single graph can support up to 30,000. | 4,096 | $0.72 | 1 million input tokens and 1 million output tokens Validity: The quota is valid for 90 days after you activate Alibaba Cloud Model Studio. |
Mainland China (Beijing)
Model | Version | Context window | Max input | Max output | Input/output unit price |
(Tokens) | (Million tokens) | ||||
qwen-vl-ocr Offers the same capabilities as qwen-vl-ocr-2025-04-13. | Stable | 34,096 | 30,000 Maximum of 30,000 for a single image. | 4,096 | $0.717 |
qwen-vl-ocr-latest Offers the same capabilities as the latest snapshot version. | Latest | ||||
qwen-vl-ocr-2025-04-13 Also known as qwen-vl-ocr-0413. Significantly improves text recognition and includes six built-in OCR tasks and features, such as custom prompts and image rotation correction. | Snapshot | ||||
qwen-vl-ocr-2024-10-28 Also known as qwen-vl-ocr-1028. | Snapshot | ||||
Qwen-Math
Qwen-Math is a language model designed for mathematical problem-solving. Usage | API reference | Try it online
This model is available only in the China (Beijing) region.
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
qwen-math-plus Same capabilities as qwen-math-plus-2024-09-19. | Stable | 4,096 | 3,072 | 3,072 | $0.574 | $1.721 |
qwen-math-plus-latest Same capabilities as the latest snapshot. | Latest | |||||
qwen-math-plus-2024-09-19 Also known as qwen-math-plus-0919. | Snapshot | |||||
qwen-math-plus-2024-08-16 Also known as qwen-math-plus-0816. | ||||||
qwen-math-turbo Same capabilities as qwen-math-turbo-2024-09-19. | Stable | $0.287 | $0.861 | |||
qwen-math-turbo-latest Same capabilities as the latest snapshot. | Latest | |||||
qwen-math-turbo-2024-09-19 Also known as qwen-math-turbo-0919. | Snapshot | |||||
Qwen-Coder
The latest Qwen3-Coder-Plus series models are Qwen code generation models built on Qwen3. They are powerful coding agents that excel at tool calling and environment interaction. These models can program autonomously and provide excellent coding and general-purpose capabilities. Usage | API reference | Try it online
International (Singapore)
Model | Version | Context window | Max input | Max output | Input cost (Million tokens) | Output cost (Million tokens) | Free quota |
Tokens | Per million tokens | ||||||
qwen3-coder-plus Currently equivalent to qwen3-coder-plus-2025-07-22 | Stable | 1,000,000 | 997,952 | 65,536 | Tiered pricing. See the description below the table. | 1 million input tokens and 1 million output tokens Valid for 90 days after you activate Alibaba Cloud Model Studio | |
qwen3-coder-plus-2025-09-23 | Snapshot | ||||||
qwen3-coder-plus-2025-07-22 | Snapshot | ||||||
qwen3-coder-flash Currently equivalent to qwen3-coder-flash-2025-07-28 | Stable | ||||||
qwen3-coder-flash-2025-07-28 | Snapshot | ||||||
These models use tiered billing based on the number of input tokens per request.
qwen3-coder-plus series
The prices for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are as follows. The qwen3-coder-plus model supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price. Input text that hits the explicit cache is billed at 10% of the unit price.
Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
0 < Tokens ≤ 32K | $1 | $5 |
32K < Tokens ≤ 128K | $1.8 | $9 |
128K < Tokens ≤ 256K | $3 | $15 |
256K < Tokens ≤ 1M | $6 | $60 |
qwen3-coder-flash series
The prices for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are as follows. The qwen3-coder-flash model supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price. Input text that hits the explicit cache is billed at 10% of the unit price.
Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
0 < Tokens ≤ 32K | $0.3 | $1.5 |
32K < Tokens ≤ 128K | $0.5 | $2.5 |
128K < Tokens ≤ 256K | $0.8 | $4 |
256K < Tokens ≤ 1M | $1.6 | $9.6 |
Mainland China (Beijing)
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
qwen3-coder-plus Provides the same functionality as qwen3-coder-plus-2025-07-22. | Stable | 1,000,000 | 997,952 | 65,536 | Tiered pricing. See the description below the table. | |
qwen3-coder-plus-2025-09-23 | Snapshot | |||||
qwen3-coder-plus-2025-07-22 | Snapshot | |||||
qwen3-coder-flash Currently an alias for qwen3-coder-flash-2025-07-28 | Stable | |||||
qwen3-coder-flash-2025-07-28 | Snapshot | |||||
These models use tiered billing based on the number of input tokens per request.
qwen3-coder-plus series
The prices for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are as follows. The qwen3-coder-plus model supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price. Input text that hits the explicit cache is billed at 10% of the unit price.
Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
0 < Tokens ≤ 32K | $0.574 | $2.294 |
32K < Tokens ≤ 128K | $0.861 | $3.441 |
128K < Tokens ≤ 256K | $1.434 | $5.735 |
256K < Tokens ≤ 1M | $2.868 | $28.671 |
qwen3-coder-flash series
The prices for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are as follows. The qwen3-coder-flash model supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price. Input text that hits the explicit cache is billed at 10% of the unit price.
Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
0 < Tokens ≤ 32K | $0.144 | $0.574 |
32K < Tokens ≤ 128K | $0.216 | $0.861 |
128K < Tokens ≤ 256K | $0.359 | $1.434 |
256K < Tokens ≤ 1M | $0.717 | $3.584 |
Qwen-MT
This flagship large translation model is a comprehensive upgrade to Qwen 3. It supports translation between 92 languages, including Chinese, English, Japanese, Korean, French, Spanish, German, Thai, Indonesian, Vietnamese, and Arabic. The model's performance and translation quality are significantly improved. It provides enhanced support for custom glossaries, format retention, and domain-specific prompts, resulting in more accurate and natural translations. Usage.
International (Singapore)
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||
qwen-mt-plus Part of Qwen3-MT | 16,384 | 8,192 | 8,192 | $2.46 | $7.37 | 1 million tokens per model Expires 90 days after activating Alibaba Cloud Model Studio. |
qwen-mt-flash Part of Qwen3-MT | $0.16 | $0.49 | ||||
qwen-mt-turbo Part of Qwen3-MT | $0.16 | $0.49 | ||||
Mainland China (Beijing)
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
qwen-mt-plus Part of Qwen3-MT | 16,384 | 8,192 | 8,192 | $0.259 | $0.775 |
qwen-mt-flash Part of Qwen3-MT | $0.101 | $0.280 | |||
qwen-mt-turbo Part of Qwen3-MT | $0.101 | $0.280 | |||
Qwen data mining model
The Qwen data mining model extracts structured information from documents for use in domains such as data annotation and content moderation. Usage | API reference
Available only in the China (Beijing) region.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||
qwen-doc-turbo | 131,072 | 129,024 | 8,192 | $0.087 | $0.144 | No free quota |
Qwen deep research model
The Qwen deep research model breaks down complex problems, performs inference and analysis using web search, and generates research reports. Usage | API reference
Available only in the China (Beijing) region.
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Per 1,000 tokens) | ||||
qwen-deep-research | 1,000,000 | 997,952 | 32,768 | $0.007742 | $0.023367 |
Text generation - Qwen open-source versions
In the model names, xxb indicates the parameter size. For example, qwen2-72b-instruct indicates a parameter size of 72 billion (72B).
Alibaba Cloud Model Studio supports invoking the open-source versions of Qwen. You do not need to deploy the models locally. For open-source versions, we recommend using the Qwen3 and Qwen2.5 models.
Qwen3
qwen3-next-80b-a3b-thinking, released in September 2025, supports only thinking mode. Compared to qwen3-235b-a22b-thinking-2507, it offers improved instruction-following capabilities and more concise summaries.
qwen3-next-80b-a3b-instruct, released in September 2025, supports only non-thinking mode. It offers enhanced Chinese comprehension, logical reasoning, and text generation capabilities compared to qwen3-235b-a22b-instruct-2507.
The qwen3-235b-a22b-thinking-2507 and qwen3-30b-a3b-thinking-2507 models, released in July 2025, support only thinking mode. They are upgraded versions of qwen3-235b-a22b (thinking mode) and qwen3-30b-a3b (thinking mode).
The qwen3-235b-a22b-instruct-2507 and qwen3-30b-a3b-instruct-2507 models, released in July 2025, support only non-thinking mode. They are upgraded versions of qwen3-235b-a22b (non-thinking mode) and qwen3-30b-a3b (non-thinking mode).
The Qwen3 models, released in April 2025, support both thinking and non-thinking modes. You can switch between the modes using the enable_thinking parameter. The Qwen3 models also feature significant capability enhancements:
Inference capabilities: In evaluations for math, code, and logical reasoning, the models significantly outperform QwQ and other non-reasoning models of a similar scale. Their performance is top-tier in the industry for models of their scale.
Human preference alignment: The models show major improvements in creative writing, role assumption, multi-turn conversation, and instruction following. Their general capabilities are significantly better than other models of a similar scale.
Agent capabilities: The models deliver industry-leading performance in both thinking and non-thinking modes and can perform precise external tool calling.
Multilingual capabilities: The models support over 100 languages and dialects. They show significant improvements in multilingual translation, instruction comprehension, and common-sense reasoning.
Response format fixes: This update fixes response format issues from previous versions, such as incorrect Markdown, truncated responses, and incorrect boxed output.
The open-source Qwen3 models released in April 2025 do not support non-streaming output in thinking mode.
If an open-source Qwen3 model is in thinking mode but does not output a thought process, it is billed at the non-thinking mode rate.
Thinking mode | Non-thinking mode | Usage
International (Singapore)
Model | Mode | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||||
qwen3-next-80b-a3b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.15 | $1.2 | 1 million tokens Valid for 90 days after you activate Alibaba Cloud Model Studio |
qwen3-next-80b-a3b-instruct | Non-thinking only | 129,024 | - | |||||
qwen3-235b-a22b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.23 | $2.3 | |||
qwen3-235b-a22b-instruct-2507 | Non-thinking only | 129,024 | - | $0.92 | ||||
qwen3-30b-a3b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.2 | $2.4 | |||
qwen3-30b-a3b-instruct-2507 | Non-thinking only | 129,024 | - | $0.8 | ||||
qwen3-235b-a22b This model and the following models were released in April 2025. | Non-thinking mode | 129,024 | - | 16,384 | $0.7 | $2.8 | ||
Thinking mode | 98,304 | 38,912 | $8.4 | |||||
qwen3-32b | Non-thinking mode | 129,024 | - | $0.16 | $0.64 | |||
Thinking mode | 98,304 | 38,912 | ||||||
qwen3-30b-a3b | Non-thinking mode | 129,024 | - | $0.2 | $0.8 | |||
Thinking mode | 98,304 | 38,912 | $2.4 | |||||
qwen3-14b | Non-thinking mode | 129,024 | - | 8,192 | $0.35 | $1.4 | ||
Thinking mode | 98,304 | 38,912 | $4.2 | |||||
qwen3-8b | Non-thinking mode | 129,024 | - | $0.18 | $0.7 | |||
Thinking mode | 98,304 | 38,912 | $2.1 | |||||
qwen3-4b | Non-thinking mode | 129,024 | - | $0.11 | $0.42 | |||
Thinking mode | 98,304 | 38,912 | $1.26 | |||||
qwen3-1.7b | Non-thinking mode | 32,768 | 30,720 | - | $0.42 | |||
Thinking mode | 28,672 | The total value cannot exceed 30,720. | $1.26 | |||||
qwen3-0.6b | Non-thinking mode | 30,720 | - | $0.42 | ||||
Thinking mode | 28,672 | The total of the value and the input cannot exceed 30,720. | $1.26 | |||||
Mainland China (Beijing)
Model | Mode | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||||
qwen3-next-80b-a3b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.144 | $1.434 |
qwen3-next-80b-a3b-instruct | Non-thinking only | 129,024 | - | $0.574 | |||
qwen3-235b-a22b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.287 | $2.868 | ||
qwen3-235b-a22b-instruct-2507 | Non-thinking only | 129,024 | - | $1.147 | |||
qwen3-30b-a3b-thinking-2507 | Thinking only | 126,976 | 81,920 | $0.108 | $1.076 | ||
qwen3-30b-a3b-instruct-2507 | Non-thinking only | 129,024 | - | $0.431 | |||
qwen3-235b-a22b | Non-thinking | 129,024 | - | 16,384 | $0.287 | $1.147 | |
Thinking | 98,304 | 38,912 | $2.868 | ||||
qwen3-32b | Non-thinking | 129,024 | - | $0.287 | $1.147 | ||
Thinking | 98,304 | 38,912 | $2.868 | ||||
qwen3-30b-a3b | Non-thinking | 129,024 | - | $0.108 | $0.431 | ||
Thinking | 98,304 | 38,912 | $1.076 | ||||
qwen3-14b | Non-thinking | 129,024 | - | 8,192 | $0.144 | $0.574 | |
Thinking | 98,304 | 38,912 | $1.434 | ||||
qwen3-8b | Non-thinking | 129,024 | - | $0.072 | $0.287 | ||
Thinking | 98,304 | 38,912 | $0.717 | ||||
qwen3-4b | Non-thinking | 129,024 | - | $0.044 | $0.173 | ||
Thinking | 98,304 | 38,912 | $0.431 | ||||
qwen3-1.7b | Non-thinking | 32,768 | 30,720 | - | $0.173 | ||
Thinking | 28,672 | The sum of input and chain-of-thought tokens must not exceed 30,720. | $0.431 | ||||
qwen3-0.6b | Non-thinking | 30,720 | - | $0.173 | |||
Thinking | 28,672 | The sum of input and chain-of-thought tokens must not exceed 30,720. | $0.431 | ||||
QwQ-Open-source
QwQ reasoning model trained on Qwen2.5-32B. Reinforcement learning has significantly improved its inference capabilities. Core metrics for math and code (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench) are on par with the full-power version of DeepSeek-R1. All metrics significantly exceed those of DeepSeek-R1-Distill-Qwen-32B, which is also based on Qwen2.5-32B. Usage | API reference
This feature is only available in the China (Beijing) region.
Model | Context window | Max input | Max chain-of-thought | Max output | Input price | Output price |
(Tokens) | (Million tokens) | |||||
qwq-32b | 131,072 | 98,304 | 32,768 | 8,192 | $0.287 | $0.861 |
QwQ-Preview
The qwq-32b-preview model is an experimental research model developed by the Qwen team in 2024. It focuses on enhancing AI reasoning capabilities, especially in math and programming. For more information about the limitations of the qwq-32b-preview model, see the QwQ official blog. Usage | API reference | Try it online
This feature is only available in the China (Beijing) region.
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
qwq-32b-preview | 32,768 | 30,720 | 16,384 | $0.287 | $0.861 |
Qwen2.5
QVQ
The qvq-72b-preview model is an experimental research model developed by the Qwen team. It focuses on enhancing visual reasoning capabilities, especially in mathematical reasoning. For more information about the limitations of the qvq-72b-preview model, see the QVQ official blog. Usage | API reference
To have the model output the thinking process before the final answer, you can use the commercial version of the QVQ model.
This feature is only available in the China (Beijing) region.
Model | Context window | Max input | Max output | Input Cost | Output cost |
Tokens | Per million tokens | ||||
qvq-72b-preview | 32,768 | 16,384 Maximum 16,384 tokens per image | 16,384 | $1.721 | $5.161 |
Qwen-Omni
This is a new multimodal large model for understanding and generation, trained on Qwen2.5. It supports text, image, speech, and video inputs, and can generate text and speech simultaneously in a stream. The speed of multimodal content understanding is significantly improved. Usage | API reference
International (Singapore)
Model | Context window | Max input | Max output | Free quota |
(Tokens) | ||||
qwen2.5-omni-7b | 32,768 | 30,720 | 2,048 | 1 million tokens (regardless of modality) Valid for 90 days after activating Alibaba Cloud Model Studio. |
After the free quota is used up, the following billing rules apply to inputs and outputs:
|
|
Mainland China (Beijing)
Model | Context window | Max input | Max output |
(Tokens) | |||
qwen2.5-omni-7b | 32,768 | 30,720 | 2,048 |
The billing rules for inputs and outputs are as follows:
|
|
Qwen3-Omni-Captioner
Qwen3-Omni-Captioner is an open-source model based on Qwen3-Omni. Without any prompts, it automatically generates accurate and comprehensive descriptions for complex audio, including speech, ambient sounds, music, and sound effects. It can identify speaker emotions, musical elements (such as style and instruments), and sensitive information, making it suitable for applications such as audio content analysis, security audits, intent recognition, and audio editing. Usage | API reference
This model is available only in the Singapore region.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||
qwen3-omni-30b-a3b-captioner | 65,536 | 32,768 | 32,768 | $3.81 | $3.06 | 1 million tokens Validity: 90 days after you activate Alibaba Cloud Model Studio |
Qwen-VL
This is the open-source version of Alibaba Cloud's Qwen-VL. Usage | API reference
The Qwen3-VL model offers significant improvements over Qwen2.5-VL:
Agent interaction: It operates computer and mobile phone interfaces, detects graphical user interface (GUI) elements, understands features, and invokes tools to perform tasks. It achieves top-tier performance in evaluations such as OS World.
Visual encoding: It generates code from images or videos. You can use this feature to create HTML, CSS, and JS code from design drafts or website screenshots.
Spatial intelligence: It supports 2D and 3D positioning and accurately determines object orientation, perspective changes, and occlusion relationships.
Long video understanding: It understands video content up to 20 minutes long and can pinpoint specific moments with second-level accuracy.
Deep thinking: It excels at capturing details and analyzing causality, achieving top-tier performance in evaluations such as MathVista and MMMU.
OCR: It supports 33 languages and performs more stably in scenarios that involve complex lighting, blur, or tilt. It also significantly improves recognition accuracy for rare characters, ancient script, and technical terms.
International (Singapore)
Model | Mode | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost CoT + response | Free quota |
(Tokens) | (Million tokens) | |||||||
qwen3-vl-235b-a22b-thinking | Thinking only | 126,976 | 81,920 | $0.4 | $4 | 1 million tokens each Valid for 90 days after Model Studio is activated. | ||
qwen3-vl-235b-a22b-instruct | Non-thinking only | 129,024 | - | $1.6 | ||||
qwen3-vl-32b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.16 | $0.64 | |
qwen3-vl-32b-instruct | Non-thinking only | 129,024 | - | |||||
qwen3-vl-30b-a3b-thinking | Thinking only | 126,976 | 81,920 | $0.2 | $2.4 | |||
qwen3-vl-30b-a3b-instruct | Non-thinking only | 129,024 | - | $0.8 | ||||
qwen3-vl-8b-thinking | Thinking only | 126,976 | 81,920 | $0.18 | $2.1 | |||
qwen3-vl-8b-instruct | Non-thinking only | 129,024 | - | $0.7 | ||||
Mainland China (Beijing)
Model | Mode | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost CoT + response | Free quota |
(Tokens) | (Million tokens) | |||||||
qwen3-vl-235b-a22b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | $0.286705 | $2.867051 | No free quota | |
qwen3-vl-235b-a22b-instruct | Non-thinking only | 129,024 | - | $1.146820 | ||||
qwen3-vl-32b-thinking | Thinking only | 131,072 | 126,976 | 81,920 | 32,768 | $0.287 | $2.868 | |
qwen3-vl-32b-instruct | Non-thinking only | 129,024 | - | $1.147 | ||||
qwen3-vl-30b-a3b-thinking | Thinking only | 126,976 | 81,920 | $0.108 | $1.076 | |||
qwen3-vl-30b-a3b-instruct | Non-thinking only | 129,024 | - | $0.431 | ||||
qwen3-vl-8b-thinking | Thinking only | 126,976 | 81,920 | $0.072 | $0.717 | |||
qwen3-vl-8b-instruct | Non-thinking only | 129,024 | - | $0.287 | ||||
Qwen-Math
This is a language model built on the Qwen model that is specialized for solving mathematical problems. Qwen2.5-Math supports Chinese and English and integrates multiple reasoning methods, including Chain of Thought (CoT), Program of Thought (PoT), and Tool-Integrated Reasoning (TIR). Usage | API reference | Try it online
This feature is only available in the China (Beijing) region.
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
qwen2.5-math-72b-instruct | 4,096 | 3,072 | 3,072 | $0.574 | $1.721 |
qwen2.5-math-7b-instruct | $0.144 | $0.287 | |||
qwen2.5-math-1.5b-instruct | Free for a limited time | ||||
Qwen-Coder
Qwen-Coder is an open source code model from Qwen. The latest Qwen3-Coder series has powerful Coding Agent capabilities. It excels at tool calling, environment interaction, and autonomous programming. The model combines excellent coding skills with general-purpose capabilities. Usage | API reference
International (Singapore)
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
Token count | ||||||
qwen3-coder-480b-a35b-instruct | 262,144 | 204,800 | 65,536 | Tiered pricing. See the description below the table. | 1 million input tokens and 1 million output tokens Valid for 90 days after you activate Alibaba Cloud Model Studio. | |
qwen3-coder-30b-a3b-instruct | ||||||
Billing for qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct is tiered based on the number of input tokens per request.
Model | Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
qwen3-coder-480b-a35b-instruct | 0 < Tokens ≤ 32K | $1.50 | $7.50 |
32K < Tokens ≤ 128K | $2.70 | $13.50 | |
128K < Tokens ≤ 200K | $4.50 | $22.50 | |
qwen3-coder-30b-a3b-instruct | 0 < Tokens ≤ 32K | $0.45 | $2.25 |
32K < Tokens ≤ 128K | $0.75 | $3.75 | |
128K < Tokens ≤ 200K | $1.20 | $6.00 |
Mainland China (Beijing)
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
qwen3-coder-480b-a35b-instruct | 262,144 | 204,800 | 65,536 | Tiered pricing. See the description below. | |
qwen3-coder-30b-a3b-instruct | |||||
qwen2.5-coder-32b-instruct | 131,072 | 129,024 | 8,192 | $0.287 | $0.861 |
qwen2.5-coder-14b-instruct | |||||
qwen2.5-coder-7b-instruct | $0.144 | $0.287 | |||
qwen2.5-coder-3b-instruct | 32,768 | 30,720 | Limited-time free trial | ||
qwen2.5-coder-1.5b-instruct | |||||
qwen2.5-coder-0.5b-instruct | |||||
Billing for qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct is tiered based on the number of input tokens per request.
Model | Input tokens per request | Input cost (Million tokens) | Output cost (Million tokens) |
qwen3-coder-480b-a35b-instruct | 0 < Tokens ≤ 32K | $0.861 | $3.441 |
32K < Tokens ≤ 128K | $1.291 | $5.161 | |
128K < Tokens ≤ 200K | $2.151 | $8.602 | |
qwen3-coder-30b-a3b-instruct | 0 < Tokens ≤ 32K | $0.216 | $0.861 |
32K < Tokens ≤ 128K | $0.323 | $1.291 | |
128K < Tokens ≤ 200K | $0.538 | $2.151 |
Text generation - Third-party models
DeepSeek
DeepSeek is a large language model launched by DeepSeek AI. API reference | Try it online
This feature is only available in the China (Beijing) region.
Model | Context window | Max input | Max chain-of-thought | Max response | Input cost | Output cost |
(Tokens) | (Million tokens) | |||||
deepseek-v3.2-exp 685B full-power version | 131,072 | 98,304 | 32,768 | 65,536 | $0.287 | $0.431 |
deepseek-v3.1 685B full-power version | $0.574 | $1.721 | ||||
deepseek-r1 685B full-power version | 16,384 | $2.294 | ||||
deepseek-r1-0528 685B full-power version | ||||||
deepseek-v3 671B full-power version | 131,072 | Not applicable | $0.287 | $1.147 | ||
deepseek-r1-distill-qwen-1.5b Based on Qwen2.5-Math-1.5B | 32,768 | 32,768 | 16,384 | 16,384 | Limited-time free trial | |
deepseek-r1-distill-qwen-7b Based on Qwen2.5-Math-7B | $0.072 | $0.144 | ||||
deepseek-r1-distill-qwen-14b Based on Qwen2.5-14B | $0.144 | $0.431 | ||||
deepseek-r1-distill-qwen-32b Based on Qwen2.5-32B | $0.287 | $0.861 | ||||
deepseek-r1-distill-llama-8b Based on Llama-3.1-8B | Limited-time free trial | |||||
deepseek-r1-distill-llama-70b Based on Llama-3.3-70B | ||||||
Kimi
Kimi-K2 is the first open-source trillion-parameter Mixture of Experts (MoE) model in China, provided by Moonshot AI. It activates 32 billion parameters and has excellent coding and tool-calling capabilities. Usage | Try it online
This feature is only available in the China (Beijing) region.
Model | Context window | Max input | Max chain-of-thought | Max response | Input price | Output price |
(Tokens) | (Million tokens) | |||||
kimi-k2-thinking | 262,144 | 229,376 | 32,768 | 16,384 | $0.574 | $2.294 |
Moonshot-Kimi-K2-Instruct | 131,072 | 131,072 | - | 8,192 | $0.574 | $2.294 |
Image generation
Qwen text-to-image
The Qwen text-to-image model excels at complex text rendering, especially for Chinese and English text.Currently, qwen-image-plus has the same capabilities as qwen-image, but qwen-image-plus has a lower price.API reference
International (Singapore)
Model | Unit price | Free quota |
qwen-image-plus | $0.03/image | Free quota: 100 images for each model Validity period: Within 90 days after you activate Alibaba Cloud Model Studio. |
qwen-image | $0.035/image |
Mainland China (Beijing)
Model | Unit price | Free quota |
qwen-image-plus | $0.028671/image | No free quota |
qwen-image | $0.035/image |
Input prompt | Output image |
Healing-style hand-drawn poster featuring three puppies playing with a ball on lush green grass, adorned with decorative elements such as birds and stars. The main title “Come Play Ball!” is prominently displayed at the top in bold, blue cartoon font. Below it, the subtitle “Come [Show Off Your Skills]!” appears in green font. A speech bubble adds playful charm with the text: “Hehe, watch me amaze my little friends next!” At the bottom, supplementary text reads: “We get to play ball with our friends again!” The color palette centers on fresh greens and blues, accented with bright pink and yellow tones to highlight a cheerful, childlike atmosphere. |
|
Qwen image editing
The Qwen image editing model supports precise text editing in Chinese and English. It also supports operations such as color adjustment, detail enhancement, style transfer, adding or removing objects, and changing positions and actions. These features enable complex editing of images and text. API reference
International (Singapore)
Model | Unit price | Free quota |
qwen-image-edit-plus | $0.03/image | Free quota: 100 images for each model Validity period: Within 90 days after you activate Alibaba Cloud Model Studio. |
qwen-image-edit-plus-2025-10-30 | $0.03/image | |
qwen-image-edit | $0.045/image |
Mainland China (Beijing)
Model | Unit price | Free quota |
qwen-image-edit-plus | $0.028671/image | No free quota |
qwen-image-edit-plus-2025-10-30 | $0.028671/image | |
qwen-image-edit | $0.043/image |
Original image |
Make the person stand up and bend over to hold the front paw of the dog. |
Original image |
Replace the text 'HEALTH INSURANCE' on the letter blocks with '明天会更好'. |
Original image |
Replace the dotted shirt with a light blue shirt. |
Original image |
Change the background to Antarctica. |
Original image |
Generate a cartoon profile picture of the person. |
Original image |
Remove the hair from the dinner plate. |
Qwen image translation
The Qwen image translation model supports translating text from images in 11 languages into Chinese or English. It accurately preserves the original layout and content information and provides custom features such as term definition, sensitive word filtering, and image entity detection. API reference
This feature is only available in the China (Beijing) region.
Model | Unit price | Free quota |
qwen-mt-image | $0.000431/image | No free quota |
Original image |
Japanese |
Portuguese |
Arabic |
Wan text-to-image
The Wan text-to-image model generates exquisite images from text. API reference | Try it online
International (Singapore)
Model | Description | Unit price | Free quota(Note) Validity period: Within 90 days after you activate Alibaba Cloud Model Studio. |
wan2.5-t2i-preview | Wan 2.5 preview. The single-side length limit is removed. You can freely select dimensions within the total pixel area and aspect ratio constraints. | $0.03/image | 50 images |
wan2.2-t2i-plus | Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.05/image | 100 images |
wan2.2-t2i-flash | Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.025/image | 100 images |
wan2.1-t2i-plus | Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details. | $0.05/image | 200 images |
wan2.1-t2i-turbo | Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed. | $0.025/image | 200 images |
Mainland China (Beijing)
Model | Description | Unit price | Free quota(Note) Validity period: Within 90 days after you activate Alibaba Cloud Model Studio. |
wan2.5-t2i-preview | Wan 2.5 preview. The single-side length limit is removed. You can freely select dimensions within the total pixel area and aspect ratio constraints. | $0.028671/image | No free quota |
wan2.2-t2i-plus | Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.02007/image | No free quota |
wan2.2-t2i-flash | Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture. | $0.028671/image | No free quota |
wanx2.1-t2i-plus | Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details. | $0.028671/image | No free quota |
wanx2.1-t2i-turbo | Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed. | $0.020070/image | No free quota |
wanx2.0-t2i-turbo | Wan 2.0 Turbo Edition. Excels at textured portraits and creative designs. It is cost-effective. | $0.005735/image | No free quota |
Input prompt | Output image |
A needle-felted Santa Claus holding a gift and a white cat standing next to him against a background of colorful gifts and green plants, creating a cute, warm, and cozy scene. |
|
Wan2.5 general image editing
The Wan2.5 general image editing model supports entity-consistent image editing and multi-image fusion. It accepts text, a single image, or multiple images as input. API reference
International (Singapore)
Model | Unit price | Free quota(Note) Validity period: Within 90 days after you activate Alibaba Cloud Model Studio. |
wan2.5-i2i-preview | $0.03/image | 50 images |
Mainland China (Beijing)
Model | Unit price | Free quota |
wan2.5-i2i-preview | $0.028671/image | No free quota |
Feature | Input example | Output image |
Single-image editing |
Change the floral dress to a vintage-style lace long dress with exquisite embroidery details on the collar and cuffs. |
|
Multi-image fusion |
Place the alarm clock from image 1 next to the vase on the dining table in image 2. |
|
Wan2.1 general image editing
The Wan2.1 general image editing model performs diverse image editing with simple instructions. It is suitable for scenarios such as outpainting, watermark removal, style transfer, image restoration, and image enhancement. Usage | API reference
This feature is only available in the China (Beijing) region.
Model | Unit price | Free quota |
wanx2.1-imageedit | $0.020070 per image | No free quota |
The general image editing model currently supports the following features:
Feature | Input image | Input prompt | Output image |
Global stylization |
| Convert to a French picture book style. |
|
Local stylization |
| Change the house to a wooden plank style. |
|
Instruction-based editing |
| Change the girl's hair to red. |
|
Inpainting | Input image
Masked image (The white area is the mask)
| A ceramic rabbit holding a ceramic flower. | Output image
|
Text watermark removal |
| Remove the text from the image. |
|
Outpainting |
| A green fairy. |
|
Image super-resolution | Blurry image
| Image super-resolution. | Clear image
|
Image colorization |
| Blue background, yellow leaves. |
|
Line art to image |
| A living room in a minimalist Nordic style. |
|
Placeholder Image |
| A cartoon character cautiously peeks out, spying on a brilliant blue gem inside the room. |
|
OutfitAnyone
Compared to the basic version, the OutfitAnyone-Plus model offers improvements in image definition, clothing texture details, and logo restoration. However, it takes longer to generate images and is suitable for scenarios that are not time-sensitive. API reference | Try it online
OutfitAnyone-Image Parsing supports parsing model and clothing images, which can be used for pre-processing and post-processing of OutfitAnyone images. API reference
This feature is only available in the China (Beijing) region.
Model | Description | Sample input | Sample output |
aitryon-plus | OutfitAnyone-Plus |
|
|
aitryon-parsing-v1 | OutfitAnyone-Image Parsing |
OutfitAnyone pricing
Model Service | Model | Unit Price | Discount | Tier |
OutfitAnyone-Plus | aitryon-plus | $0.071677/image | None | None |
OutfitAnyone-Image Parsing | aitryon-parsing-v1 | $0.000574/image | None | None |
Video generation - Wan
Text-to-video
The Wan text-to-video model generates videos from a single sentence. The videos feature rich artistic styles and cinematic quality. API reference | Try it online
International (Singapore)
Model | Description | Unit price | Free quota (Claim) Valid for 90 days after you activate Alibaba Cloud Model Studio |
wan2.5-t2v-preview | Wan 2.5 preview. Supports automatic voiceover and custom audio file input. | 480p: $0.05/second 720p: $0.10/second 1080p: $0.15/second | 50 seconds |
wan2.2-t2v-plus | Wan 2.2 Professional Edition. Significantly improved image detail and motion stability. | 480p: $0.02/second 1080p: $0.10/second | 50 seconds |
wan2.1-t2v-turbo | Wan 2.1 Turbo Edition. Fast generation speed and balanced performance. | $0.036/second | 200 seconds |
wan2.1-t2v-plus | Wan 2.1 Professional Edition. Generates rich details and higher-quality images. | $0.10/second | 200 seconds |
Mainland China (Beijing)
Model | Description | Unit price | Free quota |
wan2.5-t2v-preview | Wan 2.5 preview. Supports automatic voiceover and custom audio file input. | 480p: $0.043006/second 720p: $0.086012/second 1080p: $0.143353/second | No free quota |
wan2.2-t2v-plus | Wan 2.2 Professional Edition. Significantly improved image detail and motion stability. | 480p: $0.02007/second 1080p: $0.100347/second | No free quota |
wanx2.1-t2v-turbo | Faster generation speed and balanced performance. | $0.034405/second | No free quota |
wanx2.1-t2v-plus | Generates richer details and higher-quality images. | $0.100347/second | No free quota |
Input example | Output video (wan2.5) |
Input prompt: Shot from a low angle, in a medium close-up, with warm tones, mixed lighting (the practical light from the desk lamp blends with the overcast light from the window), side lighting, and a central composition. In a classic detective office, wooden bookshelves are filled with old case files and ashtrays. A green desk lamp illuminates a case file spread out in the center of the desk. A fox, wearing a dark brown trench coat and a light gray fedora, sits in a leather chair, its fur crimson, its tail resting lightly on the edge, its fingers slowly turning yellowed pages. Outside, a steady drizzle falls beneath a blue sky, streaking the glass with meandering streaks. It slowly raises its head, its ears twitching slightly, its amber eyes gazing directly at the camera, its mouth clearly moving as it speaks in a smooth, cynical voice: 'The case was cold, colder than a fish in winter. But every chicken has its secrets, and I, for one, intended to find them '. Input audio: |
Image-to-video - based on the first frame
The Wan image-to-video model uses an input image as the first frame of a video. It then generates the rest of the video based on a prompt. The videos feature rich artistic styles and cinematic quality.API reference | Try it online
International (Singapore)
Model | Description | Unit price | Free quota (Note) Validity: Within 90 days after you activate Alibaba Cloud Model Studio |
wan2.5-i2v-preview | Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads. | 480P: $0.05/second 720P: $0.10/second 1080P: $0.15/second | 50 seconds |
wan2.2-i2v-flash | Wan 2.2 Flash Edition. Delivers extremely fast generation speed with significant improvements in visual detail and motion stability. | 480P: $0.015/second 720P: $0.036/second | 50 seconds |
wan2.2-i2v-plus | Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability. | 480P: $0.02/second 1080P: $0.10/second | 50 seconds |
wan2.1-i2v-turbo | Wan 2.1 Turbo Edition. Fast generation speed with balanced performance. | $0.036/second | 200 seconds |
wan2.1-i2v-plus | Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals. | $0.10/second | 200 seconds |
Mainland China (Beijing)
Model | Description | Unit price | Free quota |
wan2.5-i2v-preview | Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads. | 480P: $0.043006/second 720P: $0.086012/second 1080P: $0.143353/second | No free quota |
wan2.2-i2v-plus | Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability. | 480P: $0.02007/second 1080P: $0.100347/second | No free quota |
wanx2.1-i2v-turbo | Wan 2.1 Turbo Edition. Fast generation speed with balanced performance. | $0.034405/second | No free quota |
wanx2.1-i2v-plus | Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals. | $0.100347/second | No free quota |
Input first frame image and audio | Output video (wan2.5) |
Input audio: | |
Input prompt: A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of the boy's rap, with no other dialogue or noise. | |
Image-to-video - based on the first and last frames
The Wan first-and-last-frame video model generates a smooth, dynamic video from a prompt. You only need to provide the first and last frame images. The videos feature rich artistic styles and cinematic quality. API reference | Try it online
International (Singapore)
Model | Unit price | Free quota (Note) |
wan2.1-kf2v-plus | $0.10/second | 200 seconds Validity period: Within 90 days after you activate Model Studio |
Mainland China (Beijing)
Model | Unit price | Free quota (Note) |
wanx2.1-kf2v-plus | $0.100347/second | No free quota |
Example input | Output video | ||
First frame | Last frame | Prompt | |
|
| In a realistic style, the camera starts at eye level with a small black cat looking up at the sky with curiosity, then gradually moves upward, ending in a top-down shot focused on the cat's curious eyes. | |
General video editing
The Wan unified video editing model supports multimodal inputs, including text, images, and videos. It can perform video generation and general editing tasks. API reference | Try it online
International (Singapore)
Model | Unit price | Free quota (Note) |
wan2.1-vace-plus | $0.1/s | 50 seconds Validity: Valid for 90 days after Model Studio activation. |
Mainland China (Beijing)
Model | Unit price | Free quota (Note) |
wanx2.1-vace-plus | $0.100347/s | No free quota |
The unified video editing model supports the following features:
Feature | Input reference image | Input prompt | Output video |
Multi-image reference | Reference image 1 (reference entity)
Reference image 2 (reference background)
| In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When the girl stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records the girl's wonderful encounter with nature. | Output video |
Video repainting | The video shows a black steampunk-style car driven by a gentleman. The car is decorated with gears and copper pipes. The background features a steam-powered candy factory and retro elements, creating a vintage and playful scene. | ||
Local editing | Input video Input mask image (The white area indicates the editing area)
| The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, taking a gentle sip with a relaxed expression. The cafe is tastefully decorated, with soft hues and warm lighting illuminating the area where the lion is. | The content in the editing area is modified based on the prompt. |
Video extension | Input first clip (1 second) | A dog wearing sunglasses is skateboarding on the street, 3D cartoon. | Output extended video (5 seconds) |
Video outpainting | An elegant lady is passionately playing the violin, with a full symphony orchestra behind her. |
Wan - digital human
This feature generates natural-looking videos of people speaking, singing, or performing, based on a single character image and an audio file. To use this feature, call the following models in sequence. wan2.2-s2v image detection | wan2.2-s2v video generation
This feature is only available in the China (Beijing) region.
Model | Description | Unit price |
wan2.2-s2v-detect | Checks whether an input image meets requirements, such as definition, a single person, and a frontal view. | $0.000574/image |
wan2.2-s2v | Generates a dynamic video of a person from a valid image and an audio clip. | 480P: $0.071677/second 720P: $0.129018/second |
Sample input | Output video |
Input audio: |
Wan - animate image
Available in standard and professional modes. The model transfers the actions and expressions from a reference video to a character image, generating a video that animates the character from the image. API reference.
International (Singapore)
Model | Service | Description | Unit price | Free quota (View) |
wan2.2-animate-move | Standard mode | Fast generation speed. Meets basic needs such as simple animation demos. Cost-effective. | $0.12/second | The two services share 50 seconds |
Professional mode | High animation smoothness. Natural transitions for actions and expressions. The result is similar to a live-action video. | $0.18/second |
Mainland China (Beijing)
Model | Service | Description | Unit price | Free quota (View) |
wan2.2-animate-move | Standard mode | Fast generation speed. Meets basic needs such as simple animation demos. Cost-effective. | $0.06/second | No free quota |
Professional mode | High animation smoothness. Natural transitions for actions and expressions. The result is similar to a live-action video. | $0.09/second |
Character image | Reference video | Output video (standard) | Output video (professional) |
|
Wan - video face swap
Available in standard and professional modes. The model replaces the main character in a video with a character from an image. It preserves the original video's scene, lighting, and hue. API reference.
International (Singapore)
Model | Service | Description | Unit price | Free quota (View) |
wan2.2-animate-mix | Standard mode | Generates animations quickly. Ideal for basic requirements, such as simple demos. Highly cost-effective. | $0.18/s | The two services share 50 seconds |
Professional mode | Produces highly smooth animations with natural transitions for actions and expressions. The result closely resembles a live-action video. | $0.26/s |
Mainland China (Beijing)
Model | Service | Description | Unit price | Free quota (View) |
wan2.2-animate-mix | Standard mode | Generates animations quickly. Ideal for basic requirements, such as simple demos. Highly cost-effective. | $0.09/s | No free quota |
Professional mode | Produces highly smooth animations with natural transitions for actions and expressions. The result closely resembles a live-action video. | $0.13/s |
Character image | Reference video | Output video (standard) | Output video (professional) |
|
AnimateAnyone
This feature generates character motion videos based on a character image and a motion template. To use this feature, call the following three models in sequence. AnimateAnyone image detection API details | AnimateAnyone motion template generation | AnimateAnyone video generation API details
This feature is only available in the China (Beijing) region.
Model | Description | Unit price |
animate-anyone-detect-gen2 | Detects whether an input image meets the required specifications. | $0.000574/image |
animate-anyone-template-gen2 | Extracts character motion from a video and generates a motion template. | $0.011469/second |
animate-anyone-gen2 | Generates a character motion video based on a character image and a motion template. |
Input: Character image | Input: Motion video | Output (image background) | Output (video background) |
|
The preceding example was generated by the Tongyi App, which integrates AnimateAnyone.
The content generated by the AnimateAnyone model is video only and does not include audio.
EMO
This feature generates dynamic portrait videos based on a portrait image and a human voice audio file. To use this feature, call the following models in sequence. EMO image detection | EMO video generation
This feature is only available in the China (Beijing) region.
Model | Description | Unit price |
emo-detect-v1 | Detects if an input image meets the required specifications. This model can be called directly without deployment. | $0.000574/image |
emo-v1 | Generates a dynamic portrait video. This model can be called directly without deployment. |
|
Input: Portrait image and human voice audio file | Output: Dynamic portrait video |
Portrait:
Human voice audio: See the video on the right. | Character video: Action style intensity: active ("style_level": "active") |
LivePortrait
This is a lightweight model that quickly generates dynamic portrait videos based on a portrait image and a human voice audio file. Compared to the EMO model, it generates videos faster and at a lower cost, but the quality is not as good. To use this feature, call the following two models in sequence. LivePortrait image detection | LivePortrait video generation
This feature is only available in the China (Beijing) region.
Model | Description | Unit price |
liveportrait-detect | Detects whether the input image meets the requirements. | $0.000574/image |
liveportrait | Generates a dynamic portrait video. | $0.002868/second |
Input: Portrait image and voice audio file | Output: Animated portrait video |
Portrait image:
Voice audio: From the video on the right. | Portrait video: |
Emoji
This feature generates dynamic face videos based on a face image and preset facial motion templates. This capability can be used for scenarios such as creating emojis and generating video materials. To use this feature, call the following models in sequence. Emoji image detection | Emoji video generation
This feature is only available in the China (Beijing) region.
Model | Description | Unit price |
emoji-detect-v1 | Detects whether an input image meets the specified requirements. | $0.000574/image |
emoji-v1 | Generates a character emoji based on a portrait image and a specified emoji template. | $0.011469/second |
Input: Portrait image | Output: Dynamic portrait video |
| Template parameter for the "happy" emoji: ("input.driven_id": "mengwa_kaixin") |
VideoRetalk
This feature generates a video where the character's lip movements match the input audio, based on a character video and a human voice audio file. To use this feature, call the following model. API reference
This feature is only available in the China (Beijing) region.
Model | Description | Unit price |
videoretalk | Generates a new video where the character's lip movements are synchronized with the input audio. | $0.011469/second |
Video style transfer
This model supports generating videos in different styles that match the semantic description of user-input text, or restyling a user-input video. API reference
This feature is only available in the China (Beijing) region.
Model | Description | Unit price | |
video-style-transform | Transforms an input video into styles such as Japanese manga or American comics. | 720p | $0.071677/second |
540p | $0.028671/second | ||
Input video | Output video (Japanese manga style) |
Speech synthesis (text-to-speech)
Qwen-TTS
Qwen-TTS is a speech synthesis model from the Qwen series. It supports Chinese, English, and mixed Chinese-English text input, and streams audio output. Usage | API reference
International (Singapore)
Model | Version | Unit price | Max input characters | Supported languages | Free quota (Note) |
qwen3-tts-flash Its capabilities are the same as qwen3-tts-flash-2025-09-18 | Stable | $0.1/10,000 characters | 600 | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin), Cantonese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | 2,000 characters for each Validity: Within 90 days after you activate Model Studio |
qwen3-tts-flash-2025-09-18 | Snapshot |
Qwen3-TTS is billed based on the number of input characters. The billing rules are as follows:
1 Chinese character = 2 characters
1 English letter, 1 punctuation mark, or 1 space = 1 character
Mainland China (Beijing)
Qwen3-TTS
Model | Version | Unit price | Max input characters | Supported languages |
qwen3-tts-flash Its capabilities are the same as qwen3-tts-flash-2025-09-18 | Stable | $0.114682/10,000 characters | 600 | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin), Cantonese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese |
qwen3-tts-flash-2025-09-18 | Snapshot |
Qwen3-TTS is billed based on the number of input characters. The billing rules are as follows:
1 Chinese character = 2 characters
1 English letter, 1 punctuation mark, or 1 space = 1 character
Qwen-TTS
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Thousand tokens) | |||||
qwen-tts Its capabilities are the same as qwen-tts-2025-04-10 | Stable | 8,192 | 512 | 7,680 | $0.230 | $1.434 |
qwen-tts-latest Its capabilities are always the same as the latest snapshot version | Latest | |||||
qwen-tts-2025-05-22 | Snapshot | |||||
qwen-tts-2025-04-10 | ||||||
Audio is converted to tokens at a rate of 50 tokens per second. Audio shorter than 1 second is counted as 50 tokens.
Qwen-TTS-Realtime
Based on Qwen-TTS, this model supports streaming text input and can adapt its speech rate based on text content and punctuation. It supports Chinese, English, and mixed Chinese-English text input, and streams audio output. Usage
International (Singapore)
Model | Version | Unit price | Supported languages | Free quota (Note) |
qwen3-tts-flash-realtime Current capabilities are equivalent to qwen3-tts-flash-realtime-2025-09-18 | Stable | $0.13 per 10,000 characters | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin), Cantonese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | 2,000 characters for each Validity: Within 90 days after you activate Model Studio |
qwen3-tts-flash-realtime-2025-09-18 | Snapshot |
Qwen3-TTS is billed based on the number of input characters. The billing rules are as follows:
1 Chinese character = 2 characters
1 English letter, 1 punctuation mark, or 1 space = 1 character
Mainland China (Beijing)
Qwen3-TTS Realtime
Model | Version | Unit price | Supported languages |
qwen3-tts-flash-realtime Current capabilities are equivalent to qwen3-tts-flash-realtime-2025-09-18 | Stable | $0.143353 per 10,000 characters | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin), Cantonese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese |
qwen3-tts-flash-realtime-2025-09-18 | Snapshot |
Qwen3-TTS is billed based on the number of input characters. The billing rules are as follows:
1 Chinese character = 2 characters
1 English letter, 1 punctuation mark, or 1 space = 1 character
Qwen-TTS Realtime
Model | Version | Context window | Max input | Max output | Input cost | Output cost | Supported languages |
(Tokens) | (Thousand tokens) | ||||||
qwen-tts-realtime Current capabilities are equivalent to qwen-tts-realtime-2025-07-15 | Stable | 8,192 | 512 | 7,680 | $0.345 | $1.721 | Chinese, English |
qwen-tts-realtime-latest Current capabilities are equivalent to qwen-tts-realtime-2025-07-15 | Latest | Chinese, English | |||||
qwen-tts-realtime-2025-07-15 | Snapshot | Chinese, English | |||||
Audio-to-token conversion rule: 1 second of audio corresponds to 50 tokens. Audio shorter than 1 second is counted as 50 tokens.
CosyVoice
CosyVoice is a new-generation, large-scale generative speech synthesis model from Qwen Lab. It integrates text understanding and speech generation based on large-scale, pre-trained language models and supports real-time, streaming text-to-speech synthesis. Usage | Try it online | Voice list
This feature is only available in the China (Beijing) region.
Model | Price |
cosyvoice-v2 | $0.286706 per 10,000 characters |
Each Chinese character counts as two characters. Each English letter, punctuation mark, and space counts as one character.
Speech recognition and translation (speech-to-text)
Qwen3-LiveTranslate-Flash-Realtime
qwen3-livetranslate-flash-realtime is a multilingual, real-time audio and video translation model. It recognizes 18 languages and translates them into audio in 10 languages in real time.
Core features:
Multilingual support: Supports 18 languages, such as Chinese, English, French, German, Russian, Japanese, and Korean, and 6 Chinese dialects, such as Mandarin, Cantonese, and Sichuanese.
Visual enhancement: Improves translation accuracy using visual content. The model analyzes lip movements, actions, and on-screen text to improve translation in noisy environments or for words with multiple meanings.
Low latency: Achieves a simultaneous interpretation latency as low as 3 seconds.
Lossless simultaneous interpretation: Uses semantic unit prediction technology to resolve cross-language word order issues. This ensures that the quality of real-time translation is nearly identical to that of offline translation.
Natural voice: Generates human-like speech with a natural voice. The model adapts its tone and emotion based on the source audio content.
This model is available only in the Singapore region.
Model | Version | Context window | Max input | Max output | Free quota |
(Tokens) | |||||
qwen3-livetranslate-flash-realtime Its capabilities are equivalent to qwen3-livetranslate-flash-realtime-2025-09-22. | Stable | 53,248 | 49,152 | 4,096 | 1 million tokens This is valid for 90 days after you activate Model Studio. |
qwen3-livetranslate-flash-realtime-2025-09-22 | Snapshot | ||||
After the free quota is exhausted, inputs and outputs are billed as follows:
|
|
Qwen-ASR
Built on the Qwen multi-modal base model, this model supports features such as multilingual recognition, singing recognition, and noise rejection.Usage
International (Singapore)
Model | Version | Supported languages | Supported sample rates | Unit price | Free quota (Note) |
qwen3-asr-flash Currently an alias for qwen3-asr-flash-2025-09-08 | Stable version | Chinese (including Mandarin, Sichuanese, Minnan, Wu, and Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, and Spanish | 16 kHz | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days after you activate Alibaba Cloud Model Studio |
qwen3-asr-flash-2025-09-08 | Snapshot version |
Mainland China (Beijing)
Model | Version | Supported languages | Supported sample rates | Unit price |
qwen3-asr-flash Alias for qwen3-asr-flash-2025-09-08 | Stable version | Chinese (Mandarin, Sichuanese, Minnan, Wu, and Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, and Spanish | 16 kHz | $0.000032/second |
qwen3-asr-flash-2025-09-08 | Snapshot version |
Qwen-ASR-Realtime
The Qwen real-time speech recognition model features automatic language detection. It can detect 11 languages and accurately transcribes audio in complex environments. Usage | API reference
International (Singapore)
Model | Version | Supported languages | Supported sample rates | Unit price | Free quota (Note) |
qwen3-asr-flash-realtime Currently equivalent to qwen3-asr-flash-realtime-2025-10-27 | Stable | Chinese (Mandarin, Sichuanese, Minnan, Wu), Cantonese, English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish | 8 kHz, 16 kHz | $0.000090/second | 36,000 seconds (10 hours) Validity: Within 90 days after you activate Model Studio |
qwen3-asr-flash-realtime-2025-10-27 | Snapshot |
Mainland China (Beijing)
Model | Version | Supported languages | Supported sample rates | Unit price |
qwen3-asr-flash-realtime Currently equivalent to qwen3-asr-flash-realtime-2025-10-27 | Stable | Chinese (Mandarin, Sichuanese, Minnan, Wu), Cantonese, English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish | 8 kHz, 16 kHz | $0.000047/second |
qwen3-asr-flash-realtime-2025-10-27 | Snapshot |
Paraformer
Paraformer is a speech recognition model from Tongyi Lab. It offers two versions: audio file recognition and real-time speech recognition.
Audio file recognition
This feature is available only in the Mainland China (Beijing) region.
Model | Supported languages | Supported sample rates | Scenarios | Supported audio formats | Unit price |
paraformer-v2 | Chinese (Mandarin), Chinese dialects (Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghainese), English, Japanese, Korean, German, French, Russian | Any | ApsaraVideo Live | aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv | $0.000012/second |
paraformer-8k-v2 | Chinese (Mandarin) | 8 kHz | Phone calls |
Real-time speech recognition
This feature is available only in the Mainland China (Beijing) region.
Model | Supported languages | Supported sample rates | Scenarios | Supported audio formats | Unit price |
paraformer-realtime-v2 | Chinese (Mandarin), Chinese dialects (Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghainese), English, Japanese, Korean, German, French, Russian Supports switching between multiple languages. | Any | ApsaraVideo Live, conferences, and more | pcm, wav, mp3, opus, speex, aac, amr | $0.000035/second |
paraformer-realtime-8k-v2 | 8 kHz | Call centers and more |
Fun-ASR
Fun-ASR is a speech recognition model from Tongyi Bailin. It offers two versions: audio file recognition and real-time speech recognition.
Audio file recognition
International (Singapore)
Model | Version | Supported languages | Supported sample rates | Scenarios | Supported audio formats | Unit price | Free quota (Note) |
fun-asr Currently equivalent to fun-asr-2025-08-25 | Stable | Chinese, English | Any | ApsaraVideo Live, phone calls, conference interpretation, and more | aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv | $0.000035/second | 36,000 seconds (10 hours) Validity: 90 days |
fun-asr-2025-08-25 | Snapshot |
Mainland China (Beijing)
Model | Version | Supported languages | Supported sample rates | Scenarios | Supported audio formats | Unit price |
fun-asr Currently equivalent to fun-asr-2025-08-25 | Stable | Chinese, English | Any | ApsaraVideo Live, phone calls, conference interpretation, and more | aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv | $0.000032/second |
fun-asr-2025-08-25 | Snapshot | |||||
fun-asr-mtl Currently equivalent to fun-asr-mtl-2025-08-25 | Stable | Chinese, Cantonese, English, Japanese, Thai, Vietnamese, Indonesian | ||||
fun-asr-mtl-2025-08-25 | Snapshot |
Real-time speech recognition
This feature is available only in the Mainland China (Beijing) region.
Model | Version | Supported languages | Supported sample rates | Scenarios | Supported audio formats | Unit price |
fun-asr-realtime Currently equivalent to fun-asr-realtime-2025-09-15 | Stable | Chinese, English | 16 kHz | ApsaraVideo Live, conferences, call centers, and more | pcm, wav, mp3, opus, speex, aac, amr | $0.000047/second |
fun-asr-realtime-2025-09-15 | Snapshot |
Text embedding
Text embedding models convert text into numerical representations for tasks such as search, clustering, recommendation, and classification. Billing for these models is based on the number of input tokens. API reference
International (Singapore)
Model | Embedding dimension | Batch size | Maximum tokens per row | Supported languages | Price (Million input tokens) | Free Quota |
text-embedding-v4 This post is part of the Qwen3-Embedding series. | 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64 | 10 | 8,192 | More than 100 languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, along with various programming languages | $0.07 | 1,000,000 tokens Valid for 90 days after you activate Model Studio. |
text-embedding-v3 | 1,024 (default), 768, or 512 | 10 | 8,192 | Over 50 languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian | 500,000 tokens Valid for 90 days after you activate Model Studio. |
Mainland China (Beijing)
Model | Embedding dimension | Batch size | Maximum tokens per row | Supported languages | Price (Million input tokens) | Free quota |
text-embedding-v4 This post is part of the Qwen3-Embedding series. | 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64 | 10 | 8,192 | Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, over 100 other major languages, and various programming languages | $0.072 | No free quota |
Multimodal embedding
The multimodal embedding model converts data such as text, images, and videos into a vector of floating-point numbers. This model enables applications such as video classification, image classification, and image-text retrieval. API reference
International (Singapore)
Model | Data format | Embedding dimension | Unit price (Million input tokens) | Free quota (View) |
tongyi-embedding-vision-plus | float(32) | 1,152 | $0.09 | 1,000,000 tokens Valid for 90 days after you activate Model Studio. |
tongyi-embedding-vision-flash | float(32) | 768 | Image/Video: $0.03 Text: $0.09 |
Mainland China (Beijing)
Model | Data type | Embedding dimensions | Unit price (1,000 input tokens) | Free quota (Note) |
multimodal-embedding-v1 | float(32) | 1,024 | Free trial | No token quota limit |
Text rerank
This feature is typically used for semantic retrieval. Given a query, it sorts a list of candidate documents in descending order of their semantic relevance. API reference.
This feature is only available in the China (Beijing) region.
Model | Maximum number of documents | Max input tokens per item | Max input tokens | Supported languages | Price (Million input tokens) |
gte-rerank-v2 | 500 | 4,000 | 30,000 | Over 50 languages, including Chinese, English, Japanese, Korean, Thai, Spanish, French, Portuguese, German, Indonesian, and Arabic | $0.115 |
Max input tokens per item: Each query or document is limited to 4,000 tokens. Input that exceeds this limit is truncated.
Maximum number of documents: Each request is limited to 500 documents.
Max input tokens: The total number of tokens for all queries and documents in a single request is limited to 30,000.
Domain-specific
Intent recognition
The Qwen intent recognition model can quickly and accurately parse user intents in milliseconds and select the appropriate tools to resolve user issues. API reference | Usage
This feature is only available in the China (Beijing) region.
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
tongyi-intent-detect-v3 | 8,192 | 8,192 | 1,024 | $0.058 | $0.144 |
Role-playing
Qwen's role-playing model is ideal for scenarios that require human-like conversation, such as virtual social interactions, game NPCs, IP character replication, hardware, toys, and in-vehicle systems. This model offers enhanced capabilities in character fidelity, conversation progression, and empathetic listening compared to other Qwen models. Usage
International (Singapore)
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
qwen-plus-character-ja | 8,192 | 7,680 | 512 | $0.5 | $1.4 |
Mainland China (Beijing)
Model | Context window | Max input | Max output | Input cost | Output cost |
(Tokens) | (Million tokens) | ||||
qwen-plus-character | 32,768 | 32,000 | 4,096 | $0.115 | $0.287 |


























































