Alibaba Cloud Model Studio offers a wide variety of models. This topic describes all supported models in Model Studio.
Flagship models
Flagship models |
Best inference performance |
Balanced performance, speed and cost |
Fast speed and low cost |
Maximum context window (Tokens) | 32,768 | 131,072 | 1,008,192 |
Minimum input price (Million tokens) | $1.6 | $0.4 | $0.05 |
Minimum output price (Million tokens) | $6.4 | $1.2 | $0.2 |
Model overview
Category | Model | Description |
Text generation | ||
Image generation | Generate beautiful images using a single sentence. | |
Video generation | Generates video based on a single sentence, showcasing a wide range of artistic styles and cinematic-quality visuals | |
| ||
Embedding | Converts text into numerical representations, suitable for search, clustering, recommendation, and classification tasks. |
Text generation-Qwen
The commercial models of the Qwen series, boasts the latest capabilities and enhancements over its open source counterpart.
QwQ
QwQ reasoning model, trained based on Qwen2.5, has made significant improvements in reasoning capabilities by reinforcement learning. Its performance against core mathematic and coding metrics (AIME 24/25, LiveCodeBench) and general metrics (IFEval, LiveBench, etc.) have reached the level of DeepSeek-R1. Usage instructions
Name | Version | Context window | Maximum input | Maximum CoT | Maximum response | Input price | Output price | Free quota |
(Tokens) | (Million tokens) | |||||||
qwq-plus | Stable | 131,072 | 98,304 | 32,768 | 8,192 | $0.8 | $2.4 | 1 million tokens Valid for 180 days after activation |
Qwen-Max
Qwen-Max provides the best inference performance among Qwen models, especially for complex and multi-step tasks. Usage instructions | API reference | Try online
Name | Version | Context window | Maximum input | Maximum output | Input price | Output price | Free quota |
(Tokens) | (Million tokens) | ||||||
qwen-max Same performance as qwen-max-2025-01-25 | Stable | 32,768 | 30,720 | 8,192 | $1.6 Batch: Half price | $6.4 Batch: Half price | 1 million tokens each Valid for 180 days after activation |
qwen-max-latest Always same performance as the latest snapshot | Latest | $1.6 | $6.4 | ||||
qwen-max-2025-01-25 Also qwen-max-0125 or Qwen2.5-Max | Snapshot |
Qwen-Plus
Qwen-Plus provides a balanced combination of performance, speed, and cost, ideal for moderately complex tasks. Usage instructions | API reference | Try online | Deep thinking
Name | Version | Context window | Maximum input | Maximum output | Input price | Output price | Free quota |
(Tokens) | (Million tokens) | ||||||
qwen-plus Same performance as qwen-plus-2025-01-25 | Stable | 131,072 | 129,024 | 8,192 | $0.4 Batch: Half price | $1.2 Batch: Half price | 1 million tokens each Valid for 180 days after activation |
qwen-plus-latest Always same performance as the latest snapshot | Latest | 16,384 CoT: 38,912 | $0.4 | Thinking: $8 Non-Thinking: $1.2 | |||
qwen-plus-2025-04-28 Also qwen-plus-0428 Qwen3 series | Snapshot | ||||||
qwen-plus-2025-01-25 Also qwen-plus-0125 | Snapshot | 8,192 | $1.2 |
The latest qwen-plus-2025-04-28 model is capable of responding in both thinking and non-thinking modes, allowing you to switch between the two using the enable_thinking
parameter. In addition to this, the model's capabilities have been significantly enhanced:
Reasoning capability: The model has significantly outperformed QwQ and non-reasoning models of the same size in evaluations of mathematics, coding, and logical reasoning, reaching SOTA performance at its size.
Human preference following: Its abilities in creative writing, role-playing, multi-turn conversation, and instruction following have greatly improved, surpassing general capabilities of models of similar size.
Agent capability: The model achieves industry-leading levels in both thinking and non-thinking modes, enabling precise external tool invocation.
Multilingual capability: The model supports over 100 languages and dialects, with marked improvements in multilingual translation, instruction comprehension, and common sense reasoning abilities.
Response format fixes: Previous issues with response formats in earlier versions, such as anomalous Markdown, mid-text truncation, and incorrect boxed outputs, have been fixed.
Qwen-Turbo
Qwen-Turbo provides fast speed and low cost, suitable for simple tasks. Usage instructions | API reference | Try online | Deep thinking
Name | Version | Context window | Maximum input | Maximum output | Input price | Output price | Free quota |
(Tokens) | (Million tokens) | ||||||
qwen-turbo Same performance as qwen-turbo-2024-11-01 | Stable | 1,008,192 | 1,000,000 | 8,192 | $0.05 Batch: Half price | $0.2 Batch: Half price | 1 million tokens each Valid for 180 days after activation |
qwen-turbo-latest Always same performance as the latest snapshot | Latest | Non-Thinking 1,000,000 Thinking 131,072 | Thinking 129,024 Non-Thinking 1,000,000 | 16,384 CoT: 38,912 | $0.05 | Thinking: $1 Non-Thinking: $0.2 | |
qwen-turbo-2025-04-28 Also qwen-turbo-0428 Qwen3 series | Snapshot | ||||||
qwen-turbo-2024-11-01 Also qwen-turbo-1101 | 1,008,192 | 1,000,000 | 8,192 | $0.2 |
The latest qwen-turbo-2025-04-28 model is capable of responding in both thinking and non-thinking modes, allowing you to switch between the two using the enable_thinking
parameter. In addition to this, the model's capabilities have been significantly enhanced:
Reasoning capability: The model has significantly outperformed QwQ and non-reasoning models of the same size in evaluations of mathematics, coding, and logical reasoning, reaching SOTA performance at its size.
Human preference following: Its abilities in creative writing, role-playing, multi-turn conversation, and instruction following have greatly improved, surpassing general capabilities of models of similar size.
Agent capability: The model achieves industry-leading levels in both thinking and non-thinking modes, enabling precise external tool invocation.
Multilingual capability: The model supports over 100 languages and dialects, with marked improvements in multilingual translation, instruction comprehension, and common sense reasoning abilities.
Response format fixes: Previous issues with response formats in earlier versions, such as anomalous Markdown, mid-text truncation, and incorrect boxed outputs, have been fixed.
QVQ
QVQ is a visual reasoning model that supports visual input and chain-of-thought output. It shows stronger capabilities in mathematics, coding, visual analysis, creativity, and general tasks. Usage instructions
Name | Version | Context window | Maximum input | Maximum CoT | Maximum response | Input price | Output price | Free quota |
(Tokens) | (Million tokens) | |||||||
qvq-max Same performance as qvq-max-2025-03-25 | Stable | 131,072 | 106,496 Up to 16,384 per image | 16,384 | 8,192 | Time-limited free trial After the free quota runs out, you cannot access this model. Please stay tuned for updates. | 1 million tokens each Valid for 180 days after activation | |
qvq-max-latest Always same performance as the latest snapshot | Latest | |||||||
qvq-max-2025-03-25 Also qvq-max-0325 | Snapshot |
Qwen-VL
Qwen-VL is a text generation model that can understand and process images. The model performs OCR operations and provides further functionalities, such as summarizing and reasoning. For example, it can extract product attributes from photos, and solving problems from images. Usage instructions | API reference | Try online
Qwen-VL is billed based on the total number of input and output tokens.
Image token calculation rule: Every 28 × 28 pixels count as 1 token. Each image converts to at least 4 tokens. For more information, see Visual understanding.
Name | Version | Context window | Maximum input | Maximum output | Input price | Output price | Free quota |
(Tokens) | (Million tokens) | ||||||
qwen-vl-max Enhanced capabilities of visual reasoning and instruction following compared with qwen-vl-plus. Best for complex tasks. Same performance as qwen-vl-max-2025-04-08 | Stable | 131,072 | 129,024 Up to 16,384 per image | 8,192 | $0.8 | $3.2 | 1 million tokens each Valid for 180 days after activation |
qwen-vl-max-latest Always same performance as the latest snapshot | Latest | ||||||
qwen-vl-max-2025-04-08 Also qwen-vl-max-0408 Qwen2.5-VL series, with 128,000 context window and enhanced mathematics and reasoning capabilities. | Snapshot | ||||||
qwen-vl-plus Enhanced detail and text recognition capabilities, supporting images with over one million pixel resolution and any aspect ratio. Exceptional performance for various visual tasks. Same performance as qwen-vl-plus-2025-01-25 | Stable | 131,072 | 129,024 Up to 16,384 per image | 8,192 | $0.21 | $0.63 | |
qwen-vl-plus-latest Always same performance as the latest snapshot | Latest | ||||||
qwen-vl-plus-2025-05-07 Also qwen-vl-plus-0507 Significantly enhanced mathematics, inference, and understanding of monitoring video content. | Snapshot | ||||||
qwen-vl-plus-2025-01-25 Also qwen-vl-plus-0125 Qwen2.5-VL series, with 128,000 context window and enhanced mathematics and reasoning capabilities. |
Text generation - Qwen - open source
In the model name, 'xxb' indicates the parameter scale. For example, 'qwen2-72b-instruct' has 72 billion parameters.
Model Studio facilitates the use of open source Qwen models without the need for local deployment. Qwen3 and Qwen2.5 are most recommended among the open source models.
Qwen3
Qwen3 is capable of responding in both thinking and non-thinking modes, allowing you to switch between the two using the enable_thinking
parameter. In addition to this, the model's capabilities have been significantly enhanced:
Reasoning capability: The model has significantly outperformed QwQ and non-reasoning models of the same size in evaluations of mathematics, coding, and logical reasoning, reaching SOTA performance at its size.
Human preference following: Its abilities in creative writing, role-playing, multi-turn conversation, and instruction following have greatly improved, surpassing general capabilities of models of similar size.
Agent capability: The model achieves industry-leading levels in both thinking and non-thinking modes, enabling precise external tool invocation.
Multilingual capability: The model supports over 100 languages and dialects, with marked improvements in multilingual translation, instruction comprehension, and common sense reasoning abilities.
Response format fixes: Previous issues with response formats in earlier versions, such as anomalous Markdown, mid-text truncation, and incorrect boxed outputs, have been fixed.
Open source Qwen3 does not support non-stream output in either thinking or non-thinking mode.
Open source Qwen3 is charged at the non-thinking price if it does not output the thinking process under the thinking mode.
Thinking mode | Non-thinking mode | Usage instructions
Name | Mode | Context window | Maximum input | Maximum CoT | Maximum response | Input price | Output price | Free quota |
(Tokens) | (Million tokens) | |||||||
qwen3-235b-a22b | Non-Thinking | 131,072 | 129,024 | - | 16,384 | $0.7 | $2.8 | 1 million tokens each Valid for 180 days after activation |
Thinking | 38,912 | $8.4 | ||||||
qwen3-32b | Non-Thinking | - | $0.7 | $2.8 | ||||
Thinking | 38,912 | $8.4 | ||||||
qwen3-30b-a3b | Non-Thinking | - | $0.2 | $0.8 | ||||
Thinking | 38,912 | $2.4 | ||||||
qwen3-14b | Non-Thinking | - | 8,192 | $0.35 | $1.4 | |||
Thinking | 38,912 | $4.2 | ||||||
qwen3-8b | Non-Thinking | - | $0.18 | $0.7 | ||||
Thinking | 38,912 | $2.1 | ||||||
qwen3-4b | Non-Thinking | - | $0.11 | $0.42 | ||||
Thinking | 38,912 | $1.26 | ||||||
qwen3-1.7b | Non-Thinking | 32,768 | 30,720 | - | $0.42 | |||
Thinking | 28,672 | 30,720 (CoT+Response) | $1.26 | |||||
qwen3-0.6b | Non-Thinking | 30,720 | - | $0.42 | ||||
Thinking | 28,672 | 30,720 (CoT+Response) | $1.26 |
Qwen2.5
Qwen2.5 is the latest series of the Qwen LLM. For Qwen2.5, we have launched a series of base and instruct models with parameter sizes ranging from 7 billion to 72 billion. Qwen2.5 has made the following improvements over Qwen2:
Qwen2.5 is pre-trained on our latest large-scale dataset containing 18 trillion tokens.
Thanks to our expert models in specific fields, Qwen2.5 has significantly increased knowledge and greatly improved coding and maths capabilities.
Qwen2.5 has shown significant improvements in following instructions, generating long texts (over 8K tokens), understanding structured data (such as tables), and generating structured outputs (especially JSON). It supports more diversified system prompts, enhancing its role-playing and conditional setting as a chatbot.
Qwen2.5 supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
Usage instructions | API reference | Try online
Name | Context window | Maximum input | Maximum output | Input price | Output price | Free quota |
(Tokens) | (Million tokens) | |||||
qwen2.5-14b-instruct-1m | 1,008,192 | 1,000,000 | 8,192 | $0.805 | $3.22 | 1 million tokens each Valid for 180 days after activation |
qwen2.5-7b-instruct-1m | $0.368 | $1.47 | ||||
qwen2.5-72b-instruct | 131,072 | 129,024 | $1.4 | $5.6 | ||
qwen2.5-32b-instruct | $0.7 | $2.8 | ||||
qwen2.5-14b-instruct | $0.35 | $1.4 | ||||
qwen2.5-7b-instruct | $0.175 | $0.7 |
Qwen2
The open-source Qwen2 models. Usage instructions | API reference | Try online
Name | Context window | Maximum input | Maximum output | Input price | Output price |
(Tokens) | (Million tokens) | ||||
qwen2-72b-instruct Deprecated | 131,072 | 128,000 | 6,144 | Time-limited free trial | |
qwen2-57b-a14b-instruct Deprecated | 65,536 | 63,488 | |||
qwen2-7b-instruct Deprecated | 131,072 | 128,000 |
Qwen1.5
The open-source Qwen1.5 models. Usage instructions | API reference | Try online
Name | Context window | Maximum input | Maximum output | Input price | Output price |
(Tokens) | (Million tokens) | ||||
qwen1.5-110b-chat Deprecated | 8,000 | 6,000 | 2,000 | Time-limited free trial | |
qwen1.5-72b-chat Deprecated | |||||
qwen1.5-32b-chat Deprecated | |||||
qwen1.5-14b-chat Deprecated | |||||
qwen1.5-7b-chat Deprecated |
Qwen-Omni
Qwen-Omni is a omni-modal understanding and generation model trained on Qwen2.5. It can understand text, image, audio, and video swiftly. It can also generate text and voice simultaneously in stream. Usage instructions | API reference
Name | Context window | Maximum input | Maximum output | Free quota |
(Tokens) | ||||
qwen2.5-omni-7b | 32,768 | 30,720 | 2,048 | 1 million tokens (regardless of modality) Valid for 180 days after activation |
After the free quota runs out, you cannot access qwen2.5-omni-7b. Please stay tuned for updates.
Qwen-VL - open source
The open-source version of Qwen-VL. Usage instructions | API reference
Qwen2.5-VL has made the following improvements over Qwen2-VL:
Richer perception of the world: Qwen2.5-VL is good at recognizing common objects such as flowers, birds, fish, and insects, as well as analyzing text, charts, icons, graphics, and layouts within images.
Long video understanding: Qwen2.5-VL can understand videos of up to 10 minutes. It can also pinpoint video segments to capture events.
Visual locating: Qwen2.5-VL can accurately locate objects in images by generating bounding boxes (coordinates for the top-left and bottom-right corners) or points (coordinates for the center of the bounding box). It can provide stable JSON outputs for these coordinates.
Structured output: Qwen2.5-VL supports structured output for data such as invoices, forms, and tables, suitable in finance, business, among other scenarios.
Name | Context window | Maximum input | Maximum output | Input price | Output price | Free quota |
(Tokens) | (Million tokens) | |||||
qwen2.5-vl-72b-instruct | 131,072 | 129,024 Maximum 16384 per image | 8,192 | $2.8 | $8.4 | 100 million tokens each Valid for 180 days after activation |
qwen2.5-vl-32b-instruct | $1.4 | $4.2 | ||||
qwen2.5-vl-7b-instruct | $0.35 | $1.05 | ||||
qwen2.5-vl-3b-instruct | $0.21 | $0.63 |
Image generation - Wan
Text-to-image
Wan Text-to-image can generate beautiful images based on text prompts. API reference | Try online
Name | Description | Unit price | Free quota |
wan2.1-t2i-turbo | Fast generation speed, comprehensive effects, and high cost-effectiveness. | $0.025/image | 200 images each Valid for 180 days after activation |
wan2.1-t2i-plus | Richer details but slower speed. | $0.05/image |
Input prompt | Output image |
A needle-felted Santa Claus holding a gift, with a white cat standing next to him, and many colorful gifts in the background. The entire scenario should be cute, warm, and cozy, with some green plants in the background. |
Video generation - Wan
Text-to-video
Wan Text-to-video can generate a video based on a single sentence, showcasing a wide range of artistic styles and cinematic-quality visuals. API reference | Try online
Name | Description | Unit price | Free quota |
wan2.1-t2v-turbo | Faster generation with balanced performance. | $0.036/second | 200 seconds for each Valid for 180 days after activation |
wan2.1-t2v-plus | Richer details and more textured visuals. | $0.10/second |
Sample input | Output video |
Prompt: A kitten running in the moonlight |
Image-to-video: first frame
Wan Image-to-video takes an input image as the first frame, then generates the subsequent video content based on a prompt. The resulting video features a wide range of artistic styles and cinematic-quality visuals. API reference | Try online
Name | Description | Unit price | Free quota |
wan2.1-i2v-turbo | Faster generation, taking only one-third of the time of the Plus model, offering better cost-effectiveness. | $0.036/second | 200 seconds for each Valid for 180 days after activation |
wan2.1-i2v-plus | Rich details and enhanced texture. | $0.10/second |
Sample input | Output video |
Prompt: A cat running on the grass. Input image: | Output video: Takes the input image as the first frame, then generates the subsequent video content based on a prompt. Model: wanx2.1-i2v-turbo |
Image-to-video: first and last frame
Wan Image-to-video can generate a smooth and fluid dynamic video based on the first and last frame images along with a prompt. The video showcases a wide range of artistic styles and cinematic-quality visuals. API reference | Try online
Name | Unit price | Free quota |
wan2.1-kf2v-plus | $0.10/second | 200 seconds Valid for 180 days after activation |
Sample input | Output video | ||
First frame | Last frame | Prompt | |
Realistic style, a black kitten curiously looking at the sky, the camera gradually rises from eye level, finally looking down at the kitten's curious eyes. |
All-in-one video editing
Wan All-in-one Video Creation and Editing (Wan VACE) supports multiple input modalities including text, image, and video, and can perform various video generation and editing tasks. API reference | Try online
Name | Unit price | Free quota |
wan2.1-vace-plus | $0.1/second | 50 seconds Validity period: 180 days after activation |
Current features:
Multi-image reference
Reference images | Prompt | Output video |
Image 1 (subject) Image 2 (background) | In the video, a girl gracefully emerges from the depths of an ancient, misty forest. Her steps are light, and the camera captures her every fluid moment. When the girl stands still and looks around at the lush trees, her face lights up with a smile that blends surprise and joy. This moment, frozen in the interplay of light and shadow, captures the girl's wonderful encounter with nature. |
Video repainting (generating video based on the motion outline of the input video)
Input video | Prompt | Output video |
The video shows a black steampunk-style car driven by a gentleman, decorated with gears and copper pipes. The background is a steam-powered candy factory with vintage elements, creating a retro and playful scene. |
Text embedding
Converts text into numerical representations, suitable for search, clustering, recommendation, and classification tasks. Billed based on the number of input tokens. API reference
Name | Vector dimensions | Maximum rows | Maximum tokens per row | Supported languages | Unit price (Million input tokens) | Free quota |
text-embedding-v3 | 1,024 (default), 768 or 512 | 10 | 8,192 | Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and more than 50 other languages | $0.07 | 500,000 tokens Valid for 180 days after activation |