All Products
Search
Document Center

Alibaba Cloud Model Studio:Models

Last Updated:Nov 13, 2025

Flagship models

International (Singapore)

Flagship models

通义new Qwen-Max

Ideal for complex tasks. The most powerful model.

通义new Qwen-Plus

A balance of performance, speed, and cost.

通义new Qwen-Flash

Ideal for simple jobs. Fast and low-cost.

通义new Qwen-Coder

An excellent code model that excels at tool calling and environment interaction.

Maximum context window

(Tokens)

262,144

1,000,000

1,000,000

1,000,000

Minimum input price

(Million tokens)

$1.2

$0.4

$0.05

$0.3

Minimum output price

(Million tokens)

$6

$1.2

$0.4

$1.5

Chinese Mainland (Beijing)

Flagship models

通义new Qwen-Max

Ideal for complex tasks. The most powerful model.

通义new Qwen-Plus

A balance of performance, speed, and cost.

通义new Qwen-Flash

Ideal for simple jobs. Fast and low-cost.

通义new Qwen-Coder

An excellent code model that excels at tool calling and environment interaction.

Maximum context window

(Tokens)

262,144

1,000,000

1,000,000

1,000,000

Minimum input price

(Million tokens)

$0.459

$0.115

$0.022

$0.144

Minimum output price

(Million tokens)

$1.836

$0.287

$0.216

$0.574

Model overview

International (Singapore)

Category

Subcategory

Description

Text generation

General-purpose large language models

Qwen large language models: Commercial models (Qwen-Max, Qwen-Plus, Qwen-Flash), open-source models (Qwen3, Qwen2.5)

Multimodal models

Visual understanding model Qwen-VL, visual reasoning model QVQ, omni-modal model Qwen-Omni, and real-time multi-modal model Qwen-Omni-Realtime

Domain-specific models

Coder model, Translation model, Role-playing model

Image generation

Text-to-image

Image editing

  • Qwen image editing: Supports Chinese and English prompts and lets you perform complex image and text editing operations such as style transfer, text modification, and object editing.

  • Wan image editing: Generates or edits images. This feature is suitable for creating ID photos, e-commerce main images, model photos, and portraits in various styles (such as cartoon, Chinese style, and anime). It can also be used for image matting, generating backgrounds, modifying image elements, and more.

Speech synthesis and recognition

Speech synthesis (text-to-speech)

Qwen-TTS and Qwen-TTS-Realtime can be used for text-to-speech in scenarios such as intelligent voice customer service, audiobooks, in-car navigation, and educational tutoring.

Speech recognition and translation

Qwen-ASR-Realtime, Qwen-ASR, Qwen3-LiveTranslate-Flash-Realtime and Fun-ASR can perform speech-to-text for scenarios such as real-time meeting records, real-time live stream captions, and telephone customer service.

Video generation

Text-to-video

Generates high-quality videos with rich styles from a single sentence.

Image-to-video

  • First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt.

  • First-and-last-frame-to-video: Generates a smooth and dynamic video based on the provided first and last frames and a prompt.

  • Multi-image-to-video: Generates a video by referencing the entity or background in one or more input images, combined with a prompt.

General-purpose video editing

General-purpose video editing: Performs various video editing tasks based on input text, images, and videos. For example, it can generate a new video by extracting motion features from an input video and combining them with a prompt.

Embedding

Text embedding

Converts text into a set of numbers that represent the text. It is suitable for search, clustering, recommendation, and classification tasks.

Mainland China (Beijing)

Category

Subcategory

Description

Text generation

General-purpose large language models

Multimodal models

The visual understanding model Qwen-VL, the visual reasoning model QVQ, and the omni-modal model Qwen-Omni

Domain-specific models

Code model, Mathematical model, Translation model, Data mining model, Research model, Intention recognition model, Role-playing model

Image generation

Text-to-image

  • Qwen text-to-image: Excels in complex text rendering, especially for Chinese and English text.

  • Wan text-to-image: Suitable for generating certificate photos, e-commerce main images, model photos, and portraits in various styles (such as cartoon, Chinese style, and anime style).

Image editing

General-purpose models:

  • Qwen image editing: Supports Chinese and English prompts for complex image and text editing operations, such as style transfer, text modification, and object editing.

  • Wan image editing: Generates or edits images. You can create certificate photos, E-commerce main images, model photos, and portraits in various styles, such as cartoon, Chinese style, and anime. You can also remove backgrounds, generate backgrounds, and change image elements.

More models: Qwen Image Translation, OutfitAnyone

Speech synthesis and recognition

Speech synthesis (text-to-speech)

Qwen-TTS, Qwen-TTS-Realtime, and CosyVoice convert text to speech for scenarios such as voice-based customer service, audiobooks, in-car navigation, and educational tutoring.

Speech recognition and translation

Qwen-ASR-Realtime, Qwen-ASR, Fun-ASR, and Paraformer convert speech to text for scenarios such as real-time meeting transcription, real-time live stream captioning, and customer service calls.

Video editing and generation

Text-to-video

Generates high-quality videos with rich styles from a single sentence.

Image-to-video

  • First-frame-to-video: Generates a video from an initial image and a prompt.

  • First-and-last-frame-to-video: Generates a video with a natural transition based on the first and last frame images and a prompt.

  • Multi-image-to-video: Generates a video from one or more images and a text prompt, based on the entities or backgrounds in the source images.

  • Dance video generation: AnimateAnyone generates dance videos from a character image and an action video.

  • Image + audio to generate lip-sync videos

    • Wan-digital human generates video from a person's image and audio. It provides a wide and natural range of motion, supports various frame sizes such as full-body, half-body, and portrait, and is suitable for scenarios such as singing and performance.

    • EMO uses a person's image and audio to generate video with highly expressive lip-syncing and facial expressions. It supports portrait and half-body shots and is ideal for close-up scenarios.

    • LivePortrait uses a portrait image and an audio file and is ideal for voice narration scenarios.

  • Emoji video generation: Emoji generates facial emoji videos from facial images and preset dynamic facial templates.

General-purpose video editing

  • General video editing: Performs various video editing tasks based on text prompts, images, and videos. For example, you can generate a new video by extracting motion features from an input video and combining them with a text prompt.

  • Video lip-syncing: VideoRetalk uses a person's video and audio and is ideal for scenarios such as short video production and video translation.

  • Video style transfer: Video Style Repainting transforms videos into various styles, such as Japanese manga and American comics.

Embedding

Text embedding

Converts text into a set of numbers that represent the text. It is used for search, clustering, recommendation, and classification.

Multimodal embedding

Converts text, images, and speech into a set of numbers. It is used for audio and video classification, image classification, and image-text retrieval.

Text generation - Qwen

The following are the Qwen commercial models. Compared to the open-source versions, the commercial models offer the latest capabilities and improvements.

The parameter sizes of the commercial models are not disclosed.
Each model is updated periodically. To use a fixed version, you can select a snapshot version. A snapshot version is typically maintained for one month after the release of the next snapshot version.
We recommend that you use the stable or latest version for more lenient rate limiting conditions.

Qwen-Max

The best-performing model in the Qwen series, suitable for complex, multi-step tasks. Usage | API reference | Try it online

International (Singapore)

Model

Version

Mode

Context window

Max input

Max chain-of-thought

Max output

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Million tokens)

qwen3-max

Currently same capability as qwen3-max-2025-09-23
Batch calling half price

Stable

Non-thinking only

262,144

258,048

-

65,536

Tiered pricing, see the description below.

1 million tokens each

Valid for 90 days after activation

qwen3-max-2025-09-23

Snapshot

Non-thinking only

qwen3-max-preview

Preview

Thinking

81,920

32,768

Non-thinking

-

65,536

Billing for the models listed above is tiered based on the number of input tokens per request.

Input tokens per request

Input price (Million tokens)

qwen3-max and qwen3-max-preview support context cache.

Output price (Million tokens)

0 < Tokens ≤ 32K

$1.2

$6

32K < Tokens ≤ 128K

$2.4

$12

128K < Tokens ≤ 252K

$3

$15

More models

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Million tokens)

qwen-max

Currently has the same capabilities as qwen-max-2025-01-25

Stable

32,768

30,720

8,192

$1.6

Batch calls are half price.

$6.4

Batch calls are half price.

1 million tokens for input and output each

Valid for 90 days after you activate Model Studio.

qwen-max-latest

Always has the same capabilities as the latest snapshot version

Latest

$1.6

$6.4

qwen-max-2025-01-25

also known as qwen-max-0125 or Qwen2.5-Max

Snapshot

Chinese mainland (Beijing)

Model

Version

Mode

Context window

Max input

Max chain-of-thought

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen3-max

Currently same capability as qwen3-max-2025-09-23
Batch calling half price

Stable

Non-thinking only

262,144

258,048

-

65,536

Tiered pricing, see the description below.

qwen3-max-2025-09-23

Snapshot

Non-thinking only

qwen3-max-preview

Preview

Thinking

81,920

32,768

Non-thinking

-

65,536

Billing for the models listed above is tiered based on the number of input tokens per request.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

Chain-of-thought + response

qwen3-max

batch calls are half price
Context cache discounts

0 < Tokens ≤ 32K

$0.459

$1.836

32K < Tokens ≤ 128K

$0.918

$3.672

128K < Tokens ≤ 252K

$1.377

$5.508

qwen3-max-2025-09-23

0 < Tokens ≤ 32K

$0.861

$3.441

32K < Tokens ≤ 128K

$1.434

$5.735

128K < Tokens ≤ 252K

$2.151

$8.602

qwen3-max-preview

Context cache discounts

0 < Tokens ≤ 32K

$0.861

$3.441

32K < Tokens ≤ 128K

$1.434

$5.735

128K < Tokens ≤ 252K

$2.151

$8.602

More models

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen-max

Currently has the same capabilities as qwen-max-2024-09-19

Stable

32,768

30,720

8,192

$0.345

$1.377

qwen-max-latest

Always has the same capabilities as the latest snapshot version

Latest

131,072

129,024

qwen-max-2025-01-25

also known as qwen-max-0125 or Qwen2.5-Max

Snapshot

qwen-max-2024-09-19

also known as qwen-max-0919

32,768

30,720

$2.868

$8.602

qwen-max-2024-04-28

also known as qwen-max-0428

Snapshot

8,000

6,000

2,000

CNY 0.04

CNY 0.12

qwen-max-2024-04-03

also known as qwen-max-0403
The thinking mode of qwen3-max-preview significantly improves overall inference capabilities and excels in agent programming, common sense reasoning, mathematics, science, and general tasks.

Qwen-Plus

A balanced model that offers performance, cost, and speed between those of Qwen-Max and Qwen-Flash. It is suitable for moderately complex tasks. Usage | API reference | Try it online | Deep thinking

International (Singapore)

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Million tokens)

qwen-plus

Currently has the same capabilities as qwen-plus-2025-07-28
Part of the Qwen3 series

Stable

1,000,000

Thinking mode

995,904

Non-thinking mode

997,952

The default is 262,144. You can adjust this value using the max_input_tokens parameter.

32,768

Max chain-of-thought: 81,920

Tiered pricing applies. For more information, see the notes below the table.

1 million tokens for input and output each

Valid for 90 days after you activate Model Studio.

qwen-plus-latest

Currently has the same capabilities as qwen-plus-2025-07-28
Part of the Qwen3 series

Latest

Thinking mode

995,904

Non-thinking mode

997,952

qwen-plus-2025-09-11

Part of the Qwen3 series.

Snapshot

Thinking mode

995,904

Non-thinking mode

997,952

qwen-plus-2025-07-28

also known as qwen-plus-0728
Part of the Qwen3 series

qwen-plus-2025-07-14

also known as qwen-plus-0714
Part of the Qwen3 series

131,072

Thinking mode

98,304

Non-thinking mode

129,024

16,384

Max chain-of-thought: 38,912

$0.4

Thinking mode

$4

Non-thinking mode

$1.2

qwen-plus-2025-04-28

also known as qwen-plus-0428
Part of the Qwen3 series

qwen-plus-2025-01-25

also known as qwen-plus-0125

129,024

8,192

$1.2

Billing for qwen-plus, qwen-plus-latest, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 is tiered based on the number of input tokens per request.

Input tokens per request

Input price (Million tokens)

Mode

Output price (Million tokens)

0 < Tokens ≤ 256K

$0.4

Non-thinking mode

$1.2

Thinking mode

$4

256K < Tokens ≤ 1M

$1.2

Non-thinking mode

$3.6

Thinking mode

$12

Chinese mainland (Beijing)

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen-plus

Currently has the same capabilities as qwen-plus-2025-07-28
Part of the Qwen3 series

Stable

1,000,000

Thinking mode

995,904

Non-thinking mode

997,952

The default is 131,072. You can adjust this value using the max_input_tokens parameter.

32,768

Max chain-of-thought: 81,920

Tiered pricing applies. For more information, see the notes below the table.

qwen-plus-latest

Currently has the same capabilities as qwen-plus-2025-07-28
Part of the Qwen3 series

Latest

Thinking mode

995,904

Non-thinking mode

997,952

qwen-plus-2025-09-11

Part of the Qwen3 series

Snapshot

Thinking mode

995,904

Non-thinking mode

997,952

qwen-plus-2025-07-28

also known as qwen-plus-0728
Part of the Qwen3 series

qwen-plus-2025-07-14

also known as qwen-plus-0714
Part of the Qwen3 series

131,072

Thinking mode

98,304

Non-thinking mode

129,024

16,384

Max chain-of-thought: 38,912

$0.115

Thinking mode

$1.147

Non-thinking mode

$0.287

qwen-plus-2025-04-28

also known as qwen-plus-0428
Part of the Qwen3 series

Billing for qwen-plus, qwen-plus-latest, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 is tiered based on the number of input tokens per request.

Input tokens per request

Input price (Million tokens)

Mode

Output price (Million tokens)

0 < Tokens ≤ 128K

$0.115

Non-thinking mode

$0.287

Thinking mode

$1.147

128K < Tokens ≤ 256K

$0.345

Non-thinking mode

$2.868

Thinking mode

$3.441

256K < Tokens ≤ 1M

$0.689

Non-thinking mode

$6.881

Thinking mode

$9.175

These models support both thinking and non-thinking modes. You can switch between them using the enable_thinking parameter. In addition, the models' capabilities are significantly improved:

  1. Reasoning capabilities: In evaluations for math, code, and logical reasoning, the model significantly outperforms QwQ and other models of similar size without a reasoning mode. It achieves top-tier performance among models of its scale.

  2. Human preference alignment: The model shows significant improvements in creative writing, role assumption, multi-turn conversation, and instruction following. Its general capabilities are significantly better than those of other models of similar size.

  3. Agent capabilities: The model achieves industry-leading performance in both thinking and non-thinking modes and can accurately invoke external tools.

  4. Multilingual capabilities: The model supports more than 100 languages and dialects. Its capabilities in multilingual translation, instruction understanding, and common-sense reasoning are significantly improved.

  5. Response format: This version fixes response format issues from previous versions, such as incorrect Markdown formatting, premature truncation, and incorrect boxed output.

For the models listed above, if you enable thinking mode but no thought process is generated, you are charged based on the pricing for non-thinking mode.

More models

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen-plus-2025-01-25

also known as qwen-plus-0125

Snapshot

131,072

129,024

8,192

$0.115

$0.287

qwen-plus-2025-01-12

also known as qwen-plus-0112

qwen-plus-2024-12-20

also known as qwen-plus-1220

qwen-plus-2024-11-27

also known as qwen-plus-1127

qwen-plus-2024-11-25

also known as qwen-plus-1125

qwen-plus-2024-09-19

also known as qwen-plus-0919

qwen-plus-2024-08-06

also known as qwen-plus-0806

128,000

$0.574

$1.721

Qwen-Flash

The fastest and most cost-effective model in the Qwen series, ideal for simple jobs. Qwen-Flash features flexible tiered pricing, making it more cost-effective than Qwen-Turbo. Usage | API reference | Try it online | Thinking mode

International (Singapore)

Model

Version

Mode

Context window

Max input

Max chain-of-thought

Max output

Input cost

Output cost

Chain-of-thought + Outputs

Free quota

(Note)

(Tokens)

(1,000 tokens)

qwen-flash

Same capabilities as qwen-flash-2025-07-28
Part of the Qwen3 series.
Batch calls are charged at half the standard price.

Stable

Thinking

1,000,000

995,904

81,920

32,768

Tiered pricing. See the description below the table.

1 million tokens each

Valid for 90 days after activating Alibaba Cloud Model Studio.

Non-thinking

997,952

-

qwen-flash-2025-07-28

Part of the Qwen3 series.

Snapshot

Thinking

995,904

81,920

Non-thinking

997,952

-

Billing for the models listed above is tiered based on the number of input tokens per request. qwen-flash supports context cache and batch calls.

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

0< Tokens ≤256K

$0.05

$0.4

256K< Tokens ≤1M

$0.25

$2

Chinese mainland (Beijing)

Model

Version

Mode

Context window

Max input

Max chain-of-thought

Max output

Input cost

Output cost

Chain-of-thought + Outputs

(Tokens)

(1,000 tokens)

qwen-flash

Same capabilities as qwen-flash-2025-07-28
Part of the Qwen3 series

Stable

Thinking

1,000,000

995,904

81,920

32,768

Tiered pricing. See the description below the table.

Non-thinking

997,952

-

qwen-flash-2025-07-28

Part of the Qwen3 series

Snapshot

Thinking

995,904

81,920

Non-thinking

997,952

-

Billing for the models listed above is tiered based on the number of input tokens per request. qwen-flash supports context cache.

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

0< Tokens ≤128K

$0.022

$0.216

128K< Tokens ≤256K

$0.087

$0.861

256K< Tokens ≤1M

$0.173

$1.721

Qwen-Turbo

Qwen-Turbo will no longer be updated. We recommend replacing it with Qwen-Flash. Qwen-Flash uses flexible tiered pricing, which offers a more granular pricing model. Usage | API reference | Try it online | Deep thinking

International (Singapore)

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Million tokens)

qwen-turbo

Currently has the same capabilities as qwen-turbo-2025-04-28
Part of the Qwen3 series

Stable

Thinking mode

131,072

Non-thinking mode

1,000,000

Thinking mode

98,304

Non-thinking mode

1,000,000

16,384

Max chain-of-thought is 38,912

$0.05

Batch calls are half price

Thinking mode: $0.5

Non-thinking mode: $0.2

Batch calls are half price

1 million tokens for each

Validity: 90 days after you activate Alibaba Cloud Model Studio

qwen-turbo-latest

Always has the same capabilities as the latest snapshot version
Part of the Qwen3 series

Latest

$0.05

Thinking mode: $0.5

Non-thinking mode: $0.2

qwen-turbo-2025-04-28

Also known as qwen-turbo-0428
Part of the Qwen3 series

Snapshot

qwen-turbo-2024-11-01

Also known as qwen-turbo-1101

1,000,000

1,000,000

8,192

$0.2

Mainland China (Beijing)

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen-turbo

Currently has the same capabilities as qwen-turbo-2025-04-28
Part of the Qwen3 series

Stable

Thinking mode

131,072

Non-thinking mode

1,000,000

Thinking mode

98,304

Non-thinking mode

1,000,000

16,384

Max chain-of-thought is 38,912

$0.044

Thinking mode

$0.431

Non-thinking mode

$0.087

qwen-turbo-latest

Always has the same capabilities as the latest snapshot version
Part of the Qwen3 series

Latest

qwen-turbo-2025-07-15

Also known as qwen-turbo-0715
Part of the Qwen3 series

Snapshot

qwen-turbo-2025-04-28

Also known as qwen-turbo-0428
Part of the Qwen3 series

QwQ

The QwQ reasoning model is trained on the Qwen2.5 model and uses reinforcement learning to significantly improve its reasoning capabilities. The model's core metrics for math and code, such as AIME 24/25 and LiveCodeBench, and some of its general metrics, such as IFEval and LiveBench, are comparable to the full-performance version of DeepSeek-R1. Usage

Singapore

Model

Version

Context window

Max input

Max chain-of-thought

Max response

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Million tokens)

qwq-plus

Stable

131,072

98,304

32,768

8,192

$0.8

$2.4

1 million tokens

Validity: Within 90 days after you activate Alibaba Cloud Model Studio.

Mainland China (Beijing)

Model

Version

Context window

Max input

Max chain-of-thought

Max response

Input cost

Output cost

(Tokens)

(Million tokens)

qwq-plus

Same capabilities as qwq-plus-2025-03-05.

Stable

131,072

98,304

32,768

8,192

$0.230

$0.574

qwq-plus-latest

Always has the same capabilities as the latest snapshot version.

Latest

qwq-plus-2025-03-05

Also known as qwq-plus-0305.

Snapshot

Qwen-Long

The Qwen-Long model has the longest context window in the Qwen series. It offers balanced performance at a low cost. This model is ideal for tasks such as long-text analysis, information extraction, summarization, classification, and tagging. Usage | Try it online

China (Beijing)

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen-long-latest

Always matches the capabilities of the latest snapshot version.

Stable

10,000,000

10,000,000

8,192

$0.072

$0.287

qwen-long-2025-01-25

Also known as qwen-long-0125.

Snapshot

Qwen-Omni

The Qwen-Omni model accepts combined inputs from multiple modalities, such as text, images, audio, and video, and generates responses in text or speech format. It provides a variety of expressive, human-like voices and supports audio output in multiple languages and dialects. You can use it in audio and video chat scenarios, such as for visual recognition, sentiment analysis, and education and training. Usage | API reference

Singapore

Model

Version

Mode

Context window

Max input

Max chain-of-thought

Max output

Free quota

(Note)

(Tokens)

qwen3-omni-flash

Currently has the same capabilities as qwen3-omni-flash-2025-09-15

Stable

Thinking mode

65,536

16,384

32,768

16,384

1 million tokens each (modality-agnostic)

Valid for 90 days after you activate Model Studio

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-09-15

Also known as qwen3-omni-flash-0915

Snapshot

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

After you use your free quota, the following billing rules apply to inputs and outputs. The billing is the same for both thinking mode and non-thinking mode. Audio output is not supported in thinking mode.

Input

Unit price (Million tokens)

Text

$0.43

Audio

$3.81

Image/Video

$0.78

Output

Unit price (Million tokens)

Text

$1.66 (when input contains only text)

$3.06 (when input contains images, videos, or audio)

Text + Audio

This item is not billed in thinking mode.

$15.11 (Audio)

The output text is not billed.

More models

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(Tokens)

qwen-omni-turbo

This version has the same capabilities as qwen-omni-turbo-2025-03-26.

Stable

32,768

30,720

2,048

1 million tokens each (modality-agnostic)

Valid for 90 days after activating Model Studio.

qwen-omni-turbo-latest

Always points to the latest snapshot version.
Equivalent capabilities

Latest

qwen-omni-turbo-2025-03-26

Also known as qwen-omni-turbo-0326.

Snapshot

After you use your free quota for commercial models, the following billing rules apply to inputs and outputs:

Input

Unit price (Million tokens)

Text

$0.07

Audio

$4.44

Image/Video

$0.21

Output

Unit price (Million tokens)

Text

$0.27 (when the input contains only text)

$0.63 (when the input contains images, videos, or audio)

Text + Audio

$8.89 (Audio)

The text portion of the output is not billed.

Mainland China (Beijing)

Model

Version

Mode

Context window

Max input

Max chain-of-thought

Max output

Free quota

(Note)

(Tokens)

qwen3-omni-flash

Currently has the same capabilities as qwen3-omni-flash-2025-09-15

Stable

Thinking mode

65,536

16,384

32,768

16,384

No free quota

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-09-15

Also known as qwen3-omni-flash-0915

Snapshot

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

After your free quota is used up, inputs and outputs are billed according to the following rules. The billing is the same for both thinking mode and non-thinking mode. Audio output is not supported in thinking mode.

Input

Unit price (Million tokens)

Text

$0.258

Audio

$2.265

Image/Video

$0.473

Output

Unit price (Million tokens)

Text

$0.989 (when input contains only text)

$1.821 (when input contains images, videos, or audio)

Text + audio

This item is not billed in thinking mode.

$8.974 (audio)

The output text is not billed.

More models

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(Tokens)

qwen-omni-turbo

This model currently has the same capabilities as qwen-omni-turbo-2025-03-26.

Stable

32,768

30,720

2,048

No free quota

qwen-omni-turbo-latest

Always aligned with the latest snapshot
Identical capabilities

Latest

qwen-omni-turbo-2025-03-26

Also known as qwen-omni-turbo-0326.

Snapshot

qwen-omni-turbo-2025-01-19

Also known as qwen-omni-turbo-0119.

Inputs and outputs are billed according to the following rules:

Input

Unit price (Million tokens)

Text

$0.058

Audio

$3.584

Image/Video

$0.216

Output

Unit price (Million tokens)

Text

$0.230 (for text-only input)

$0.646 (when the input includes images, audio, or videos)

Text + audio

$7.168 (audio)

The output text is not billed.

For example, a request with 1,000 text input tokens, 1,000 image input tokens, 1,000 text output tokens, and 1,000 audio output tokens costs $0.000058 (text input) + $0.000216 (image input) + $0.007168 (audio output).

The Qwen3-Omni-Flash model offers significant improvements over Qwen-Omni-Turbo, which is no longer updated:

  • It is a hybrid thinking model that supports both thinking and non-thinking modes. You can switch between the modes using the enable_thinking parameter. By default, thinking mode is disabled.

  • Audio output is not supported in thinking mode. In non-thinking mode, the audio output from the model has the following features:

    • It supports 17 voices, an increase from the 4 supported by Qwen-Omni-Turbo.

    • It supports 10 languages, an increase from the 2 supported by Qwen-Omni-Turbo.

Qwen-Omni-Realtime

Compared to Qwen Omni, these models support audio stream input. They have a built-in Voice Activity Detection (VAD) feature that automatically detects the start and end of user speech. Usage | Client events | Server events

International (Singapore)

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(Tokens)

qwen3-omni-flash-realtime

Equivalent to qwen3-omni-flash-realtime-2025-09-15

Stable

65,536

49,152

16,384

1 million tokens each, regardless of modality

Valid for 90 days after you activate Model Studio

qwen3-omni-flash-realtime-2025-09-15

Snapshot

After the free quota is exhausted, the following billing rules apply to inputs and outputs:

Input

Unit price (Million tokens)

Text

$0.52

Audio

$4.57

Image

$0.94

Output

Unit price (Million tokens)

Text

$1.99 (when input contains only text)

$3.67 (when input contains images or audio)

Text + Audio

$18.13 (audio)

Text output is not billed.

More models

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(Tokens)

qwen-omni-turbo-realtime

Equivalent to qwen-omni-turbo-realtime-2025-05-08

Stable

32,768

30,720

2,048

1 million tokens, regardless of modality

Valid for 90 days after you activate Model Studio

qwen-omni-turbo-realtime-latest

Always equivalent to the latest snapshot version

Latest

qwen-omni-turbo-realtime-2025-05-08

Snapshot

After the free quota is exhausted, the following billing rules apply to inputs and outputs:

Input

Unit price (Million tokens)

Text

$0.270

Audio

$4.440

Image

$0.840

Output

Unit price (Million tokens)

Text

$1.070 (for text-only input)

$2.520 (for input that contains images or audio)

Text + Audio

$8.890 (for audio output)

The text output is not billed.

Mainland China (Beijing)

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(Tokens)

qwen3-omni-flash-realtime

Equivalent to qwen3-omni-flash-realtime-2025-09-15

Stable

65,536

49,152

16,384

No free quota

qwen3-omni-flash-realtime-2025-09-15

Snapshot

The following billing rules apply to inputs and outputs:

Input

Unit price (Million tokens)

Text

$0.315

Audio

$2.709

Image

$0.559

Output

Unit price (Million tokens)

Text

$1.19 (when the input contains only text)

$2.179 (when the input contains images or audio)

Text + audio

$10.766 (audio)

Text output is not billed.

More models

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(Tokens)

qwen-omni-turbo-realtime

Equivalent to qwen-omni-turbo-2025-05-08

Stable

32,768

30,720

2,048

No free quota

qwen-omni-turbo-realtime-latest

Always equivalent to the latest snapshot version

Latest

qwen-omni-turbo-realtime-2025-05-08

Snapshot

The following billing rules apply to inputs and outputs:

Input

Unit price (Million tokens)

Text

$0.230

Audio

$3.584

Image

$0.861

Output

Unit price (Million tokens)

Text

$0.918 (if the input contains only text)

$2.581 (if the input contains images or audio)

Text + Audio

$7.168 (audio)

Text output is not billed.

The Qwen3-Omni-Flash-Realtime model is recommended. It offers significantly improved capabilities compared to Qwen-Omni-Turbo-Realtime, which will no longer be updated. For audio output from the model:

  • It supports 17 voices. Qwen-Omni-Turbo-Realtime supports only 4.

  • It supports 10 languages. Qwen-Omni-Turbo-Realtime supports only 2.

QVQ

QVQ is a visual reasoning model that supports visual inputs and chain-of-thought outputs. It delivers superior performance in math, programming, visual analysis, creative tasks, and general tasks. Usage | Try it online

International (Singapore)

Model

Version

Context window

Max input

Max chain-of-thought

Max response

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Million tokens)

qvq-max

Equivalent to qvq-max-2025-03-25.

Stable

131,072

106,496

Maximum of 16,384 tokens for a single image.

16,384

8,192

$1.2

$4.8

1 million input tokens and 1 million output tokens.

Valid for 90 days after you activate Alibaba Cloud Model Studio.

qvq-max-latest

Always equivalent to the latest snapshot version.

Latest

qvq-max-2025-03-25

Also known as qvq-max-0325.

Snapshot

Mainland China (Beijing)

Model

Version

Context window

Max input

Max chain-of-thought

Max response

Input cost

Output cost

(Tokens)

(Million tokens)

qvq-max

Offers stronger visual reasoning and instruction-following capabilities than qvq-plus, providing optimal performance for more complex tasks.
Has the same capabilities as qvq-max-2025-03-25.

Stable

131,072

106,496

Maximum of 16,384 for a single image.

16,384

8,192

$1.147

$4.588

qvq-max-latest

Always has the same capabilities as the latest snapshot version.

Latest

qvq-max-2025-05-15

Also known as qvq-max-0515.

Snapshot

qvq-max-2025-03-25

Also known as qvq-max-0325.

qvq-plus

Has the same capabilities as qvq-plus-2025-05-15.

Stable

$0.287

$0.717

qvq-plus-latest

Always has the same capabilities as the latest snapshot version.

Latest

qvq-plus-2025-05-15

Also known as qvq-plus-0515.

Snapshot

Qwen-VL

Qwen-VL is a text generation model with visual understanding (image) capabilities. It not only performs Optical Character Recognition (OCR) but also provides further summarization and reasoning, such as extracting properties from product photos or solving problems shown in diagrams. Usage | API reference | Try it online

Qwen-VL models are billed based on the total number of input and output tokens. For more information about how image tokens are calculated, see Visual understanding.

International (Singapore)

Model

Version

Mode

Context window

Max input

Max chain-of-thought

Max output

Input cost

Output cost

(Chain-of-thought + Output)

Free quota

(Note)

(Tokens)

(Million tokens)

qwen3-vl-plus

Same capabilities as qwen3-vl-plus-2025-09-23

Stable

thinking

262,144

258,048

Maximum of 16,384 tokens per image

81,920

32,768

Tiered pricing. For more information, see the description below the table.

1 million input tokens and 1 million output tokens

Valid for 90 days after you activate Alibaba Cloud Model Studio.

non-thinking

260,096

Maximum of 16,384 tokens per image

-

qwen3-vl-plus-2025-09-23

Snapshot

thinking

258,048

Maximum of 16,384 tokens per image

81,920

non-thinking

260,096

Maximum of 16,384 tokens per image

-

qwen3-vl-flash

Same capabilities as qwen3-vl-flash-2025-10-15

Stable

thinking

258,048

Maximum of 16,384 tokens per image

81,920

non-thinking

260,096

Maximum of 16,384 tokens per image

-

qwen3-vl-flash-2025-10-15

Snapshot

thinking

258,048

Maximum of 16,384 tokens per image

81,920

non-thinking

260,096

Maximum of 16,384 tokens per image

-

The models listed above use tiered pricing based on the number of input tokens per request. The input and output prices are the same for both thinking and non-thinking modes.

qwen3-vl-plus series

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

0 < Tokens ≤ 32K

$0.20

$1.60

32K < Tokens ≤ 128K

$0.30

$2.40

128K < Tokens ≤ 256K

$0.60

$4.80

qwen3-vl-flash series

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

0 < Tokens ≤ 32K

$0.05

$0.40

32K < Tokens ≤ 128K

$0.075

$0.60

128K < Tokens ≤ 256K

$0.12

$0.96

More models

Qwen-VL-Max

Qwen-VL-Max performs better than Qwen-VL-Plus. The following models belong to the Qwen2.5-VL series.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Million tokens)

qwen-vl-max

Offers improved visual reasoning and instruction-following capabilities compared to qwen-vl-plus. Provides optimal performance for complex tasks.
Same capabilities as qwen-vl-max-2025-08-13.

Stable

131,072

129,024

Maximum 16,384 per image.

8,192

$0.8

Half price for batch calls

$3.2

Half price for batch calls

1 million input tokens and 1 million output tokens.

The validity period is 90 days after Model Studio is activated.

qwen-vl-max-latest

Always provides the same capabilities as the latest snapshot version.

Latest

$0.8

$3.2

qwen-vl-max-2025-08-13

Also known as qwen-vl-max-0813.
Provides comprehensive improvements in visual understanding, with significantly enhanced math, reasoning, object detection, and multilingual processing capabilities.

Snapshot

qwen-vl-max-2025-04-08

Also known as qwen-vl-max-0408.
A Qwen2.5-VL series model with an expanded 128k context window and significantly enhanced math and reasoning capabilities.
Qwen-VL-Plus

Qwen-VL-Plus offers a balance between performance and cost. The following models belong to the Qwen2.5-VL series.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Million tokens)

qwen-vl-plus

Same capabilities as qwen-vl-plus-2025-08-15.

Stable

131,072

129,024

Maximum of 16,384 per image.

8,192

$0.21

Half-price for batch calls

$0.63

Half-price for batch calls

1 million input tokens and 1 million output tokens

Valid for 90 days after you activate Alibaba Cloud Model Studio.

qwen-vl-plus-latest

Always has the same capabilities as the latest snapshot version.

Latest

$0.21

$0.63

qwen-vl-plus-2025-08-15

Also known as qwen-vl-plus-0815.
Features significant improvements in object detection, localization, and multilingual processing.

Snapshot

qwen-vl-plus-2025-05-07

Also known as qwen-vl-plus-0507.
Features significantly improved understanding of math, reasoning, and content from surveillance videos.

qwen-vl-plus-2025-01-25

Also known as qwen-vl-plus-0125.
A Qwen2.5-VL series model with an expanded 128k context window and significantly enhanced image and video understanding.

Mainland China (Beijing)

Model

Version

Mode

Context window

Max input

Max chain-of-thought

Max output

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Million tokens)

qwen3-vl-plus

Same capabilities as qwen3-vl-plus-2025-09-23

Stable

thinking

262,144

258,048

Maximum of 16,384 tokens per image

81,920

32,768

Tiered pricing. For more information, see the description below the table.

No free quota

non-thinking

260,096

Maximum of 16,384 tokens per image

-

qwen3-vl-plus-2025-09-23

Snapshot

thinking

258,048

Maximum of 16,384 tokens per image

81,920

non-thinking

260,096

Maximum of 16,384 tokens per image

-

qwen3-vl-flash

Same capabilities as qwen3-vl-flash-2025-10-15

Stable

thinking

258,048

Maximum of 16,384 tokens per image

81,920

non-thinking

260,096

Maximum of 16,384 tokens per image

-

qwen3-vl-flash-2025-10-15

Snapshot

thinking

258,048

Maximum of 16,384 tokens per image

81,920

non-thinking

260,096

Maximum of 16,384 tokens per image

-

The models listed above use tiered pricing based on the number of input tokens per request. The input and output prices are the same for both thinking and non-thinking modes.

qwen3-vl-plus series

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

0 < Tokens ≤ 32K

$0.143353

$1.433525

32K < Tokens ≤ 128K

$0.215029

$2.150288

128K < Tokens ≤ 256K

$0.430058

$4.300576

qwen3-vl-flash series

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

0 < Tokens ≤ 32K

$0.022

$0.215

32K < Tokens ≤ 128K

$0.043

$0.43

128K < Tokens ≤ 256K

$0.086

$0.859

More models

Qwen-VL-Max series

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen-vl-max

Offers improved visual reasoning and instruction-following capabilities compared to qwen-vl-plus and delivers optimal performance for complex tasks.
Same capabilities as qwen-vl-max-2025-08-13.

Stable

131,072

129,024

Maximum of 16,384 per image

8,192

$0.23

$0.574

qwen-vl-max-latest

Provides the same capabilities as the latest snapshot version.

Latest

qwen-vl-max-2025-08-13

Also known as qwen-vl-max-0813.
Features comprehensive improvements in visual understanding and significantly enhanced capabilities for math, reasoning, object detection, and multilingual processing.

Snapshot

qwen-vl-max-2025-04-08

Also known as qwen-vl-max-0408.
Enhanced math and reasoning capabilities.

$0.431

$1.291

qwen-vl-max-2025-04-02

Also known as qwen-vl-max-0402.
Significantly improves accuracy in solving complex math problems.

qwen-vl-max-2025-01-25

Also known as qwen-vl-max-0125.

This version is an upgrade to the Qwen2.5-VL series, featuring an expanded context window of 128k and significantly enhanced image and video understanding.

qwen-vl-max-2024-12-30

Also known as qwen-vl-max-1230.

32,768

30,720

Maximum of 16,384 per image

2,048

$0.431

$1.291

qwen-vl-max-2024-11-19

Also known as qwen-vl-max-1119.

qwen-vl-max-2024-10-30

Also known as qwen-vl-max-1030.

$2.868

qwen-vl-max-2024-08-09

Also known as qwen-vl-max-0809.
Qwen-VL-Plus series

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen-vl-plus

Offers the same capabilities as qwen-vl-plus-2025-08-15.

Stable

131,072

129,024

Maximum of 16,384 tokens per image.

8,192

$0.115

$0.287

qwen-vl-plus-latest

Always has the same capabilities as the latest snapshot version.

Latest

qwen-vl-plus-2025-08-15

Also known as qwen-vl-plus-0815.
Provides significant improvements in object detection, localization, and multilingual processing.

Snapshot

qwen-vl-plus-2025-07-10

Also known as qwen-vl-plus-0710.
Further improves the understanding of content from surveillance videos.

32,768

30,720

Maximum of 16,384 tokens per image.

$0.022

$0.216

qwen-vl-plus-2025-05-07

Also known as qwen-vl-plus-0507.
Significantly improves the understanding of math, reasoning, and content from surveillance videos.

131,072

129,024

Maximum of 16,384 tokens per image.

$0.216

$0.646

qwen-vl-plus-2025-01-25

Also known as qwen-vl-plus-0125.

Upgrades to the Qwen2.5-VL series, expands the context window to 128K, and significantly enhances image and video understanding.

qwen-vl-plus-2025-01-02

Also known as qwen-vl-plus-0102.

32,768

30,720

Maximum of 16,384 tokens per image.

2,048

qwen-vl-plus-2024-08-09

Also known as qwen-vl-plus-0809.

Qwen-OCR

The Qwen-OCR model is designed for text extraction. Compared to the Qwen-VL model, it specializes in extracting text from images of documents, tables, exam papers, and handwriting. It can recognize multiple languages, such as English, French, Japanese, Korean, German, Russian, and Italian. Usage | API reference | Try it online

International (Singapore)

Model

Version

Context window

Max input

Max output

Unit price

Free quota

(Note)

(tokens)

(Million tokens)

qwen-vl-ocr

Stable

34,096

30,000

A single graph can support up to 30,000.

4,096

$0.72

1 million input tokens and 1 million output tokens

Validity: The quota is valid for 90 days after you activate Alibaba Cloud Model Studio.

Mainland China (Beijing)

Model

Version

Context window

Max input

Max output

Input/output unit price

(Tokens)

(Million tokens)

qwen-vl-ocr

Offers the same capabilities as qwen-vl-ocr-2025-04-13.

Stable

34,096

30,000

Maximum of 30,000 for a single image.

4,096

$0.717

qwen-vl-ocr-latest

Offers the same capabilities as the latest snapshot version.

Latest

qwen-vl-ocr-2025-04-13

Also known as qwen-vl-ocr-0413.
Significantly improves text recognition and includes six built-in OCR tasks and features, such as custom prompts and image rotation correction.

Snapshot

qwen-vl-ocr-2024-10-28

Also known as qwen-vl-ocr-1028.

Snapshot

Qwen-Math

Qwen-Math is a language model designed for mathematical problem-solving. Usage | API reference | Try it online

Note

This model is available only in the China (Beijing) region.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen-math-plus

Same capabilities as qwen-math-plus-2024-09-19.

Stable

4,096

3,072

3,072

$0.574

$1.721

qwen-math-plus-latest

Same capabilities as the latest snapshot.

Latest

qwen-math-plus-2024-09-19

Also known as qwen-math-plus-0919.

Snapshot

qwen-math-plus-2024-08-16

Also known as qwen-math-plus-0816.

qwen-math-turbo

Same capabilities as qwen-math-turbo-2024-09-19.

Stable

$0.287

$0.861

qwen-math-turbo-latest

Same capabilities as the latest snapshot.

Latest

qwen-math-turbo-2024-09-19

Also known as qwen-math-turbo-0919.

Snapshot

Qwen-Coder

The latest Qwen3-Coder-Plus series models are Qwen code generation models built on Qwen3. They are powerful coding agents that excel at tool calling and environment interaction. These models can program autonomously and provide excellent coding and general-purpose capabilities. Usage | API reference | Try it online

International (Singapore)

Model

Version

Context window

Max input

Max output

Input cost (Million tokens)

Output cost (Million tokens)

Free quota

(Note)

Tokens

Per million tokens

qwen3-coder-plus

Currently equivalent to qwen3-coder-plus-2025-07-22

Stable

1,000,000

997,952

65,536

Tiered pricing. See the description below the table.

1 million input tokens and 1 million output tokens

Valid for 90 days after you activate Alibaba Cloud Model Studio

qwen3-coder-plus-2025-09-23

Snapshot

qwen3-coder-plus-2025-07-22

Snapshot

qwen3-coder-flash

Currently equivalent to qwen3-coder-flash-2025-07-28

Stable

qwen3-coder-flash-2025-07-28

Snapshot

These models use tiered billing based on the number of input tokens per request.

qwen3-coder-plus series

The prices for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are as follows. The qwen3-coder-plus model supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price. Input text that hits the explicit cache is billed at 10% of the unit price.

Input tokens per request

Input cost (Million tokens)

Output cost (Million tokens)

0 < Tokens ≤ 32K

$1

$5

32K < Tokens ≤ 128K

$1.8

$9

128K < Tokens ≤ 256K

$3

$15

256K < Tokens ≤ 1M

$6

$60

qwen3-coder-flash series

The prices for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are as follows. The qwen3-coder-flash model supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price. Input text that hits the explicit cache is billed at 10% of the unit price.

Input tokens per request

Input cost (Million tokens)

Output cost (Million tokens)

0 < Tokens ≤ 32K

$0.3

$1.5

32K < Tokens ≤ 128K

$0.5

$2.5

128K < Tokens ≤ 256K

$0.8

$4

256K < Tokens ≤ 1M

$1.6

$9.6

Mainland China (Beijing)

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen3-coder-plus

Provides the same functionality as qwen3-coder-plus-2025-07-22.

Stable

1,000,000

997,952

65,536

Tiered pricing. See the description below the table.

qwen3-coder-plus-2025-09-23

Snapshot

qwen3-coder-plus-2025-07-22

Snapshot

qwen3-coder-flash

Currently an alias for qwen3-coder-flash-2025-07-28

Stable

qwen3-coder-flash-2025-07-28

Snapshot

These models use tiered billing based on the number of input tokens per request.

qwen3-coder-plus series

The prices for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are as follows. The qwen3-coder-plus model supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price. Input text that hits the explicit cache is billed at 10% of the unit price.

Input tokens per request

Input cost (Million tokens)

Output cost (Million tokens)

0 < Tokens ≤ 32K

$0.574

$2.294

32K < Tokens ≤ 128K

$0.861

$3.441

128K < Tokens ≤ 256K

$1.434

$5.735

256K < Tokens ≤ 1M

$2.868

$28.671

qwen3-coder-flash series

The prices for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are as follows. The qwen3-coder-flash model supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price. Input text that hits the explicit cache is billed at 10% of the unit price.

Input tokens per request

Input cost (Million tokens)

Output cost (Million tokens)

0 < Tokens ≤ 32K

$0.144

$0.574

32K < Tokens ≤ 128K

$0.216

$0.861

128K < Tokens ≤ 256K

$0.359

$1.434

256K < Tokens ≤ 1M

$0.717

$3.584

More models

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen-coder-plus

Same functionality as qwen-coder-plus-2024-11-06

Stable

131,072

129,024

8,192

$0.502

$1.004

qwen-coder-plus-latest

Same functionality as the latest snapshot version of qwen-coder-plus

Latest

qwen-coder-plus-2024-11-06

Also known as qwen-coder-plus-1106

Snapshot

qwen-coder-turbo

Same functionality as qwen-coder-turbo-2024-09-19

Stable

131,072

129,024

8,192

$0.287

$0.861

qwen-coder-turbo-latest

Same functionality as the latest snapshot version of qwen-coder-turbo

Latest

qwen-coder-turbo-2024-09-19

Also known as qwen-coder-turbo-0919

Snapshot

Qwen-MT

This flagship large translation model is a comprehensive upgrade to Qwen 3. It supports translation between 92 languages, including Chinese, English, Japanese, Korean, French, Spanish, German, Thai, Indonesian, Vietnamese, and Arabic. The model's performance and translation quality are significantly improved. It provides enhanced support for custom glossaries, format retention, and domain-specific prompts, resulting in more accurate and natural translations. Usage.

International (Singapore)

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

View rules

(Tokens)

(Million tokens)

qwen-mt-plus

Part of Qwen3-MT

16,384

8,192

8,192

$2.46

$7.37

1 million tokens per model

Expires 90 days after activating Alibaba Cloud Model Studio.

qwen-mt-flash

Part of Qwen3-MT

$0.16

$0.49

qwen-mt-turbo

Part of Qwen3-MT

$0.16

$0.49

Mainland China (Beijing)

Model

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen-mt-plus

Part of Qwen3-MT

16,384

8,192

8,192

$0.259

$0.775

qwen-mt-flash

Part of Qwen3-MT

$0.101

$0.280

qwen-mt-turbo

Part of Qwen3-MT

$0.101

$0.280

Qwen data mining model

The Qwen data mining model extracts structured information from documents for use in domains such as data annotation and content moderation. Usage | API reference

Note

Available only in the China (Beijing) region.

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Tokens)

(Million tokens)

qwen-doc-turbo

131,072

129,024

8,192

$0.087

$0.144

No free quota

Qwen deep research model

The Qwen deep research model breaks down complex problems, performs inference and analysis using web search, and generates research reports. Usage | API reference

Note

Available only in the China (Beijing) region.

Model

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Per 1,000 tokens)

qwen-deep-research

1,000,000

997,952

32,768

$0.007742

$0.023367

Text generation - Qwen open-source versions

  • In the model names, xxb indicates the parameter size. For example, qwen2-72b-instruct indicates a parameter size of 72 billion (72B).

  • Alibaba Cloud Model Studio supports invoking the open-source versions of Qwen. You do not need to deploy the models locally. For open-source versions, we recommend using the Qwen3 and Qwen2.5 models.

Qwen3

qwen3-next-80b-a3b-thinking, released in September 2025, supports only thinking mode. Compared to qwen3-235b-a22b-thinking-2507, it offers improved instruction-following capabilities and more concise summaries.

qwen3-next-80b-a3b-instruct, released in September 2025, supports only non-thinking mode. It offers enhanced Chinese comprehension, logical reasoning, and text generation capabilities compared to qwen3-235b-a22b-instruct-2507.

The qwen3-235b-a22b-thinking-2507 and qwen3-30b-a3b-thinking-2507 models, released in July 2025, support only thinking mode. They are upgraded versions of qwen3-235b-a22b (thinking mode) and qwen3-30b-a3b (thinking mode).

The qwen3-235b-a22b-instruct-2507 and qwen3-30b-a3b-instruct-2507 models, released in July 2025, support only non-thinking mode. They are upgraded versions of qwen3-235b-a22b (non-thinking mode) and qwen3-30b-a3b (non-thinking mode).

The Qwen3 models, released in April 2025, support both thinking and non-thinking modes. You can switch between the modes using the enable_thinking parameter. The Qwen3 models also feature significant capability enhancements:

  1. Inference capabilities: In evaluations for math, code, and logical reasoning, the models significantly outperform QwQ and other non-reasoning models of a similar scale. Their performance is top-tier in the industry for models of their scale.

  2. Human preference alignment: The models show major improvements in creative writing, role assumption, multi-turn conversation, and instruction following. Their general capabilities are significantly better than other models of a similar scale.

  3. Agent capabilities: The models deliver industry-leading performance in both thinking and non-thinking modes and can perform precise external tool calling.

  4. Multilingual capabilities: The models support over 100 languages and dialects. They show significant improvements in multilingual translation, instruction comprehension, and common-sense reasoning.

    Supported languages

    English

    Simplified Chinese

    Traditional Chinese

    French

    Spanish

    Arabic. Uses the Arabic script and is the official language of many Arab countries.

    Russian. Uses the Cyrillic script and is an official language in Russia and several other countries.

    Portuguese. Uses the Latin script and is the official language of Portugal, Brazil, and other Portuguese-speaking countries.

    German. Uses the Latin script and is an official language in countries such as Germany and Austria.

    Italian. Uses the Latin script and is an official language of Italy, San Marino, and parts of Switzerland.

    Dutch. Uses the Latin script and is an official language of the Netherlands, parts of Belgium (Flanders), and Suriname.

    Danish. Uses the Latin script and is the official language of Denmark.

    Irish. Uses the Latin script and is one of the official languages of Ireland.

    Welsh. Uses the Latin script and is one of the official languages of Wales.

    Finnish. Uses the Latin script and is the official language of Finland.

    Icelandic. Uses the Latin script and is the official language of Iceland.

    Swedish. Uses the Latin script and is the official language of Sweden.

    Norwegian Nynorsk. Uses the Latin script and is one of the two official written standards for the Norwegian language, used alongside Norwegian Bokmål.

    Norwegian Bokmål. Uses the Latin script and is the more widely used of the two official written standards for the Norwegian language.

    Japanese. Uses Japanese characters and is the official language of Japan.

    Korean. Uses Hangul and is the official language of South Korea and North Korea.

    Vietnamese. Uses the Latin script and is the official language of Vietnam.

    Thai. Uses the Thai script and is the official language of Thailand.

    Indonesian. Uses the Latin script and is the official language of Indonesia.

    Malay. Uses the Latin script and is a major language in Malaysia and several other countries.

    Burmese. Uses the Myanmar script and is the official language of Myanmar.

    Tagalog. Uses the Latin script and is one of the major languages of the Philippines.

    Khmer. Uses the Khmer script and is the official language of Cambodia.

    Lao. Uses the Lao script and is the official language of Laos.

    Hindi. Uses the Devanagari script and is one of the official languages of India.

    Bengali. Uses the Bengali script and is the official language of Bangladesh and the Indian state of West Bengal.

    Urdu. Uses the Arabic script, is an official language of Pakistan, and is also spoken in India.

    Nepali. Uses the Devanagari script and is the official language of Nepal.

    Hebrew. Uses the Hebrew script and is the official language of Israel.

    Turkish. Uses the Latin script and is the official language of Türkiye and Northern Cyprus.

    Persian. Uses the Arabic script and is an official language in countries such as Iran and Tajikistan.

    Polish. Uses the Latin script and is the official language of Poland.

    Ukrainian. Uses the Cyrillic script and is the official language of Ukraine.

    Czech. Uses the Latin script and is the official language of the Czech Republic.

    Romanian. Uses the Latin script and is the official language of Romania and Moldova.

    Bulgarian. Uses the Cyrillic script and is the official language of Bulgaria.

    Slovak. Uses the Latin script and is the official language of Slovakia.

    Hungarian. Uses the Latin script and is the official language of Hungary.

    Slovenian. Uses the Latin script and is the official language of Slovenia.

    Latvian. Uses the Latin script and is the official language of Latvia.

    Estonian. Uses the Latin script and is the official language of Estonia.

    Lithuanian. Uses the Latin script and is the official language of Lithuania.

    Belarusian. Uses the Cyrillic script and is one of the official languages of Belarus.

    Greek. Uses the Greek alphabet and is the official language of Greece and Cyprus.

    Croatian. Uses the Latin script and is the official language of Croatia.

    Macedonian. Uses the Cyrillic script and is the official language of North Macedonia.

    Maltese. Uses the Latin script and is an official language of Malta.

    Serbian. Uses the Cyrillic script and is the official language of Serbia.

    Bosnian. Uses the Latin script and is one of the official languages of Bosnia and Herzegovina.

    Georgian. Uses the Georgian script and is the official language of Georgia.

    Armenian. Uses the Armenian alphabet and is the official language of Armenia.

    North Azerbaijani. Uses the Latin script and is the official language of Azerbaijan.

    Kazakh. Uses the Cyrillic script and is an official language of Kazakhstan.

    Northern Uzbek. Uses the Latin script and is the official language of Uzbekistan.

    Tajik. Uses the Cyrillic script and is the official language of Tajikistan.

    Swahili. Uses the Latin script and is a lingua franca or an official language in many East African countries.

    Afrikaans. Uses the Latin script and is primarily spoken in South Africa and Namibia.

    Cantonese. Uses Traditional Chinese characters and is the primary language spoken in Guangdong Province, Hong Kong, and Macao.

    Luxembourgish. Uses the Latin script and is an official language of Luxembourg. It is also spoken in parts of Germany.

    Limburgish. Uses the Latin script and is primarily spoken in the Netherlands, Belgium, and parts of Germany.

    Catalan. Uses the Latin script and is spoken in Catalonia and other parts of Spain.

    Galician. Uses the Latin script and is primarily spoken in the Galicia region of Spain.

    Asturian. Uses the Latin script and is primarily spoken in the Asturias region of Spain.

    Basque. Uses the Latin script. It is an official language of the Basque Autonomous Community in Spain and is primarily spoken throughout the Basque Country of Spain and France.

    Occitan. Uses the Latin script and is primarily spoken in the southern regions of France.

    Venetian. Uses the Latin script and is primarily spoken in the Veneto region of Italy.

    Sardinian. Uses the Latin script and is primarily spoken on the island of Sardinia in Italy.

    Sicilian. Uses the Latin script and is primarily spoken on the island of Sicily in Italy.

    Friulian. Uses the Latin script and is primarily spoken in Friuli-Venezia Giulia, Italy.

    Lombard. Uses the Latin script and is primarily spoken in the Lombardy region of Italy.

    Ligurian. Uses the Latin script and is primarily spoken in the Liguria region of Italy.

    Faroese. Uses the Latin script and is the official language of the Faroe Islands.

    Tosk Albanian. Uses the Latin script and is the southern dialect of Albanian.

    Silesian. Uses the Latin script and is primarily spoken in Poland.

    Bashkir. Uses the Cyrillic script and is primarily spoken in Bashkortostan, Russia.

    Tatar. Uses the Cyrillic script and is primarily spoken in Tatarstan, Russia.

    Mesopotamian Arabic. Uses the Arabic script and is primarily spoken in Iraq.

    Najdi Arabic. Uses the Arabic script and is primarily spoken in the Najd region of Saudi Arabia.

    Egyptian Arabic. Uses the Arabic script and is primarily spoken in Egypt.

    Levantine Arabic. Uses the Arabic script and is primarily spoken in Syria and Lebanon.

    Ta'izzi-Adeni Arabic. Uses the Arabic script and is primarily spoken in Yemen and the Hadhramaut region of Saudi Arabia.

    Dari. Uses the Arabic script and is one of the official languages of Afghanistan.

    Tunisian Arabic. Uses the Arabic script and is primarily spoken in Tunisia.

    Moroccan Arabic. Uses the Arabic script and is primarily spoken in Morocco.

    Kabuverdianu. Uses the Latin script and is primarily spoken in Cape Verde.

    Tok Pisin. Uses the Latin script and is a major lingua franca in Papua New Guinea.

    Eastern Yiddish. Uses the Hebrew script and is primarily spoken in Jewish communities.

    Sindhi. Uses the Arabic script and is an official language of the Sindh province in Pakistan.

    Sinhala. Uses the Sinhala script and is one of the official languages of Sri Lanka.

    Telugu. Uses the Telugu script and is an official language of the Indian states of Andhra Pradesh and Telangana.

    Punjabi. Uses the Gurmukhi script, is spoken in the Indian state of Punjab, and is an official language of India.

    Tamil. Uses the Tamil script and is an official language of the Indian state of Tamil Nadu and Sri Lanka.

    Gujarati. Uses the Gujarati script and is the official language of the Indian state of Gujarat.

    Malayalam. Uses the Malayalam script and is an official language of the Indian state of Kerala.

    Marathi. Uses the Devanagari script and is the official language of the Indian state of Maharashtra.

    Kannada. Uses the Kannada script and is the official language of the Indian state of Karnataka.

    Magahi. Uses the Devanagari script and is primarily spoken in the Indian state of Bihar.

    Oriya. Uses the Odia script and is one of the official languages of the Indian state of Odisha.

    Awadhi. Uses the Devanagari script and is primarily spoken in the Indian state of Uttar Pradesh.

    Maithili. Uses the Devanagari script, is spoken in the Indian state of Bihar and the Terai plains of Nepal, and is an official language of India.

    Assamese. Uses the Bengali script and is an official language of the Indian state of Assam.

    Chhattisgarhi. Uses the Devanagari script and is primarily spoken in the Indian state of Chhattisgarh.

    Bhojpuri. Uses the Devanagari script and is spoken in parts of India and Nepal.

    Minangkabau. Uses the Latin script and is primarily spoken on the island of Sumatra in Indonesia.

    Balinese. Uses the Latin script and is primarily spoken on the island of Bali in Indonesia.

    Javanese. Uses the Latin script, although the Javanese script is also traditionally used. It is widely spoken on the island of Java in Indonesia.

    Banjar. Uses the Latin script and is primarily spoken on the island of Kalimantan in Indonesia.

    Sundanese. Uses the Latin script, although the Sundanese script is also traditionally used. It is primarily spoken in the western part of the island of Java in Indonesia.

    Cebuano. Uses the Latin script and is primarily spoken in the Cebu region of the Philippines.

    Pangasinan. Uses the Latin script and is primarily spoken in the Pangasinan province of the Philippines.

    Iloko. Uses the Latin script and is primarily spoken in the Philippines.

    Waray (Philippines). Uses the Latin script and is primarily spoken in the Philippines.

    Haitian Creole. Uses the Latin script and is one of the official languages of Haiti.

    Papiamento. Uses the Latin script and is primarily spoken in Caribbean regions such as Aruba and Curaçao.

  5. Response format fixes: This update fixes response format issues from previous versions, such as incorrect Markdown, truncated responses, and incorrect boxed output.

The open-source Qwen3 models released in April 2025 do not support non-streaming output in thinking mode.
If an open-source Qwen3 model is in thinking mode but does not output a thought process, it is billed at the non-thinking mode rate.

Thinking mode | Non-thinking mode | Usage

International (Singapore)

Model

Mode

Context window

Max input

Max chain-of-thought

Max response

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Million tokens)

qwen3-next-80b-a3b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.15

$1.2

1 million tokens

Valid for 90 days after you activate Alibaba Cloud Model Studio

qwen3-next-80b-a3b-instruct

Non-thinking only

129,024

-

qwen3-235b-a22b-thinking-2507

Thinking only

126,976

81,920

$0.23

$2.3

qwen3-235b-a22b-instruct-2507

Non-thinking only

129,024

-

$0.92

qwen3-30b-a3b-thinking-2507

Thinking only

126,976

81,920

$0.2

$2.4

qwen3-30b-a3b-instruct-2507

Non-thinking only

129,024

-

$0.8

qwen3-235b-a22b

This model and the following models were released in April 2025.

Non-thinking mode

129,024

-

16,384

$0.7

$2.8

Thinking mode

98,304

38,912

$8.4

qwen3-32b

Non-thinking mode

129,024

-

$0.16

$0.64

Thinking mode

98,304

38,912

qwen3-30b-a3b

Non-thinking mode

129,024

-

$0.2

$0.8

Thinking mode

98,304

38,912

$2.4

qwen3-14b

Non-thinking mode

129,024

-

8,192

$0.35

$1.4

Thinking mode

98,304

38,912

$4.2

qwen3-8b

Non-thinking mode

129,024

-

$0.18

$0.7

Thinking mode

98,304

38,912

$2.1

qwen3-4b

Non-thinking mode

129,024

-

$0.11

$0.42

Thinking mode

98,304

38,912

$1.26

qwen3-1.7b

Non-thinking mode

32,768

30,720

-

$0.42

Thinking mode

28,672

The total value cannot exceed 30,720.

$1.26

qwen3-0.6b

Non-thinking mode

30,720

-

$0.42

Thinking mode

28,672

The total of the value and the input cannot exceed 30,720.

$1.26

Mainland China (Beijing)

Model

Mode

Context window

Max input

Max chain-of-thought

Max response

Input cost

Output cost

(Tokens)

(Million tokens)

qwen3-next-80b-a3b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.144

$1.434

qwen3-next-80b-a3b-instruct

Non-thinking only

129,024

-

$0.574

qwen3-235b-a22b-thinking-2507

Thinking only

126,976

81,920

$0.287

$2.868

qwen3-235b-a22b-instruct-2507

Non-thinking only

129,024

-

$1.147

qwen3-30b-a3b-thinking-2507

Thinking only

126,976

81,920

$0.108

$1.076

qwen3-30b-a3b-instruct-2507

Non-thinking only

129,024

-

$0.431

qwen3-235b-a22b

Non-thinking

129,024

-

16,384

$0.287

$1.147

Thinking

98,304

38,912

$2.868

qwen3-32b

Non-thinking

129,024

-

$0.287

$1.147

Thinking

98,304

38,912

$2.868

qwen3-30b-a3b

Non-thinking

129,024

-

$0.108

$0.431

Thinking

98,304

38,912

$1.076

qwen3-14b

Non-thinking

129,024

-

8,192

$0.144

$0.574

Thinking

98,304

38,912

$1.434

qwen3-8b

Non-thinking

129,024

-

$0.072

$0.287

Thinking

98,304

38,912

$0.717

qwen3-4b

Non-thinking

129,024

-

$0.044

$0.173

Thinking

98,304

38,912

$0.431

qwen3-1.7b

Non-thinking

32,768

30,720

-

$0.173

Thinking

28,672

The sum of input and chain-of-thought tokens must not exceed 30,720.

$0.431

qwen3-0.6b

Non-thinking

30,720

-

$0.173

Thinking

28,672

The sum of input and chain-of-thought tokens must not exceed 30,720.

$0.431

QwQ-Open-source

QwQ reasoning model trained on Qwen2.5-32B. Reinforcement learning has significantly improved its inference capabilities. Core metrics for math and code (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench) are on par with the full-power version of DeepSeek-R1. All metrics significantly exceed those of DeepSeek-R1-Distill-Qwen-32B, which is also based on Qwen2.5-32B. Usage | API reference

Note

This feature is only available in the China (Beijing) region.

Model

Context window

Max input

Max chain-of-thought

Max output

Input price

Output price

(Tokens)

(Million tokens)

qwq-32b

131,072

98,304

32,768

8,192

$0.287

$0.861

QwQ-Preview

The qwq-32b-preview model is an experimental research model developed by the Qwen team in 2024. It focuses on enhancing AI reasoning capabilities, especially in math and programming. For more information about the limitations of the qwq-32b-preview model, see the QwQ official blog. Usage | API reference | Try it online

Note

This feature is only available in the China (Beijing) region.

Model

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwq-32b-preview

32,768

30,720

16,384

$0.287

$0.861

Qwen2.5

Qwen2.5 is a series of Qwen large language models. For Qwen2.5, we have released a series of base language models and instruction-tuned language models with parameter sizes ranging from 7 billion to 72 billion. Qwen2.5 includes the following improvements over Qwen2:

  • It is pre-trained on our latest large-scale dataset, which contains up to 18 trillion tokens.

  • Our specialized expert models in these fields have significantly increased the model's knowledge and greatly improved its coding and math capabilities.

  • It has significant improvements in following instructions, generating long text (over 8K tokens), understanding structured data (such as tables), and generating structured output (especially JSON). It is more resilient to the diversity of system prompts, which enhances the implementation of role-play and conditional settings for chatbots.

  • It supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.

UsageAPI referenceTry it online

International (Singapore)

Model

Context window

Max input

Max output

Input price

Output price

Free quota

(Tokens)

(Million tokens)

qwen2.5-14b-instruct-1m

1,008,192

1,000,000

8,192

$0.805

$3.22

1 million input and 1 million output tokens

Valid for 90 days after you activate Alibaba Cloud Model Studio.

qwen2.5-7b-instruct-1m

$0.368

$1.47

qwen2.5-72b-instruct

131,072

129,024

$1.40

$5.60

qwen2.5-32b-instruct

$0.70

$2.80

qwen2.5-14b-instruct

$0.35

$1.40

Qwen2.5-7B-Instruct

$0.175

$0.700

Mainland China (Beijing)

Model

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen2.5-14b-instruct-1m

1,000,000

1,000,000

8,192

$0.144

$0.431

qwen2.5-7b-instruct-1m

$0.072

$0.144

qwen2.5-72b-instruct

131,072

129,024

$0.574

$1.721

qwen2.5-32b-instruct

$0.287

$0.861

qwen2.5-14b-instruct

$0.144

$0.431

qwen2.5-7b-instruct

$0.072

$0.144

qwen2.5-3b-instruct

32,768

30,720

$0.044

$0.130

qwen2.5-1.5b-instruct

Free for a limited time

qwen2.5-0.5b-instruct

QVQ

The qvq-72b-preview model is an experimental research model developed by the Qwen team. It focuses on enhancing visual reasoning capabilities, especially in mathematical reasoning. For more information about the limitations of the qvq-72b-preview model, see the QVQ official blog. Usage | API reference

To have the model output the thinking process before the final answer, you can use the commercial version of the QVQ model.
Note

This feature is only available in the China (Beijing) region.

Model

Context window

Max input

Max output

Input Cost

Output cost

Tokens

Per million tokens

qvq-72b-preview

32,768

16,384

Maximum 16,384 tokens per image

16,384

$1.721

$5.161

Qwen-Omni

This is a new multimodal large model for understanding and generation, trained on Qwen2.5. It supports text, image, speech, and video inputs, and can generate text and speech simultaneously in a stream. The speed of multimodal content understanding is significantly improved. Usage | API reference

International (Singapore)

Model

Context window

Max input

Max output

Free quota

(Note)

(Tokens)

qwen2.5-omni-7b

32,768

30,720

2,048

1 million tokens (regardless of modality)

Valid for 90 days after activating Alibaba Cloud Model Studio.

After the free quota is used up, the following billing rules apply to inputs and outputs:

Input item

Price (Million tokens)

Text

$0.10

Audio

$6.76

Image/Video

$0.28

Output item

Price (Million tokens)

Text

$0.40 (if the input contains only text)

$0.84 (if the input contains images, audio, or video)

Text and audio

$13.51 (for the audio component)

The text portion of the output is not billed.

Mainland China (Beijing)

Model

Context window

Max input

Max output

(Tokens)

qwen2.5-omni-7b

32,768

30,720

2,048

The billing rules for inputs and outputs are as follows:

Input

Price (Million tokens)

Text

$0.087

Audio

$5.448

Image or video

$0.287

Output

Price (Million tokens)

Text

$0.345 (if the input is text-only)

$0.861 (if the input includes images, audio, or video)

Text and audio

$10.895 (for the audio portion)

The text portion of the output is not billed.

Qwen3-Omni-Captioner

Qwen3-Omni-Captioner is an open-source model based on Qwen3-Omni. Without any prompts, it automatically generates accurate and comprehensive descriptions for complex audio, including speech, ambient sounds, music, and sound effects. It can identify speaker emotions, musical elements (such as style and instruments), and sensitive information, making it suitable for applications such as audio content analysis, security audits, intent recognition, and audio editing. Usage | API reference

Note

This model is available only in the Singapore region.

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Million tokens)

qwen3-omni-30b-a3b-captioner

65,536

32,768

32,768

$3.81

$3.06

1 million tokens

Validity: 90 days after you activate Alibaba Cloud Model Studio

Qwen-VL

This is the open-source version of Alibaba Cloud's Qwen-VL. Usage | API reference

The Qwen3-VL model offers significant improvements over Qwen2.5-VL:

  • Agent interaction: It operates computer and mobile phone interfaces, detects graphical user interface (GUI) elements, understands features, and invokes tools to perform tasks. It achieves top-tier performance in evaluations such as OS World.

  • Visual encoding: It generates code from images or videos. You can use this feature to create HTML, CSS, and JS code from design drafts or website screenshots.

  • Spatial intelligence: It supports 2D and 3D positioning and accurately determines object orientation, perspective changes, and occlusion relationships.

  • Long video understanding: It understands video content up to 20 minutes long and can pinpoint specific moments with second-level accuracy.

  • Deep thinking: It excels at capturing details and analyzing causality, achieving top-tier performance in evaluations such as MathVista and MMMU.

  • OCR: It supports 33 languages and performs more stably in scenarios that involve complex lighting, blur, or tilt. It also significantly improves recognition accuracy for rare characters, ancient script, and technical terms.

    Supported languages

    The model supports the following 33 languages: Chinese, Japanese, Korean, Indonesian, Vietnamese, Thai, English, French, German, Russian, Portuguese, Spanish, Italian, Swedish, Danish, Czech, Norwegian, Dutch, Finnish, Türkiye, Polish, Swahili, Romanian, Serbian, Greek, Kazakh, Uzbek, Cebuano, Arabic, Urdu, Persian, Hindi/Devanagari, and Hebrew.

International (Singapore)

Model

Mode

Context window

Max input

Max chain-of-thought

Max response

Input cost

Output cost

CoT + response

Free quota

(Note)

(Tokens)

(Million tokens)

qwen3-vl-235b-a22b-thinking

Thinking only

126,976

81,920

$0.4

$4

1 million tokens each

Valid for 90 days after Model Studio is activated.

qwen3-vl-235b-a22b-instruct

Non-thinking only

129,024

-

$1.6

qwen3-vl-32b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.16

$0.64

qwen3-vl-32b-instruct

Non-thinking only

129,024

-

qwen3-vl-30b-a3b-thinking

Thinking only

126,976

81,920

$0.2

$2.4

qwen3-vl-30b-a3b-instruct

Non-thinking only

129,024

-

$0.8

qwen3-vl-8b-thinking

Thinking only

126,976

81,920

$0.18

$2.1

qwen3-vl-8b-instruct

Non-thinking only

129,024

-

$0.7

More models

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Million tokens)

qwen2.5-vl-72b-instruct 

131,072

129,024

Maximum 16,384 per image

8,192

$2.8

$8.4

1 million tokens

Valid for 90 days after you activate Alibaba Cloud Model Studio.

qwen2.5-vl-32b-instruct

$1.4

$4.2

qwen2.5-vl-7b-instruct

$0.35

$1.05

qwen2.5-vl-3b-instruct

$0.21

$0.63

Mainland China (Beijing)

Model

Mode

Context window

Max input

Max chain-of-thought

Max response

Input cost

Output cost

CoT + response

Free quota

(Note)

(Tokens)

(Million tokens)

qwen3-vl-235b-a22b-thinking

Thinking only

131,072

126,976

81,920

$0.286705

$2.867051

No free quota

qwen3-vl-235b-a22b-instruct

Non-thinking only

129,024

-

$1.146820

qwen3-vl-32b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.287

$2.868

qwen3-vl-32b-instruct

Non-thinking only

129,024

-

$1.147

qwen3-vl-30b-a3b-thinking

Thinking only

126,976

81,920

$0.108

$1.076

qwen3-vl-30b-a3b-instruct

Non-thinking only

129,024

-

$0.431

qwen3-vl-8b-thinking

Thinking only

126,976

81,920

$0.072

$0.717

qwen3-vl-8b-instruct

Non-thinking only

129,024

-

$0.287

More models

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Million tokens)

qwen2.5-vl-72b-instruct 

131,072

129,024

Maximum 16,384 per image

8,192

$2.294

$6.881

No free quota

qwen2.5-vl-32b-instruct

$1.147

$3.441

qwen2.5-vl-7b-instruct

$0.287

$0.717

qwen2.5-vl-3b-instruct

$0.173

$0.517

qwen2-vl-72b-instruct

32,768

30,720

Maximum 16,384 per image

2,048

$2.294

$6.881

Qwen-Math

This is a language model built on the Qwen model that is specialized for solving mathematical problems. Qwen2.5-Math supports Chinese and English and integrates multiple reasoning methods, including Chain of Thought (CoT), Program of Thought (PoT), and Tool-Integrated Reasoning (TIR). Usage | API reference | Try it online

Note

This feature is only available in the China (Beijing) region.

Model

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen2.5-math-72b-instruct

4,096

3,072

3,072

$0.574

$1.721

qwen2.5-math-7b-instruct

$0.144

$0.287

qwen2.5-math-1.5b-instruct

Free for a limited time

Qwen-Coder

Qwen-Coder is an open source code model from Qwen. The latest Qwen3-Coder series has powerful Coding Agent capabilities. It excels at tool calling, environment interaction, and autonomous programming. The model combines excellent coding skills with general-purpose capabilities. Usage | API reference

International (Singapore)

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

Token count

qwen3-coder-480b-a35b-instruct

262,144

204,800

65,536

Tiered pricing. See the description below the table.

1 million input tokens and 1 million output tokens

Valid for 90 days after you activate Alibaba Cloud Model Studio.

qwen3-coder-30b-a3b-instruct

Billing for qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct is tiered based on the number of input tokens per request.

Model

Input tokens per request

Input cost (Million tokens)

Output cost (Million tokens)

qwen3-coder-480b-a35b-instruct

0 < Tokens ≤ 32K

$1.50

$7.50

32K < Tokens ≤ 128K

$2.70

$13.50

128K < Tokens ≤ 200K

$4.50

$22.50

qwen3-coder-30b-a3b-instruct

0 < Tokens ≤ 32K

$0.45

$2.25

32K < Tokens ≤ 128K

$0.75

$3.75

128K < Tokens ≤ 200K

$1.20

$6.00

Mainland China (Beijing)

Model

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen3-coder-480b-a35b-instruct

262,144

204,800

65,536

Tiered pricing. See the description below.

qwen3-coder-30b-a3b-instruct

qwen2.5-coder-32b-instruct

131,072

129,024

8,192

$0.287

$0.861

qwen2.5-coder-14b-instruct

qwen2.5-coder-7b-instruct

$0.144

$0.287

qwen2.5-coder-3b-instruct

32,768

30,720

Limited-time free trial

qwen2.5-coder-1.5b-instruct

qwen2.5-coder-0.5b-instruct

Billing for qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct is tiered based on the number of input tokens per request.

Model

Input tokens per request

Input cost (Million tokens)

Output cost (Million tokens)

qwen3-coder-480b-a35b-instruct

0 < Tokens ≤ 32K

$0.861

$3.441

32K < Tokens ≤ 128K

$1.291

$5.161

128K < Tokens ≤ 200K

$2.151

$8.602

qwen3-coder-30b-a3b-instruct

0 < Tokens ≤ 32K

$0.216

$0.861

32K < Tokens ≤ 128K

$0.323

$1.291

128K < Tokens ≤ 200K

$0.538

$2.151

Text generation - Third-party models

DeepSeek

DeepSeek is a large language model launched by DeepSeek AI. API reference | Try it online

Note

This feature is only available in the China (Beijing) region.

Model

Context window

Max input

Max chain-of-thought

Max response

Input cost

Output cost

(Tokens)

(Million tokens)

deepseek-v3.2-exp

685B full-power version

131,072

98,304

32,768

65,536

$0.287

$0.431

deepseek-v3.1

685B full-power version

$0.574

$1.721

deepseek-r1

685B full-power version

16,384

$2.294

deepseek-r1-0528

685B full-power version

deepseek-v3

671B full-power version

131,072

Not applicable

$0.287

$1.147

deepseek-r1-distill-qwen-1.5b

Based on Qwen2.5-Math-1.5B

32,768

32,768

16,384

16,384

Limited-time free trial

deepseek-r1-distill-qwen-7b

Based on Qwen2.5-Math-7B

$0.072

$0.144

deepseek-r1-distill-qwen-14b

Based on Qwen2.5-14B

$0.144

$0.431

deepseek-r1-distill-qwen-32b

Based on Qwen2.5-32B

$0.287

$0.861

deepseek-r1-distill-llama-8b

Based on Llama-3.1-8B

Limited-time free trial

deepseek-r1-distill-llama-70b

Based on Llama-3.3-70B

Kimi

Kimi-K2 is the first open-source trillion-parameter Mixture of Experts (MoE) model in China, provided by Moonshot AI. It activates 32 billion parameters and has excellent coding and tool-calling capabilities. Usage | Try it online

Note

This feature is only available in the China (Beijing) region.

Model

Context window

Max input

Max chain-of-thought

Max response

Input price

Output price

(Tokens)

(Million tokens)

kimi-k2-thinking

262,144

229,376

32,768

16,384

$0.574

$2.294

Moonshot-Kimi-K2-Instruct

131,072

131,072

-

8,192

$0.574

$2.294

Image generation

Qwen text-to-image

The Qwen text-to-image model excels at complex text rendering, especially for Chinese and English text.Currently, qwen-image-plus has the same capabilities as qwen-image, but qwen-image-plus has a lower price.API reference

International (Singapore)

Model

Unit price

Free quota

qwen-image-plus

$0.03/image

Free quota: 100 images for each model

Validity period: Within 90 days after you activate Alibaba Cloud Model Studio.

qwen-image

$0.035/image

Mainland China (Beijing)

Model

Unit price

Free quota

qwen-image-plus

$0.028671/image

No free quota

qwen-image

$0.035/image

Input prompt

Output image

Healing-style hand-drawn poster featuring three puppies playing with a ball on lush green grass, adorned with decorative elements such as birds and stars. The main title “Come Play Ball!” is prominently displayed at the top in bold, blue cartoon font. Below it, the subtitle “Come [Show Off Your Skills]!” appears in green font. A speech bubble adds playful charm with the text: “Hehe, watch me amaze my little friends next!” At the bottom, supplementary text reads: “We get to play ball with our friends again!” The color palette centers on fresh greens and blues, accented with bright pink and yellow tones to highlight a cheerful, childlike atmosphere.

image

Qwen image editing

The Qwen image editing model supports precise text editing in Chinese and English. It also supports operations such as color adjustment, detail enhancement, style transfer, adding or removing objects, and changing positions and actions. These features enable complex editing of images and text. API reference

International (Singapore)

Model

Unit price

Free quota

qwen-image-edit-plus

$0.03/image

Free quota: 100 images for each model

Validity period: Within 90 days after you activate Alibaba Cloud Model Studio.

qwen-image-edit-plus-2025-10-30

$0.03/image

qwen-image-edit

$0.045/image

Mainland China (Beijing)

Model

Unit price

Free quota

qwen-image-edit-plus

$0.028671/image

No free quota

qwen-image-edit-plus-2025-10-30

$0.028671/image

qwen-image-edit

$0.043/image

dog_and_girl (1)

Original image

狗修改图

Make the person stand up and bend over to hold the front paw of the dog.

image

Original image

image

Replace the text 'HEALTH INSURANCE' on the letter blocks with '明天会更好'.

5

Original image

5out

Replace the dotted shirt with a light blue shirt.

6

Original image

6out

Change the background to Antarctica.

7

Original image

7out

Generate a cartoon profile picture of the person.

image

Original image

image

Remove the hair from the dinner plate.

Qwen image translation

The Qwen image translation model supports translating text from images in 11 languages into Chinese or English. It accurately preserves the original layout and content information and provides custom features such as term definition, sensitive word filtering, and image entity detection. API reference

Note

This feature is only available in the China (Beijing) region.

Model

Unit price

Free quota

qwen-mt-image

$0.000431/image

No free quota

en

Original image

ja

Japanese

es

Portuguese

ar

Arabic

Wan text-to-image

The Wan text-to-image model generates exquisite images from text. API reference | Try it online

International (Singapore)

Model

Description

Unit price

Free quota(Note)

Validity period: Within 90 days after you activate Alibaba Cloud Model Studio.

wan2.5-t2i-preview Recommended

Wan 2.5 preview. The single-side length limit is removed. You can freely select dimensions within the total pixel area and aspect ratio constraints.

$0.03/image

50 images

wan2.2-t2i-plus Recommended

Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture.

$0.05/image

100 images

wan2.2-t2i-flash Recommended

Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture.

$0.025/image

100 images

wan2.1-t2i-plus

Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details.

$0.05/image

200 images

wan2.1-t2i-turbo

Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed.

$0.025/image

200 images

Mainland China (Beijing)

Model

Description

Unit price

Free quota(Note)

Validity period: Within 90 days after you activate Alibaba Cloud Model Studio.

wan2.5-t2i-preview Recommended

Wan 2.5 preview. The single-side length limit is removed. You can freely select dimensions within the total pixel area and aspect ratio constraints.

$0.028671/image

No free quota

wan2.2-t2i-plus Recommended

Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture.

$0.02007/image

No free quota

wan2.2-t2i-flash Recommended

Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture.

$0.028671/image

No free quota

wanx2.1-t2i-plus

Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details.

$0.028671/image

No free quota

wanx2.1-t2i-turbo

Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed.

$0.020070/image

No free quota

wanx2.0-t2i-turbo

Wan 2.0 Turbo Edition. Excels at textured portraits and creative designs. It is cost-effective.

$0.005735/image

No free quota

Input prompt

Output image

A needle-felted Santa Claus holding a gift and a white cat standing next to him against a background of colorful gifts and green plants, creating a cute, warm, and cozy scene.

image

Wan2.5 general image editing

The Wan2.5 general image editing model supports entity-consistent image editing and multi-image fusion. It accepts text, a single image, or multiple images as input. API reference

International (Singapore)

Model

Unit price

Free quota(Note)

Validity period: Within 90 days after you activate Alibaba Cloud Model Studio.

wan2.5-i2i-preview

$0.03/image

50 images

Mainland China (Beijing)

Model

Unit price

Free quota

wan2.5-i2i-preview

$0.028671/image

No free quota

Feature

Input example

Output image

Single-image editing

damotest2023_Portrait_photography_outdoors_fashionable_beauty_409ae3c1-19e8-4515-8e50-b3c9072e1282_2-转换自-png

Change the floral dress to a vintage-style lace long dress with exquisite embroidery details on the collar and cuffs.

a26b226d-f044-4e95-a41c-d1c0d301c30b-转换自-png

Multi-image fusion

图像编辑2图像编辑2

Place the alarm clock from image 1 next to the vase on the dining table in image 2.

图像编辑2

Wan2.1 general image editing

The Wan2.1 general image editing model performs diverse image editing with simple instructions. It is suitable for scenarios such as outpainting, watermark removal, style transfer, image restoration, and image enhancement. UsageAPI reference

Note

This feature is only available in the China (Beijing) region.

Model

Unit price

Free quota

wanx2.1-imageedit

$0.020070 per image

No free quota

The general image editing model currently supports the following features:

Feature

Input image

Input prompt

Output image

Global stylization

image

Convert to a French picture book style.

image

Local stylization

image

Change the house to a wooden plank style.

image

Instruction-based editing

image

Change the girl's hair to red.

image

Inpainting

Input image

image

Masked image (The white area is the mask)

image

A ceramic rabbit holding a ceramic flower.

Output image

image

Text watermark removal

image

Remove the text from the image.

image

Outpainting

20250319105917

A green fairy.

image

Image super-resolution

Blurry image

image

Image super-resolution.

Clear image

image

Image colorization

image

Blue background, yellow leaves.

image

Line art to image

image

A living room in a minimalist Nordic style.

image

Placeholder Image

image

A cartoon character cautiously peeks out, spying on a brilliant blue gem inside the room.

image

OutfitAnyone

  • Compared to the basic version, the OutfitAnyone-Plus model offers improvements in image definition, clothing texture details, and logo restoration. However, it takes longer to generate images and is suitable for scenarios that are not time-sensitive. API reference | Try it online

  • OutfitAnyone-Image Parsing supports parsing model and clothing images, which can be used for pre-processing and post-processing of OutfitAnyone images. API reference

Note

This feature is only available in the China (Beijing) region.

Model

Description

Sample input

Sample output

aitryon-plus

OutfitAnyone-Plus

output26

output29

aitryon-parsing-v1

OutfitAnyone-Image Parsing

OutfitAnyone pricing

Model Service

Model

Unit Price

Discount

Tier

OutfitAnyone-Plus

aitryon-plus

$0.071677/image

None

None

OutfitAnyone-Image Parsing

aitryon-parsing-v1

$0.000574/image

None

None

Video generation - Wan

Text-to-video

The Wan text-to-video model generates videos from a single sentence. The videos feature rich artistic styles and cinematic quality. API reference | Try it online

International (Singapore)

Model

Description

Unit price

Free quota (Claim)

Valid for 90 days after you activate Alibaba Cloud Model Studio

wan2.5-t2v-preview Recommended

Wan 2.5 preview. Supports automatic voiceover and custom audio file input.

480p: $0.05/second

720p: $0.10/second

1080p: $0.15/second

50 seconds

wan2.2-t2v-plus Recommended

Wan 2.2 Professional Edition. Significantly improved image detail and motion stability.

480p: $0.02/second

1080p: $0.10/second

50 seconds

wan2.1-t2v-turbo

Wan 2.1 Turbo Edition. Fast generation speed and balanced performance.

$0.036/second

200 seconds

wan2.1-t2v-plus

Wan 2.1 Professional Edition. Generates rich details and higher-quality images.

$0.10/second

200 seconds

Mainland China (Beijing)

Model

Description

Unit price

Free quota

wan2.5-t2v-preview Recommended

Wan 2.5 preview. Supports automatic voiceover and custom audio file input.

480p: $0.043006/second

720p: $0.086012/second

1080p: $0.143353/second

No free quota

wan2.2-t2v-plus Recommended

Wan 2.2 Professional Edition. Significantly improved image detail and motion stability.

480p: $0.02007/second

1080p: $0.100347/second

No free quota

wanx2.1-t2v-turbo

Faster generation speed and balanced performance.

$0.034405/second

No free quota

wanx2.1-t2v-plus

Generates richer details and higher-quality images.

$0.100347/second

No free quota

Input example

Output video (wan2.5)

Input prompt: Shot from a low angle, in a medium close-up, with warm tones, mixed lighting (the practical light from the desk lamp blends with the overcast light from the window), side lighting, and a central composition. In a classic detective office, wooden bookshelves are filled with old case files and ashtrays. A green desk lamp illuminates a case file spread out in the center of the desk. A fox, wearing a dark brown trench coat and a light gray fedora, sits in a leather chair, its fur crimson, its tail resting lightly on the edge, its fingers slowly turning yellowed pages. Outside, a steady drizzle falls beneath a blue sky, streaking the glass with meandering streaks. It slowly raises its head, its ears twitching slightly, its amber eyes gazing directly at the camera, its mouth clearly moving as it speaks in a smooth, cynical voice: 'The case was cold, colder than a fish in winter. But every chicken has its secrets, and I, for one, intended to find them '.

Input audio:

Image-to-video - based on the first frame

The Wan image-to-video model uses an input image as the first frame of a video. It then generates the rest of the video based on a prompt. The videos feature rich artistic styles and cinematic quality.API reference | Try it online

International (Singapore)

Model

Description

Unit price

Free quota (Note)

Validity: Within 90 days after you activate Alibaba Cloud Model Studio

wan2.5-i2v-preview Recommended

Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads.

480P: $0.05/second

720P: $0.10/second

1080P: $0.15/second

50 seconds

wan2.2-i2v-flash Recommended

Wan 2.2 Flash Edition. Delivers extremely fast generation speed with significant improvements in visual detail and motion stability.

480P: $0.015/second

720P: $0.036/second

50 seconds

wan2.2-i2v-plus Recommended

Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability.

480P: $0.02/second

1080P: $0.10/second

50 seconds

wan2.1-i2v-turbo

Wan 2.1 Turbo Edition. Fast generation speed with balanced performance.

$0.036/second

200 seconds

wan2.1-i2v-plus

Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals.

$0.10/second

200 seconds

Mainland China (Beijing)

Model

Description

Unit price

Free quota

wan2.5-i2v-preview Recommended

Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads.

480P: $0.043006/second

720P: $0.086012/second

1080P: $0.143353/second

No free quota

wan2.2-i2v-plus Recommended

Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability.

480P: $0.02007/second

1080P: $0.100347/second

No free quota

wanx2.1-i2v-turbo

Wan 2.1 Turbo Edition. Fast generation speed with balanced performance.

$0.034405/second

No free quota

wanx2.1-i2v-plus

Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals.

$0.100347/second

No free quota

Input first frame image and audio

Output video (wan2.5)

rap-转换自-png

Input audio:

Input prompt: A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of the boy's rap, with no other dialogue or noise.

Image-to-video - based on the first and last frames

The Wan first-and-last-frame video model generates a smooth, dynamic video from a prompt. You only need to provide the first and last frame images. The videos feature rich artistic styles and cinematic quality. API reference | Try it online

International (Singapore)

Model

Unit price

Free quota (Note)

wan2.1-kf2v-plus

$0.10/second

200 seconds

Validity period: Within 90 days after you activate Model Studio

Mainland China (Beijing)

Model

Unit price

Free quota (Note)

wanx2.1-kf2v-plus

$0.100347/second

No free quota

Example input

Output video

First frame

Last frame

Prompt

first_frame

last_frame

In a realistic style, the camera starts at eye level with a small black cat looking up at the sky with curiosity, then gradually moves upward, ending in a top-down shot focused on the cat's curious eyes.

General video editing

The Wan unified video editing model supports multimodal inputs, including text, images, and videos. It can perform video generation and general editing tasks. API reference | Try it online

International (Singapore)

Model

Unit price

Free quota (Note)

wan2.1-vace-plus

$0.1/s

50 seconds

Validity: Valid for 90 days after Model Studio activation.

Mainland China (Beijing)

Model

Unit price

Free quota (Note)

wanx2.1-vace-plus

$0.100347/s

No free quota

The unified video editing model supports the following features:

Feature

Input reference image

Input prompt

Output video

Multi-image reference

Reference image 1 (reference entity)

image

Reference image 2 (reference background)

image

In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When the girl stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records the girl's wonderful encounter with nature.

Output video

Video repainting

The video shows a black steampunk-style car driven by a gentleman. The car is decorated with gears and copper pipes. The background features a steam-powered candy factory and retro elements, creating a vintage and playful scene.

Local editing

Input video

Input mask image (The white area indicates the editing area)

mask

The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, taking a gentle sip with a relaxed expression. The cafe is tastefully decorated, with soft hues and warm lighting illuminating the area where the lion is.

The content in the editing area is modified based on the prompt.

Video extension

Input first clip (1 second)

A dog wearing sunglasses is skateboarding on the street, 3D cartoon.

Output extended video (5 seconds)

Video outpainting

An elegant lady is passionately playing the violin, with a full symphony orchestra behind her.

Wan - digital human

This feature generates natural-looking videos of people speaking, singing, or performing, based on a single character image and an audio file. To use this feature, call the following models in sequence. wan2.2-s2v image detection | wan2.2-s2v video generation

Note

This feature is only available in the China (Beijing) region.

Model

Description

Unit price

wan2.2-s2v-detect

Checks whether an input image meets requirements, such as definition, a single person, and a frontal view.

$0.000574/image

wan2.2-s2v

Generates a dynamic video of a person from a valid image and an audio clip.

480P: $0.071677/second

720P: $0.129018/second

Sample input

Output video

p1001125-转换自-jpeg

Input audio:

Wan - animate image

Available in standard and professional modes. The model transfers the actions and expressions from a reference video to a character image, generating a video that animates the character from the image. API reference.

International (Singapore)

Model

Service

Description

Unit price

Free quota (View)

wan2.2-animate-move

Standard mode wan-std

Fast generation speed. Meets basic needs such as simple animation demos. Cost-effective.

$0.12/second

The two services share 50 seconds

Professional mode wan-pro

High animation smoothness. Natural transitions for actions and expressions. The result is similar to a live-action video.

$0.18/second

Mainland China (Beijing)

Model

Service

Description

Unit price

Free quota (View)

wan2.2-animate-move

Standard mode wan-std

Fast generation speed. Meets basic needs such as simple animation demos. Cost-effective.

$0.06/second

No free quota

Professional mode wan-pro

High animation smoothness. Natural transitions for actions and expressions. The result is similar to a live-action video.

$0.09/second

Character image

Reference video

Output video (standard)

Output video (professional)

move_input_image

Wan - video face swap

Available in standard and professional modes. The model replaces the main character in a video with a character from an image. It preserves the original video's scene, lighting, and hue. API reference.

International (Singapore)

Model

Service

Description

Unit price

Free quota (View)

wan2.2-animate-mix

Standard mode wan-std

Generates animations quickly. Ideal for basic requirements, such as simple demos. Highly cost-effective.

$0.18/s

The two services share 50 seconds

Professional mode wan-pro

Produces highly smooth animations with natural transitions for actions and expressions. The result closely resembles a live-action video.

$0.26/s

Mainland China (Beijing)

Model

Service

Description

Unit price

Free quota (View)

wan2.2-animate-mix

Standard mode wan-std

Generates animations quickly. Ideal for basic requirements, such as simple demos. Highly cost-effective.

$0.09/s

No free quota

Professional mode wan-pro

Produces highly smooth animations with natural transitions for actions and expressions. The result closely resembles a live-action video.

$0.13/s

Character image

Reference video

Output video (standard)

Output video (professional)

mix_input_image

AnimateAnyone

This feature generates character motion videos based on a character image and a motion template. To use this feature, call the following three models in sequence. AnimateAnyone image detection API details | AnimateAnyone motion template generation | AnimateAnyone video generation API details

Note

This feature is only available in the China (Beijing) region.

Model

Description

Unit price

animate-anyone-detect-gen2

Detects whether an input image meets the required specifications.

$0.000574/image

animate-anyone-template-gen2

Extracts character motion from a video and generates a motion template.

$0.011469/second

animate-anyone-gen2

Generates a character motion video based on a character image and a motion template.

Input: Character image

Input: Motion video

Output (image background)

Output (video background)

04-9_16

Note
  • The preceding example was generated by the Tongyi App, which integrates AnimateAnyone.

  • The content generated by the AnimateAnyone model is video only and does not include audio.

EMO

This feature generates dynamic portrait videos based on a portrait image and a human voice audio file. To use this feature, call the following models in sequence. EMO image detection | EMO video generation

Note

This feature is only available in the China (Beijing) region.

Model

Description

Unit price

emo-detect-v1

Detects if an input image meets the required specifications. This model can be called directly without deployment.

$0.000574/image

emo-v1

Generates a dynamic portrait video. This model can be called directly without deployment.

  • Generating a 1:1 aspect ratio video: $0.011469/second

  • Generating a 3:4 aspect ratio video: $0.022937/second

Input: Portrait image and human voice audio file

Output: Dynamic portrait video

Portrait:

上春山

Human voice audio: See the video on the right.

Character video:

Action style intensity: active ("style_level": "active")

LivePortrait

This is a lightweight model that quickly generates dynamic portrait videos based on a portrait image and a human voice audio file. Compared to the EMO model, it generates videos faster and at a lower cost, but the quality is not as good. To use this feature, call the following two models in sequence. LivePortrait image detection | LivePortrait video generation

Note

This feature is only available in the China (Beijing) region.

Model

Description

Unit price

liveportrait-detect

Detects whether the input image meets the requirements.

$0.000574/image

liveportrait

Generates a dynamic portrait video.

$0.002868/second

Input: Portrait image and voice audio file

Output: Animated portrait video

Portrait image:

Emoji男孩

Voice audio: From the video on the right.

Portrait video:

Emoji

This feature generates dynamic face videos based on a face image and preset facial motion templates. This capability can be used for scenarios such as creating emojis and generating video materials. To use this feature, call the following models in sequence. Emoji image detection | Emoji video generation

Note

This feature is only available in the China (Beijing) region.

Model

Description

Unit price

emoji-detect-v1

Detects whether an input image meets the specified requirements.

$0.000574/image

emoji-v1

Generates a character emoji based on a portrait image and a specified emoji template.

$0.011469/second

Input: Portrait image

Output: Dynamic portrait video

image.png

Template parameter for the "happy" emoji: ("input.driven_id": "mengwa_kaixin")

VideoRetalk

This feature generates a video where the character's lip movements match the input audio, based on a character video and a human voice audio file. To use this feature, call the following model. API reference

Note

This feature is only available in the China (Beijing) region.

Model

Description

Unit price

videoretalk

Generates a new video where the character's lip movements are synchronized with the input audio.

$0.011469/second

Video style transfer

This model supports generating videos in different styles that match the semantic description of user-input text, or restyling a user-input video. API reference

Note

This feature is only available in the China (Beijing) region.

Model

Description

Unit price

video-style-transform

Transforms an input video into styles such as Japanese manga or American comics.

720p

$0.071677/second

540p

$0.028671/second

Input video

Output video (Japanese manga style)

Speech synthesis (text-to-speech)

Qwen-TTS

Qwen-TTS is a speech synthesis model from the Qwen series. It supports Chinese, English, and mixed Chinese-English text input, and streams audio output. Usage | API reference

International (Singapore)

Model

Version

Unit price

Max input characters

Supported languages

Free quota (Note)

qwen3-tts-flash

Its capabilities are the same as qwen3-tts-flash-2025-09-18

Stable

$0.1/10,000 characters

600

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin), Cantonese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

2,000 characters for each

Validity: Within 90 days after you activate Model Studio

qwen3-tts-flash-2025-09-18

Snapshot

Qwen3-TTS is billed based on the number of input characters. The billing rules are as follows:

  • 1 Chinese character = 2 characters

  • 1 English letter, 1 punctuation mark, or 1 space = 1 character

Mainland China (Beijing)

Qwen3-TTS

Model

Version

Unit price

Max input characters

Supported languages

qwen3-tts-flash

Its capabilities are the same as qwen3-tts-flash-2025-09-18

Stable

$0.114682/10,000 characters

600

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin), Cantonese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

qwen3-tts-flash-2025-09-18

Snapshot

Qwen3-TTS is billed based on the number of input characters. The billing rules are as follows:

  • 1 Chinese character = 2 characters

  • 1 English letter, 1 punctuation mark, or 1 space = 1 character

Qwen-TTS

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Thousand tokens)

qwen-tts

Its capabilities are the same as qwen-tts-2025-04-10

Stable

8,192

512

7,680

$0.230

$1.434

qwen-tts-latest

Its capabilities are always the same as the latest snapshot version

Latest

qwen-tts-2025-05-22

Snapshot

qwen-tts-2025-04-10

Audio is converted to tokens at a rate of 50 tokens per second. Audio shorter than 1 second is counted as 50 tokens.

Qwen-TTS-Realtime

Based on Qwen-TTS, this model supports streaming text input and can adapt its speech rate based on text content and punctuation. It supports Chinese, English, and mixed Chinese-English text input, and streams audio output. Usage

International (Singapore)

Model

Version

Unit price

Supported languages

Free quota (Note)

qwen3-tts-flash-realtime

Current capabilities are equivalent to qwen3-tts-flash-realtime-2025-09-18

Stable

$0.13 per 10,000 characters

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin), Cantonese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

2,000 characters for each

Validity: Within 90 days after you activate Model Studio

qwen3-tts-flash-realtime-2025-09-18

Snapshot

Qwen3-TTS is billed based on the number of input characters. The billing rules are as follows:

  • 1 Chinese character = 2 characters

  • 1 English letter, 1 punctuation mark, or 1 space = 1 character

Mainland China (Beijing)

Qwen3-TTS Realtime

Model

Version

Unit price

Supported languages

qwen3-tts-flash-realtime

Current capabilities are equivalent to qwen3-tts-flash-realtime-2025-09-18

Stable

$0.143353 per 10,000 characters

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin), Cantonese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

qwen3-tts-flash-realtime-2025-09-18

Snapshot

Qwen3-TTS is billed based on the number of input characters. The billing rules are as follows:

  • 1 Chinese character = 2 characters

  • 1 English letter, 1 punctuation mark, or 1 space = 1 character

Qwen-TTS Realtime

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Supported languages

(Tokens)

(Thousand tokens)

qwen-tts-realtime

Current capabilities are equivalent to qwen-tts-realtime-2025-07-15

Stable

8,192

512

7,680

$0.345

$1.721

Chinese, English

qwen-tts-realtime-latest

Current capabilities are equivalent to qwen-tts-realtime-2025-07-15

Latest

Chinese, English

qwen-tts-realtime-2025-07-15

Snapshot

Chinese, English

Audio-to-token conversion rule: 1 second of audio corresponds to 50 tokens. Audio shorter than 1 second is counted as 50 tokens.

CosyVoice

CosyVoice is a new-generation, large-scale generative speech synthesis model from Qwen Lab. It integrates text understanding and speech generation based on large-scale, pre-trained language models and supports real-time, streaming text-to-speech synthesis. Usage | Try it online | Voice list

Note

This feature is only available in the China (Beijing) region.

Model

Price

cosyvoice-v2

$0.286706 per 10,000 characters

Each Chinese character counts as two characters. Each English letter, punctuation mark, and space counts as one character.

Speech recognition and translation (speech-to-text)

Qwen3-LiveTranslate-Flash-Realtime

qwen3-livetranslate-flash-realtime is a multilingual, real-time audio and video translation model. It recognizes 18 languages and translates them into audio in 10 languages in real time.

Core features:

  • Multilingual support: Supports 18 languages, such as Chinese, English, French, German, Russian, Japanese, and Korean, and 6 Chinese dialects, such as Mandarin, Cantonese, and Sichuanese.

  • Visual enhancement: Improves translation accuracy using visual content. The model analyzes lip movements, actions, and on-screen text to improve translation in noisy environments or for words with multiple meanings.

  • Low latency: Achieves a simultaneous interpretation latency as low as 3 seconds.

  • Lossless simultaneous interpretation: Uses semantic unit prediction technology to resolve cross-language word order issues. This ensures that the quality of real-time translation is nearly identical to that of offline translation.

  • Natural voice: Generates human-like speech with a natural voice. The model adapts its tone and emotion based on the source audio content.

Usage

Note

This model is available only in the Singapore region.

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(Tokens)

qwen3-livetranslate-flash-realtime

Its capabilities are equivalent to qwen3-livetranslate-flash-realtime-2025-09-22.

Stable

53,248

49,152

4,096

1 million tokens

This is valid for 90 days after you activate Model Studio.

qwen3-livetranslate-flash-realtime-2025-09-22

Snapshot

After the free quota is exhausted, inputs and outputs are billed as follows:

Input

Price

Audio

$10

Image

$1.3

Output

Price

Text

$10

Audio

$38

Qwen-ASR

Built on the Qwen multi-modal base model, this model supports features such as multilingual recognition, singing recognition, and noise rejection.Usage

International (Singapore)

Model

Version

Supported languages

Supported sample rates

Unit price

Free quota (Note)

qwen3-asr-flash

Currently an alias for qwen3-asr-flash-2025-09-08

Stable version

Chinese (including Mandarin, Sichuanese, Minnan, Wu, and Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, and Spanish

16 kHz

$0.000035/second

36,000 seconds (10 hours)

Valid for 90 days after you activate Alibaba Cloud Model Studio

qwen3-asr-flash-2025-09-08

Snapshot version

Mainland China (Beijing)

Model

Version

Supported languages

Supported sample rates

Unit price

qwen3-asr-flash

Alias for qwen3-asr-flash-2025-09-08

Stable version

Chinese (Mandarin, Sichuanese, Minnan, Wu, and Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, and Spanish

16 kHz

$0.000032/second

qwen3-asr-flash-2025-09-08

Snapshot version

Qwen-ASR-Realtime

The Qwen real-time speech recognition model features automatic language detection. It can detect 11 languages and accurately transcribes audio in complex environments. Usage | API reference

International (Singapore)

Model

Version

Supported languages

Supported sample rates

Unit price

Free quota (Note)

qwen3-asr-flash-realtime

Currently equivalent to qwen3-asr-flash-realtime-2025-10-27

Stable

Chinese (Mandarin, Sichuanese, Minnan, Wu), Cantonese, English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish

8 kHz, 16 kHz

$0.000090/second

36,000 seconds (10 hours)

Validity: Within 90 days after you activate Model Studio

qwen3-asr-flash-realtime-2025-10-27

Snapshot

Mainland China (Beijing)

Model

Version

Supported languages

Supported sample rates

Unit price

qwen3-asr-flash-realtime

Currently equivalent to qwen3-asr-flash-realtime-2025-10-27

Stable

Chinese (Mandarin, Sichuanese, Minnan, Wu), Cantonese, English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish

8 kHz, 16 kHz

$0.000047/second

qwen3-asr-flash-realtime-2025-10-27

Snapshot

Paraformer

Paraformer is a speech recognition model from Tongyi Lab. It offers two versions: audio file recognition and real-time speech recognition.

Audio file recognition

Usage | API reference

Note

This feature is available only in the Mainland China (Beijing) region.

Model

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Unit price

paraformer-v2

Chinese (Mandarin), Chinese dialects (Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghainese), English, Japanese, Korean, German, French, Russian

Any

ApsaraVideo Live

aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

$0.000012/second

paraformer-8k-v2

Chinese (Mandarin)

8 kHz

Phone calls

Real-time speech recognition

Usage | API reference

Note

This feature is available only in the Mainland China (Beijing) region.

Model

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Unit price

paraformer-realtime-v2

Chinese (Mandarin), Chinese dialects (Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghainese), English, Japanese, Korean, German, French, Russian

Supports switching between multiple languages.

Any

ApsaraVideo Live, conferences, and more

pcm, wav, mp3, opus, speex, aac, amr

$0.000035/second

paraformer-realtime-8k-v2

8 kHz

Call centers and more

Fun-ASR

Fun-ASR is a speech recognition model from Tongyi Bailin. It offers two versions: audio file recognition and real-time speech recognition.

Audio file recognition

Usage | API reference

International (Singapore)

Model

Version

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Unit price

Free quota (Note)

fun-asr

Currently equivalent to fun-asr-2025-08-25

Stable

Chinese, English

Any

ApsaraVideo Live, phone calls, conference interpretation, and more

aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

$0.000035/second

36,000 seconds (10 hours)

Validity: 90 days

fun-asr-2025-08-25

Snapshot

Mainland China (Beijing)

Model

Version

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Unit price

fun-asr

Currently equivalent to fun-asr-2025-08-25

Stable

Chinese, English

Any

ApsaraVideo Live, phone calls, conference interpretation, and more

aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

$0.000032/second

fun-asr-2025-08-25

Snapshot

fun-asr-mtl

Currently equivalent to fun-asr-mtl-2025-08-25

Stable

Chinese, Cantonese, English, Japanese, Thai, Vietnamese, Indonesian

fun-asr-mtl-2025-08-25

Snapshot

Real-time speech recognition

Usage | API Reference

Note

This feature is available only in the Mainland China (Beijing) region.

Model

Version

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Unit price

fun-asr-realtime

Currently equivalent to fun-asr-realtime-2025-09-15

Stable

Chinese, English

16 kHz

ApsaraVideo Live, conferences, call centers, and more

pcm, wav, mp3, opus, speex, aac, amr

$0.000047/second

fun-asr-realtime-2025-09-15

Snapshot

Text embedding

Text embedding models convert text into numerical representations for tasks such as search, clustering, recommendation, and classification. Billing for these models is based on the number of input tokens. API reference

International (Singapore)

Model

Embedding dimension

Batch size

Maximum tokens per row

Supported languages

Price

(Million input tokens)

Free Quota

Note

text-embedding-v4

This post is part of the Qwen3-Embedding series.

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64

10

8,192

More than 100 languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, along with various programming languages

$0.07

1,000,000 tokens

Valid for 90 days after you activate Model Studio.

text-embedding-v3

1,024 (default), 768, or 512

10

8,192

Over 50 languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian

500,000 tokens

Valid for 90 days after you activate Model Studio.

Mainland China (Beijing)

Model

Embedding dimension

Batch size

Maximum tokens per row

Supported languages

Price

(Million input tokens)

Free quota

(Note)

text-embedding-v4

This post is part of the Qwen3-Embedding series.

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64

10

8,192

Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, over 100 other major languages, and various programming languages

$0.072

No free quota

Multimodal embedding

The multimodal embedding model converts data such as text, images, and videos into a vector of floating-point numbers. This model enables applications such as video classification, image classification, and image-text retrieval. API reference

International (Singapore)

Model

Data format

Embedding dimension

Unit price (Million input tokens)

Free quota (View)

tongyi-embedding-vision-plus

float(32)

1,152

$0.09

1,000,000 tokens

Valid for 90 days after you activate Model Studio.

tongyi-embedding-vision-flash

float(32)

768

Image/Video: $0.03

Text: $0.09

Mainland China (Beijing)

Model

Data type

Embedding dimensions

Unit price (1,000 input tokens)

Free quota (Note)

multimodal-embedding-v1

float(32)

1,024

Free trial

No token quota limit

Text rerank

This feature is typically used for semantic retrieval. Given a query, it sorts a list of candidate documents in descending order of their semantic relevance. API reference.

Note

This feature is only available in the China (Beijing) region.

Model

Maximum number of documents

Max input tokens per item

Max input tokens

Supported languages

Price (Million input tokens)

gte-rerank-v2

500

4,000

30,000

Over 50 languages, including Chinese, English, Japanese, Korean, Thai, Spanish, French, Portuguese, German, Indonesian, and Arabic

$0.115

  • Max input tokens per item: Each query or document is limited to 4,000 tokens. Input that exceeds this limit is truncated.

  • Maximum number of documents: Each request is limited to 500 documents.

  • Max input tokens: The total number of tokens for all queries and documents in a single request is limited to 30,000.

Domain-specific

Intent recognition

The Qwen intent recognition model can quickly and accurately parse user intents in milliseconds and select the appropriate tools to resolve user issues. API reference | Usage

Note

This feature is only available in the China (Beijing) region.

Model

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

tongyi-intent-detect-v3

8,192

8,192

1,024

$0.058

$0.144

Role-playing

Qwen's role-playing model is ideal for scenarios that require human-like conversation, such as virtual social interactions, game NPCs, IP character replication, hardware, toys, and in-vehicle systems. This model offers enhanced capabilities in character fidelity, conversation progression, and empathetic listening compared to other Qwen models. Usage

International (Singapore)

Model

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen-plus-character-ja

8,192

7,680

512

$0.5

$1.4

Mainland China (Beijing)

Model

Context window

Max input

Max output

Input cost

Output cost

(Tokens)

(Million tokens)

qwen-plus-character

32,768

32,000

4,096

$0.115

$0.287

Retired models

Retired on August 20, 2025

Qwen2

This is Alibaba Cloud's open-source Qwen2. Usage | API reference | Try it online

Model

Context window

Max input

Max output

Input price

Output price

Alternative models

(tokens)

(Million tokens)

qwen2-72b-instruct

131,072

128,000

6,144

Free for a limited time

Qwen3, DeepSeek, Kimi, and others

qwen2-57b-a14b-instruct

65,536

63,488

qwen2-7b-instruct

131,072

128,000

Qwen1.5

This is Alibaba Cloud's open-source Qwen1.5. Usage | API reference | Try it online

Model

Context window

Max input

Max output

Input price

Output price

Alternative models

Tokens

(Million tokens)

qwen1.5-110b-chat

8,000

6,000

2,000

Free for a limited time

Qwen3, DeepSeek, Kimi, and others

qwen1.5-72b-chat

qwen1.5-32b-chat

qwen1.5-14b-chat

qwen1.5-7b-chat