All Products
Search
Document Center

Alibaba Cloud Model Studio:Model list

Last Updated:Jan 28, 2026

Flagship models

Global

In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.

Flagship models

通义new Qwen-Max

Ideal for complex tasks. The most powerful model.

通义new Qwen-Plus

A balance of performance, speed, and cost.

通义new Qwen-Flash

Ideal for simple jobs. Fast and low-cost.

通义new Qwen-Coder

An excellent code model that excels at tool calling and environment interaction.

Max context window

(tokens)

262,144

1,000,000

1,000,000

1,000,000

Min input cost

(per 1M tokens)

$1.2

$0.4

$0.05

$0.3

Min output cost

(per 1M tokens)

$6

$1.2

$0.4

$1.5

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Flagship models

通义new Qwen-Max

Ideal for complex tasks. The most powerful model.

通义new Qwen-Plus

A balance of performance, speed, and cost.

通义new Qwen-Flash

Ideal for simple jobs. Fast and low-cost.

通义new Qwen-Coder

An excellent code model that excels at tool calling and environment interaction.

Max context window

(tokens)

262,144

1,000,000

1,000,000

1,000,000

Min input cost

(per 1M tokens)

$1.2

$0.4

$0.05

$0.3

Min output cost

(per 1M tokens)

$6

$1.2

$0.4

$1.5

US

In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.

Flagship models

通义new Qwen-Plus

A balance of performance, speed, and cost.

通义new Qwen-Flash

Ideal for simple jobs. Fast and low-cost.

Max context window

(tokens)

1,000,000

1,000,000

Min input cost

(per 1M tokens)

$0.4

$0.05

Min output cost

(per 1M tokens)

$1.2

$0.4

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Flagship models

通义new Qwen-Max

Ideal for complex tasks. The most powerful model.

通义new Qwen-Plus

A balance of performance, speed, and cost.

通义new Qwen-Flash

Ideal for simple jobs. Fast and low-cost.

通义new Qwen-Coder

An excellent code model that excels at tool calling and environment interaction.

Max context window

(tokens)

262,144

1,000,000

1,000,000

1,000,000

Min input cost

(per 1M tokens)

$0.459

$0.115

$0.022

$0.144

Min output cost

(per 1M tokens)

$1.836

$0.287

$0.216

$0.574

Model overview

Global

In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.

Category

Subcategory

Description

Text generation

General-purpose large language models

Qwen large language models: Commercial models (Qwen-Max, Qwen-Plus, Qwen-Flash), open source models (Qwen3)

Multimodal models

Visual understanding model Qwen-VL

Domain-specific models

Code model, Translation model

Image generation

Text-to-image

Image editing

  • Wan image editing: Supports scenarios such as multi-image fusion, style transfer, object detection, image restoration, and watermark removal. Model series: Wan2.6.

Video generation

Text-to-video

Generates high-quality videos with rich styles from a single sentence.

Image-to-video

First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt.

Video-to-video

Reference-to-video: Generates a video that maintains character consistency using a prompt and the appearance and voice from an input video.

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Category

Subcategory

Description

Text generation

General-purpose large language models

Qwen large language models: Commercial models (Qwen-Max, Qwen-Plus, Qwen-Flash), open source models (Qwen3, Qwen2.5)

Multimodal models

Visual understanding model Qwen-VL, visual reasoning model QVQ, omni-modal model Qwen-Omni, and real-time multi-modal model Qwen-Omni-Realtime

Domain-specific models

Code model, Translation model, Role-playing model

Image generation

Text-to-image

  • Qwen-Image: Excels in handling complex instructions, rendering both Chinese and English text, and generating high-definition, photorealistic images. It supports the selection of different models based on efficiency and quality requirements.

  • Wan text-to-image:

  • Tongyi - text-to-Image - Z-Image: A lightweight text-to-image model that quickly generates high-quality images and supports bilingual rendering in Chinese and English, complex semantic understanding, and a variety of styles and themes.

Image editing

  • Qwen-Image-Edit: Supports prompts in both Chinese and English, enabling complex image and text editing operations such as style transfer, text modification, and object editing. It also supports multi-image fusion and is adaptable to a wide variety of industrial application scenarios.

  • Wan image editing: Supports scenarios such as multi-image fusion, style transfer, object detection, image restoration, and watermark removal. Model series include the following: Wan2.6, Wan2.5.

Speech synthesis and recognition

Speech synthesis (text-to-speech)

Qwen speech synthesis and Qwen realtime speech synthesis can be used for text-to-speech in scenarios such as intelligent voice customer service, audiobooks, in-car navigation, and educational tutoring.

Speech recognition and translation

Qwen realtime speech recognition, Qwen audio file recognition, Qwen3-LiveTranslate-Flash-Realtime, and Fun-ASR speech recognition can perform speech-to-text for scenarios such as real-time meeting records, real-time live stream captions, and telephone customer service.

Video generation

Text-to-video

Generates high-quality videos with rich styles from a single sentence.

Image-to-video

  • First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt.

  • First-and-last-frame-to-video: Generates a smooth and dynamic video based on the provided first and last frames and a prompt.

  • Multi-image-to-video: Generates a video by referencing the entity or background in one or more input images, combined with a prompt.

Video-to-video

Reference-to-video: Generates a video that maintains character consistency using a prompt and the appearance and voice from an input video.

General video editing

General video editing: Performs various video editing tasks based on input text, images, and videos. For example, it can generate a new video by extracting motion features from an input video and combining them with a prompt.

Embedding

Text embedding

Converts text into a set of numbers that represent the text. It is suitable for search, clustering, recommendation, and classification tasks.

US

In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.

Category

Subcategory

Description

Text generation

General-purpose large language models

Qwen large language models: Commercial models (Qwen-Plus, Qwen-Flash)

Multimodal models

Visual understanding model Qwen-VL

Video generation

Text-to-video

Generates high-quality videos with rich styles from a single sentence.

Image-to-video

First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt.

Speech recognition

Speech recognition

Qwen audio file recognition can perform speech-to-text for scenarios such as meeting transcription and live stream captioning.

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Category

Model

Description

Text generation

General-purpose large language models

Multimodal models

Visual understanding model Qwen-VL, visual reasoning model QVQ, and omni-modal model Qwen-Omni

Domain-specific models

Code model, Mathematical model, Translation model, Data mining model, Research model, Intention recognition model, Role-playing model

Image generation

Text-to-image

  • Qwen-Image: Excels in handling complex instructions, rendering both Chinese and English text, and generating high-definition, photorealistic images. It supports the selection of different models based on efficiency and quality requirements.

  • Wan text-to-image:

  • Tongyi - text-to-Image - Z-Image: A lightweight text-to-image model that quickly generates high-quality images and supports bilingual rendering in Chinese and English, complex semantic understanding, and a variety of styles and themes.

Image editing

General-purpose models:

  • Qwen-Image-Edit: Supports prompts in both Chinese and English, enabling complex image and text editing operations such as style transfer, text modification, and object editing. It also supports multi-image fusion and is adaptable to a wide variety of industrial application scenarios.

  • Wan image editing: Supports scenarios such as multi-image fusion, style transfer, object detection, image restoration, and watermark removal. Model series include the following: Wan2.6, Wan2.5, Wan2.1.

More models: Qwen Image Translation, OutfitAnyone

Speech synthesis and recognition

Speech synthesis (text-to-speech)

Qwen speech synthesis, Qwen realtime speech synthesis, and CosyVoice speech synthesis convert text to speech for scenarios such as voice-based customer service, audiobooks, in-car navigation, and educational tutoring.

Speech recognition and translation

Qwen realtime speech recognition, Qwen audio file recognition, Fun-ASR speech recognition, and Paraformer speech recognition convert speech to text for scenarios such as real-time meeting transcription, real-time live stream captioning, and customer service calls.

Video editing and generation

Text-to-video

Generates high-quality videos with rich styles from a single sentence.

Image-to-video

  • First-frame-to-video: Generates a video from an initial image and a prompt.

  • First-and-last-frame-to-video: Generates a video with a natural transition based on the first and last frame images and a prompt.

  • Multi-image-to-video: Generates a video from one or more images and a text prompt, based on the entities or backgrounds in the source images.

  • Dance video generation: AnimateAnyone generates dance videos from a character image and an action video.

  • Image + audio to generate lip-sync videos

    • Wan - digital human generates video from a person's image and audio. It provides a wide and natural range of motion, supports various frame sizes such as full-body, half-body, and portrait, and is suitable for scenarios such as singing and performance.

    • EMO uses a person's image and audio to generate video with highly expressive lip-syncing and facial expressions. It supports portrait and half-body shots and is ideal for close-up scenarios.

    • LivePortrait uses a portrait image and an audio file and is ideal for voice narration scenarios.

  • Emoji video generation: Emoji generates facial emoji videos from facial images and preset dynamic facial templates.

Video-to-video

Reference-to-video: Generates a video that maintains character consistency using a prompt and the appearance and voice from an input video.

General-purpose video editing

  • General video editing: Performs various video editing tasks based on text prompts, images, and videos. For example, you can generate a new video by extracting motion features from an input video and combining them with a text prompt.

  • Video lip-syncing: VideoRetalk uses a person's video and audio and is ideal for scenarios such as short video production and video translation.

  • Video style transfer: Video style transform transforms videos into various styles, such as Japanese manga and American comics.

Vector

Text embedding

Converts text into a set of numbers that represent the text. It is used for search, clustering, recommendation, and classification.

Multimodal embedding

Converts text, images, and speech into a set of numbers. It is used for audio and video classification, image classification, and image-text retrieval.

Text generation - Qwen

The following are the Qwen commercial models. Compared to the open-source versions, the commercial models offer the latest capabilities and improvements.

The parameter sizes of the commercial models are not disclosed.
Each model is updated periodically. To use a fixed version, you can select a snapshot version. A snapshot version is typically maintained for one month after the release of the next snapshot version.
We recommend that you use the stable or latest version for more lenient rate limiting conditions.

Qwen-Max

The most powerful model in the Qwen series, ideal for complex, multi-step tasks. Usage | API reference | Try online

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1K tokens)

qwen3-max

Currently matches the capability of qwen3-max-2025-09-23
Context cache discount available

Stable

Non-thinking only

262,144

258,048

-

65,536

Tiered pricing. See details below.

None

qwen3-max-2025-09-23

Snapshot

Non-thinking only

qwen3-max-preview

Context cache discount available

Preview

Thinking

81,920

32,768

Non-thinking

-

65,536

The models above use tiered pricing based on the number of input tokens in the current request.

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

CoT + response

0<Token≤32K

$1.2

$6

32K<Token≤128K

$2.4

$12

128K<Token≤252K

$3

$15

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-max

Currently matches the capability of qwen3-max-2025-09-23

Stable

Non-thinking only

262,144

258,048

-

65,536

Tiered pricing. See details below.

1 million tokens each

Valid for 90 days after activating Model Studio

qwen3-max-2026-01-23

Supports calling built-in tools

Snapshot

Thinking

81,920

Non-thinking

-

qwen3-max-2025-09-23

Snapshot

Non-thinking only

qwen3-max-preview

Preview

Thinking

81,920

32,768

Non-thinking

-

65,536

The models above use tiered pricing based on the number of input tokens in the current request.

Input tokens per request

Input price (per 1M tokens)

qwen3-max and qwen3-max-preview support context cache.

Output price (per 1M tokens)

0<Token≤32K

$1.2

$6

32K<Token≤128K

$2.4

$12

128K<Token≤252K

$3

$15

More models

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen-max

Currently matches the capability of qwen-max-2025-01-25
Batch calls at half price

Stable

32,768

30,720

8,192

$1.6

$6.4

1 million tokens each

Valid for 90 days after activating Model Studio

qwen-max-latest

Always matches the latest snapshot version

Latest

$1.6

$6.4

qwen-max-2025-01-25

Also known as qwen-max-0125, Qwen2.5-Max

Snapshot

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen3-max

Currently matches the capability of qwen3-max-2025-09-23

Stable

Non-thinking only

262,144

258,048

-

65,536

Tiered pricing. See details below.

qwen3-max-2026-01-23

Supports calling built-in tools

Snapshot

Thinking

81,920

Non-thinking

-

qwen3-max-2025-09-23

Snapshot

Non-thinking only

qwen3-max-preview

Preview

Thinking

81,920

32,768

Non-thinking

-

65,536

The models above use tiered pricing based on the number of input tokens in the current request.

Model

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

CoT + response

qwen3-max

Batch calls at half price
Context cache discount available

0<Token≤32K

$0.359

$1.434

32K<Token≤128K

$0.574

$2.294

128K<Token≤252K

$1.004

$4.014

qwen3-max-2026-01-23

0<Token≤32K

$0.359

$1.434

32K<Token≤128K

$0.574

$2.294

128K<Token≤252K

$1.004

$4.014

qwen3-max-2025-09-23

0<Token≤32K

$0.861

$3.441

32K<Token≤128K

$1.434

$5.735

128K<Token≤252K

$2.151

$8.602

qwen3-max-preview

Context cache discount available

0<Token≤32K

$0.861

$3.441

32K<Token≤128K

$1.434

$5.735

128K<Token≤252K

$2.151

$8.602

More models

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-max

Currently matches the capability of qwen-max-2024-09-19
Batch calls at half price

Stable

32,768

30,720

8,192

$0.345

$1.377

qwen-max-latest

Always matches the latest snapshot version
Batch calls at half price

Latest

131,072

129,024

qwen-max-2025-01-25

Also known as qwen-max-0125, Qwen2.5-Max

Snapshot

qwen-max-2024-09-19

Also known as qwen-max-0919

32,768

30,720

$2.868

$8.602

The qwen3-max-2026-01-23 model integrates thinking and non-thinking modes more effectively than the snapshot version from September 23, 2025, resulting in significantly improved overall performance. In thinking mode, the model integrates three tools—web search, webpage information extraction, and a code interpreter—to achieve higher accuracy on complex problems by leveraging external tools during reasoning.

qwen3-max, qwen3-max-2026-01-23, and qwen3-max-2025-09-23 natively support the search agent feature, see Web search.

Qwen-Plus

Offers a balance of inference performance, cost, and speed between Qwen-Max and Qwen-Flash. Ideal for moderately complex tasks. Usage | API reference | Try onlineDeep Thinking

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-plus

Currently matches the capability of qwen-plus-2025-12-01
Part of the Qwen3 series

Stable

1,000,000

Thinking mode

995,904

Non-thinking mode

997,952

32,768

Max CoT: 81,920

Tiered pricing. See details below.

qwen-plus-2025-12-01

Part of the Qwen3 series

Snapshot

Thinking mode

995,904

Non-thinking mode

997,952

qwen-plus-2025-09-11

Part of the Qwen3 series

qwen-plus-2025-07-28

Part of the Qwen3 series

The models above use tiered pricing based on the number of input tokens in the current request. qwen-plus supports context cache.

Input tokens per request

Input price (per 1M tokens)

Mode

Output price (per 1M tokens)

0<Token≤256K

$0.4

Non-thinking mode

$1.2

Thinking mode

$4

256K<Token≤1M

$1.2

Non-thinking mode

$3.6

Thinking mode

$12

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen-plus

Currently matches the capability of qwen-plus-2025-12-01
Part of the Qwen3 series
Batch calls at half price

Stable

1,000,000

Thinking mode

995,904

Non-thinking mode

997,952

32,768

Max CoT: 81,920

Tiered pricing. See details below.

1 million tokens each

Valid for 90 days after activating Model Studio

qwen-plus-latest

Currently matches the capability of qwen-plus-2025-12-01
Part of the Qwen3 series

Latest

Thinking mode

995,904

Non-thinking mode

997,952

qwen-plus-2025-12-01

Part of the Qwen3 series

Snapshot

Thinking mode

995,904

Non-thinking mode

997,952

qwen-plus-2025-09-11

Part of the Qwen3 series

qwen-plus-2025-07-28

Also known as qwen-plus-0728
Part of the Qwen3 series

qwen-plus-2025-07-14

Also known as qwen-plus-0714
Part of the Qwen3 series

131,072

Thinking mode

98,304

Non-thinking mode

129,024

16,384

Max CoT: 38,912

$0.4

Thinking mode

$4

Non-thinking mode

$1.2

qwen-plus-2025-04-28

Also known as qwen-plus-0428
Part of the Qwen3 series

qwen-plus-2025-01-25

Also known as qwen-plus-0125

129,024

8,192

$1.2

qwen-plus, qwen-plus-latest, qwen-plus-2025-12-01, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 use tiered pricing based on the number of input tokens in the current request.

Input tokens per request

Input price (per 1M tokens)

Mode

Output price (per 1M tokens)

0<Token≤256K

$0.4

Non-thinking mode

$1.2

Thinking mode

$4

256K<Token≤1M

$1.2

Non-thinking mode

$3.6

Thinking mode

$12

US

In US deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are limited to the US.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen-plus-us

Currently matches the capability of qwen-plus-2025-12-01-us
Part of the Qwen3 series

Stable

1,000,000

Thinking mode

995,904

Non-thinking mode

997,952

32,768

Max CoT: 81,920

Tiered pricing. See details below.

None

qwen-plus-2025-12-01-us

Part of the Qwen3 series

Snapshot

Thinking mode

995,904

Non-thinking mode

997,952

The models above use tiered pricing based on the number of input tokens in the current request. qwen-plus-us supports context cache.

Input tokens per request

Input price (per 1M tokens)

Mode

Output price (per 1M tokens)

0<Token≤256K

$0.4

Non-thinking mode

$1.2

Thinking mode

$4

256K<Token≤1M

$1.2

Non-thinking mode

$3.6

Thinking mode

$12

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-plus

Currently matches the capability of qwen-plus-2025-12-01
Part of the Qwen3 series
Batch calls at half price

Stable

1,000,000

Thinking mode

995,904

Non-thinking mode

997,952

32,768

Max CoT: 81,920

Tiered pricing. See details below.

qwen-plus-latest

Currently matches the capability of qwen-plus-2025-12-01
Part of the Qwen3 series
Batch calls at half price

Latest

Thinking mode

995,904

Non-thinking mode

997,952

qwen-plus-2025-12-01

Part of the Qwen3 series

Snapshot

Thinking mode

995,904

Non-thinking mode

997,952

qwen-plus-2025-09-11

Part of the Qwen3 series

qwen-plus-2025-07-28

Also known as qwen-plus-0728
Part of the Qwen3 series

qwen-plus-2025-07-14

Also known as qwen-plus-0714
Part of the Qwen3 series

131,072

Thinking mode

98,304

Non-thinking mode

129,024

16,384

Max CoT: 38,912

$0.115

Thinking mode

$1.147

Non-thinking mode

$0.287

qwen-plus-2025-04-28

Also known as qwen-plus-0428
Part of the Qwen3 series

qwen-plus, qwen-plus-latest, qwen-plus-2025-12-01, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 use tiered pricing based on the number of input tokens in the current request.

Input tokens per request

Input price (per 1M tokens)

Mode

Output price (per 1M tokens)

0<Token≤128K

$0.115

Non-thinking mode

$0.287

Thinking mode

$1.147

128K<Token≤256K

$0.345

Non-thinking mode

$2.868

Thinking mode

$3.441

256K<Token≤1M

$0.689

Non-thinking mode

$6.881

Thinking mode

$9.175

The models above support both thinking and non-thinking modes. You can switch between modes using the enable_thinking parameter. Additionally, these models offer the following significant improvements:

  1. Reasoning ability: Significantly outperforms QwQ and similarly sized non-reasoning models in evaluations for math, code, and logical reasoning, achieving top-tier industry performance for a model of its size.

  2. Human preference alignment: Features greatly enhanced capabilities for creative writing, role assumption, multi-turn conversation, and instruction following. Its general abilities significantly surpass those of similarly sized models.

  3. Agent capabilities: Achieves industry-leading performance in both thinking and non-thinking modes and enables precise external tool invocation.

  4. Multilingual support: Supports over 100 languages and dialects and provides notable improvements in multilingual translation, instruction understanding, and commonsense reasoning.

  5. Response formatting: Resolves issues found in previous versions, such as incorrect Markdown formatting, response truncation, and incorrectly formatted boxed output.

For the models above, if thinking mode is enabled but no reasoning process is output, billing applies at the non-thinking mode rate.

More models

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-plus-2025-01-25

Also known as qwen-plus-0125

Snapshot

131,072

129,024

8,192

$0.115

$0.287

qwen-plus-2025-01-12

Also known as qwen-plus-0112

qwen-plus-2024-12-20

Also known as qwen-plus-1220

qwen-plus-2024-11-27

Also known as qwen-plus-1127

qwen-plus-2024-11-25

Also known as qwen-plus-1125

qwen-plus-2024-09-19

Also known as qwen-plus-0919

qwen-plus-2024-08-06

Also known as qwen-plus-0806

128,000

$0.574

$1.721

Qwen-Flash

The fastest and lowest-cost model in the Qwen series, ideal for simple tasks. Qwen-Flash uses flexible tiered pricing, which provides more cost-effective billing than Qwen-Turbo. Usage | API reference | Try online | Thinking mode

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

(tokens)

(per 1K tokens)

qwen-flash

Currently matches the capability of qwen-flash-2025-07-28
Part of the Qwen3 series

Stable

Thinking

1,000,000

995,904

81,920

32,768

Tiered pricing. See details below.

Non-thinking

997,952

-

qwen-flash-2025-07-28

Part of the Qwen3 series

Snapshot

Thinking

995,904

81,920

Non-thinking

997,952

-

The models above use tiered pricing based on the number of input tokens in the current request. qwen-flash supports context cache.

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

0<Token≤256K

$0.05

$0.4

256K<Token≤1M

$0.25

$2

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

Free quota

(Note)

(tokens)

(per 1K tokens)

qwen-flash

Currently matches the capability of qwen-flash-2025-07-28
Part of the Qwen3 series
Batch calls at half price

Stable

Thinking

1,000,000

995,904

81,920

32,768

Tiered pricing. See details below.

1 million tokens each

Valid for 90 days after activating Model Studio

Non-thinking

997,952

-

qwen-flash-2025-07-28

Part of the Qwen3 series

Snapshot

Thinking

995,904

81,920

Non-thinking

997,952

-

The models above use tiered pricing based on the number of input tokens in the current request. qwen-flash supports cache and Batch calls.

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

0<Token≤256K

$0.05

$0.4

256K<Token≤1M

$0.25

$2

US

In US deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are limited to the US.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

Free quota

(Note)

(tokens)

(per 1K tokens)

qwen-flash-us

Currently matches the capability of qwen-flash-2025-07-28-us
Part of the Qwen3 series

Stable

Thinking

1,000,000

995,904

81,920

32,768

Tiered pricing. See details below.

None

Non-thinking

997,952

-

qwen-flash-2025-07-28-us

Part of the Qwen3 series

Snapshot

Thinking

995,904

81,920

Non-thinking

997,952

-

The models above use tiered pricing based on the number of input tokens in the current request.

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

0<Token≤256K

$0.05

$0.4

256K<Token≤1M

$0.25

$2

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

(tokens)

(per 1K tokens)

qwen-flash

Currently matches the capability of qwen-flash-2025-07-28
Part of the Qwen3 series
Batch calls at half price

Stable

Thinking

1,000,000

995,904

81,920

32,768

Tiered pricing. See details below.

Non-thinking

997,952

-

qwen-flash-2025-07-28

Part of the Qwen3 series

Snapshot

Thinking

995,904

81,920

Non-thinking

997,952

-

The models above use tiered pricing based on the number of input tokens in the current request. qwen-flash supports context cache.

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

0<Token≤128K

$0.022

$0.216

128K<Token≤256K

$0.087

$0.861

256K<Token≤1M

$0.173

$1.721

Qwen-Turbo

Qwen-Turbo will no longer receive updates. We recommend that you replace it with Qwen-Flash. Qwen-Flash uses flexible tiered pricing for more cost-effective billing. Usage | API reference | Try onlineDeep Thinking

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen-turbo

Currently matches the capability of qwen-turbo-2025-04-28
Part of the Qwen3 series
Batch calls at half price

Stable

Thinking mode

131,072

Non-thinking mode

1,000,000

Thinking mode

98,304

Non-thinking mode

1,000,000

16,384

Max CoT: 38,912

$0.05

Thinking mode: $0.5

Non-thinking mode: $0.2

1 million tokens each

Valid for 90 days after activating Model Studio

qwen-turbo-latest

Always matches the latest snapshot version
Part of the Qwen3 series

Latest

$0.05

Thinking mode: $0.5

Non-thinking mode: $0.2

qwen-turbo-2025-04-28

Also known as qwen-turbo-0428
Part of the Qwen3 series

Snapshot

qwen-turbo-2024-11-01

Also known as qwen-turbo-1101

1,000,000

1,000,000

8,192

$0.2

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-turbo

Currently matches the capability of qwen-turbo-2025-04-28
Part of the Qwen3 series

Stable

Thinking mode

131,072

Non-thinking mode

1,000,000

Thinking mode

98,304

Non-thinking mode

1,000,000

16,384

Max CoT: 38,912

$0.044

Thinking mode

$0.431

Non-thinking mode

$0.087

qwen-turbo-latest

Always matches the latest snapshot version
Part of the Qwen3 series

Latest

qwen-turbo-2025-07-15

Also known as qwen-turbo-0715
Part of the Qwen3 series

Snapshot

qwen-turbo-2025-04-28

Also known as qwen-turbo-0428
Part of the Qwen3 series

QwQ

QwQ is a reasoning model trained on the Qwen2.5 base and significantly enhanced through reinforcement learning. It achieves performance comparable to the full-capacity DeepSeek-R1 on core metrics, such as AIME 24/25 and LiveCodeBench, and on certain general benchmarks, such as IFEval and LiveBench. Usage

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Version

Context window

Max input

Max CoT

Max response

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwq-plus

Stable

131,072

98,304

32,768

8,192

$0.8

$2.4

1 million tokens

Valid for 90 days after activating Model Studio

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Version

Context window

Max input

Max CoT

Max response

Input cost

Output cost

(tokens)

(per 1M tokens)

qwq-plus

Currently matches the capability of qwq-plus-2025-03-05
Batch calls at half price

Stable

131,072

98,304

32,768

8,192

$0.230

$0.574

qwq-plus-latest

Always matches the latest snapshot version

Latest

qwq-plus-2025-03-05

Also known as qwq-plus-0305

Snapshot

Qwen-Long

This Qwen series model features the longest context window, balanced capabilities, and a low cost. It is ideal for long-text analysis, information extraction, summarization, and classification tasks. Usage | Try online

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-long-latest

Always matches the latest snapshot version
Batch calls at half price

Stable

10,000,000

10,000,000

32,768

$0.072

$0.287

qwen-long-2025-01-25

Also known as qwen-long-0125

Snapshot

Qwen-Omni

Qwen-Omni accepts multimodal inputs, such as text, images, audio, and video, and generates text or speech responses. It offers multiple expressive, human-like voice options and supports multilingual and dialect speech output. This makes it suitable for audiovisual chat scenarios, such as visual recognition, emotion sensing, and education. UsageAPI reference

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Free quota

(Note)

(tokens)

qwen3-omni-flash

Currently matches the capability of qwen3-omni-flash-2025-12-01

Stable

Thinking mode

65,536

16,384

32,768

16,384

1 million tokens each (regardless of modality)

Valid for 90 days after activating Model Studio

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-12-01

Snapshot

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-09-15

Also known as qwen3-omni-flash-0915

Snapshot

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

After the free quota is used up, input and output are billed as follows. The pricing is the same for thinking and non-thinking modes. Audio output is not supported in thinking mode.

Input

Unit price (per 1M tokens)

Text

$0.43

Audio

$3.81

Image/Video

$0.78

Output

Unit price (per 1M tokens)

Text

$1.66 (input contains text only)

$3.06 (input contains images/video/audio)

Text + Audio

This item is not billed in thinking mode.

$15.11 (audio)

Text output is not billed.

More models

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(tokens)

qwen-omni-turbo

Matches the capabilities of qwen-omni-turbo-2025-03-26

Stable

32,768

30,720

2,048

1 million tokens (regardless of modality)

Valid for 90 days after you activate Model Studio

qwen-omni-turbo-latest

Points to the latest snapshot version
Same capabilities

Latest

qwen-omni-turbo-2025-03-26

Also known as qwen-omni-turbo-0326

Snapshot

After the free quota for commercial models is used up, the following input and output billing rules apply:

Input

Unit price (per 1M tokens)

Text

$0.07

Audio

$4.44

Image/Video

$0.21

Output

Unit price (per 1M tokens)

Text

$0.27 (if the input contains only text)

$0.63 (if the input includes images, video, or audio)

Text + Audio

$8.89 (for audio)

Text output is not billed.

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Free quota

(Note)

(tokens)

qwen3-omni-flash

Currently has the same capabilities as qwen3-omni-flash-2025-12-01

Stable

Thinking mode

65,536

16,384

32,768

16,384

No free quota

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-12-01

Snapshot

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-09-15

Also known as qwen3-omni-flash-0915

Snapshot

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

After the free quota is used up, input and output are billed as follows. The pricing is the same for thinking and non-thinking modes. Audio output is not supported in thinking mode.

Input

Unit price (per 1M tokens)

Text

$0.258

Audio

$2.265

Image/Video

$0.473

Output

Unit price (per 1M tokens)

Text

$0.989 (if the input contains only text)

$1.821 (if the input contains images, video, or audio)

Text + Audio

This item does not incur charges in thinking mode.

$8.974 (audio)

Text output is not billed.

More models

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(tokens)

qwen-omni-turbo

Corresponds to the capabilities of qwen-omni-turbo-2025-03-26.

Stable

32,768

30,720

2,048

No free quota

qwen-omni-turbo-latest

Always uses the latest snapshot version.
Identical capabilities

Latest

qwen-omni-turbo-2025-03-26

Also known as qwen-omni-turbo-0326.

Snapshot

qwen-omni-turbo-2025-01-19

Also known as qwen-omni-turbo-0119.

The input and output billing rules are as follows:

Input

Unit price (per 1M tokens)

Text

$0.058

Audio

$3.584

Image/Video

$0.216

Output

Unit price (per 1M tokens)

Text

$0.230 (if the input contains only text)

$0.646 (if the input contains images, audio, or video)

Text + audio

$7.168 (for the audio output)

The text portion of the output is not billed.

Billing example: If a request includes 1,000 text tokens and 1,000 image tokens as input and generates 1,000 text tokens and 1,000 audio tokens as output, the total cost is: $0.000058 (text input) + $0.000216 (image input) + $0.007168 (audio output).

We recommend using the Qwen3-Omni-Flash model, which offers significant capability improvements over Qwen-Omni-Turbo (which will no longer be updated):

  • It is a hybrid thinking model that supports both thinking and non-thinking modes. You can switch between modes using the enable_thinking parameter. Thinking mode is disabled by default.

  • Audio output is not supported in thinking mode. For audio output in non-thinking mode:

    • qwen3-omni-flash-2025-12-01 supports 49 voice options, while qwen3-omni-flash-2025-09-15 and qwen3-omni-flash support 17 voice options. Qwen-Omni-Turbo supports only 4.

    • Supports 10 languages, compared to Qwen-Omni-Turbo's 2.

Qwen-Omni-Realtime

Compared to Qwen-Omni, Qwen-Omni-Realtime supports streaming audio input and includes built-in Voice Activity Detection (VAD) to automatically detect the start and end of user speech. UsageClient EventsServer Events

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(tokens)

qwen3-omni-flash-realtime

This model currently maps to qwen3-omni-flash-realtime-2025-12-01.

Stable

65,536

49,152

16,384

1 million tokens (regardless of modality)

Valid for 90 days after Model Studio activation.

qwen3-omni-flash-realtime-2025-12-01

Snapshot

qwen3-omni-flash-realtime-2025-09-15

After the free quota is used up, input and output are billed as follows:

Input

Unit price (per 1M tokens)

Text

$0.52

Audio

$4.57

Image

$0.94

Output

Unit price (per 1M tokens)

Text

$1.99 (for text-only input)

$3.67 (for inputs with images or audio)

Text + Audio

$18.13 (for audio)

Text output is not billed.

More models

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(tokens)

qwen-omni-turbo-realtime

Currently matches the capabilities of qwen-omni-turbo-realtime-2025-05-08.

Stable

32,768

30,720

2,048

1 million tokens each (regardless of modality)

Valid for 90 days after you activate Model Studio.

qwen-omni-turbo-realtime-latest

Always matches the latest snapshot version.

Latest

qwen-omni-turbo-realtime-2025-05-08

Snapshot

After the free quota is used up, input and output are billed as follows:

Input

Unit price (per 1M tokens)

Text

$0.270

Audio

$4.440

Image

$0.840

Output

Unit price (per 1M tokens)

Text

$1.070 (if the input contains only text)

$2.520 (if the input contains images or audio)

Text + Audio

$8.890 (for audio output)

The text portion of the output is not billed.

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(tokens)

qwen3-omni-flash-realtime

Currently has the same capabilities as qwen3-omni-flash-realtime-2025-12-01

Stable

65,536

49,152

16,384

No free quota

qwen3-omni-flash-realtime-2025-12-01

Snapshot

qwen3-omni-flash-realtime-2025-09-15

After the free quota is used up, input and output are billed as follows:

Input

Unit price (per 1M tokens)

Text

$0.315

Audio

$2.709

Image

$0.559

Output

Unit price (per 1M tokens)

Text

$1.19 (if the input contains only text)

$2.179 (if the input contains images or audio)

Text + Audio

$10.766 (audio)

The text part is not billed.

More models

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(tokens)

qwen-omni-turbo-realtime

Currently equivalent to qwen-omni-turbo-2025-05-08.

Stable

32,768

30,720

2,048

No free quota

qwen-omni-turbo-realtime-latest

Always aligned with the capabilities of the latest snapshot.

Latest

qwen-omni-turbo-realtime-2025-05-08

Snapshot

The input and output billing rules are as follows:

Input

Unit price (per 1M tokens)

Text

$0.230

Audio

$3.584

Image

$0.861

Output

Unit price (per 1M tokens)

Text

$0.918 (for text-only input)

$2.581 (for input with images or audio)

Text + Audio

$7.168 (for audio)

Text output is not billed.

We recommend using the Qwen3-Omni-Flash-Realtime model, which offers significant capability improvements over Qwen-Omni-Turbo-Realtime (which will no longer be updated). For audio output:

  • qwen3-omni-flash-realtime-2025-12-01 supports 49 voice options, while qwen3-omni-flash-realtime-2025-09-15 and qwen3-omni-realtime-flash support 17 voice options. Qwen-Omni-Turbo-Realtime supports only 4.

  • Supports 10 languages, compared to Qwen-Omni-Turbo-Realtime's 2.

QVQ

QVQ is a visual reasoning model that supports visual input and CoT output. It demonstrates stronger capabilities in math, programming, visual analysis, creation, and general tasks. Usage | Try online

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Version

Context window

Max input

Max CoT

Max response

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qvq-max

Matches the capabilities of qvq-max-2025-03-25.

Stable

131,072

106,496

Max per image: 16,384

16,384

8,192

$1.2

$4.80

1 million input tokens and 1 million output tokens

Valid for 90 days after you activate Model Studio.

qvq-max-latest

Always matches the latest snapshot version.

Latest

qvq-max-2025-03-25

Also known as qvq-max-0325.

Snapshot

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Version

Context window

Max input

Max CoT

Max response

Input cost

Output cost

(tokens)

(per 1M tokens)

qvq-max

Provides stronger visual reasoning and instruction-following capabilities than qvq-plus and delivers optimal performance for more complex tasks.
Currently matches the capabilities of the qvq-max-2025-03-25 snapshot.

Stable

131,072

106,496

Max per image: 16,384

16,384

8,192

$1.147

$4.588

qvq-max-latest

Always points to the latest snapshot version.

Latest

qvq-max-2025-05-15

Also known as qvq-max-0515.

Snapshot

qvq-max-2025-03-25

Also known as qvq-max-0325.

qvq-plus

Currently matches the capabilities of the qvq-plus-2025-05-15 snapshot.

Stable

$0.287

$0.717

qvq-plus-latest

Always points to the latest snapshot version.

Latest

qvq-plus-2025-05-15

Also known as qvq-plus-0515.

Snapshot

Qwen-VL

Qwen-VL is a text generation model with visual (image) understanding capabilities. It performs tasks such as optical character recognition (OCR), summarization, and reasoning. For example, it can extract attributes from product photos or solve problems from images of exercises. How to use | API reference | Try online

Qwen-VL models are billed based on the total number of input and output tokens. For more information about image token calculation rules, see Visual Understanding.

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT and output

(tokens)

(per 1M tokens)

qwen3-vl-plus

Currently has the same capabilities as qwen3-vl-plus-2025-09-23

Stable

Thinking

262,144

258,048

Max per image: 16,384

81,920

32,768

Tiered pricing. See details below.

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-plus-2025-09-23

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash

Currently has the same capabilities as qwen3-vl-flash-2025-10-15

Stable

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2025-10-15

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

The models above use tiered pricing based on the number of input tokens in the current request. The prices for input and output are the same for thinking and non-thinking modes.

qwen3-vl-plus series

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

0 < Token ≤ 32 KB

$0.2

$1.6

32 K < Token ≤ 128 K

$0.3

$2.4

128 K < Token ≤ 256 K

$0.6

$4.8

qwen3-vl-flash series

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

0 < Tokens ≤ 32K

$0.05

$0.4

32K < Tokens ≤ 128K

$0.075

$0.6

128K < Tokens ≤ 256K

$0.12

$0.96

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT and output

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-vl-plus

Currently equivalent to qwen3-vl-plus-2025-09-23

Stable

Thinking

262,144

258,048

Max per image: 16,384

81,920

32,768

Tiered pricing. See details below.

1 million tokens each

Valid for 90 days after you activate Model Studio

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-plus-2025-12-19

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-plus-2025-09-23

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash

Currently equivalent to qwen3-vl-flash-2025-10-15

Stable

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2026-01-22

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2025-10-15

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

The models above use tiered pricing based on the number of input tokens in the current request. The prices for input and output are the same for thinking and non-thinking modes.

qwen3-vl-plus series

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

0 < Tokens ≤ 32K

$0.2

$1.6

32K < Tokens ≤ 128K

$0.3

$2.4

128K < Tokens ≤ 256K

$0.6

$4.8

qwen3-vl-flash series

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

0 < Token ≤ 32 KB

$0.05

$0.4

32 K < Token ≤ 128 K

$0.075

$0.6

128 KB < Token ≤ 256 KB

$0.12

$0.96

More models

Qwen-VL-Max

Qwen-VL-Max outperforms Qwen-VL-Plus. All models below belong to the Qwen2.5-VL series.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen-vl-max

Offers enhanced visual reasoning and instruction-following capabilities compared to qwen-vl-plus, delivering optimal performance on more complex tasks.
Matches the capabilities of qwen-vl-max-2025-08-13.

Stable

131,072

129,024

Max per image: 16,384

8,192

$0.8

$3.2

1 million tokens

Valid for 90 days after Model Studio is activated.

qwen-vl-max-latest

Always points to the latest snapshot version.

Latest

$0.8

$3.2

qwen-vl-max-2025-08-13

Also known as qwen-vl-max-0813.
Visual understanding capabilities show comprehensive improvement, with significant enhancements in mathematics, reasoning, object detection, and multilingual processing.

Snapshot

qwen-vl-max-2025-04-08

Also known as qwen-vl-max-0408.
Part of the Qwen2.5-VL series, this model extends the Context window to 128k and significantly enhances math and reasoning capabilities.
Qwen-VL-Plus

Qwen-VL-Plus offers balanced performance and cost. All models below belong to the Qwen2.5-VL series.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen-vl-plus

Currently matches the qwen-vl-plus-2025-08-15 snapshot.

Stable

131,072

129,024

Max per image: 16,384

8,192

$0.21

$0.63

1 million tokens

Valid for 90 days after you activate Model Studio.

qwen-vl-plus-latest

Always matches the latest snapshot version.

Latest

$0.21

$0.63

qwen-vl-plus-2025-08-15

Also known as qwen-vl-plus-0815
Significantly improved object recognition and localization, and multilingual processing capabilities.

Snapshot

qwen-vl-plus-2025-05-07

Also known as qwen-vl-plus-0507
Significantly improved math, reasoning, and surveillance video content understanding capabilities.

qwen-vl-plus-2025-01-25

Also known as qwen-vl-plus-0125
Part of the Qwen2.5-VL series, this model extends the Context window to 128k and significantly enhances image and video understanding capabilities.

US

In US deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are limited to the US.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

(tokens)

(per 1M tokens)

qwen3-vl-flash-us

Currently provides the same capabilities as qwen3-vl-flash-2025-10-15-us.

Stable

Thinking

258,048

Max per image: 16,384

81,920

32,768

Tiered pricing. See details below.

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2025-10-15-us

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

The models above use tiered pricing based on the number of input tokens in the current request. The prices for input and output are the same for thinking and non-thinking modes. qwen3-vl-flash-us supports context cache.

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

0 < Token ≤ 32 KB

$0.05

$0.4

Over 32K, up to 128K

$0.075

$0.6

Over 128K, up to 256K

$0.12

$0.96

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-vl-plus

Currently has the same capabilities as qwen3-vl-plus-2025-09-23
Batch calls at half price

Stable

Thinking

262,144

258,048

Max per image: 16,384

81,920

32,768

Tiered pricing. See details below.

No free quota

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-plus-2025-12-19

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-plus-2025-09-23

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash

Currently has the same capabilities as qwen3-vl-flash-2025-10-15
Batch calls at half price

Stable

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2026-01-22

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2025-10-15

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

The models above use tiered pricing based on the number of input tokens in the current request. The prices for input and output are the same for thinking and non-thinking modes.

qwen3-vl-plus series

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

0 < Token ≤ 32K

$0.143

$1.434

32K < Token ≤ 128K

$0.215

$2.15

128K < Token ≤ 256K

$0.43

$4.301

qwen3-vl-flash series

Input tokens per request

Input price (per 1M tokens)

Output price (per 1M tokens)

0 < tokens ≤ 32K

$0.022

$0.215

32K < tokens ≤ 128K

$0.043

$0.43

128K < tokens ≤ 256K

$0.086

$0.859

More models

Qwen-VL-Max series
Models updated on or after qwen-vl-max-2025-01-25 belong to the Qwen2.5-VL series.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-vl-max

Provides enhanced visual reasoning and instruction-following capabilities compared to qwen-vl-plus, and delivers optimal performance for more complex tasks.
Currently matches the capabilities of qwen-vl-max-2025-08-13.
batch calls at half price.

Stable

131,072

129,024

Max per image: 16,384

8,192

$0.23

$0.574

qwen-vl-max-latest

Always points to the latest snapshot version.
batch calls at half price.

Latest

qwen-vl-max-2025-08-13

Also known as qwen-vl-max-0813.
Features fully upgraded visual understanding metrics and significantly enhanced capabilities for math, reasoning, object recognition, and multilingual processing.

Snapshot

qwen-vl-max-2025-04-08

Also known as qwen-vl-max-0408.
Provides enhanced math and reasoning capabilities.

$0.431

$1.291

qwen-vl-max-2025-04-02

Also known as qwen-vl-max-0402.
Offers significantly improved accuracy in solving complex math problems.

qwen-vl-max-2025-01-25

Also known as qwen-vl-max-0125.

Upgraded to the Qwen2.5-VL series, extends the context to 128,000 tokens, and significantly enhances image and video understanding capabilities.

qwen-vl-max-2024-12-30

Also known as qwen-vl-max-1230.

32,768

30,720

Max per image: 16,384

2,048

$0.431

$1.291

qwen-vl-max-2024-11-19

Also known as qwen-vl-max-1119.

qwen-vl-max-2024-10-30

Also known as qwen-vl-max-1030.

$2.868

qwen-vl-max-2024-08-09

Also known as qwen-vl-max-0809.
Qwen-VL-Plus series
Models updated on or after qwen-vl-plus-2025-01-25 belong to the Qwen2.5-VL series.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-vl-plus

Matches the capabilities of the qwen-vl-plus-2025-08-15 snapshot.
Batch calls at half price

Stable

131,072

129,024

Max per image: 16,384

8,192

$0.115

$0.287

qwen-vl-plus-latest

Always matches the latest snapshot version.
Batch calls at half price

Latest

qwen-vl-plus-2025-08-15

Also known as qwen-vl-plus-0815.
Features significantly improved object recognition, localization, and multilingual processing.

Snapshot

qwen-vl-plus-2025-07-10

Also known as qwen-vl-plus-0710.
Further improves the understanding of surveillance video content.

32,768

30,720

Max per image: 16,384

$0.022

$0.216

qwen-vl-plus-2025-05-07

Also known as qwen-vl-plus-0507.
Features significantly improved capabilities for math, reasoning, and understanding surveillance video content.

131,072

129,024

Max per image: 16,384

$0.216

$0.646

qwen-vl-plus-2025-01-25

Also known as qwen-vl-plus-0125.

Upgraded to the Qwen2.5-VL series. This version extends the Context window to 128k and significantly enhances image and video understanding capabilities.

qwen-vl-plus-2025-01-02

Also known as qwen-vl-plus-0102.

32,768

30,720

Max per image: 16,384

2,048

qwen-vl-plus-2024-08-09

Also known as qwen-vl-plus-0809.

Qwen-OCR

Qwen-OCR is a model that specializes in text extraction. Compared to Qwen-VL, it focuses more on extracting text from images of items such as documents, tables, exam questions, and handwriting. It can recognize multiple languages, including English, French, Japanese, Korean, German, Russian, and Italian. Usage | API referenceTry online

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model

Version

Context window

Max input

Max output

Input price

Output price

(tokens)

(per 1M tokens)

qwen-vl-ocr

Matches the capabilities of qwen-vl-ocr-2025-11-20.

Stable

34,096

30,000

Max of 30,000 per image.

4,096

$0.07

$0.16

qwen-vl-ocr-2025-11-20

Also known as qwen-vl-ocr-1120.
Based on the Qwen3-VL architecture, this model significantly improves document parsing and text localization capabilities.

Snapshot

38,192

8,192

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Version

Context window

Max input

Max output

Input price

Output price

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen-vl-ocr

Stable

34,096

30,000

Max per image: 30,000

4,096

$0.72

$0.72

1 million input tokens and 1 million output tokens

Valid for 90 days after you activate Model Studio

qwen-vl-ocr-2025-11-20

Also known as qwen-vl-ocr-1120
Based on the Qwen3-VL architecture, this model provides significantly improved document parsing and text localization.

Snapshot

38,192

8,192

$0.07

$0.16

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Version

Context window

Max input

Max output

Input price

Output price

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen-vl-ocr

Matches the capabilities of qwen-vl-ocr-2025-08-28.
Batch calls are available at half price.

Stable

34,096

30,000

Max per image: 30,000

4,096

$0.717

$0.717

No free quota

qwen-vl-ocr-latest

Always matches the capabilities of the latest version.

Latest

38,192

8,192

$0.043

$0.072

qwen-vl-ocr-2025-11-20

Also known as qwen-vl-ocr-1120.
Based on the Qwen3-VL architecture, this model significantly improves document parsing and text localization capabilities.

Snapshot

qwen-vl-ocr-2025-08-28

Also known as qwen-vl-ocr-0828.

34,096

4,096

$0.717

$0.717

qwen-vl-ocr-2025-04-13

Also known as qwen-vl-ocr-0413.

qwen-vl-ocr-2024-10-28

Also known as qwen-vl-ocr-1028.

Qwen-Math

Qwen-Math is a language model that specializes in solving mathematical problems. Usage | API reference | Try online

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-math-plus

Currently equivalent to qwen-math-plus-2024-09-19

Stable

4,096

3,072

3,072

$0.574

$1.721

qwen-math-plus-latest

Always points to the latest snapshot version

Latest

qwen-math-plus-2024-09-19

Also known as qwen-math-plus-0919

Snapshot

qwen-math-plus-2024-08-16

Also known as qwen-math-plus-0816

qwen-math-turbo

Currently equivalent to qwen-math-turbo-2024-09-19

Stable

$0.287

$0.861

qwen-math-turbo-latest

Always points to the latest snapshot version

Latest

qwen-math-turbo-2024-09-19

Also known as qwen-math-turbo-0919

Snapshot

Qwen-Coder

Qwen-Coder is a code generation model. The latest Qwen3-Coder-Plus series is based on Qwen3 and features powerful coding agent capabilities. It excels at tool calling, environment interaction, and autonomous programming, combining excellent coding skills with general-purpose abilities. Usage | API reference | Try online

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen3-coder-plus

Currently maps to qwen3-coder-plus-2025-09-23

Stable

1,000,000

997,952

65,536

Tiered pricing. See details below.

qwen3-coder-plus-2025-09-23

Snapshot

qwen3-coder-plus-2025-07-22

Snapshot

qwen3-coder-flash

Currently maps to qwen3-coder-flash-2025-07-28

Stable

qwen3-coder-flash-2025-07-28

Snapshot

The models above use tiered pricing based on the number of input tokens in the current request.

qwen3-coder-plus series

qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are priced as follows. qwen3-coder-plus supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32K

$1

$5

32K < Tokens ≤ 128K

$1.8

$9

128K < Tokens ≤ 256K

$3

$15

256K < Tokens ≤ 1M

$6

$60

qwen3-coder-flash series

qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are priced as follows. qwen3-coder-flash supports context cache. Input text that hits the cache is billed at 20% of the unit price.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32K

$0.3

$1.5

32K < Tokens ≤ 128K

$0.5

$2.5

128K < Tokens ≤ 256K

$0.8

$4

256K < Tokens ≤ 1M

$1.6

$9.6

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-coder-plus

Currently equivalent to qwen3-coder-plus-2025-09-23

Stable

1,000,000

997,952

65,536

Tiered pricing. See details below.

1 million input tokens and 1 million output tokens

Valid for 90 days after activating Model Studio

qwen3-coder-plus-2025-09-23

Snapshot

qwen3-coder-plus-2025-07-22

Snapshot

qwen3-coder-flash

Currently equivalent to qwen3-coder-flash-2025-07-28

Stable

qwen3-coder-flash-2025-07-28

Snapshot

The models above use tiered pricing based on the number of input tokens in the current request.

qwen3-coder-plus series

qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are priced as follows. qwen3-coder-plus supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price, while input text that hits the explicit cache is billed at 10% of the unit price.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

Up to 32K

$1

$5

32 K < Token ≤ 128 K

$1.8

$9

128 KB < Token ≤ 256 KB

$3

$15

Over 256K to 1M

$6

$60

qwen3-coder-flash series

qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are priced as follows. qwen3-coder-flash supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price, while input text that hits the explicit cache is billed at 10% of the unit price.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32K

$0.3

$1.5

32K < Tokens ≤ 128K

$0.5

$2.5

128K < Tokens ≤ 256K

$0.8

$4

256K < Tokens ≤ 1M

$1.6

$9.6

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen3-coder-plus

This model is functionally identical to qwen3-coder-plus-2025-09-23.

Stable

1,000,000

997,952

65,536

Tiered pricing. See details below.

qwen3-coder-plus-2025-09-23

Snapshot

qwen3-coder-plus-2025-07-22

Snapshot

qwen3-coder-flash

This model has the same capabilities as qwen3-coder-flash-2025-07-28.

Stable

qwen3-coder-flash-2025-07-28

Snapshot

The models above use tiered pricing based on the number of input tokens in the current request.

qwen3-coder-plus series

qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are priced as follows. qwen3-coder-plus supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price, while input text that hits the explicit cache is billed at 10% of the unit price.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32 K

$0.574

$2.294

32 K < Tokens ≤ 128 K

$0.861

$3.441

128 K < Tokens ≤ 256 K

$1.434

$5.735

256 K < Tokens ≤ 1 M

$2.868

$28.671

qwen3-coder-flash series

qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are priced as follows. qwen3-coder-flash supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price, while input text that hits the explicit cache is billed at 10% of the unit price.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

Up to 32K

$0.144

$0.574

32 K < Token ≤ 128 K

$0.216

$0.861

128 K < Token ≤ 256 K

$0.359

$1.434

256 KB < Token ≤ 1 MB

$0.717

$3.584

More models

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-coder-plus

The capabilities are currently identical to those of qwen-coder-plus-2024-11-06.

Stable

131,072

129,024

8,192

$0.502

$1.004

qwen-coder-plus-latest

The capabilities are the same as those of the latest snapshot of qwen-coder-plus.

Latest

qwen-coder-plus-2024-11-06

Alias: qwen-coder-plus-1106

Snapshot

qwen-coder-turbo

This model offers the same capabilities as qwen-coder-turbo-2024-09-19.

Stable

131,072

129,024

8,192

$0.287

$0.861

qwen-coder-turbo-latest

The capabilities are identical to those of the latest snapshot of qwen-coder-turbo.

Latest

qwen-coder-turbo-2024-09-19

Alias: qwen-coder-turbo-0919

Snapshot

Qwen-MT

Qwen-MT is a flagship large language model for translation that is fully upgraded based on Qwen 3. It supports translation between 92 languages, including Chinese, English, Japanese, Korean, French, Spanish, German, Thai, Indonesian, Vietnamese, and Arabic, and features comprehensive upgrades in model performance and translation quality. It offers more stable term customization, format retention, and domain-specific prompting, which makes translations more accurate and natural. Usage

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-mt-plus

Part of Qwen3-MT

16,384

8,192

8,192

$2.46

$7.37

qwen-mt-flash

Part of Qwen3-MT

$0.16

$0.49

qwen-mt-lite

Part of Qwen3-MT

$0.12

$0.36

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

Rule Description

(tokens)

(per 1M tokens)

qwen-mt-plus

From Qwen3-MT

16,384

8,192

8,192

$2.46

$7.37

1 million tokens

Valid for 90 days after you activate Model Studio

qwen-mt-flash

From Qwen3-MT

$0.16

$0.49

qwen-mt-lite

From Qwen3-MT

$0.12

$0.36

qwen-mt-turbo

From Qwen3-MT

$0.16

$0.49

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-mt-plus

Part of the Qwen3-MT

16,384

8,192

8,192

$0.259

$0.775

qwen-mt-flash

Part of the Qwen3-MT

$0.101

$0.280

qwen-mt-lite

Part of the Qwen3-MT

$0.086

$0.229

qwen-mt-turbo

Part of the Qwen3-MT

$0.101

$0.280

Qwen data mining model

The Qwen data mining model can extract structured information from documents for use in fields such as data annotation and content moderation. Usage | API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(tokens)

(per 1M tokens)

qwen-doc-turbo

262,144

253,952

32,768

$0.087

$0.144

No free quota

Qwen deep research model

The Qwen deep research model can break down complex problems, perform reasoning and analysis using web searches, and generate research reports. Usage | API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1,000 tokens)

qwen-deep-research

1,000,000

997,952

32,768

$0.007742

$0.023367

Text generation - Qwen - Open source

  • In the model names, xxb indicates the parameter size. For example, qwen2-72b-instruct indicates a parameter size of 72 billion (72B).

  • Model Studio supports invoking the open-source versions of Qwen. You do not need to deploy the models locally. For open-source versions, we recommend using the Qwen3 and Qwen2.5 models.

Qwen3

qwen3-next-80b-a3b-thinking, released in September 2025, supports only thinking mode. It improves instruction-following capabilities and provides more concise summary responses compared to qwen3-235b-a22b-thinking-2507.

qwen3-next-80b-a3b-instruct, released in September 2025, supports only non-thinking mode. It enhances Chinese understanding, logical reasoning, and text generation capabilities compared to qwen3-235b-a22b-instruct-2507.

The qwen3-235b-a22b-thinking-2507 and qwen3-30b-a3b-thinking-2507 models, released in July 2025, support only thinking mode and are upgrades to qwen3-235b-a22b (thinking mode) and qwen3-30b-a3b (thinking mode).

The qwen3-235b-a22b-instruct-2507 and qwen3-30b-a3b-instruct-2507 models, released in July 2025, support only non-thinking mode and are upgrades to qwen3-235b-a22b (non-thinking mode) and qwen3-30b-a3b (non-thinking mode).

The Qwen3 models, released in April 2025, support both thinking and non-thinking modes. You can switch between modes using the enable_thinking parameter. Additionally, Qwen3 models offer the following significant improvements:

  1. Reasoning ability: Significantly outperforms QwQ and similarly sized non-reasoning models in evaluations for math, code, and logical reasoning, achieving top-tier industry performance for a model of its size.

  2. Human preference alignment: Features greatly enhanced capabilities for creative writing, role assumption, multi-turn conversation, and instruction following. Its general abilities significantly surpass those of similarly sized models.

  3. Agent capabilities: Achieves industry-leading performance in both thinking and non-thinking modes and enables precise external tool invocation.

  4. Multilingual support: Supports over 100 languages and dialects and provides notable improvements in multilingual translation, instruction understanding, and commonsense reasoning.

    Supported languages

    English

    Simplified Chinese

    Traditional Chinese

    French

    Spanish

    Arabic is written with the Arabic alphabet and serves as the official language of many Arab countries.

    Written in the Cyrillic alphabet, Russian is the official language of Russia and several other countries.

    Portuguese uses the Latin alphabet and is the official language of Portugal, Brazil, and other countries.

    German is written in the Latin alphabet and is an official language in countries such as Germany and Austria.

    Italian uses the Latin alphabet and is an official language in Italy, San Marino, and parts of Switzerland.

    Dutch uses the Latin alphabet and is an official language in the Netherlands, the Flemish Region of Belgium, and Suriname.

    Danish, the official language of Denmark, uses the Latin alphabet.

    Irish uses the Latin alphabet and is one of the official languages of Ireland.

    Welsh uses the Latin alphabet and is one of the official languages of Wales.

    Finnish, an official language of Finland, is written in the Latin alphabet.

    Icelandic is the official language of Iceland and uses the Latin alphabet.

    Swedish is the official language of Sweden and uses the Latin alphabet.

    Norwegian Nynorsk uses the Latin alphabet. It is one of Norway's two official written standards, along with Bokmål.

    Norwegian Bokmål, a major language of Norway, is written in the Latin alphabet.

    Japanese is the official language of Japan and is written with Japanese characters.

    Korean, written in the Hangul script, is the official language of South Korea and North Korea.

    Vietnamese, the official language of Vietnam, uses the Latin alphabet.

    Thai is the official language of Thailand and uses the Thai alphabet.

    Indonesian is the official language of Indonesia and is written in the Latin alphabet.

    Malay uses the Latin alphabet and is the primary language of Malaysia and surrounding regions.

    Burmese, the official language of Myanmar, is written using the Burmese alphabet.

    Tagalog, one of the major languages of the Philippines, uses the Latin alphabet.

    Khmer, written in the Khmer script, is the official language of Cambodia.

    Lao, written in the Lao script, is the official language of Laos.

    Hindi, one of the official languages of India, uses the Devanagari script.

    Bengali is written in the Bengali alphabet and serves as the official language of Bangladesh and the Indian state of West Bengal.

    Written in the Arabic script, Urdu is one of the official languages of Pakistan and is also spoken in India.

    Nepali, written in the Devanagari script, is the official language of Nepal.

    Hebrew, written in the Hebrew alphabet, is the official language of Israel.

    Turkish is written in the Latin alphabet and is the official language of Türkiye and Northern Cyprus.

    Persian uses the Arabic script and is the official language in countries such as Iran and Tajikistan.

    Polish is the official language of Poland and uses the Latin alphabet.

    Ukrainian, written in the Cyrillic alphabet, is the official language of Ukraine.

    Czech, the official language of the Czech Republic, uses the Latin alphabet.

    Romanian, written in the Latin alphabet, is the official language of Romania and Moldova.

    Bulgarian is the official language of Bulgaria and uses the Cyrillic alphabet.

    Slovak, the official language of Slovakia, is written in the Latin alphabet.

    Hungarian, which uses the Latin alphabet, is the official language of Hungary.

    Slovenian is the official language of Slovenia and uses the Latin alphabet.

    Latvian is the official language of Latvia and is written in the Latin alphabet.

    Estonian, the official language of Estonia, is written in the Latin alphabet.

    Lithuanian is the official language of Lithuania and uses the Latin alphabet.

    Belarusian, written in the Cyrillic alphabet, is one of the official languages of Belarus.

    Greek, written in the Greek alphabet, is the official language of Greece and Cyprus.

    Croatian is the official language of Croatia and uses the Latin alphabet.

    Macedonian is the official language of North Macedonia and is written in the Cyrillic alphabet.

    Maltese is an official language in Malta and is written in the Latin alphabet.

    Serbian, the official language of Serbia, uses the Cyrillic alphabet.

    Bosnian is one of the official languages of Bosnia and Herzegovina and is written in the Latin alphabet.

    Georgian is the official language of Georgia and is written in the Georgian script.

    Armenian is the official language of Armenia and uses the Armenian alphabet.

    North Azerbaijani uses the Latin alphabet and is the official language of Azerbaijan.

    Kazakh, the official language of Kazakhstan, is written in the Cyrillic alphabet.

    Northern Uzbek is written in the Latin alphabet and is the official language of Uzbekistan.

    Tajik, the official language of Tajikistan, is written in the Cyrillic alphabet.

    Swahili uses the Latin alphabet and is a lingua franca or an official language in many East African countries.

    Afrikaans uses the Latin alphabet and is spoken mainly in South Africa and Namibia.

    Cantonese is written in Traditional Chinese characters. It is a primary language in China's Guangdong Province, Hong Kong, and Macau.

    Luxembourgish, which uses the Latin alphabet, is one of the official languages of Luxembourg and is also spoken in parts of Germany.

    Limburgish is written using the Latin alphabet and is spoken mainly in parts of the Netherlands, Belgium, and Germany.

    Catalan uses the Latin alphabet and is spoken in Catalonia and other parts of Spain.

    Galician uses the Latin alphabet and is spoken mainly in the Galicia region of Spain.

    Asturian uses the Latin alphabet and is mainly spoken in the Asturias region of Spain.

    Basque uses the Latin alphabet. It is mainly spoken in the Basque Country of Spain and France. It is also one of the official languages of the Basque Autonomous Community in Spain.

    Occitan uses the Latin alphabet and is primarily spoken in Southern France.

    Venetian is spoken mainly in the Veneto region of Italy and uses the Latin alphabet.

    Sardinian uses the Latin alphabet and is primarily spoken in Sardinia, Italy.

    The Sicilian language is written in the Latin alphabet and is mainly spoken in Sicily, Italy.

    Friulian uses the Latin alphabet and is spoken mainly in Friuli-Venezia Giulia, Italy.

    Lombard is mainly spoken in the Lombardy region of Italy and uses the Latin alphabet.

    Ligurian uses the Latin alphabet and is spoken primarily in the Liguria region of Italy.

    Faroese is written in the Latin alphabet and is spoken primarily in the Faroe Islands, where it is one of the official languages.

    Tosk Albanian, the primary dialect of southern Albania, uses the Latin alphabet.

    Silesian uses the Latin alphabet and is mainly spoken in Poland.

    Bashkir uses the Cyrillic alphabet and is mainly spoken in Bashkortostan, Russia.

    Tatar uses the Cyrillic alphabet and is spoken primarily in Tatarstan, Russia.

    Mesopotamian Arabic is written in the Arabic script and spoken mainly in Iraq.

    Najdi Arabic uses the Arabic script and is spoken primarily in the Najd region of Saudi Arabia.

    Egyptian Arabic is written in the Arabic alphabet and spoken primarily in Egypt.

    Levantine Arabic uses the Arabic script and is mainly spoken in Syria and Lebanon.

    Ta'izzi-Adeni Arabic is a Semitic language spoken primarily in Yemen and the Hadhramaut region of Saudi Arabia and written in the Arabic script.

    Dari uses the Arabic script and is one of the official languages of Afghanistan.

    Tunisian Arabic is written in the Arabic script and is mainly spoken in Tunisia.

    Moroccan Arabic is written in the Arabic script and is primarily used in Morocco.

    Cape Verdean Creole (Kabuverdianu) is mainly spoken in Cape Verde and uses the Latin alphabet.

    Tok Pisin is a primary lingua franca in Papua New Guinea written in the Latin alphabet.

    Eastern Yiddish, written in the Hebrew alphabet, is mainly used in Jewish communities.

    Sindhi is written in the Arabic alphabet and is one of the official languages in Pakistan's Sindh province.

    Sinhala is written in the Sinhala alphabet and is one of the official languages in Sri Lanka.

    Telugu is written in the Telugu script and is one of the official languages in the Indian states of Andhra Pradesh and Telangana.

    Punjabi is written in the Gurmukhi script, spoken in the Indian state of Punjab, and one of the official languages of India.

    Written in the Tamil script, Tamil is one of the official languages in the Indian state of Tamil Nadu and Sri Lanka.

    Gujarati is written in the Gujarati script and is an official language of the Indian state of Gujarat.

    Malayalam is written in the Malayalam script and is one of the official languages of the Indian state of Kerala.

    Marathi, written in the Devanagari script, is one of the official languages of the Indian state of Maharashtra.

    Kannada, written in the Kannada script, is one of the official languages of the Indian state of Karnataka.

    Magahi is written in the Devanagari script and is mainly spoken in the Indian state of Bihar.

    Oriya

    Awadhi is written in the Devanagari script and is mainly used in the Indian state of Uttar Pradesh.

    Maithili is written in the Devanagari script. It is one of India's official languages and is spoken in the Bihar state of India and the Terai plains of Nepal.

    Assamese uses the Bengali script and is one of the official languages of the Indian state of Assam.

    Chhattisgarhi is written in the Devanagari script and is mainly spoken in the Indian state of Chhattisgarh.

    Bhojpuri uses the Devanagari script and is spoken in parts of India and Nepal.

    Minangkabau is written in the Latin alphabet and is spoken mainly on the island of Sumatra in Indonesia.

    Balinese is written in the Latin alphabet and is spoken mainly on the island of Bali, Indonesia.

    Javanese is widely spoken on the island of Java in Indonesia and is written using both the Latin alphabet and the Javanese script.

    Banjar is written in the Latin alphabet and is mainly spoken on the island of Kalimantan in Indonesia.

    Sundanese is written in the Latin alphabet, though it traditionally used the Sundanese script. It is mainly spoken in western Java, Indonesia.

    Cebuano uses the Latin alphabet and is mainly spoken in the Cebu region of the Philippines.

    Pangasinan is written in the Latin alphabet and is mainly spoken in the province of Pangasinan in the Philippines.

    Ilocano (Iloko) is mainly spoken in the Philippines and uses the Latin alphabet.

    Waray is a language of the Philippines that uses the Latin alphabet.

    Haitian Creole is an official language of Haiti that uses the Latin alphabet.

    Papiamento uses the Latin alphabet and is primarily spoken in Caribbean regions such as Aruba and Curaçao.

  5. Response formatting: Resolves issues found in previous versions, such as incorrect Markdown formatting, response truncation, and incorrectly formatted boxed output.

Qwen3 open-source models released in April 2025 do not support non-streaming output in thinking mode.
When thinking mode is enabled for Qwen3 open-source models, if no reasoning process is output, billing applies at the non-thinking mode rate.

Thinking mode | Non-thinking mode | Usage

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model

Mode

Context window

Max input

Max CoT

Max response

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-next-80b-a3b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.15

$1.2

No free quota

qwen3-next-80b-a3b-instruct

Non-thinking only

129,024

-

qwen3-235b-a22b-thinking-2507

Thinking only

126,976

81,920

$0.23

$2.3

qwen3-235b-a22b-instruct-2507

Non-thinking only

129,024

-

$0.92

qwen3-30b-a3b-thinking-2507

Thinking only

126,976

81,920

$0.2

$2.4

qwen3-30b-a3b-instruct-2507

Non-thinking only

129,024

-

$0.8

qwen3-235b-a22b

Non-thinking

129,024

-

16,384

$0.7

$2.8

Thinking

98,304

38,912

$8.4

qwen3-32b

Non-thinking

129,024

-

$0.16

$0.64

Thinking

98,304

38,912

qwen3-30b-a3b

Non-thinking

129,024

-

$0.2

$0.8

Thinking

98,304

38,912

$2.4

qwen3-14b

Non-thinking

129,024

-

8,192

$0.35

$1.4

Thinking

98,304

38,912

$4.2

qwen3-8b

Non-thinking

129,024

-

$0.18

$0.7

Thinking

98,304

38,912

$2.1

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Mode

Context window

Max input

Max CoT

Max response

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-next-80b-a3b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.15

$1.2

1 million tokens each

Valid for 90 days after you activate Model Studio

qwen3-next-80b-a3b-instruct

Non-thinking only

129,024

-

qwen3-235b-a22b-thinking-2507

Thinking only

126,976

81,920

$0.23

$2.3

qwen3-235b-a22b-instruct-2507

Non-thinking only

129,024

-

$0.92

qwen3-30b-a3b-thinking-2507

Thinking only

126,976

81,920

$0.2

$2.4

qwen3-30b-a3b-instruct-2507

Non-thinking only

129,024

-

$0.8

qwen3-235b-a22b

This model and the following models will be released in April 2025.

Non-thinking

129,024

-

16,384

$0.7

$2.8

Thinking

98,304

38,912

$8.4

qwen3-32b

Non-thinking

129,024

-

$0.16

$0.64

Thinking

98,304

38,912

qwen3-30b-a3b

Non-thinking

129,024

-

$0.2

$0.8

Thinking

98,304

38,912

$2.4

qwen3-14b

Non-thinking

129,024

-

8,192

$0.35

$1.4

Thinking

98,304

38,912

$4.2

qwen3-8b

Non-thinking

129,024

-

$0.18

$0.7

Thinking

98,304

38,912

$2.1

qwen3-4b

Non-thinking

129,024

-

$0.11

$0.42

Thinking

98,304

38,912

$1.26

qwen3-1.7b

Non-thinking

32,768

30,720

-

$0.42

Thinking

28,672

CoT + response: 30,720

$1.26

qwen3-0.6b

Non-thinking

30,720

-

$0.42

Thinking

28,672

CoT + response: 30,720

$1.26

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Mode

Context window

Max input

Max CoT

Max response

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-next-80b-a3b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.144

$1.434

No free quota

qwen3-next-80b-a3b-instruct

Non-thinking only

129,024

-

$0.574

qwen3-235b-a22b-thinking-2507

Thinking only

126,976

81,920

$0.287

$2.868

qwen3-235b-a22b-instruct-2507

Non-thinking only

129,024

-

$1.147

qwen3-30b-a3b-thinking-2507

Thinking only

126,976

81,920

$0.108

$1.076

qwen3-30b-a3b-instruct-2507

Non-thinking only

129,024

-

$0.431

qwen3-235b-a22b

Non-thinking

129,024

-

16,384

$0.287

$1.147

Thinking

98,304

38,912

$2.868

qwen3-32b

Non-thinking

129,024

-

$0.287

$1.147

Thinking

98,304

38,912

$2.868

qwen3-30b-a3b

Non-thinking

129,024

-

$0.108

$0.431

Thinking

98,304

38,912

$1.076

qwen3-14b

Non-thinking

129,024

-

8,192

$0.144

$0.574

Thinking

98,304

38,912

$1.434

qwen3-8b

Non-thinking

129,024

-

$0.072

$0.287

Thinking

98,304

38,912

$0.717

qwen3-4b

Non-thinking

129,024

-

$0.044

$0.173

Thinking

98,304

38,912

$0.431

qwen3-1.7b

Non-thinking

32,768

30,720

-

$0.173

Thinking

28,672

CoT + response: 30,720

$0.431

qwen3-0.6b

Non-thinking

30,720

-

$0.173

Thinking

28,672

CoT + response: 30,720

$0.431

QwQ - Open source

The QwQ reasoning model is trained on Qwen2.5-32B. Reinforcement learning has significantly improved its inference capabilities. Core metrics for math and code (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench) are comparable to the full-power version of DeepSeek-R1. All metrics significantly exceed those of DeepSeek-R1-Distill-Qwen-32B, which is also based on Qwen2.5-32B. Usage | API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Context window

Max input

Max chain-of-thought

Max response

Input cost

Output cost

(tokens)

(per 1M tokens)

qwq-32b

131,072

98,304

32,768

8,192

$0.287

$0.861

QwQ-Preview

The qwq-32b-preview model is an experimental research model developed by the Qwen team in 2024. It focuses on enhancing AI reasoning capabilities, especially in math and programming. For more information about the limitations of the qwq-32b-preview model, see the QwQ official blog. Usage | API reference | Try it online

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwq-32b-preview

32,768

30,720

16,384

$0.287

$0.861

Qwen2.5

Qwen2.5 is a series of Qwen large language models that includes base and instruction-tuned language models with parameter sizes ranging from 7 billion to 72 billion. Qwen2.5 includes the following improvements over Qwen2:

  • It is pre-trained on our latest large-scale dataset, which contains up to 18 trillion tokens.

  • Pre-training with specialized expert models has significantly increased the model's knowledge and greatly improved its coding and math capabilities.

  • It shows significant improvements in following instructions, generating long text (over 8K tokens), understanding structured data (such as tables), and generating structured output (especially JSON). It is also more resilient to diverse system prompts, which enhances the implementation of role-playing and conditional settings for chatbots.

  • It supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.

UsageAPI referenceTry it online

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(tokens)

(per 1M tokens)

qwen2.5-14b-instruct-1m

1,008,192

1,000,000

8,192

$0.805

$3.22

1 million tokens each

Valid for 90 days after you activate Model Studio.

qwen2.5-7b-instruct-1m

$0.368

$1.47

qwen2.5-72b-instruct

131,072

129,024

$1.4

$5.6

qwen2.5-32b-instruct

$0.7

$2.8

qwen2.5-14b-instruct

$0.35

$1.4

qwen2.5-7b-instruct

$0.175

$0.7

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen2.5-14b-instruct-1m

1,000,000

1,000,000

8,192

$0.144

$0.431

qwen2.5-7b-instruct-1m

$0.072

$0.144

qwen2.5-72b-instruct

131,072

129,024

$0.574

$1.721

qwen2.5-32b-instruct

$0.287

$0.861

qwen2.5-14b-instruct

$0.144

$0.431

qwen2.5-7b-instruct

$0.072

$0.144

qwen2.5-3b-instruct

32,768

30,720

$0.044

$0.130

qwen2.5-1.5b-instruct

Free for a limited time

qwen2.5-0.5b-instruct

QVQ

The qvq-72b-preview model is an experimental research model developed by the Qwen team. It focuses on enhancing visual reasoning capabilities, especially in mathematical reasoning. For more information about the limitations of the qvq-72b-preview model, see the QVQ official blog.Usage | API reference

To have the model output its thinking process before the final answer, you can use the commercial version of the QVQ model.
Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qvq-72b-preview

32,768

16,384

Max 16,384 tokens per image

16,384

$1.721

$5.161

Qwen-Omni

This is a new multimodal large model for understanding and generation, trained on Qwen2.5. It supports text, image, speech, and video inputs, and can generate text and speech simultaneously in a stream. Its multimodal content understanding speed is significantly improved.Usage | API reference

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Context window

Max input

Max output

Free quota

(Note)

(tokens)

qwen2.5-omni-7b

32,768

30,720

2,048

1 million tokens (regardless of modality)

Valid for 90 days after activating Model Studio.

After the free quota is used up, the following billing rules apply to inputs and outputs:

Input

Price (per 1M tokens)

Text

$0.10

Audio

$6.76

Image/Video

$0.28

Output

Price (per 1M tokens)

Text

$0.40 (if the input contains only text)

$0.84 (if the input contains images, audio, or video)

Text+Audio

$13.51 (for the audio component)

The text portion of the output is not billed.

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Context window

Max input

Max output

(tokens)

qwen2.5-omni-7b

32,768

30,720

2,048

The billing rules for inputs and outputs are as follows:

Input

Price (per 1M tokens)

Text

$0.087

Audio

$5.448

Image/Video

$0.287

Output

Price (per 1M tokens)

Text

$0.345 (for text-only input)

$0.861 (if the input includes images, audio, or video)

Text+Audio

$10.895 (for the audio portion)

The text portion of the output is not billed.

Qwen3-Omni-Captioner

Qwen3-Omni-Captioner is an open-source model based on Qwen3-Omni. Without any prompts, it automatically generates accurate and comprehensive descriptions for complex audio, such as speech, ambient sounds, music, and sound effects. It can identify speaker emotions, musical elements (such as style and instruments), and sensitive information, making it suitable for applications such as audio content analysis, security audits, intent recognition, and audio editing. Usage | API reference

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-omni-30b-a3b-captioner

65,536

32,768

32,768

$3.81

$3.06

1 million tokens

Valid for 90 days after you activate Model Studio

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-omni-30b-a3b-captioner

65,536

32,768

32,768

$2.265

$1.821

No free quota

Qwen-VL

This is the open-source version of Alibaba Cloud's Qwen-VL. Usage | API reference

Compared to Qwen2.5-VL, the Qwen3-VL model offers significant improvements:

  • Agent interaction: It can operate computer or mobile interfaces, recognize GUI elements, understand functions, and call tools to perform tasks, achieving top-tier performance in evaluations such as OS World.

  • Visual coding: It can generate code from images or videos and can be used to create HTML, CSS, and JS code from design mockups, website screenshots, and more.

  • Spatial perception: Supports 2D and 3D positioning and accurately judges object orientation, perspective changes, and occlusion relationships.

  • Long video understanding: Supports the understanding of video content up to 20 minutes long and provides precise localization down to the second.

  • Deep thinking: It has deep thinking capabilities and excels at capturing details and analyzing cause and effect, achieving top-tier performance in evaluations such as MathVista and MMMU.

  • Text recognition: Language support is expanded to 33 languages. The model provides more stable performance in scenarios with complex lighting, blur, or tilted text. It also offers significantly improved accuracy for rare characters, ancient texts, and professional terminology.

    Supported languages

    The model supports the following 33 languages: Chinese, Japanese, Korean, Indonesian, Vietnamese, Thai, English, French, German, Russian, Portuguese, Spanish, Italian, Swedish, Danish, Czech, Norwegian, Dutch, Finnish, Turkish, Polish, Swahili, Romanian, Serbian, Greek, Kazakh, Uzbek, Cebuano, Arabic, Urdu, Persian, Hindi/Devanagari, and Hebrew.

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model

Mode

Context window

Max input

Max CoT

Max response

Input cost

Output cost

CoT + Outputs

(tokens)

(per 1M tokens)

qwen3-vl-235b-a22b-thinking

Thinking only

126,976

81,920

$0.4

$4

qwen3-vl-235b-a22b-instruct

Non-thinking only

129,024

-

$1.6

qwen3-vl-32b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.16

$0.64

qwen3-vl-32b-instruct

Non-thinking only

129,024

-

qwen3-vl-30b-a3b-thinking

Thinking only

126,976

81,920

$0.2

$2.4

qwen3-vl-30b-a3b-instruct

Non-thinking only

129,024

-

$0.8

qwen3-vl-8b-thinking

Thinking only

126,976

81,920

$0.18

$2.1

qwen3-vl-8b-instruct

Non-thinking only

129,024

-

$0.7

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Mode

Context window

Max input

Max CoT

Max response

Input cost

Output cost

CoT + Outputs

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-vl-235b-a22b-thinking

Thinking only

126,976

81,920

$0.4

$4

1 million tokens each

Valid for 90 days after you activate Model Studio.

qwen3-vl-235b-a22b-instruct

Non-thinking only

129,024

-

$1.6

qwen3-vl-32b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.16

$0.64

qwen3-vl-32b-instruct

Non-thinking only

129,024

-

qwen3-vl-30b-a3b-thinking

Thinking only

126,976

81,920

$0.2

$2.4

qwen3-vl-30b-a3b-instruct

Non-thinking only

129,024

-

$0.8

qwen3-vl-8b-thinking

Thinking only

126,976

81,920

$0.18

$2.1

qwen3-vl-8b-instruct

Non-thinking only

129,024

-

$0.7

More models

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen2.5-vl-72b-instruct

131,072

129,024

Max per image: 16,384

8,192

$2.8

$8.4

1 million tokens each

Valid for 90 days after activating Model Studio

qwen2.5-vl-32b-instruct

$1.4

$4.2

qwen2.5-vl-7b-instruct

$0.35

$1.05

qwen2.5-vl-3b-instruct

$0.21

$0.63

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Mode

Context window

Max input

Max CoT

Max response

Input cost

Output cost

CoT + Outputs

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-vl-235b-a22b-thinking

Thinking only

131,072

126,976

81,920

$0.287

$2.867

No free quota

qwen3-vl-235b-a22b-instruct

Non-thinking only

129,024

-

$1.147

qwen3-vl-32b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.287

$2.868

qwen3-vl-32b-instruct

Non-thinking only

129,024

-

$1.147

qwen3-vl-30b-a3b-thinking

Thinking only

126,976

81,920

$0.108

$1.076

qwen3-vl-30b-a3b-instruct

Non-thinking only

129,024

-

$0.431

qwen3-vl-8b-thinking

Thinking only

126,976

81,920

$0.072

$0.717

qwen3-vl-8b-instruct

Non-thinking only

129,024

-

$0.287

More models

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen2.5-vl-72b-instruct

131,072

129,024

Max per image: 16,384

8,192

$2.294

$6.881

No free quota

qwen2.5-vl-32b-instruct

$1.147

$3.441

qwen2.5-vl-7b-instruct

$0.287

$0.717

qwen2.5-vl-3b-instruct

$0.173

$0.517

qwen2-vl-72b-instruct

32,768

30,720

Max per image: 16,384

2,048

$2.294

$6.881

Qwen-Math

This is a language model built on the Qwen model that is specialized for solving mathematical problems. Qwen2.5-Math supports Chinese and English and integrates multiple reasoning methods, such as Chain of Thought (CoT), Program of Thought (PoT), and Tool-Integrated Reasoning (TIR). Usage | API reference | Try it online

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Context

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen2.5-math-72b-instruct

4,096

3,072

3,072

$0.574

$1.721

qwen2.5-math-7b-instruct

$0.144

$0.287

qwen2.5-math-1.5b-instruct

Free for a limited time

Qwen-Coder

Qwen-Coder is an open-source code model from the Qwen series. The latest Qwen3-Coder series has powerful coding agent capabilities. It excels at tool calling, environment interaction, and autonomous programming. The model combines excellent coding skills with general-purpose capabilities. Usage | API reference

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen3-coder-480b-a35b-instruct

262,144

204,800

65,536

Tiered pricing. See details below.

qwen3-coder-30b-a3b-instruct

qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct use tiered pricing based on the number of input tokens in the current request.

Model

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

qwen3-coder-480b-a35b-instruct

0 < Tokens ≤ 32K

$1.5

$7.5

32K < Tokens ≤ 128K

$2.7

$13.5

128K < Tokens ≤ 200K

$4.5

$22.5

qwen3-coder-30b-a3b-instruct

0 < Tokens ≤ 32K

$0.45

$2.25

32K < Tokens ≤ 128K

$0.75

$3.75

128K < Tokens ≤ 200K

$1.2

$6

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

qwen3-coder-480b-a35b-instruct

262,144

204,800

65,536

Tiered pricing. See details below.

1 million tokens each

Valid for 90 days after you activate Model Studio

qwen3-coder-30b-a3b-instruct

qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct use tiered pricing based on the number of input tokens in the current request.

Model

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

qwen3-coder-480b-a35b-instruct

0 < Token ≤ 32K

$1.5

$7.5

32K < Token ≤ 128K

$2.7

$13.5

128K < Token ≤ 200K

$4.5

$22.5

qwen3-coder-30b-a3b-instruct

0 < Token ≤ 32K

$0.45

$2.25

32K < Token ≤ 128K

$0.75

$3.75

128K < Token ≤ 200K

$1.2

$6

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen3-coder-480b-a35b-instruct

262,144

204,800

65,536

Tiered pricing. See the notes below this table.

qwen3-coder-30b-a3b-instruct

qwen2.5-coder-32b-instruct

131,072

129,024

8,192

$0.287

$0.861

qwen2.5-coder-14b-instruct

qwen2.5-coder-7b-instruct

$0.144

$0.287

qwen2.5-coder-3b-instruct

32,768

30,720

Limited-time free trial

qwen2.5-coder-1.5b-instruct

qwen2.5-coder-0.5b-instruct

qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct use tiered pricing based on the number of input tokens in the current request.

Model

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

qwen3-coder-480b-a35b-instruct

0 < token ≤ 32 K

$0.861

$3.441

32 K < token ≤ 128 K

$1.291

$5.161

128 K < token ≤ 200 K

$2.151

$8.602

qwen3-coder-30b-a3b-instruct

0 < token ≤ 32 K

$0.216

$0.861

32 K < token ≤ 128 K

$0.323

$1.291

128 K < token ≤ 200 K

$0.538

$2.151

Text generation - Third-party

DeepSeek

DeepSeek is a large language model from DeepSeek AI. API reference | Try it online

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Context window

Max input

Max chain-of-thought

Max response

Input cost

Output cost

(tokens)

(per 1M tokens)

deepseek-v3.2

685B full-power version
Context cache discounts

131,072

98,304

32,768

65,536

$0.287

$0.431

deepseek-v3.2-exp

685B full-power version

deepseek-v3.1

685B full-power version

$0.574

$1.721

deepseek-r1

685B full-power version
Batch half price

16,384

$2.294

deepseek-r1-0528

685B full-power version

deepseek-v3

671B full-power version
Batch half price

131,072

N/A

$0.287

$1.147

deepseek-r1-distill-qwen-1.5b

Based on Qwen2.5-Math-1.5B

32,768

32,768

16,384

16,384

Free trial for a limited time

deepseek-r1-distill-qwen-7b

Based on Qwen2.5-Math-7B

$0.072

$0.144

deepseek-r1-distill-qwen-14b

Based on Qwen2.5-14B

$0.144

$0.431

deepseek-r1-distill-qwen-32b

Based on Qwen2.5-32B

$0.287

$0.861

deepseek-r1-distill-llama-8b

Based on Llama-3.1-8B

Free trial for a limited time

deepseek-r1-distill-llama-70b

Based on Llama-3.3-70B

Kimi

Kimi-K2 is a large language model launched by Moonshot AI. It has excellent coding and tool-calling capabilities. Usage | Try it online

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Context window

Max input

Max CoT

Max response

Input price

Output price

(tokens)

(per 1M tokens)

kimi-k2-thinking

262,144

229,376

32,768

16,384

$0.574

$2.294

Moonshot-Kimi-K2-Instruct

131,072

131,072

-

8,192

$0.574

$2.294

GLM

The GLM series models are hybrid reasoning models from Zhipu AI that are designed for agents and support two modes: thinking and non-thinking. GLM

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Context window

Max input

Max chain-of-thought

Max response

Input cost

Output cost

(tokens)

(per 1M tokens)

glm-4.7

202,752

169,984

32,768

16,384

Tiered pricing, see the table below.

glm-4.6

The above models use tiered pricing based on input tokens per request.

Model

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

glm-4.7

0<Token<=32K

$0.431

$2.007

32K<Token<=166K

$0.574

$2.294

glm-4.6

0<Token<=32K

$0.431

$2.007

32K<Token<=166K

$0.574

$2.294

The models are not integrated third-party services, but deployed on Model Studio servers.
GLM models have the same prices under both thinking and non-thinking modes.

Image generation

Qwen-Image

The Qwen text-to-image model excels at rendering complex text, especially in Chinese and English. API reference

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Unit price

Free quota

qwen-image-max

Currently has the same capabilities as qwen-image-max-2025-12-30

$0.075/image

Free quota: 100 images for each model

Valid for 90 days after activating Model Studio

qwen-image-max-2025-12-30

$0.075/image

qwen-image-plus

Currently has the same capabilities as qwen-image

$0.03/image

qwen-image-plus-2026-01-09

$0.03/image

qwen-image

$0.035/image

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Unit price

Free quota

qwen-image-max

Currently has the same capabilities as qwen-image-max-2025-12-30

$0.071677/image

No free quota

qwen-image-max-2025-12-30

$0.071677/image

qwen-image-plus

Currently has the same capabilities as qwen-image

$0.028671/image

qwen-image-plus-2026-01-09

$0.028671/image

qwen-image

$0.035/image

Input prompt

Output image

Healing-style hand-drawn poster featuring three puppies playing with a ball on lush green grass, adorned with decorative elements such as birds and stars. The main title “Come Play Ball!” is prominently displayed at the top in bold, blue cartoon font. Below it, the subtitle “Come [Show Off Your Skills]!” appears in green font. A speech bubble adds playful charm with the text: “Hehe, watch me amaze my little friends next!” At the bottom, supplementary text reads: “We get to play ball with our friends again!” The color palette centers on fresh greens and blues, accented with bright pink and yellow tones to highlight a cheerful, childlike atmosphere.

image

Qwen-Image-Edit

The Qwen image editing model supports precise text editing in Chinese and English. It also supports operations such as color adjustment, detail enhancement, style transfer, adding or removing objects, and changing positions and actions. These features enable complex editing of images and text. API reference

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Unit price

Free quota

qwen-image-edit-max

Currently has the same capabilities as qwen-image-edit-max-2026-01-16

$0.075/image

Free quota: 100 images for each model

Valid for 90 days after activating Model Studio

qwen-image-edit-max-2026-01-16

$0.075/image

qwen-image-edit-plus

Currently has the same capabilities as qwen-image-edit-plus-2025-10-30

$0.03/image

qwen-image-edit-plus-2025-12-15

$0.03/image

qwen-image-edit-plus-2025-10-30

$0.03/image

qwen-image-edit

$0.045/image

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Unit price

Free quota

qwen-image-edit-max

Currently has the same capabilities as qwen-image-edit-max-2026-01-16

$0.071677/image

No free quota

qwen-image-edit-max-2026-01-16

$0.071677/image

qwen-image-edit-plus

Currently has the same capabilities as qwen-image-edit-plus-2025-10-30

$0.028671/image

qwen-image-edit-plus-2025-12-15

$0.028671/image

qwen-image-edit-plus-2025-10-30

$0.028671/image

qwen-image-edit

$0.043/image

dog_and_girl (1)

Original image

狗修改图

Make the person bend over and hold the dog's front paw.

image

Original image

image

Change the text on the letter blocks from 'HEALTH INSURANCE' to 'Tomorrow will be better'.

5

Original image

5out

Change the dotted shirt to a light blue shirt.

6

Original image

6out

Change the background to Antarctica.

7

Original image

7out

Create a cartoon-style profile picture of the person.

image

Original image

image

Remove the hair from the dinner plate.

Qwen-MT-Image

The Qwen image translation model supports translating text from images in 11 languages into Chinese or English. It accurately preserves the original layout and content information and provides custom features such as term definition, sensitive word filtering, and image entity detection. API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Unit price

Free quota

qwen-mt-image

$0.000431/image

No free quota

en

Original image

ja

Japanese

es

Portuguese

ar

Arabic

Tongyi - text-to-Image - Z-Image

Tongyi - text-to-image - Z-Image is a lightweight model that quickly generates high-quality images. The model supports Chinese and English text rendering, complex semantic understanding, various styles, and multiple resolutions and aspect ratios. API reference

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

z-image-turbo

Prompt extension disabled (prompt_extend=false): $0.015/image

Prompt extension enabled (prompt_extend=true): $0.03/image

100 images

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Unit price

Free quota

z-image-turbo

Prompt extension disabled (prompt_extend=false): $0.01434/image

Prompt extension enabled (prompt_extend=true): $0.02868/image

No free quota

Input prompt

Output image

Photo of a stylish young woman with short black hair standing confidently in front of a vibrant cartoon-style mural wall. She wears an all-black outfit: a puffed bomber jacket with a ruffled collar, cargo shorts, fishnet tights, and chunky black Doc Martens, with a gold chain dangling from her waist. The background features four colorful comic-style panels: one reads “GRAND STAGE” and includes sneakers and a Gatorade bottle; another displays green Nike sneakers and a slice of pizza; the third reads “HARAJUKU st” with floating shoes; and the fourth shows a blue mouse riding a skateboard with the text “Takeshita WELCOME.” Dominant bright colors include yellow, teal, orange, pink, and green. Speech bubbles, halftone patterns, and playful characters enhance the urban street-art aesthetic. Daylight evenly illuminates the scene, and the ground beneath her feet is white tiled pavement. Full-body portrait, centered composition, slightly tilted stance, direct eye contact with the camera. High detail, sharp focus, dynamic framing.

b16c8008-83c1-4c80-ae22-786a2299bec3-1-转换自-png

Wan text-to-image

The Wan text-to-image model generates high-quality images from text. API reference | Try it online

Global

In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.

Model

Description

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

wan2.6-t2i Recommended

Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.

$0.03/image

No free quota

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Description

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

wan2.6-t2i Recommended

Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.

$0.03/image

50 images

wan2.5-t2i-preview Recommended

Wan 2.5 preview. Removes single-side length limits and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.

$0.03/image

50 images

wan2.2-t2i-plus

Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture.

$0.05/image

100 images

wan2.2-t2i-flash

Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture.

$0.025/image

100 images

wan2.1-t2i-plus

Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details.

$0.05/image

200 images

wan2.1-t2i-turbo

Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed.

$0.025/image

200 images

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Description

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

wan2.6-t2i Recommended

Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.

$0.028671/image

No free quota

wan2.5-t2i-preview Recommended

Wan 2.5 preview. Removes single-side length limits and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.

$0.028671/image

No free quota

wan2.2-t2i-plus

Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture.

$0.02007/image

No free quota

wan2.2-t2i-flash

Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture.

$0.028671/image

No free quota

wanx2.1-t2i-plus

Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details.

$0.028671/image

No free quota

wanx2.1-t2i-turbo

Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed.

$0.020070/image

No free quota

wanx2.0-t2i-turbo

Wan 2.0 Turbo Edition. Excels at textured portraits and creative designs. It is cost-effective.

$0.005735/image

No free quota

Input prompt

Output image

A needle-felted Santa Claus holding a gift and a white cat standing next to him against a background of colorful gifts and green plants, creating a cute, warm, and cozy scene.

image

Wan2.6 image generation and editing

The Wan2.6 image generation model supports image editing and can generate outputs that contain both text and images to meet various generation and integration requirements. API reference.

Global

In Global deployment mode, the access point and data storage are in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.

Model

Unit price

Free quota

wan2.6-image

$0.03/image

No free quota

International

In International deployment mode, the access point and data storage are in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

wan2.6-image

$0.03/image

50 images

Mainland China

In Mainland China deployment mode, the access point and data storage are in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Unit price

Free quota

wan2.6-image

$0.028671/image

No free quota

Wan general image editing 2.5

The Wan2.5 general image editing model supports entity-consistent image editing and multi-image fusion. It accepts text, a single image, or multiple images as input. API reference.

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

wan2.5-i2i-preview

$0.03/image

50 units

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Unit price

Free quota

wan2.5-i2i-preview

$0.028671/image

No free quota

Feature

Input example

Output image

Single-image editing

damotest2023_Portrait_photography_outdoors_fashionable_beauty_409ae3c1-19e8-4515-8e50-b3c9072e1282_2-转换自-png

a26b226d-f044-4e95-a41c-d1c0d301c30b-转换自-png

Change the floral dress to a vintage-style lace long dress with exquisite embroidery details on the collar and cuffs.

Multi-image fusion

image

p1028883

Place the alarm clock from Image 1 next to the vase on the dining table in Image 2.

Wan general image editing 2.1

The Wan2.1 general image editing model performs diverse image editing with simple instructions. It is suitable for scenarios such as outpainting, watermark removal, style transfer, image restoration, and image enhancement. UsageAPI reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Unit Price

Free Quota

wanx2.1-imageedit

$0.020070 per image

No free quota

The general image editing model currently supports the following features:

Model Features

Input image

Input prompt

Output image

Global stylization

image

French picture book style.

image

Local stylization

image

Change the house to a wooden plank style.

image

Instruction-based editing

image

Change the girl's hair to red.

image

Inpainting

Input image

image

Masked image (The white area is the mask)

image

A ceramic rabbit holding a ceramic flower.

Output image

image

Text watermark removal

image

Remove the text from the image.

image

Outpainting

20250319105917

A green fairy.

image

Image super-resolution

Blurry image

image

Image super-resolution.

Clear image

image

Image colorization

image

Blue background, yellow leaves.

image

Line art to image

image

A living room in a minimalist Nordic style.

image

Placeholder Image

image

A cartoon character cautiously peeks out, spying on a brilliant blue gem inside the room.

image

OutfitAnyone

  • Compared to the basic version, the OutfitAnyone-Plus model offers improvements in image definition, clothing texture details, and logo restoration. However, it takes longer to generate images and is suitable for scenarios that are not time-sensitive. API reference | Try it online

  • OutfitAnyone-Image Parsing supports parsing model and clothing images, which can be used for pre-processing and post-processing of OutfitAnyone images. API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Description

Sample input

Sample output

aitryon-plus

OutfitAnyone-Plus

output26

output29

aitryon-parsing-v1

OutfitAnyone image parsing

OutfitAnyone pricing

Service

Model

Unit price

Discount

Tier

OutfitAnyone - Plus

aitryon-plus

$0.071677/image

None

None

OutfitAnyone - Image parsing

aitryon-parsing-v1

$0.000574/image

None

None

Video generation - Wan

Text-to-video

The Wan text-to-video model generates videos from a single sentence. The videos feature rich artistic styles and cinematic quality. API reference | Try it online

Global

In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.

Model

Description

Unit price

Free quota

wan2.6-t2v Recommended

Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files.

720P: $0.1/second

1080P: $0.15/second

No free quota

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Description

Unit price

Free quota (Claim)

Valid for 90 days after you activate Alibaba Cloud Model Studio

wan2.6-t2v Recommended

Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files.

720P: $0.10/second

1080P: $0.15/second

50 seconds

wan2.5-t2v-preview Recommended

Wan 2.5 preview. Supports automatic voiceover and custom audio file input.

480p: $0.05/second

720p: $0.10/second

1080p: $0.15/second

50 seconds

wan2.2-t2v-plus

Wan 2.2 Professional Edition. Significantly improved image detail and motion stability.

480p: $0.02/second

1080p: $0.10/second

50 seconds

wan2.1-t2v-turbo

Wan 2.1 Turbo Edition. Fast generation speed and balanced performance.

$0.036/second

200 seconds

wan2.1-t2v-plus

Wan 2.1 Professional Edition. Generates rich details and higher-quality images.

$0.10/second

200 seconds

US

In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.

Model

Description

Unit price

Free quota

wan2.6-t2v-us Recommended

Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files.

720P: $0.1/second

1080P: $0.15/second

No free quota

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Description

Unit price

Free quota

wan2.6-t2v REcommended

Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files.

720P: $0.086012/second

1080P: 0.143353/second

No free quota

wan2.5-t2v-preview Recommended

Wan 2.5 preview. Supports automatic voiceover and custom audio file input.

480p: $0.043006/second

720p: $0.086012/second

1080p: $0.143353/second

No free quota

wan2.2-t2v-plus

Wan 2.2 Professional Edition. Significantly improved image detail and motion stability.

480p: $0.02007/second

1080p: $0.100347/second

No free quota

wanx2.1-t2v-turbo

Faster generation speed and balanced performance.

$0.034405/second

No free quota

wanx2.1-t2v-plus

Generates richer details and higher-quality images.

$0.100347/second

No free quota

Input prompt

Output video (wan2.6, multi-shot video)

Shot from a low angle, in a medium close-up, with warm tones, mixed lighting (the practical light from the desk lamp blends with the overcast light from the window), side lighting, and a central composition. In a classic detective office, wooden bookshelves are filled with old case files and ashtrays. A green desk lamp illuminates a case file spread out in the center of the desk. A fox, wearing a dark brown trench coat and a light gray fedora, sits in a leather chair, its fur crimson, its tail resting lightly on the edge, its fingers slowly turning yellowed pages. Outside, a steady drizzle falls beneath a blue sky, streaking the glass with meandering streaks. It slowly raises its head, its ears twitching slightly, its amber eyes gazing directly at the camera, its mouth clearly moving as it speaks in a smooth, cynical voice: 'The case was cold, colder than a fish in winter. But every chicken has its secrets, and I, for one, intended to find them '.

Image-to-video - first frame

The Wan image-to-video model uses an input image as the first frame of a video. It then generates the rest of the video based on a prompt. The videos feature rich artistic styles and cinematic quality. API reference | Try it online

Global

In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.

Model

Description

Unit price

Free quota

wan2.6-i2v Recommended

Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files.

720P: $0.1/second

1080P: $0.15/second

No free quota

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Description

Unit price

Free quota (Note)

Validity: Within 90 days after you activate Alibaba Cloud Model Studio

wan2.6-i2v-flash Recommended

Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files.

Outputs video with audio audio=true:

  • 720P: $0.05/second

  • 1080P: $0.075/second

Outputs video without audio audio=false:

  • 720P: $0.025/second

  • 1080P: $0.0375/second

50 seconds

wan2.6-i2v Recommended

Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files.

720P: $0.10/second

1080P: $0.15/second

50 seconds

wan2.5-i2v-preview

Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads.

480P: $0.05/second

720P: $0.10/second

1080P: $0.15/second

50 seconds

wan2.2-i2v-flash

Wan 2.2 Flash Edition. Delivers extremely fast generation speed with significant improvements in visual detail and motion stability.

480P: $0.015/second

720P: $0.036/second

50 seconds

wan2.2-i2v-plus

Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability.

480P: $0.02/second

1080P: $0.10/second

50 seconds

wan2.1-i2v-turbo

Wan 2.1 Turbo Edition. Fast generation speed with balanced performance.

$0.036/second

200 seconds

wan2.1-i2v-plus

Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals.

$0.10/second

200 seconds

US

In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.

Model

Description

Unit price

Free quota

wan2.6-i2v-us Recommended

Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files.

720P: $0.1/second

1080P: $0.15/second

No free quota

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Description

Unit price

Free quota

wan2.6-i2v-flash Recommended

Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files.

Outputs video with audio audio=true:

  • 720P: $0.043006/second

  • 1080P: $0.071676/second

Outputs video without audio audio=false:

  • 720P: $0.021503/second

  • 1080P: $0.035838/second

No free quota

wan2.6-i2v Recommended

Wan 2.6. Introduces a multi-shot narrative feature and supports automatic voiceover and the import of custom audio files.

720P: $0.086012/second

1080P: $0.143353/second

No free quota

wan2.5-i2v-preview

Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads.

480P: $0.043006/second

720P: $0.086012/second

1080P: $0.143353/second

No free quota

wan2.2-i2v-plus

Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability.

480P: $0.02007/second

1080P: $0.100347/second

No free quota

wanx2.1-i2v-turbo

Wan 2.1 Turbo Edition. Fast generation speed with balanced performance.

$0.034405/second

No free quota

wanx2.1-i2v-plus

Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals.

$0.100347/second

No free quota

Input first frame image and audio

Output video (wan2.6, multi-shot video)

rap-转换自-png

Input audio:

Input prompt: A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life from a concrete wall. He raps an English song at high speed while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.

Image-to-video - first and last frames

The Wan first-and-last-frame video model generates a smooth, dynamic video from a prompt. You only need to provide the first and last frame images. The videos feature rich artistic styles and cinematic quality. API reference | Try it online

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Unit price

Free quota (Note)

Validity: 90 days after you activate Model Studio

wan2.2-kf2v-flash

480P: $0.015/second

720P: $0.036/second

1080P: $0.07/second

50 seconds

wan2.1-kf2v-plus

$0.10/second

200 seconds

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Unit price

Free quota (Note)

wan2.2-kf2v-flash

480P: $0.014335/second

720P: $0.028671/second

1080P: $0.068809/second

No free quota

wanx2.1-kf2v-plus

$0.100347/second

No free quota

Example input

Output video

First frame

Last frame

Prompt

first_frame

last_frame

In a realistic style, the camera starts at eye level on a small black cat looking up at the sky, then gradually moves upward to a top-down shot that focuses on the cat's curious eyes.

Reference-to-video

The Wan reference-to-video model uses a character's appearance and voice from an input video and a prompt to generate a new video that maintains character consistency. API reference

Billing rule: Both input and output videos are billed by the second. Failed jobs are not billed and do not consume the free quota.

  • The billable duration of the input video does not exceed 5 seconds. For more information, see Billing and rate limits.

  • The billable duration of the output video is the duration in seconds of the successfully generated video.

Global

In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.

Model

Input price

Output price

Free quota (Note)

wan2.6-r2v

720P: $0.086012/second

1080P: $0.143353/second

720P: $0.1/second

1080P: $0.15/second

No free quota

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Input price

Output price

Free quota (Note)

wan2.6-r2v

720P: $0.10/second

1080P: $0.15/second

720P: $0.10/second

1080P: $0.15/second

50 seconds

Validity: 90 days after you activate Model Studio

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Input price

Output price

Free quota (Note)

wan2.6-r2v

720P: $0.086012/second

1080P: $0.143353/second

720P: $0.086012/second

1080P: $0.143353/second

No free quota

General video editing

The Wan general video editing model supports multimodal inputs, including text, images, and videos. It can perform video generation and general editing tasks. API reference | Try it online

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Unit price

Free quota (Note)

wan2.1-vace-plus

$0.1/s

50 seconds

Validity: Valid for 90 days after Model Studio activation.

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Unit price

Free quota (Note)

wanx2.1-vace-plus

$0.100347/s

No free quota

The general video editing model supports the following features:

Feature

Input reference image

Input prompt

Output video

Multi-image reference

Reference image 1 (reference entity)

image

Reference image 2 (reference background)

image

In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When the girl stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records her wonderful encounter with nature.

Output video

Video repainting

The video shows a black steampunk-style car driven by a gentleman. The car is decorated with gears and copper pipes. The background features a steam-powered candy factory and retro elements, creating a vintage and playful scene.

Local editing

Input video

Input mask image (The white area indicates the editing area)

mask

The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, taking a gentle sip with a relaxed expression. The cafe is tastefully decorated, with soft hues and warm lighting illuminating the area where the lion is.

The content in the editing area is modified based on the prompt.

Video extension

Input first clip (1 second)

A dog wearing sunglasses is skateboarding on the street, 3D cartoon.

Output extended video (5 seconds)

Video outpainting

An elegant lady is passionately playing the violin, with a full symphony orchestra behind her.

Wan - digital human

This feature generates natural-looking videos of people speaking, singing, or performing, based on a single character image and an audio file. To use this feature, you can call the following models in sequence. wan2.2-s2v image detection | wan2.2-s2v video generation

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Description

Unit price

wan2.2-s2v-detect

Checks if an input image meets requirements, such as sufficient definition, a single person, and a frontal view.

$0.000574/image

wan2.2-s2v

Generates a dynamic video of a person from a valid image and an audio clip.

480p: $0.071677/second

720p: $0.129018/second

Sample input

Output video

p1001125-转换自-jpeg

Input audio:

Wan - animate image

Available in standard and professional modes. The model transfers the actions and expressions from a reference video to a character image, generating a video that animates the character from the image. API reference.

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Service

Description

Unit price

Free quota (View)

wan2.2-animate-move

Standard mode wan-std

A cost-effective service with fast generation speeds. Suitable for basic needs, such as simple animation demos.

$0.12/second

The total time for both patterns is 50 seconds.

Professional mode wan-pro

Delivers high animation smoothness and natural transitions for actions and expressions. The output resembles a live-action video.

$0.18/second

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Service

Description

Unit price

Free quota (View)

wan2.2-animate-move

Standard mode wan-std

Fast generation. Ideal for basic needs, such as simple animation demos. Cost-effective.

$0.06/second

No free quota

Professional mode wan-pro

Provides high-quality, smooth animation with natural transitions for actions and expressions. The output is similar to a live-action video.

$0.09/second

Character image

Reference video

Standard video

Output Video (Professional Mode)

move_input_image

Wan - video character swap

Available in standard and professional modes. The model replaces the main character in a video with a character from an image. It preserves the original video's scene, lighting, and hue. API reference.

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Service

Description

Unit price

Free quota (View)

wan2.2-animate-mix

Standard mode wan-std

Generates animations quickly. Ideal for basic requirements, such as simple demos. Highly cost-effective.

$0.18/s

The combined duration of both services is 50 seconds.

Professional mode wan-pro

Produces highly smooth animations with natural transitions for actions and expressions. The result closely resembles a live-action video.

$0.26/s

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Service

Description

Unit price

Free quota (View)

wan2.2-animate-mix

Standard mode wan-std

Generates animations quickly. Ideal for basic requirements, such as simple demos. Highly cost-effective.

$0.09/s

No free quota

Professional mode wan-pro

Produces highly smooth animations with natural transitions for actions and expressions. The result closely resembles a live-action video.

$0.13/s

Character image

Reference video

Standard output video

Professional output video

mix_input_image

AnimateAnyone

This feature generates character motion videos based on a character image and a motion template. To use this feature, you can call the following three models in sequence. AnimateAnyone image detection API details | AnimateAnyone motion template generation | AnimateAnyone video generation API details

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Description

Unit price

animate-anyone-detect-gen2

Detects whether an input image meets the requirements.

$0.000574/image

animate-anyone-template-gen2

Extracts character motion from a video and generates a motion template.

$0.011469/second

animate-anyone-gen2

Generates a character action video from a character image and an action template.

Input: Character image

Input: Motion video

Outputs (generated from the image background)

Outputs Generated by Video Background

04-9_16

Note
  • The preceding example was generated by the Tongyi App, which integrates AnimateAnyone.

  • The content generated by the AnimateAnyone model is video only and does not include audio.

EMO

This feature generates dynamic portrait videos based on a portrait image and a human voice audio file. To use this feature, you can call the following models in sequence. EMO image detection | EMO video generation

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Description

Unit price

emo-detect-v1

Detects whether an input image meets the required specifications. This model can be called directly without deployment.

$0.000574/image

emo-v1

Generates a dynamic portrait video. This model can be called directly without deployment.

  • 1:1 aspect ratio video: $0.011469/second

  • 3:4 aspect ratio video: $0.022937/second

Input: Portrait image and human voice audio file

Output: Dynamic portrait video

Portrait:

上春山

Human voice audio: See the video on the right.

Character video:

Style level: active ("style_level": "active")

LivePortrait

This model quickly and efficiently generates dynamic portrait videos based on a portrait image and a human voice audio file. Compared to the EMO model, it generates videos faster and at a lower cost, but the quality is not as good. To use this feature, you can call the following two models in sequence. LivePortrait image detection | LivePortrait video generation

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Description

Unit price

liveportrait-detect

Detects whether an input image meets the requirements.

$0.000574/image

liveportrait

Generates a dynamic portrait video.

$0.002868/second

Input: Portrait image and voice audio

Output: Animated portrait video

Portrait image:

Emoji男孩

Voice audio: Sourced from the video on the right.

Portrait video:

Emoji

This feature generates dynamic face videos based on a face image and preset facial motion templates. This capability can be used for scenarios such as creating emojis and generating video materials. To use this feature, you can call the following models in sequence. Emoji image detection | Emoji video generation

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Description

Unit price

emoji-detect-v1

Detects whether an input image meets specified requirements.

$0.000574/image

emoji-v1

Generates custom emojis based on a portrait image and a specified emoji template.

$0.011469/second

Input: Portrait image

Output: Dynamic portrait video

image.png

Parameter for the "Happy" emoji template: ("input.driven_id": "mengwa_kaixin")

VideoRetalk

This feature generates a video where the character's lip movements match the input audio, based on a character video and a human voice audio file. To use this feature, you can call the following model. API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Description

Unit price

videoretalk

Synchronizes a character's lip movements with input audio to generate a new video.

$0.011469/second

Video style transform

This model generates videos in different styles that match the semantic description of user-input text, or restyles a user-input video. API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Description

Unit price

video-style-transform

Transforms an input video into styles such as Japanese comic and American comic.

720P

$0.071677/second

540P

$0.028671/second

Input video

Output video (Manga style)

Speech synthesis (text-to-speech)

Qwen speech synthesis

This feature supports multilingual mixed-text input and provides streaming audio output. Usage | API reference

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.

Model

Version

Price

Maximum input characters

Supported languages

Free quota (Note)

qwen3-tts-flash

Same capabilities as qwen3-tts-flash-2025-09-18.

Stable

$0.10 per 10,000 characters

600

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

2,000 characters if you activate Model Studio before 00:00 on November 13, 2025.

10,000 characters if you activate Model Studio on or after 00:00 on November 13, 2025.

Valid for 90 days after you activate Model Studio.

qwen3-tts-flash-2025-11-27

Snapshot

10,000 characters

Valid for 90 days after you activate Model Studio.

qwen3-tts-flash-2025-09-18

Snapshot

2,000 characters if you activate Model Studio before 00:00 on November 13, 2025.

10,000 characters if you activate Model Studio on or after 00:00 on November 13, 2025.

Valid for 90 days after you activate Model Studio.

Billing is based on the number of input characters. The calculation rules are as follows:

  • Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.

  • Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.

Qwen3-TTS-Flash

Model

Version

Price

Max input characters

Supported languages

Free quota (Note)

qwen3-tts-flash

Same capabilities as qwen3-tts-flash-2025-09-18.

Stable

$0.114682/10,000 characters

600

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

No free quota

qwen3-tts-flash-2025-11-27

Snapshot

qwen3-tts-flash-2025-09-18

Snapshot

Billing is based on the number of input characters. The calculation rules are as follows:

  • Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.

  • Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.

Qwen-TTS

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota (Note)

(Tokens)

(per 1,000 tokens)

qwen-tts

This model has the same capabilities as qwen-tts-2025-04-10.

Stable

8,192

512

7,680

$0.230

$1.434

No free quota

qwen-tts-latest

This model always has the same capabilities as the latest snapshot version.

Latest

qwen-tts-2025-05-22

Snapshot

qwen-tts-2025-04-10

Audio is converted to tokens at a rate of 50 tokens per second. Audio clips shorter than 1 second are billed as 50 tokens.

Qwen real-time speech synthesis

This feature supports streaming text input and streaming audio output. It automatically adjusts the speech rate based on the text content and punctuation. Usage | API reference

Qwen3-TTS-VD-Realtime supports real-time speech synthesis with voice design voices but does not support default voices.

Qwen3-TTS-VC-Realtime supports real-time speech synthesis with cloned voices but does not support default voices.

Qwen3-TTS-Flash-Realtime and Qwen-TTS-Realtime support only default voices and do not support cloned or designed voices.

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.

Qwen3-TTS-VD-Realtime

Model

Version

Price

Supported languages

Free quota (Note)

qwen3-tts-vd-realtime-2025-12-16

Snapshot

$0.143353/10,000 characters

Chinese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

10,000 characters

Validity: Valid for 90 days after you activate Model Studio

Billing is based on the number of input characters. The calculation rules are as follows:

  • Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.

  • Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.

Qwen3-TTS-VC-Realtime

Model

Version

Price

Supported languages

Free quota (Note)

qwen3-tts-vc-realtime-2025-11-27

Snapshot

$0.13/10,000 characters

Chinese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

10,000 characters

Valid for 90 days after you activate Model Studio.

Billing is based on the number of input characters. The calculation rules are as follows:

  • Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.

  • Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.

Qwen3-TTS-Flash-Realtime

Model

Version

Price

Supported languages

Free quota (Note)

qwen3-tts-flash-realtime

This model has the same capabilities as qwen3-tts-flash-realtime-2025-09-18.

Stable

$0.13/10,000 characters

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

If you activate Model Studio before 00:00 on November 13, 2025: 2,000 characters

If you activate Model Studio on or after 00:00 on November 13, 2025: 10,000 characters

Valid for 90 days after you activate Model Studio

qwen3-tts-flash-realtime-2025-11-27

Snapshot

10,000 characters

Valid for 90 days after you activate Model Studio

qwen3-tts-flash-realtime-2025-09-18

Snapshot

If you activate Model Studio before 00:00 on November 13, 2025: 2,000 characters

If you activate Model Studio on or after 00:00 on November 13, 2025: 10,000 characters

Valid for 90 days after you activate Model Studio

Billing is based on the number of input characters. The calculation rules are as follows:

  • Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.

  • Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.

Qwen3-TTS-VD-Realtime

Model

Version

Price

Supported languages

Free quota (Note)

qwen3-tts-vd-realtime-2025-12-16

Snapshot

$0.143353 per 10,000 characters

Chinese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

No free quota

Billing is based on the number of input characters. The calculation rules are as follows:

  • Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.

  • Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.

Qwen3-TTS-VC-Realtime

Model

Version

Price

Supported languages

Free quota (Note)

qwen3-tts-vc-realtime-2025-11-27

Snapshot

$0.143353/10,000 characters

Chinese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

No free quota

Billing is based on the number of input characters. The calculation rules are as follows:

  • Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.

  • Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.

Qwen3-TTS-Flash-Realtime

Model

Version

Price

Supported languages

Free quota (Note)

qwen3-tts-flash-realtime

Functionally identical to qwen3-tts-flash-realtime-2025-09-18.

Stable

$0.143353/10,000 characters

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

No free quota

qwen3-tts-flash-realtime-2025-11-27

Snapshot

qwen3-tts-flash-realtime-2025-09-18

Snapshot

Billing is based on the number of input characters. The calculation rules are as follows:

  • Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters.

  • Any other character, such as an English letter, a punctuation mark, or a space, counts as 1 character.

Qwen-TTS-Realtime

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Supported languages

Free quota (Note)

(Tokens)

(per 1,000 tokens)

qwen-tts-realtime

This model has the same capabilities as qwen-tts-realtime-2025-07-15.

Stable

8,192

512

7,680

$0.345

$1.721

Chinese, English

No free quota

qwen-tts-realtime-latest

This model has the same capabilities as qwen-tts-realtime-2025-07-15.

Latest

Chinese, English

qwen-tts-realtime-2025-07-15

Snapshot

Chinese, English

Audio is converted to tokens at a rate of 50 tokens per second. Audio clips shorter than 1 second are billed as 50 tokens.

Qwen voice cloning

Voice cloning uses a model for feature extraction to clone voices without training. You can provide as little as 10 to 20 seconds of audio to generate a highly similar and natural-sounding custom voice. Usage | API reference

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.

Model

Price

Free quota (Note)

qwen-voice-enrollment

$0.01/voice

1,000 voices

Valid for 90 days after you activate Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.

Model

Price

Free quota (Note)

qwen-voice-enrollment

$0.01/voice

No free quota

Qwen voice design

Voice design generates custom voices from text descriptions. It supports multilingual and multi-dimensional voice feature definitions. This feature is suitable for various applications, such as ad dubbing, character creation, and audio content creation. Usage | API reference

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.

Model

Price

Free quota (Note)

qwen-voice-design

$0.20 per voice

10 timbres

Valid for 90 days after you activate Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.

Model

Price

Free quota (Note)

qwen-voice-design

$0.20 per voice

No free quota

CosyVoice speech synthesis

CosyVoice is a next-generation generative speech synthesis model from Tongyi Lab. Built on large-scale pre-trained language models, CosyVoice deeply integrates text understanding with speech generation and supports real-time, streaming text-to-speech synthesis. Usage | API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Price

Free quota (Note)

cosyvoice-v3-plus

$0.286706 per 10,000 characters

No free quota

cosyvoice-v3-flash

$0.14335 per 10,000 characters

cosyvoice-v2

$0.286706 per 10,000 characters

Characters are calculated as follows: Each Chinese character, including simplified and traditional Chinese, Japanese Kanji, and Korean Hanja, counts as 2 characters. Any other character, such as letters, numbers, Japanese Kana, and Korean Hangul, counts as 1 character. Content within SSML tags is not billed.

Speech recognition (speech-to-text) and translation (speech-to-translation)

Qwen3-LiveTranslate-Flash

Qwen3-LiveTranslate-Flash-Realtime

Qwen3-LiveTranslate-Flash-Realtime is a multilingual model for real-time audio and video translation. It recognizes 18 languages and provides real-time audio translations in 10 languages.

Core features:

  • Multilingual support: Supports 18 languages and 6 Chinese dialects, including Chinese, English, French, German, Russian, Japanese, and Korean. It also supports dialects such as Mandarin, Cantonese, and Sichuanese.

  • Vision enhancement: Uses visual content to improve translation accuracy. The model analyzes visual cues, such as lip movements, actions, and on-screen text, to enhance translation accuracy in noisy environments or when speech is ambiguous.

  • 3-second latency: Achieves a simultaneous interpretation latency of as low as 3 seconds.

  • Lossless simultaneous interpretation: Resolves cross-lingual word order issues using semantic unit prediction technology. The quality of real-time translation is comparable to that of offline translation.

  • Natural voice: Generates speech with a natural, human-like voice. The model automatically adjusts its tone and emotion based on the source audio content.

Usage | API reference

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(Tokens)

qwen3-livetranslate-flash-realtime

This model has the same capabilities as qwen3-livetranslate-flash-realtime-2025-09-22.

Stable

53,248

49,152

4,096

1 million tokens for each version

Valid for 90 days after you activate Model Studio

qwen3-livetranslate-flash-realtime-2025-09-22

Snapshot

After the free quota is used, billing for input and output is calculated as follows:

Billing Item

Price (per million tokens)

Input: Audio

$10

Input: Image

$1.3

Billing Items for Output

Price (per million tokens)

Text

$10

Audio

$38

Token calculation:

  • Audio: Each second of input or output audio consumes 12.5 tokens.

  • Image: Each 28 × 28 pixel input consumes 0.5 tokens.

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.

Model

Version

Context window

Max input

Max output

Free quota (Note)

(Tokens)

qwen3-livetranslate-flash-realtime

This model has the same capabilities as qwen3-livetranslate-flash-realtime-2025-09-22.

Stable

53,248

49,152

4,096

No free quota

qwen3-livetranslate-flash-realtime-2025-09-22

Snapshot

Billing for input and output is calculated as follows:

Billing Item Input

Price (per million tokens)

Input: Audio

$9.175

Input image

$1.147

Billable Items for Output

Price (per million tokens)

Text

$9.175

Audio

$34.405

Token calculation:

  • Audio: Each second of input or output audio consumes 12.5 tokens.

  • Image: Each 28 × 28 pixel input consumes 0.5 tokens.

Qwen audio file recognition

Based on the Qwen multimodal foundation model, this feature supports multilingual recognition, singing recognition, and noise rejection. Usage | API reference

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.

Qwen3-ASR-Flash-Filetrans

Model

Version

Supported languages

Supported sample rates

Price

Free quota (Note)

qwen3-asr-flash-filetrans

Provides the same capabilities as qwen3-asr-flash-filetrans-2025-11-17.

Stable

Chinese (Mandarin, Sichuanese, Minnan, Wu, and Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish

Any

$0.000035/second

36,000 seconds (10 hours)

Valid for 90 days after you activate Model Studio

qwen3-asr-flash-filetrans-2025-11-17

Snapshot

Qwen3-ASR-Flash

Model

Version

Supported languages

Supported sample rates

Unit price

Free quota (Note)

qwen3-asr-flash

Same capabilities as qwen3-asr-flash-2025-09-08.

Stable

Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish

Any

$0.000035/second

36,000 seconds (10 hours)

Validity: Valid for 90 days after you activate Model Studio

qwen3-asr-flash-2025-09-08

Snapshot

US

In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region, and inference computing resources are restricted to the United States.

Model

Version

Supported languages

Supported sample rates

Price

Free quota (Note)

qwen3-asr-flash-us

This model provides the same capabilities as qwen3-asr-flash-2025-09-08-us.

Stable

Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish

Any

$0.000035/second

No free quota

qwen3-asr-flash-2025-09-08-us

Snapshot

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.

Qwen3-ASR-Flash-Filetrans

Model

Version

Supported languages

Supported sample rates

Price

Free quota (Note)

qwen3-asr-flash-filetrans

Current equivalent: qwen3-asr-flash-filetrans-2025-11-17

Stable

Chinese (Mandarin, Sichuanese, Minnan, Wu, and Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish

Any

$0.000032/second

No free quota

qwen3-asr-flash-filetrans-2025-11-17

Snapshot

Qwen3-ASR-Flash

Model

Version

Supported languages

Supported sample rates

Price

Free quota (Note)

qwen3-asr-flash

Functionally identical to qwen3-asr-flash-2025-09-08.

Stable

Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish

Any

$0.000032/second

No free quota

qwen3-asr-flash-2025-09-08

Snapshot

Qwen real-time speech recognition

The Qwen real-time speech recognition model provides automatic language detection. It detects 11 types of speech and accurately transcribes audio in complex environments. Usage | API reference

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.

Model

Version

Supported languages

Supported sample rates

Price

Free quota (Note)

qwen3-asr-flash-realtime

Same capabilities as qwen3-asr-flash-realtime-2025-10-27.

Stable

Chinese (Mandarin, Sichuanese, Minnan, Wu, and Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish

8 kHz, 16 kHz

$0.00009/second

36,000 seconds (10 hours)

Valid for 90 days after you activate Model Studio

qwen3-asr-flash-realtime-2025-10-27

Snapshot

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.

Model

Version

Supported languages

Supported sample rates

Price

Free quota (Note)

qwen3-asr-flash-realtime

This model has the same capabilities as qwen3-asr-flash-realtime-2025-10-27.

Stable

Chinese (Mandarin, Sichuanese, Minnan, Wu, and Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish

8 kHz, 16 kHz

$0.000047/second

No free quota

qwen3-asr-flash-realtime-2025-10-27

Snapshot

Paraformer speech recognition

Paraformer is a speech recognition model from Tongyi Lab. It is available in two versions: audio file recognition and real-time speech recognition.

Audio file recognition

Usage | API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Price

Free quota (Note)

paraformer-v2

Chinese (Mandarin, Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghai), English, Japanese, Korean, German, French, Russian

Any

Live stream

aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

$0.000012/second

No free quota

paraformer-8k-v2

Chinese (Mandarin)

8 kHz

Phone calls

Real-time speech recognition

Usage | API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Price

Free quota (Note)

paraformer-realtime-v2

Chinese (Mandarin, Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, and Shanghai), English, Japanese, Korean, German, French, and Russian

You can switch between multiple languages.

Any

Live video streaming and conferences

pcm, wav, mp3, opus, speex, aac, amr

$0.000035/second

No free quota

paraformer-realtime-8k-v2

Chinese (Mandarin)

8 kHz

Call centers and more

Fun-ASR speech recognition

Fun-ASR is a speech recognition model from the Tongyi Fun series. It is available in two versions: audio file recognition and real-time speech recognition.

Audio file recognition

Usage | API reference

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.

Model

Version

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Price

Free quota (Note)

fun-asr

Its capabilities are the same as fun-asr-2025-11-07.

Stable

Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin), Mandarin accents from regions such as Zhongyuan, Southwest, Ji-Lu, Jianghuai, Lan-Yin, Jiao-Liao, Northeast, Beijing, Hong Kong, and Taiwan (including accents from Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia), English, and Japanese

Any

Live stream, phone calls, conference interpretation, and more

aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, and wmv

$0.000035/second

36,000 seconds (10 hours)

Valid for 90 days

fun-asr-2025-11-07

Compared to fun-asr-2025-08-25, this version is optimized for far-field VAD to improve recognition accuracy.

Snapshot

fun-asr-2025-08-25

Chinese (Mandarin), English

fun-asr-mtl

Its capabilities are the same as fun-asr-mtl-2025-08-25.

Stable

Chinese (Mandarin and Cantonese), English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish

fun-asr-mtl-2025-08-25

Snapshot

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.

Model

Version

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Price

Free quota (Note)

fun-asr

Same capabilities as fun-asr-2025-11-07.

Stable

Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin), Mandarin accents from regions such as Zhongyuan, Southwest, Ji-Lu, Jianghuai, Lan-Yin, Jiao-Liao, Northeast, Beijing, Hong Kong, and Taiwan (including accents from Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia), English, and Japanese

Any

Live stream, phone calls, conference interpretation, and more

aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, and wmv

$0.000032/second

No free quota

fun-asr-2025-11-07

Compared to fun-asr-2025-08-25, this version is optimized for far-field VAD to improve recognition accuracy.

Snapshot

fun-asr-2025-08-25

Chinese (Mandarin), English

fun-asr-mtl

Same capabilities as fun-asr-mtl-2025-08-25.

Stable

Chinese (Mandarin and Cantonese), English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish

fun-asr-mtl-2025-08-25

Snapshot

Real-time speech recognition

Usage | API reference

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.

Model

Version

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Price

Free quota (Note)

fun-asr-realtime

The capabilities of this model are the same as fun-asr-realtime-2025-11-07.

Stable

Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin), English, and Japanese. This model also supports Mandarin accents from regions such as Zhongyuan, Southwest, Ji-Lu, Jianghuai, Lan-Yin, Jiao-Liao, Northeast, Beijing, Hong Kong, and Taiwan. Additionally, it supports accents from areas such as Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia.

16 kHz

Live video streaming, video conferencing, call centers, and more

pcm, wav, mp3, opus, speex, aac, and amr

$0.00009/second

36,000 seconds (10 hours)

Valid for 90 days

fun-asr-realtime-2025-11-07

Snapshot

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the China (Beijing) region, and inference computing resources are restricted to Mainland China.

Model

Version

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Price

Free quota (Note)

fun-asr-realtime

This model has the same capabilities as fun-asr-realtime-2025-11-07.

Stable

Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin), English, and Japanese. This model also supports Mandarin accents from regions such as Zhongyuan, Southwest, Ji-Lu, Jianghuai, Lan-Yin, Jiao-Liao, Northeast, Beijing, Hong Kong, and Taiwan. Additionally, it supports accents from areas such as Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia.

16 kHz

Live video streaming, video conferences, call centers, and more

pcm, wav, mp3, opus, speex, aac, and amr

$0.000047/second

No free quota

fun-asr-realtime-2025-11-07

Compared to fun-asr-realtime-2025-09-15, this version is optimized for far-field VAD to improve recognition accuracy.

Snapshot

fun-asr-realtime-2025-09-15

Chinese (Mandarin), English

Text embedding

Text embedding models convert text into numerical representations for tasks such as search, clustering, recommendation, and classification. Billing for these models is based on the number of input tokens. API reference

International

In the international deployment mode, endpoints and data storage are located in the Singapore region. Inference computing resources are scheduled globally, excluding Mainland China.

Model

Embedding dimensions

Batch size

Max tokens per batch (Note)

Supported languages

Price

(Million input tokens)

Free quota

(Note)

text-embedding-v4

Part of the Qwen3-Embedding series

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64

10

8,192

More than 100 major languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, and various programming languages

$0.07

1 million tokens

Valid for 90 days after you activate Model Studio.

text-embedding-v3

1,024 (default), 768, or 512

10

8,192

Over 50 languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian

500,000 tokens

Valid for 90 days after you activate Model Studio.

Mainland China

In the Mainland China deployment mode, endpoints and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model

Embedding dimensions

Batch size

Max tokens per batch (Note)

Supported languages

Price

(Million input tokens)

Free quota

(Note)

text-embedding-v4

Part of the Qwen3-Embedding series
Batch half price

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64

10

8,192

More than 100 major languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, and various programming languages

$0.072

No free quota

Note

Batch size is the max number of texts that a single API call can process. For example, the batch size for text-embedding-v4 is 10. This means a single request can vectorize up to 10 texts, and each text cannot exceed 8,192 tokens. This limit applies to:

  • String array input: The array can contain up to 10 elements.

  • File input: The text file can contain up to 10 lines of text.

Multimodal embedding

A multimodal embedding model converts text, images, and videos into a vector of floating-point numbers. The model is suitable for applications such as video classification, image classification, and image-text retrieval. API reference

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are scheduled globally, excluding Mainland China.

Model

Data type

Embedding dimensions

Unit price (Million input tokens)

Free quota (Note)

tongyi-embedding-vision-plus

float(32)

1,152

$0.09

1 million tokens

Valid for 90 days after you activate Model Studio.

tongyi-embedding-vision-flash

float(32)

768

Image/Video: $0.03

Text: $0.09

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model

Data type

Embedding dimensions

Price (1,000 input tokens)

Free quota (Note)

multimodal-embedding-v1

float(32)

1,024

Free trial

No token limit

Text rerank

This feature is typically used for semantic retrieval. Given a query, it sorts a list of candidate documents in descending order of their semantic relevance. API reference

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Only available in Mainland China (Beijing) region.

Model

Max number of documents

Max input tokens per item

Max input tokens

Supported languages

Price (Million input tokens)

gte-rerank-v2

500

4,000

30,000

More than 50 languages, such as Chinese, English, Japanese, Korean, Thai, Spanish, French, Portuguese, German, Indonesian, and Arabic

$0.115

  • Max input tokens per item: Each query or document is limited to 4,000 tokens. Input that exceeds this limit is truncated.

  • Max number of documents: Each request is limited to 500 documents.

  • Max input tokens: The total number of tokens for all queries and documents in a single request is limited to 30,000.

Domain specific

Intent recognition

The Qwen intent recognition model can quickly and accurately parse user intents in milliseconds and select the appropriate tools to resolve user issues. API reference | Usage

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

tongyi-intent-detect-v3

8,192

8,192

1,024

$0.058

$0.144

Role playing

Qwen's role-playing model is ideal for scenarios that require human-like conversation, such as virtual social interactions, NPCs in games, replicating IP characters, hardware, toys, and in-vehicle systems. Compared to other Qwen models, this model offers enhanced capabilities in character fidelity, conversation progression, and empathetic listening. Usage

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-plus-character-ja

8,192

7,680

512

$0.5

$1.4

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-plus-character

32,768

32,000

4,096

$0.115

$0.287

Retired models

Retired on August 20, 2025

Qwen2

The open-source Qwen2 model from Alibaba Cloud. Usage | API reference | Try it online

Model

Context window

Max input

Max output

Input cost

Output cost

Alternative models

(tokens)

(per 1M tokens)

qwen2-72b-instruct

131,072

128,000

6,144

Free for a limited time

Qwen3, DeepSeek, Kimi, and others

qwen2-57b-a14b-instruct

65,536

63,488

qwen2-7b-instruct

131,072

128,000

Qwen1.5

The open-source Qwen1.5 model from Alibaba Cloud. Usage | API reference | Try it online

Model

Context window

Max input

Max output

Input cost

Output cost

Alternative models

(tokens)

(per 1M tokens)

qwen1.5-110b-chat

8,000

6,000

2,000

Free for a limited time

Qwen3, DeepSeek, Kimi, and others

qwen1.5-72b-chat

qwen1.5-32b-chat

qwen1.5-14b-chat

qwen1.5-7b-chat