All Products
Search
Document Center

Alibaba Cloud Model Studio:Model list

Last Updated:Apr 01, 2026

Flagship models

International

In international deployment mode, endpoints and data storage are both located in Singapore region. Model inference computing resources are dynamically scheduled globally (excluding Chinese mainland).

Qwen3.5-Plus supports text, image, and video inputs. Its text performance is comparable to Qwen3-Max, but faster and more cost-effective. Its multimodal capabilities also significantly outperform the Qwen3-VL series.

Flagship models

new Qwen3-Max

Best for complex tasks, most capable

new Qwen3.5-Plus

Balanced performance, speed, and cost

new Qwen3.5-Flash

Best for simple tasks, fast and cost-effective

Max context window

(tokens)

262,144

1,000,000

1,000,000

Min input price

(per 1M tokens)

$1.2

$0.4

$0.1

Min output price

(per 1M tokens)

$6

$2.4

$0.4

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

Qwen3.5-Plus supports text, image, and video inputs. Its text performance is comparable to Qwen3-Max, but faster and more cost-effective. Its multimodal capabilities also significantly outperform the Qwen3-VL series.

Flagship models

new Qwen3-Max

Best for complex tasks, most capable

new Qwen3.5-Plus

Balanced performance, speed, and cost

new Qwen3.5-Flash

Best for simple tasks, fast and cost-effective

Max context window

(tokens)

262,144

1,000,000

1,000,000

Min input price

(per 1M tokens)

$0.359

$0.115

$0.029

Min output price

(per 1M tokens)

$1.434

$0.688

$0.287

US

In US deployment mode, endpoints and data storage are both located in US (Virginia) region. Model inference computing resources are limited to US only.

Flagship models

new Qwen-Plus

Balanced performance, speed, and cost

new Qwen-Flash

Best for simple tasks, fast and cost-effective

Max context window

(tokens)

1,000,000

1,000,000

Min input price

(per 1M tokens)

$0.4

$0.05

Min output price

(per 1M tokens)

$1.2

$0.4

Chinese mainland

In Chinese mainland deployment mode, endpoints and data storage are both located in Beijing region. Model inference computing resources are limited to Chinese mainland only.

Qwen3.5-Plus supports text, image, and video inputs. Its text performance is comparable to Qwen3-Max, but faster and more cost-effective. Its multimodal capabilities also significantly outperform the Qwen3-VL series.

Flagship models

new Qwen3-Max

Best for complex tasks, most capable

new Qwen3.5-Plus

Balanced performance, speed, and cost

new Qwen3.5-Flash

Best for simple tasks, fast and cost-effective

Max context window

(tokens)

262,144

1,000,000

1,000,000

Min input price

(per 1M tokens)

$0.359

$0.115

$0.029

Min output price

(per 1M tokens)

$1.434

$0.688

$0.287

China (Hong Kong)

In China (Hong Kong) deployment mode, endpoints and data storage are both located in China (Hong Kong). Model inference computing resources are limited to China (Hong Kong) only.

Flagship models

new Qwen3-Max

Best for complex tasks, most capable

new Qwen-Plus

Balanced performance, speed, and cost

new Qwen3.5-Flash

Best for simple tasks, fast and cost-effective

Max context window

(tokens)

262,144

1,000,000

1,000,000

Min input price

(per 1M tokens)

$1.2

$0.4

$0.1

Min output price

(per 1M tokens)

$6

$1.2

$0.4

EU

In EU deployment mode, endpoints and data storage are both located in Germany (Frankfurt). Model inference computing resources are limited to EU only.

Flagship models

new Qwen3-Max

Best for complex tasks, most capable

new Qwen-Plus

Balanced performance, speed, and cost

new Qwen3.5-Flash

Best for simple tasks, fast and cost-effective

Max context window

(tokens)

262,144

1,000,000

1,000,000

Min input price

(per 1M tokens)

$1.2

$0.4

$0.1

Min output price

(per 1M tokens)

$6

$2.4

$0.4

Model overview

International

In the International deployment mode, endpoints and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Category

Subcategory

Description

Text generation

General-purpose large language models

Qwen large language model:

Multimodal models

Visual understanding models (Qwen-Plus, Qwen-VL, QVQ), omni-modal model Qwen-Omni, and real-time multimodal model Qwen-Omni-Realtime

Domain-specific models

Coder models, translation models, role-playing models

Image generation

Text-to-image

  • Qwen text-to-image: Excels at handling complex instructions, rendering Chinese and English text, and generating high-definition, realistic images. Supports selecting different models based on efficiency and quality requirements.

  • Wan text-to-image:

  • Z-Image: A lightweight text-to-image model that quickly generates high-quality images. It supports bilingual (Chinese and English) rendering, complex semantic understanding, and various styles and themes.

Image editing

  • Qwen image editing: Supports prompts in Chinese and English. It can perform complex image and text editing operations such as style transfer, text modification, and object editing. It also supports multi-image fusion, making it suitable for various industrial scenarios.

  • Wan image editing: Suitable for scenarios such as multi-image fusion, style transfer, object detection, image inpainting, and watermark removal. The model series includes: Wan2.6, Wan2.5.

Speech synthesis and recognition

Speech synthesis (text-to-speech)

Qwen speech synthesis and Qwen real-time speech synthesis can convert text to speech. They are suitable for scenarios such as intelligent voice assistants, audiobooks, in-car navigation, and educational tutoring.

Speech recognition and translation

Qwen real-time speech recognition, Qwen audio file recognition, Qwen3-LiveTranslate-Flash-Realtime, and Fun-ASR speech recognition can convert speech to text. They are suitable for scenarios such as real-time meeting transcription, live streaming captions, and call center services.

Video generation

Text-to-video

Generates videos from a single sentence, offering a wide range of styles and high-quality visuals.

Image-to-video

  • First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt.

  • First-and-last-frame-to-video: Generates a smooth, dynamic video from a prompt using provided first and last frame images.

  • Multi-image-to-video: Supports one or more input images. It generates a video based on a prompt by referencing the entities or backgrounds in the images.

Video-to-video

Reference-to-video: Generates a performance video based on a prompt by referencing the character's appearance from an input video or image, and can also reference the timbre from the video.

General-purpose video editing

General-purpose video editing: Performs various video editing tasks based on input text prompts, images, and videos. For example, it can extract motion features from an input video and generate a new video based on a prompt.

Embedding

Text embedding

Converts text into a set of numbers that represent the text. It is suitable for search, clustering, recommendation, and classification tasks.

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

Category

Subcategory

Description

Text generation

Text generation - Qwen

Qwen large language model:

Multimodal models

Visual understanding model Qwen-VL

Domain-specific models

Coder models, translation models

Image generation

Text-to-image

Image editing

  • Wan image editing: Suitable for scenarios such as multi-image fusion, style transfer, object detection, image inpainting, and watermark removal. Model series: Wan2.6.

Video generation

Text-to-video

Generates videos from a single sentence, offering a wide range of styles and high-quality visuals.

Image-to-video

First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt.

Video-to-video

Reference-to-video: Generates a performance video based on a prompt by referencing the character's appearance from an input video or image, and can also reference the timbre from the video.

US

In the US deployment mode, endpoints and data storage are both located in the US (Virginia) region. Model inference compute resources are restricted to the United States.

Category

Subcategory

Description

Text generation

General-purpose large language models

Qwen large language model: Commercial (Qwen-Plus, Qwen-Flash)

Multimodal models

Visual understanding model Qwen-VL

Video generation

Text-to-video

Generates videos from a single sentence, offering a wide range of styles and high-quality visuals.

Image-to-video

First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt.

Speech recognition

Speech recognition

Qwen audio file recognition can convert speech to text. It is suitable for scenarios such as meeting transcription and live streaming captions.

Chinese mainland

In the Chinese mainland deployment mode, endpoints and data storage are both located in the Beijing region. Model inference compute resources are restricted to the Chinese mainland.

Category

Model

Description

Text generation

General-purpose large language models

Multimodal models

Visual understanding models (Qwen-Plus, Qwen-VL, QVQ), omni-modal model Qwen-Omni

Domain-specific models

Coder models, math model, translation models, data mining model, in-depth research model, intention recognition model, role-playing models

Image generation

Text-to-image

  • Qwen text-to-image: Excels at handling complex instructions, rendering Chinese and English text, and generating high-definition, realistic images. Supports selecting different models based on efficiency and quality requirements.

  • Wan text-to-image:

  • Z-Image: A lightweight text-to-image model that quickly generates high-quality images. It supports bilingual (Chinese and English) rendering, complex semantic understanding, and various styles and themes.

Image editing

General-purpose models:

  • Qwen image editing: Supports prompts in Chinese and English. It can perform complex image and text editing operations such as style transfer, text modification, and object editing. It also supports multi-image fusion, making it suitable for various industrial scenarios.

  • Wan image editing: Suitable for scenarios such as multi-image fusion, style transfer, object detection, image inpainting, and watermark removal. The model series includes: Wan2.6, Wan2.5, Wan2.1.

More models: Qwen image translation, OutfitAnyone

Speech synthesis and recognition

Speech synthesis (text-to-speech)

Qwen speech synthesis, Qwen real-time speech synthesis, and CosyVoice speech synthesis can convert text to speech. They are suitable for scenarios such as intelligent voice assistants, audiobooks, in-car navigation, and educational tutoring.

Speech recognition and translation

Qwen real-time speech recognition, Qwen audio file recognition, Fun-ASR speech recognition, and Paraformer speech recognition can convert speech to text. They are suitable for scenarios such as real-time meeting transcription, live streaming captions, and call center services.

Video editing and generation

Text-to-video

Generates videos from a single sentence, offering a wide range of styles and high-quality visuals.

Image-to-video

  • First-frame-to-video: Uses an input image as the first frame and generates a complete video based on a prompt.

  • First-and-last-frame-to-video: Generates a video with a natural transition from a prompt using provided first and last frame images.

  • Multi-image-to-video: Supports one or more input images. It generates a video based on a prompt by referencing the entities or backgrounds in the images.

  • Image + action template to generate dance videos: AnimateAnyone generates dance videos from a person's image and an action video.

  • Image + audio to generate lip-sync videos

    • Wan digital human generates videos from a person's image and audio. It features large, natural movements and supports various frame sizes, such as full-body, half-body, and portrait. It is suitable for singing and performance scenarios.

    • EMO generates videos from a person's image and audio. It features strong lip-sync and facial expression capabilities and supports portrait and half-body frames. It is suitable for close-up shots of people.

    • LivePortrait generates videos from a person's image and audio. It is suitable for voice announcement scenarios.

  • Image + expression template to generate emoji videos: Emoji generates facial emoji videos from a face image and a preset dynamic face template.

Video-to-video

Reference-to-video: Generates a performance video based on a prompt by referencing the character's appearance from an input video or image, and can also reference the timbre from the video.

General-purpose video editing

  • General-purpose video editing: Performs various video editing tasks based on input text prompts, images, and videos. For example, it can extract motion features from an input video and generate a new video based on a prompt.

  • Video lip-sync replacement: VideoRetalk generates videos from a person's video and audio. It is suitable for scenarios such as short video production and video translation.

  • Video style transfer: Video Style Re-rendering can convert videos into styles such as Japanese manga and American comics.

Embedding

Text embedding

Converts text into a set of numbers that represent the text. It is used for search, clustering, recommendation, and classification.

Multimodal embedding

Converts text, images, and speech into a set of numbers. It is used for audio and video classification, image classification, and image-text retrieval.

China (Hong Kong)

In the China (Hong Kong) deployment mode, endpoints and data storage are both located in China (Hong Kong). Model inference compute resources are restricted to China (Hong Kong).

Category

Subcategory

Description

Text generation

General-purpose large language models

Qwen large language model: Commercial (Qwen-Plus, Qwen-Flash)

Multimodal models

Visual understanding model Qwen-VL

Embedding

Text embedding

Converts text into a set of numbers that represent the text. It is suitable for search, clustering, recommendation, and classification tasks.

EU

In the EU deployment mode, endpoints and data storage are both located in Germany (Frankfurt). Model inference compute resources are restricted to the European Union.

Category

Subcategory

Description

Text generation

General-purpose large language models

Qwen large language model: Commercial (Qwen-Plus, Qwen-Flash)

Multimodal models

Visual understanding model Qwen-VL

Domain-specific models

Coder models

Text generation – Qwen

This is the commercial version of the Qwen model. Compared with the open-source version, it offers the latest capabilities and improvements.

The parameter count for commercial models is not disclosed.
Models are updated periodically. To use a fixed version, select a snapshot version. Snapshot versions are typically maintained until one month after the next snapshot version is released.
We recommend prioritizing the stable version or the latest version because the rate limiting conditions are looser.

Qwen-Max

Qwen-Max is the highest-performing model in the Qwen series and excels at complex, multi-step tasks.Usage | Thinking | API reference | Try online

International

In international deployment mode, endpoints and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the China (Mainland) region).

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

Free quota

Note

(tokens)

(per 1M tokens)

qwen3-max

Currently qwen3-max-2026-01-23
Part of the Qwen3 series
Supports tool calling

Stable

Thinking

262,144

258,048

81,920

32,768

Tiered pricing. See details below.

1 million tokens each

Valid for 90 days after activating Model Studio

Non-thinking

-

65,536

qwen3-max-2026-01-23

Thinking mode aka Qwen3-Max-Thinking
Part of the Qwen3 series
Supports tool calling

Snapshot

Thinking

81,920

32,768

Non-thinking

-

65,536

qwen3-max-2025-09-23

Part of the Qwen3 series

Snapshot

Non-thinking only

qwen3-max-preview

Part of the Qwen3 series

Preview

Thinking

81,920

32,768

Non-thinking

-

65,536

The models above use tiered pricing based on the number of input tokens in your request.

Input tokens per request

Input cost (per 1M tokens)

qwen3-max and qwen3-max-preview support context cache.

Output cost (per 1M tokens)

0 < Tokens ≤32K

$1.2

$6

32K < Tokens ≤128K

$2.4

$12

128K < Tokens ≤ 252K

$3

$15

More models

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

Note

(tokens)

(per 1M tokens)

qwen-max

Currently qwen-max-2025-01-25
Batch calls at half price

Stable

32,768

30,720

8,192

$1.6

$6.4

1 million tokens each

Valid for 90 days after activating Model Studio

qwen-max-latest

Always matches the latest snapshot version

Latest

$1.6

$6.4

qwen-max-2025-01-25

Also known as qwen-max-0125, Qwen2.5-Max

Snapshot

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-max

Currently qwen3-max-2025-09-23
Context Cache discount available

Stable

Non-thinking only

262,144

258,048

-

65,536

Tiered pricing. See details below.

None

qwen3-max-2025-09-23

Snapshot

Non-thinking only

qwen3-max-preview

Context Cache discount available

Preview

Thinking

81,920

32,768

Non-thinking

-

65,536

The models above use tiered pricing based on the number of input tokens in your request.

Model

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

CoT + response

qwen3-max

Context Cache discount available

0 < Tokens ≤32K

$0.359

$1.434

32K < Tokens ≤128K

$0.574

$2.294

128K < Tokens ≤252K

$1.004

$4.014

qwen3-max-2025-09-23

0 < Tokens ≤32K

$0.861

$3.441

32K < Tokens ≤128K

$1.434

$5.735

128K < Tokens ≤252K

$2.151

$8.602

qwen3-max-preview

Context Cache discount available

0 < Tokens ≤32K

$0.861

$3.441

32K < Tokens ≤128K

$1.434

$5.735

128K < Tokens ≤252K

$2.151

$8.602

Chinese Mainland

In China (Mainland) deployment mode, endpoints and data storage are both located in the Beijing region. Model inference compute resources are limited to the China (Mainland) region.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + response

(tokens)

(per 1M tokens)

qwen3-max

Currently qwen3-max-2026-01-23
Part of the Qwen3 series
Supports tool calling

Stable

Thinking

262,144

258,048

81,920

32,768

Tiered pricing. See details below.

Non-thinking

-

65,536

qwen3-max-2026-01-23

Thinking mode aka Qwen3-Max-Thinking
Part of the Qwen3 series
Supports tool calling

Snapshot

Thinking

81,920

32,768

Non-thinking

-

65,536

qwen3-max-2025-09-23

Part of the Qwen3 series

Snapshot

Non-thinking only

qwen3-max-preview

Part of the Qwen3 series

Preview

Thinking

81,920

32,768

Non-thinking

-

65,536

The models above use tiered pricing based on the number of input tokens in your request.

Model

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

CoT + response

qwen3-max

Batch calls at half price
Context Cache discount available

0 < Tokens ≤ 32K

$0.359

$1.434

32K < Tokens ≤ 128K

$0.574

$2.294

128K < Tokens ≤ 252K

$1.004

$4.014

qwen3-max-2026-01-23

0 < Tokens ≤ 32K

$0.359

$1.434

32K < Tokens ≤ 128K

$0.574

$2.294

128K < Tokens ≤ 252K

$1.004

$4.014

qwen3-max-2025-09-23

0 < Tokens ≤ 32K

$0.861

$3.441

32K < Tokens ≤ 128K

$1.434

$5.735

128K < Tokens ≤ 252K

$2.151

$8.602

qwen3-max-preview

Context Cache discount available

0 < Tokens ≤ 32K

$0.861

$3.441

32K < Tokens ≤ 128K

$1.434

$5.735

128K < Tokens ≤ 252K

$2.151

$8.602

More models

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-max

Currently qwen-max-2024-09-19
Batch calls at half price

Stable

32,768

30,720

8,192

$0.345

$1.377

qwen-max-latest

Always matches the latest snapshot version
Batch calls at half price

Latest

131,072

129,024

qwen-max-2025-01-25

Also known as qwen-max-0125, Qwen2.5-Max

Snapshot

qwen-max-2024-09-19

Also known as qwen-max-0919

32,768

30,720

$2.868

$8.602

China (Hong Kong)

In China (Hong Kong) deployment mode, endpoints and data storage are both located in China (Hong Kong). Model inference compute resources are limited to China (Hong Kong).

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen3-max

Currently qwen3-max-2026-01-23
Part of the Qwen3 series
Supports tool calling

Stable

Thinking

262,144

258,048

81,920

32,768

Tiered pricing. See details below.

Non-thinking

-

65,536

qwen3-max-2026-01-23

Thinking mode aka Qwen3-Max-Thinking
Part of the Qwen3 series
Supports tool calling

Snapshot

Thinking

81,920

32,768

Non-thinking

-

65,536

The models above use tiered pricing based on the number of input tokens in your request.

Input tokens per request

Input cost (per 1M tokens)

qwen3-max context cache.

Output cost (per 1M tokens)

0 < Tokens ≤32K

$1.2

$6

32K < Tokens ≤128K

$2.4

$12

128K < Tokens ≤252K

$3

$15

EU

In EU deployment mode, endpoints and data storage are both located in Germany (Frankfurt). Model inference compute resources are limited to the EU region.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen3-max

Currently qwen3-max-2026-01-23
Part of the Qwen3 series
Supports tool calling

Stable

Thinking

262,144

258,048

81,920

32,768

Tiered pricing. See details below.

Non-thinking

-

65,536

qwen3-max-2026-01-23

Thinking mode aka Qwen3-Max-Thinking
Part of the Qwen3 series
Supports tool calling

Snapshot

Thinking

81,920

32,768

Non-thinking

-

65,536

The models above use tiered pricing based on the number of input tokens in your request.

Input tokens per request

Input cost (per 1M tokens)

qwen3-max context cache.

Output cost (per 1M tokens)

0 < Tokens ≤32K

$1.2

$6

32K < Tokens ≤128K

$2.4

$12

128K < Tokens ≤252K

$3

$15

qwen3-max-2026-01-23 thinking mode: Compared with the September 23, 2025 snapshot, this version effectively combines thinking and non-thinking modes, significantly improving overall model performance. In thinking mode, the model integrates three tools—web search, web information extraction, and code interpreter—to achieve higher accuracy on complex tasks by incorporating external tools during reasoning.

qwen3-max, qwen3-max-2026-01-23, and qwen3-max-2025-09-23 natively support search agents. For details, see web search.

Qwen-Plus

Qwen-Plus offers balanced capabilities: inference quality, cost, and speed are between Qwen-Max and Qwen-Flash, making it ideal for medium-complexity tasks. Usage | Thinking | API reference | Try online

Qwen3.5 Plus supports text, image, and video inputs. Its performance on plain text tasks is comparable to that of Qwen3 Max, while offering superior efficiency at a lower cost. Its multimodal capabilities are a significant improvement over the Qwen3 VL series.

International

Under International Deployment Mode, access points and data storage are both located in the Singapore region, and model inference compute resources are dynamically scheduled globally (excluding the Chinese Mainland).

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

Qwen3.5-Plus

Currently qwen3.5-plus-2026-02-15
Thinking mode aka enabled by default

Stable

1,000,000

Thinking

983,616

Non-thinking

991,808

65,536

Maximum chain-of-thought length: 81,920

Pricing is tiered. For details, see the note below the table.

1,000,000 tokens each

Valid for 90 days after activating Model Studio

qwen3.5-plus-2026-02-15

Thinking mode aka enabled by default.

Snapshot

Thinking

983,616

Non-thinking

991,808

65,536

Maximum chain-of-thought length: 81,920

qwen-plus

Currently qwen-plus-2025-12-01.
Belongs to the Qwen3 series.
Batch calls at half price

Stable

Thinking

995,904

Non-thinking

997,952

32,768

Maximum chain-of-thought: 81,920.

qwen-plus-latest

Currently qwen-plus-2025-12-01
Part of the Qwen3 series

Latest

Thinking

995,904

Non-thinking

997,952

qwen-plus-2025-12-01

Part of the Qwen3 series

Snapshot

Thinking

995,904

Non-thinking

997,952

qwen-plus-2025-09-11

Part of the Qwen3 series

qwen-plus-2025-07-28

Also known as qwen-plus-0728
Part of the Qwen3 series

qwen-plus-2025-07-14

Also known as qwen-plus-0714
Part of the Qwen3 series

131,072

Thinking

98,304

Non-thinking

129,024

16,384

Max CoT 38,912

$0.4

Thinking

$4

Non-thinking

$1.2

qwen-plus-2025-04-28

Also known as qwen-plus-0428
Part of the Qwen3 series

qwen-plus-2025-01-25

Also known as qwen-plus-0125

129,024

8,192

$1.2

qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen-plus, qwen-plus-latest, qwen-plus-2025-12-01, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 are subject to tiered billing based on the number of input tokens per request.

Qwen3.5-Plus

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤256K

$0.4

$2.4

256K < Tokens ≤1M

$0.5

$3

Qwen-Plus

Input tokens per request

Mode

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤256K

Non-thinking

$0.4

$1.2

Thinking

$4

256K < Tokens ≤1M

Non-thinking

$1.2

$3.6

Thinking

$12

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3.5-plus

Currently qwen3.5-plus-2026-02-15
Thinking mode enabled by default

Stable

Thinking

1,000,000

983,616

81,920

65,536

Tiered pricing. See details below.

None

991,808

-

qwen3.5-plus-2026-02-15

Thinking mode enabled by default

Snapshot

Non-thinking

983,616

81,920

991,808

-

qwen-plus

Currently qwen-plus-2025-12-01
Part of the Qwen3 series

Stable

Thinking

995,904

81,920

32,768

Non-thinking

997,952

-

qwen-plus-2025-12-01

Part of the Qwen3 series

Snapshot

Thinking

995,904

81,920

Non-thinking

997,952

-

qwen-plus-2025-09-11

Part of the Qwen3 series

Thinking

995,904

81,920

Non-thinking

997,952

-

qwen-plus-2025-07-28

Also known as qwen-plus-0728
Part of the Qwen3 series

Thinking

995,904

81,920

Non-thinking

997,952

-

The models above use tiered billing based on the number of input tokens in each request.

Qwen3.5-Plus

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤128K

$0.115

$0.688

128K < Tokens ≤ 256K

$0.287

$1.72

256K < Tokens ≤1M

$0.573

$3.44

Qwen-Plus

Input tokens per request

Mode

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤128K

Non-thinking

$0.115

$0.287

Thinking

$1.147

128K < Tokens ≤256K

Non-thinking

$0.345

$2.868

Thinking

$3.441

256K < Tokens ≤1M

Non-thinking

$0.689

$6.881

Thinking

$9.175

US

In US deployment mode, the access point and data storage are both located in the US (Virginia) region, and model inference compute resources are restricted to the United States.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen-plus-us

Currently qwen-plus-2025-12-01-us.
Part of the Qwen3 series.

Stable

1,000,000

Thinking

995,904

Non-thinking

997,952

32,768

Maximum chain-of-thought length: 81,920

Tiered pricing. See details below.

None

qwen-plus-2025-12-01-us

Part of the Qwen3 series

Snapshot

Thinking

995,904

Non-thinking

997,952

The preceding models are subject to tiered billing based on the number of tokens in each request, and qwen-plus-us supports context cache.

Input tokens per request

Input cost (per 1M tokens)

Mode

Output cost (per 1M tokens)

0 < Tokens ≤256K

$0.4

Non-thinking

$1.2

Thinking

$4

256K < Tokens ≤1M

$1.2

Non-thinking

$3.6

Thinking

$12

Chinese Mainland

In the Chinese Mainland deployment mode, both the endpoint and data storage are in the Beijing region. Model inference computing resources are limited to the Chinese Mainland.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen3.5-plus

Currently qwen3.5-plus-2026-02-15
Thinking mode aka enabled by default
Batch calls at half price

Stable

1,000,000

Thinking

983,616

Non-thinking

991,808

65,536

Max chain-of-thought length: 81,920

Tiered pricing. See details below.

qwen3.5-plus-2026-02-15

Thinking mode aka enabled by default

Snapshot

Thinking

983,616

Non-thinking

991,808

65,536

Max chain-of-thought length: 81,920

qwen-plus

Currently qwen-plus-2025-12-01
Part of the Qwen3 series
Batch calls at half price

Stable

Thinking

995,904

Non-thinking

997,952

32,768

Max chain-of-thought length: 81,920

qwen-plus-latest

Currently qwen-plus-2025-12-01
Part of the Qwen3 series
Batch calls at half price

Latest

Thinking

995,904

Non-thinking

997,952

qwen-plus-2025-12-01

Part of the Qwen3 series

Snapshot

Thinking

995,904

Non-thinking

997,952

qwen-plus-2025-09-11

Part of the Qwen3 series

qwen-plus-2025-07-28

Also known as qwen-plus-0728
Part of the Qwen3 series

qwen-plus-2025-07-14

Also known as qwen-plus-0714
Part of the Qwen3 series

131,072

Thinking

98,304

Non-thinking

129,024

16,384

Max chain-of-thought length: 38,912

$0.115

Thinking

$1.147

Non-thinking

$0.287

qwen-plus-2025-04-28

Also known as qwen-plus-0428
Part of the Qwen3 series

qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen-plus, qwen-plus-latest, qwen-plus-2025-12-01, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 use tiered billing based on the number of input tokens per request.

Qwen3.5-Plus

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 128 K

$0.115

$0.688

128 K < Tokens ≤ 256 K

$0.287

$1.72

256 K < Tokens ≤ 1 M

$0.573

$3.44

Qwen-Plus

Input tokens per request

Mode

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 128 K

Non-thinking

$0.115

$0.287

Thinking

$1.147

128 K < Tokens ≤ 256 K

Non-thinking

$0.345

$2.868

Thinking

$3.441

256 K < Tokens ≤ 1 M

Non-thinking

$0.689

$6.881

Thinking

$9.175

These models support thinking mode and non-thinking mode. Switch between the modes using the enable_thinking parameter. For these models, if you enable thinking mode but the model does not output a thought process, you are billed at the non-thinking mode rate.

More models

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-plus-2025-01-25

Also known as qwen-plus-0125

Snapshot

131,072

129,024

8,192

$0.115

$0.287

qwen-plus-2025-01-12

Also known as qwen-plus-0112

qwen-plus-2024-12-20

Also known as qwen-plus-1220

China (Hong Kong)

In the China (Hong Kong) deployment mode, both the endpoint and data storage are located in China (Hong Kong). Model inference computing resources are also limited to China (Hong Kong).

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-plus

Currently qwen-plus-2025-12-01
Part of the Qwen3 series

Stable

1,000,000

Thinking

995,904

Non-thinking

997,952

32,768

Maximum chain-of-thought: 81,920

Tiered pricing. See details below.

qwen-plus-2025-12-01

Part of the Qwen3 series

Snapshot

These models use tiered billing based on the number of input tokens per request.

Input tokens per request

Mode

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤256K

Non-thinking

$0.4

$1.2

Thinking

$4

256K < Tokens ≤1M

Non-thinking

$1.2

$3.6

Thinking

$12

EU

Under the European Union deployment mode, access points and data storage are located in Germany (Frankfurt). Model inference compute resources are limited to the European Union.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-plus

Currently qwen-plus-2025-12-01
Belongs to the Qwen3 series

Stable

1,000,000

Thinking

995,904

Non-thinking

997,952

32,768

Max CoT 81,920

Tiered pricing. See details below.

qwen-plus-2025-12-01

Belongs to the Qwen3 series

Snapshot

The above models use tiered billing based on the number of input tokens for each request.

Input tokens per request

Mode

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 256K

Non-thinking

$0.4

$1.2

Thinking

$4

256K < Tokens ≤ 1M

Non-thinking

$1.2

$3.6

Thinking

$12

Qwen-Flash

Qwen-Flash is the fastest and most cost-effective model in the Qwen series, designed for simple jobs. Qwen-Flash features flexible tiered pricing, resulting in more reasonable billing compared to Qwen-Turbo. Usage | API reference | Try online | Thinking

International

In International Deployment Mode, access points and data storage are both located in the Singapore region, and model inference compute resources are dynamically scheduled globally (excluding the Chinese Mainland).

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3.5-flash

Currently qwen3.5-flash-2026-02-23.
Thinking mode aka enabled by default.

Stable

Thinking

1,000,000

983,616

81,920

65,536

$0.1

$0.4

1 million tokens each

Valid for 90 days after activating Model Studio

Non-thinking

991,808

-

qwen3.5-flash-2026-02-23

Thinking mode enabled by default

Snapshot

Thinking

983,616

81,920

Non-thinking

991,808

-

qwen-flash

Currently qwen-flash-2025-07-28.
Part of the Qwen3 series.
Batch calls at half price

Stable

Thinking

995,904

81,920

32,768

Tiered pricing. See details below.

Non-thinking

997,952

-

qwen-flash-2025-07-28

Part of the Qwen3 series

Snapshot

Thought

995,904

81,920

Non-thinking

997,952

-

qwen-flash and qwen-flash-2025-07-28 tiered pricing

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤256K

$0.05

$0.4

256K < Tokens ≤1M

$0.25

$2

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

(tokens)

(per 1M tokens)

qwen3.5-flash

Currently qwen3.5-flash-2026-02-23
Thinking mode aka enabled by default.

Stable

Thinking

1,000,000

983,616

81,920

65,536

Tiered pricing. See details below.

Non-thinking

991,808

-

qwen3.5-flash-2026-02-23

Thinking mode enabled by default.

Snapshot

Thinking

983,616

81,920

Non-thinking

991,808

-

qwen-flash

Currently qwen-flash-2025-07-28
Part of the Qwen3 series

Stable

Thought

995,904

81,920

32,768

Non-thinking

997,952

-

qwen-flash-2025-07-28

Part of the Qwen3 series

Snapshot

Thought

995,904

81,920

Non-thinking

997,952

-

The above models use tiered billing based on the number of tokens in this request, and qwen-flash supports context cache.

qwen3.5-flash, qwen3.5-flash-2026-02-23 tiered pricing

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤128K

$0.029

$0.287

128K < Tokens ≤256K

$0.115

$1.147

256K < Tokens ≤1M

$0.172

$1.72

qwen-flash, qwen-flash-2025-07-28 tiered pricing

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤128K

$0.022

$0.216

128K < Tokens ≤256K

$0.087

$0.861

256K < Tokens ≤1M

$0.173

$1.721

US

In the United States deployment mode, endpoints and data storage are both located in the US (Virginia) region. Model inference compute resources are limited to the United States.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen-flash-us

Always the latest snapshot
Part of the Qwen3 series

Stable

Thinking

1,000,000

995,904

81,920

32,768

Pricing is tiered. See the note below the table.

None

Non-thinking

997,952

-

qwen-flash-2025-07-28-us

Part of the Qwen3 series

Snapshot

Thinking

995,904

81,920

Non-thinking

997,952

-

The models above use tiered pricing based on the number of input tokens in each request.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 256K

$0.05

$0.4

256K < Tokens ≤ 1M

$0.25

$2

Chinese Mainland

In the deployment mode for the Chinese Mainland, both the endpoint and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese Mainland.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

(tokens)

(per 1M tokens)

qwen3.5-flash

Currently qwen3.5-flash-2026-02-23
Thinking mode enabled by default
Batch calls at half price

Stable

Thinking

1,000,000

983,616

81,920

65,536

Tiered pricing. See details below.

Non-thinking

991,808

-

qwen3.5-flash-2026-02-23

Thinking mode enabled by default

Snapshot

Thinking

983,616

81,920

Non-thinking

991,808

-

qwen-flash

Currently qwen-flash-2025-07-28
Part of the Qwen3 series
Batch calls at half price

Stable

Thinking

995,904

81,920

32,768

Non-thinking

997,952

-

qwen-flash-2025-07-28

Part of the Qwen3 series

Snapshot

Thinking

995,904

81,920

Non-thinking

997,952

-

The models above use tiered pricing based on the number of input tokens in each request. The qwen3.5-flash and qwen-flash models support context cache Batch calls.

Tiered pricing for qwen3.5-flash and qwen3.5-flash-2026-02-23

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤128K

$0.029

$0.287

128K < Tokens ≤256K

$0.115

$1.147

256K < Tokens ≤1M

$0.172

$1.72

Tiered pricing for qwen-flash and qwen-flash-2025-07-28

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤128K

$0.022

$0.216

128K < Tokens ≤256K

$0.087

$0.861

256K < Tokens ≤1M

$0.173

$1.721

China (Hong Kong)

In the China (Hong Kong) deployment mode, the endpoint and data storage are located in China (Hong Kong), and model inference compute resources are limited to China (Hong Kong).

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

(tokens)

(per 1M tokens)

qwen3.5-flash

Currently qwen3.5-flash-2026-02-23.
Thinking mode aka enabled by default.

Stable

Thinking

1,000,000

983,616

81,920

65,536

$0.1

$0.4

Non-thinking

991,808

-

qwen3.5-flash-2026-02-23

Thinking mode aka enabled by default.

Snapshot

Thinking

983,616

81,920

Non-thinking

991,808

-

EU

In the European Union deployment mode, both the endpoint and data storage are located in Germany (Frankfurt). Model inference compute resources are restricted to within the European Union.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

(tokens)

(per 1M tokens)

qwen3.5-flash

Currently qwen3.5-flash-2026-02-23
Thinking mode enabled by default

Stable

Thinking

1,000,000

983,616

81,920

65,536

$0.1

$0.4

Non-thinking

991,808

-

qwen3.5-flash-2026-02-23

Thinking mode enabled by default

Snapshot

Thinking

983,616

81,920

Non-thinking

991,808

-

Qwen-Turbo

Qwen-Turbo is no longer updated. Replace it with Qwen-Flash. Qwen-Flash uses a flexible tiered pricing model for fairer billing. Usage instructions | API reference | Try online | Deep thinking

International

In international deployment mode, endpoints and data storage are in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding the Chinese Mainland.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen-turbo

Currently qwen-turbo-2025-04-28
Part of the Qwen3 series
Batch calls at half price

Stable

Thinking

131,072

Non-thinking

1,000,000

Thinking

98,304

Non-thinking

1,000,000

16,384

Maximum chain-of-thought length: 38,912

$0.05

Thinking: $0.5

Non-thinking: $0.2

1 million tokens each

Valid for 90 days after activating Model Studio

qwen-turbo-latest

Always matches the latest snapshot version
Part of the Qwen3 series

Latest

$0.05

Thinking: $0.5

Non-thinking: $0.2

qwen-turbo-2025-04-28

Also known as qwen-turbo-0428
Part of the Qwen3 series

Snapshot

qwen-turbo-2024-11-01

Also known as qwen-turbo-1101

1,000,000

1,000,000

8,192

$0.2

Chinese Mainland

In Chinese Mainland deployment mode, endpoints and data storage are in the Beijing region. Model inference compute resources are limited to the Chinese Mainland.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-turbo

Currently qwen-turbo-2025-04-28
Part of the Qwen3 series

Stable

Thinking

131,072

Non-thinking

1,000,000

Thinking

98,304

Non-thinking

1,000,000

16,384

Maximum chain-of-thought length: 38,912

$0.044

Thinking

$0.431

Non-thinking

$0.087

qwen-turbo-latest

Always matches the latest snapshot version
Part of the Qwen3 series

Latest

qwen-turbo-2025-07-15

Also known as qwen-turbo-0715
Part of the Qwen3 series

Snapshot

qwen-turbo-2025-04-28

Also known as qwen-turbo-0428
Part of the Qwen3 series

QwQ

The QwQ reasoning model, trained on the Qwen2.5 model, uses reinforcement learning to significantly improve its model inference capabilities. The model's core metrics for math and code (AIME 24/25, LiveCodeBench) and general metrics (IFEval, LiveBench) are on par with the full-performance version of DeepSeek-R1. Usage

International

In the international deployment mode, endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding the Chinese Mainland.

Model

Version

Context window

Max input

Max CoT

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwq-plus

Stable

131,072

98,304

32,768

8,192

$0.8

$2.4

1 million tokens

Valid for 90 days after activating Model Studio

Chinese Mainland

In the Chinese Mainland deployment mode, endpoints and data storage are located in the Beijing region, and model inference compute resources are limited to the Chinese Mainland.

Model

Version

Context window

Max input

Max CoT

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwq-plus

Currently qwq-plus-2025-03-05.
Batch calls at half price

Stable

131,072

98,304

32,768

8,192

$0.230

$0.574

qwq-plus-latest

Always the latest snapshot.

Latest

qwq-plus-2025-03-05

Also known as qwq-plus-0305.

Snapshot

Qwen-Long

Qwen-Long is the longest-context-length model in the Qwen series. It offers balanced capabilities and lower costs, making it ideal for long-text analysis, information extraction, summarization, and classification tasks.Usage | Try online

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-long-latest

Always matches the latest snapshot version
Batch calls at half price

Stable

10,000,000

10,000,000

32,768

$0.072

$0.287

qwen-long-2025-01-25

Also known as qwen-long-0125

Snapshot

Qwen-Omni

The Qwen-Omni model accepts multimodal inputs—including text, images, audio, and video—and generates responses in text or speech form. It offers multiple expressive, human-like voices and supports multilingual and dialect speech output. You can use it in audio-video chat scenarios such as visual recognition, emotion perception, and education and training.Usage | API reference

International

In the International deployment mode, endpoint and data storage are located in the Singapore region, while model inference computing resources are dynamically scheduled globally (excluding Chinese Mainland).

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

Free quota

Note

(tokens)

qwen3.5-omni-plus

Currently qwen3.5-omni-plus-2026-03-15

Stable

Non-thinking

262,144

196,608

-

65,536

In preview. Model invocation is temporarily free, excluding tool calling fees.

1 million tokens each (all modalities)

Valid for 90 days after activating Model Studio

qwen3.5-omni-plus-2026-03-15

Snapshot

Non-thinking

262,144

196,608

-

65,536

qwen3.5-omni-flash

Currently qwen3.5-omni-flash-2026-03-15

Stable

Non-thinking

262,144

196,608

-

65,536

qwen3.5-omni-flash-2026-03-15

Snapshot

Non-thinking

262,144

196,608

-

65,536

qwen3-omni-flash

Currently qwen3-omni-flash-2025-12-01

Stable

Thinking

65,536

16,384

32,768

16,384

See pricing details below.

Non-thinking

49,152

-

qwen3-omni-flash-2025-12-01

Snapshot

Thinking

65,536

16,384

32,768

16,384

Non-thinking

49,152

-

qwen3-omni-flash-2025-09-15

Also known as qwen3-omni-flash-0915

Snapshot

Thinking

65,536

16,384

32,768

16,384

Non-thinking

49,152

-

Qwen3-Omni-Flash

Input billing item

(per 1M tokens)

Text

$0.43

Audio

$3.81

Image/Video

$0.78

Output billing item

(per 1M tokens)

Text

$1.66 (for text-only input)

$3.06 (when input contains images, video, or audio)

Text + Audio

Not billed in thinking mode

$15.11 (audio)

Text output is not billed.

More models

Model

Version

Context window

Max input

Max output

Free quota

Note

(tokens)

qwen-omni-turbo

Currently qwen-omni-turbo-2025-03-26

Stable

32,768

30,720

2,048

1 million tokens each (all modalities)

Valid for 90 days after activating Model Studio

qwen-omni-turbo-latest

Always matches the latest snapshot
in capabilities

Latest

qwen-omni-turbo-2025-03-26

Also known as qwen-omni-turbo-0326

Snapshot

After the free quota of the commercial model is used up, billing rules for input and output are as follows:

Input billing item

(per 1M tokens)

Text

$0.07

Audio

$4.44

Image/Video

$0.21

Output billing item

(per 1M tokens)

Text

$0.27 (for text-only input)

$0.63 (when input contains images, video, or audio)

Text + Audio

$8.89 (audio)

Text output is not billed.

Chinese Mainland

In the Chinese Mainland deployment mode, endpoint and data storage are located in the Beijing region, and model inference computing resources are limited to Chinese Mainland.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

Free quota

Note

(tokens)

qwen3.5-omni-plus

Currently qwen3.5-omni-plus-2026-03-15

Stable

Non-thinking

262,144

196,608

-

65,536

In preview. Model invocation is temporarily free, excluding tool calling fees.

No free quota

qwen3.5-omni-plus-2026-03-15

Snapshot

Non-thinking

262,144

196,608

-

65,536

qwen3.5-omni-flash

Currently qwen3.5-omni-flash-2026-03-15

Stable

Non-thinking

262,144

196,608

-

65,536

qwen3.5-omni-flash-2026-03-15

Snapshot

Non-thinking

262,144

196,608

-

65,536

qwen3-omni-flash

Currently qwen3-omni-flash-2025-12-01

Stable

Thinking

65,536

16,384

32,768

16,384

See pricing details below.

Non-thinking

49,152

-

qwen3-omni-flash-2025-12-01

Snapshot

Thinking

65,536

16,384

32,768

16,384

Non-thinking

49,152

-

qwen3-omni-flash-2025-09-15

Also known as qwen3-omni-flash-0915

Snapshot

Thinking

65,536

16,384

32,768

16,384

Non-thinking

49,152

-

Qwen3-Omni-Flash

Input billing item

(per 1M tokens)

Text

$0.258

Audio

$2.265

Image/Video

$0.473

Output billing item

(per 1M tokens)

Text

$0.989 (for text-only input)

$1.821 (when input contains images, video, or audio)

Text + Audio

Not billed in thinking mode.

$8.974 (audio)

Text output is not billed.

More models

Model

Version

Context window

Max input

Max output

Free quota

Note

(tokens)

qwen-omni-turbo

Currently qwen-omni-turbo-2025-03-26

Stable

32,768

30,720

2,048

No free quota

qwen-omni-turbo-latest

Always matches the latest snapshot
in capabilities

Latest

qwen-omni-turbo-2025-03-26

Also known as qwen-omni-turbo-0326

Snapshot

qwen-omni-turbo-2025-01-19

Also known as qwen-omni-turbo-0119

Billing rules for input and output are as follows:

Input billing item

(per 1M tokens)

Text

$0.058

Audio

$3.584

Image/Video

$0.216

Output billing item

(per 1M tokens)

Text

$0.230 (for text-only input)

$0.646 (when input contains images, audio, or video)

Text + Audio

$7.168 (audio)

Text output is not billed.

Billing example: A request includes 1,000 tokens of text input and 1,000 tokens of image input, with 1,000 tokens of text output and 1,000 tokens of audio output. The total cost is $0.000058 (text input) + $0.000216 (image input) + $0.007168 (audio output)

Qwen-Omni-Realtime

Compared to Qwen-Omni, Qwen-Omni-Realtime supports streaming audio input and includes built-in VAD (Voice Activity Detection) to automatically detect speech start and end points.Usage | Client-side events | Server-side events

International

In international deployment mode, endpoints and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the China (Mainland) region).

Model

Version

Context window

Max input

Max output

Free quota

Note

(tokens)

qwen3-omni-flash-realtime

Currently qwen3-omni-flash-realtime-2025-12-01

Stable

65,536

49,152

16,384

1 million tokens each, regardless of modality

Valid for 90 days after activating Model Studio

qwen3-omni-flash-realtime-2025-12-01

Snapshot

qwen3-omni-flash-realtime-2025-09-15

After your free quota is used up, billing rules for input and output are as follows:

Input billing item

(per 1M tokens)

Text

$0.52

Audio

$4.57

Image/Video

$0.94

Output billing item

(per 1M tokens)

Text

$1.99 (when input contains text only)

$3.67 (when input contains images, video, or audio)

Text + Audio

$18.13 (for the audio output)

Text output is not billed.

More models

Model

Version

Context window

Max input

Max output

Free quota

Note

(tokens)

qwen-omni-turbo-realtime

Currently qwen-omni-turbo-realtime-2025-05-08

Stable

32,768

30,720

2,048

1 million tokens each, regardless of modality

Valid for 90 days after activating Model Studio

qwen-omni-turbo-realtime-latest

Always the latest snapshot

Latest

qwen-omni-turbo-realtime-2025-05-08

Snapshot

After your free quota is used up, billing rules for input and output are as follows:

Input billing item

(per 1M tokens)

Text

$0.270

Audio

$4.440

Image/Video

$0.840

Output billing item

(per 1M tokens)

Text

$1.070 (when input contains text only)

$2.520 (when input contains images, video, or audio)

Text + Audio

$8.890 (audio)

Text output is not billed.

Chinese Mainland

In China (Mainland) deployment mode, endpoints and data storage are both located in the Beijing region. Model inference compute resources are limited to the China (Mainland) region.

Model

Version

Context window

Max input

Max output

Free quota

Note

(tokens)

qwen3-omni-flash-realtime

Currently qwen3-omni-flash-realtime-2025-12-01

Stable

65,536

49,152

16,384

No free quota

qwen3-omni-flash-realtime-2025-12-01

Snapshot

qwen3-omni-flash-realtime-2025-09-15

After your free quota is used up, billing rules for input and output are as follows:

Input billing item

(per 1M tokens)

Text input

$0.315

Audio input

$2.709

Image input

$0.559

Output billing item

(per 1M tokens)

Text

$1.19 (for text-only input)

$2.179 (when input contains images, video, or audio)

Text + Audio output

$10.766 (for the audio component)

Text output is not billed.

More models

Model

Version

Context window

Max input

Max output

Free quota

Note

(tokens)

qwen-omni-turbo-realtime

Currently qwen-omni-turbo-2025-05-08.

Stable

32,768

30,720

2,048

No free quota

qwen-omni-turbo-realtime-latest

Always the latest snapshot

Latest

qwen-omni-turbo-realtime-2025-05-08

Snapshot

Input and output billing rules are as follows:

Input billing item

(per 1M tokens)

Text input

$0.230

Audio input

$3.584

Image input

$0.861

Output billing item

(per 1M tokens)

Text

$0.918 (for text-only input)

$2.581 (for image/audio input)

Text + Audio output

$7.168 (audio)

Text output is not billed.

QVQ

QVQ is a visual reasoning model that supports visual inputs and chain-of-thought outputs. It shows stronger performance on math, programming, visual analysis, creative tasks, and general-purpose tasks.Usage | Try online

International

In international deployment mode, endpoints and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the China (Mainland) region).

Model

Version

Context window

Max input

Max CoT

Max output

Input cost

Output cost

Free quota

Note

(tokens)

(per 1M tokens)

qvq-max

Currently qwen-max-2025-03-25

Stable

131,072

106,496

Max per image: 16,384

16,384

8,192

$1.2

$4.8

1 million tokens each

Valid for 90 days after activating Model Studio

qvq-max-latest

Always matches the latest snapshot version

Latest

qvq-max-2025-03-25

Also known as qvq-max-0325

Snapshot

Chinese Mainland

In China (Mainland) deployment mode, endpoints and data storage are both located in the Beijing region. Model inference compute resources are limited to the China (Mainland) region.

Model

Version

Context window

Max input

Max CoT

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qvq-max

Compared to qvq-plus, qvq-max provides stronger visual reasoning and instruction-following capabilities, delivering the best performance on more complex tasks.
Currently qwen-max-2025-03-25

Stable

131,072

106,496

Max per image: 16,384

16,384

8,192

$1.147

$4.588

qvq-max-latest

Always matches the latest snapshot version

Latest

qvq-max-2025-05-15

Also known as qvq-max-0515

Snapshot

qvq-max-2025-03-25

Also known as qvq-max-0325

qvq-plus

Currently qwen-plus-2025-05-15

Stable

$0.287

$0.717

qvq-plus-latest

Always matches the latest snapshot version

Latest

qvq-plus-2025-05-15

Also known as qvq-plus-0515

Snapshot

Qwen-VL

Qwen-VL is a text generation model with visual (image) understanding capabilities. It performs OCR (Optical Character Recognition) and further summarizes and infers. For example, it extracts properties from product photos and solves problems based on exercise diagrams. Usage | API reference | Try online

Qwen-VL models are billed based on the total number of input and output tokens. For rules on calculating image tokens, see Image and Video Understanding.

International

In the International deployment mode, access points and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the Chinese Mainland).

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-vl-plus

Currently qwen3-vl-plus-2025-12-19.

Stable

Thinking

262,144

258,048

Max per image: 16,384

81,920

32,768

Tiered pricing. See details below.

1 million tokens

Valid for 90 days after activating Model Studio

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-plus-2025-12-19

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-plus-2025-09-23

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash

Currently qwen3-vl-flash-2025-10-15.

Stable

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2026-01-22

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2025-10-15

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

The above models use tiered pricing based on the number of input tokens per request. Input and output prices are the same for thinking and non-thinking modes. The qwen3-vl-plus and qwen3-vl-flash models support context cache.

qwen3-vl-plus Series

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < tokens ≤ 32K

$0.2

$1.6

32K < tokens ≤ 128K

$0.30

$2.4

128K < tokens ≤ 256K

$0.6

$4.8

qwen3-vl-flash Series

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤32K

$0.05

$0.4

32K < Tokens ≤128K

$0.075

$0.6

128K < Tokens ≤256K

$0.12

$0.96

More models

Qwen-VL-Max
The following models are in the Qwen2.5-VL series. The qwen-vl-max model supports context cache.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen-vl-max

Compared to qwen-vl-plus, it further improves visual reasoning and instruction-following capabilities, and delivers optimal performance in more complex tasks.
Currently qwen-vl-max-2025-08-13.

Stable

131,072

129,024

Max per image: 16,384

8,192

$0.8

$3.2

1 million tokens each.

Valid for 90 days after activating Model Studio

qwen-vl-max-latest

Always the latest snapshot.

Latest

$0.8

$3.2

qwen-vl-max-2025-08-13

Also known as qwen-vl-max-0813.
Visual understanding metrics are comprehensively improved. Mathematical, reasoning, object detection, and multilingual processing capabilities are significantly enhanced.

Snapshot

qwen-vl-max-2025-04-08

Also known as qwen-vl-max-0408.
Belongs to the Qwen2.5-VL series, extends the context to 128K, significantly enhancing mathematical and reasoning capabilities.
Qwen-VL-Plus
The following models are part of the Qwen2.5-VL series. The qwen-vl-plus model supports context cache.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen-vl-plus

Currently qwen-vl-plus-2025-08-15.

Stable

131,072

129,024

Max per image: 16,384

8,192

$0.21

$0.63

1 million tokens

Valid for 90 days after activating Model Studio

qwen-vl-plus-latest

Always the latest snapshot.

Latest

$0.21

$0.63

qwen-vl-plus-2025-08-15

Also known as qwen-vl-plus-0815.
Significantly improves object detection and localization and multilingual processing capabilities.

Snapshot

qwen-vl-plus-2025-05-07

Also known as qwen-vl-plus-0507.
Significantly improves mathematical, reasoning, and surveillance video content understanding capabilities.

qwen-vl-plus-2025-01-25

Also known as qwen-vl-plus-0125.
Belongs to the Qwen2.5-VL series, extends the context to 128K, significantly enhancing image and video understanding capabilities.

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

(tokens)

(per 1M tokens)

qwen3-vl-plus

Currently qwen3-vl-plus-2025-12-19.

Stable

Thinking

262,144

258,048

Max per image: 16,384

81,920

32,768

Tiered pricing. See details below.

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-plus-2025-09-23

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash

Currently qwen3-vl-flash-2025-10-15.

Stable

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2025-10-15

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

The above models use tiered pricing based on the number of input tokens per request. Input and output prices are the same for thinking and non-thinking modes. The qwen3-vl-plus and qwen3-vl-flash models support context cache.

qwen3-vl-plus Series

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32K

$0.143

$1.434

32K < Tokens ≤ 128K

$0.215

$2.15

128K < Tokens ≤ 256K

$0.43

$4.301

qwen3-vl-flash Series

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32K

$0.022

$0.215

32K < Tokens ≤ 128K

$0.043

$0.43

128K < Tokens ≤ 256K

$0.086

$0.859

US

Under the United States deployment mode, access points and data storage are both located in the US (Virginia) region. Model inference compute resources are limited to the United States.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

(tokens)

(per 1M tokens)

qwen3-vl-flash-us

Currently qwen3-vl-flash-2025-10-15-us.

Stable

Thinking

262,144

258,048

Max per image: 16,384

81,920

32,768

Tiered pricing. See details below.

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2026-01-22-us

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2025-10-15-us

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

The models above use tiered pricing based on the number of input tokens for each request. Input and output prices are the same for thinking mode and non-thinking mode. qwen3-vl-flash-us supports context cache.

Input Tokens Per Request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32K

$0.05

$0.4

32K < Tokens ≤ 128K

$0.075

$0.6

128K < Tokens ≤ 256K

$0.12

$0.96

Chinese Mainland

In the Chinese Mainland deployment mode, endpoints and data storage are located in the Beijing region. Model inference compute resources are available only in the Chinese Mainland.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-vl-plus

Matches qwen3-vl-plus-2025-12-19 in capability.
Batch calls at half price

Stable

Thinking

262,144

258,048

Max per image: 16,384

81,920

32,768

Priced in tiers. See the notes below the table.

No free quota

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-plus-2025-12-19

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-plus-2025-09-23

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash

Matches qwen3-vl-flash-2025-10-15 in capability.
Batch calls at half price

Stable

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2026-01-22

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2025-10-15

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

All models above use tiered pricing based on the number of tokens in your request. Input and output costs are identical for thinking and non-thinking modes. The qwen3-vl-plus and qwen3-vl-flash models support context cache.

qwen3-vl-plus series

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32 K

$0.143

$1.434

32 K < Tokens ≤ 128 K

$0.215

$2.15

128 K < Tokens ≤ 256 K

$0.43

$4.301

qwen3-vl-flash series

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32K

$0.022

$0.215

32K < Tokens ≤ 128K

$0.043

$0.43

128K < Tokens ≤ 256K

$0.086

$0.859

More models

Qwen-VL-Max series
The qwen-vl-max-2025-01-25 model and later updates belong to the Qwen2.5-VL series. The qwen-vl-max model supports context cache.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-vl-max

Offers improved visual reasoning and instruction following over qwen-vl-plus. Delivers the best performance on more complex tasks.
Currently qwen-vl-max-2025-08-13.
Batch calls at half price

Stable

131,072

129,024

Max per image: 16,384

8,192

$0.23

$0.574

qwen-vl-max-latest

Always the latest snapshot.
Batch calls at half price

Latest

qwen-vl-max-2025-08-13

Also known as qwen-vl-max-0813
Improves all visual understanding metrics. Significantly boosts performance in math, reasoning, object detection, and multilingual processing.

Snapshot

qwen-vl-max-2025-04-08

Also known as qwen-vl-max-0408
Enhances math and reasoning capabilities.

$0.431

$1.291

qwen-vl-max-2025-04-02

Also known as qwen-vl-max-0402
Significantly improves accuracy on complex math problems.

qwen-vl-max-2025-01-25

Also known as qwen-vl-max-0125

Upgraded to the Qwen2.5-VL series. Extends the context length to 128K and improves image and video understanding.

qwen-vl-max-2024-12-30

Also known as qwen-vl-max-1230

32,768

30,720

Max per image: 16,384

2,048

$0.431

$1.291

qwen-vl-max-2024-11-19

Also known as qwen-vl-max-1119
Qwen-VL-Plus series
qwen-vl-plus-2025-01-25 model and later versions belong to the Qwen2.5-VL series. The qwen-vl-plus model supports context cache.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-vl-plus

Currently qwen-vl-plus-2025-08-15.
Batch calls at half price

Stable

131,072

129,024

Max per image: 16,384

8,192

$0.115

$0.287

qwen-vl-plus-latest

Always the latest snapshot.
Batch calls at half price

Latest

qwen-vl-plus-2025-08-15

Also known as qwen-vl-plus-0815.
Improved object detection, localization, and multilingual processing.

Snapshot

qwen-vl-plus-2025-07-10

Also known as qwen-vl-plus-0710.
Further improved understanding of surveillance video content.

32,768

30,720

Max per image: 16,384

$0.022

$0.216

qwen-vl-plus-2025-05-07

Also known as qwen-vl-plus-0507.
Significantly improved math, reasoning, and surveillance video understanding.

131,072

129,024

Max per image: 16,384

$0.216

$0.646

qwen-vl-plus-2025-01-25

Also known as qwen-vl-plus-0125.

Upgraded to the Qwen2.5-VL series. Extended context length to 128K. Improved image and video understanding.

qwen-vl-plus-2025-01-02

Also known as qwen-vl-plus-0102.

32,768

30,720

Max per image: 16,384

2,048

Hong Kong (China)

In Hong Kong (China) deployment mode, both the endpoint and data storage are located in Hong Kong (China). Model inference uses compute resources only in Hong Kong (China).

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

(tokens)

(per 1M tokens)

qwen3-vl-plus

Currently qwen3-vl-plus-2025-12-19

Stable

Thinking

262,144

258,048

Max per image: 16,384

81,920

32,768

Pricing is tiered. See the note below the table.

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-plus-2025-12-19

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

These models use tiered billing based on the number of input tokens in each request. Input and output costs are the same for thinking and non-thinking modes.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32 K

$0.20

$1.60

32 K < Tokens ≤ 128 K

$0.30

$2.40

128 K < Tokens ≤ 256 K

$0.60

$4.80

EU

In the EU deployment mode, both the endpoint and data storage are located in Germany (Frankfurt). Model inference compute resources are restricted to the European Union.

Model

Version

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

(tokens)

(per 1M tokens)

qwen3-vl-plus

Stable

Thinking

262,144

258,048

Max per image: 16,384

81,920

32,768

Tiered pricing. See details below.

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash

Currently qwen3-vl-flash-2025-10-15

Stable

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

qwen3-vl-flash-2025-10-15

Snapshot

Thinking

258,048

Max per image: 16,384

81,920

Non-thinking

260,096

Max per image: 16,384

-

The models above use tiered pricing based on the number of input tokens in each request. Input and output costs are identical for thinking and non-thinking modes.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < tokens ≤ 32,000

$0.2

$1.6

32,000 < tokens ≤ 128,000

$0.30

$2.4

128,000 < tokens ≤ 256,000

$0.60

$4.8

The Qwen3-VL-Flash-2026-01-22 model effectively integrates thinking mode and non-thinking mode. Compared to the snapshot version from October 15, 2025, it significantly improves overall model performance and enables higher-accuracy inference in business scenarios such as general-purpose visual recognition, security surveillance, store inspection, routine inspection, and photo-based problem solving.

Qwen-OCR

Qwen-OCR is a model specifically designed for text extraction. Compared to Qwen-VL models, it focuses more on extracting text from documents, tables, test questions, and handwritten images. It can recognize multiple languages, including English, French, Japanese, Korean, German, Russian, and Italian.Usage | API reference | Try online

International

In international deployment mode, endpoints and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the China (Mainland) region).

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

Note

(tokens)

(per 1M tokens)

qwen-vl-ocr

Currently qwen-vl-ocr-2025-11-20

Stable

38,192

30,000

Max per image: 30,000

8,192

$0.07

$0.16

1 million tokens each

Valid for 90 days after activating Model Studio

qwen-vl-ocr-2025-11-20

Also known as qwen-vl-ocr-1120
Based on the Qwen3-VL architecture, significantly improving document parsing and text localization capabilities.

Snapshot

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-vl-ocr

Currently qwen-vl-ocr-2025-11-20

Stable

38,192

30,000

Max per image: 30,000

8,192

$0.043

$0.072

qwen-vl-ocr-2025-11-20

Also known as qwen-vl-ocr-1120
Based on the Qwen3-VL architecture, this version significantly improves document parsing and text localization capabilities.

Snapshot

Chinese Mainland

In China (Mainland) deployment mode, endpoints and data storage are both located in the Beijing region. Model inference compute resources are limited to the China (Mainland) region.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

Note

(tokens)

(per 1M tokens)

qwen-vl-ocr

Currently qwen-vl-ocr-2025-11-20
Batch calls at half price

Stable

38,192

30,000

Max per image: 30,000

8,192

$0.043

$0.072

No free quota

qwen-vl-ocr-latest

Always the latest snapshot

Latest

qwen-vl-ocr-2025-11-20

Also known as qwen-vl-ocr-1120
Based on the Qwen3-VL architecture, significantly improving document parsing and text localization capabilities.

Snapshot

qwen-vl-ocr-2025-08-28

Also known as qwen-vl-ocr-0828

34,096

4,096

$0.717

$0.717

qwen-vl-ocr-2025-04-13

Also known as qwen-vl-ocr-0413

qwen-vl-ocr-2024-10-28

Also known as qwen-vl-ocr-1028

Qwen-Math

The Qwen-Math model is a language model designed specifically for solving mathematical problems. Usage | API reference | Try online

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-math-plus

Currently qwen-math-plus-2024-09-19

Stable

4,096

3,072

3,072

$0.574

$1.721

qwen-math-plus-latest

Always matches the latest snapshot version

Latest

qwen-math-plus-2024-09-19

Also known as qwen-math-plus-0919

Snapshot

qwen-math-plus-2024-08-16

Also known as qwen-math-plus-0816

qwen-math-turbo

Currently qwen-math-turbo-2024-09-19

Stable

$0.287

$0.861

qwen-math-turbo-latest

Always matches the latest snapshot version

Latest

qwen-math-turbo-2024-09-19

Also known as qwen-math-turbo-0919

Snapshot

Qwen-Coder

Qwen code models: The latest Qwen3-Coder-Plus series are code generation models built on Qwen3, featuring robust Coding Agent capabilities. They excel at tool calling and environment interaction, enabling autonomous programming with exceptional coding skills while retaining general-purpose functionality. Usage | API reference | Try online

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese Mainland).

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-coder-plus

Currently qwen3-coder-plus-2025-09-23

Stable

1,000,000

997,952

65,536

Tiered pricing. See details below.

1 million tokens each

Valid for 90 days after activating Model Studio

qwen3-coder-plus-2025-09-23

Snapshot

qwen3-coder-plus-2025-07-22

Snapshot

qwen3-coder-flash

Currently qwen3-coder-flash-2025-07-28

Stable

qwen3-coder-flash-2025-07-28

Snapshot

The models above use tiered billing based on the number of input tokens in the current request.

qwen3-coder-plus series

Pricing for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 is shown below. qwen3-coder-plus supports context cache. Input text that hits an implicit cache is billed at 20% of the standard rate. Input text that hits an explicit cache is billed at 10% of the standard rate.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤32K

$1

$5

32K < Tokens ≤128K

$1.8

$9

128K < Tokens ≤256K

$3

$15

256K < Tokens ≤1M

$6

$60

qwen3-coder-flash series

Pricing for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 is shown below. qwen3-coder-flash supports context cache. Input text that hits an implicit cache is billed at 20% of the standard rate. Input text that hits an explicit cache is billed at 10% of the standard rate.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32K

$0.30

$1.50

32K < Tokens ≤ 128K

$0.50

$2.50

128K < Tokens ≤ 256K

$0.80

$4

256K < Tokens ≤ 1M

$1.60

$9.60

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen3-coder-plus

Currently qwen3-coder-plus-2025-09-23.

Stable

1,000,000

997,952

65,536

Tiered pricing. See details below.

qwen3-coder-plus-2025-09-23

Snapshot

qwen3-coder-plus-2025-07-22

Snapshot

qwen3-coder-flash

Currently qwen3-coder-flash-2025-07-28.

Stable

qwen3-coder-flash-2025-07-28

Snapshot

The models above use tiered billing based on the number of input tokens in the current request.

qwen3-coder-plus series

Pricing for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 is shown below. qwen3-coder-plus supports context cache. Input text that hits a cache is billed at 20% of the standard rate.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32K

$0.574

$2.294

32K < Tokens ≤ 128K

$0.861

$3.441

128K < Tokens ≤ 256K

$1.434

$5.735

256K < Tokens ≤ 1M

$2.868

$28.671

qwen3-coder-flash series

Pricing for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 is shown below. qwen3-coder-flash supports context cache. Input text that hits a cache is billed at 20% of the standard rate.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32K

$0.144

$0.574

32K < Tokens ≤ 128K

$0.216

$0.861

128K < Tokens ≤ 256K

$0.359

$1.434

256K < Tokens ≤ 1M

$0.717

$3.584

Chinese Mainland

In Chinese Mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese Mainland.

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen3-coder-plus

Currently qwen3-coder-plus-2025-09-23

Stable

1,000,000

997,952

65,536

Tiered pricing. See details below.

qwen3-coder-plus-2025-09-23

Snapshot

qwen3-coder-plus-2025-07-22

Snapshot

qwen3-coder-flash

Currently qwen3-coder-flash-2025-07-28

Stable

qwen3-coder-flash-2025-07-28

Snapshot

The models above use tiered billing based on the number of input tokens in the current request.

qwen3-coder-plus series

Pricing for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 is shown below. qwen3-coder-plus supports context cache. Input text that hits an implicit cache is billed at 20% of the standard rate. Input text that hits an explicit cache is billed at 10% of the standard rate.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32K

$0.574

$2.294

32K < Tokens ≤ 128K

$0.861

$3.441

128K < Tokens ≤ 256K

$1.434

$5.735

256K < Tokens ≤ 1M

$2.868

$28.671

qwen3-coder-flash series

Pricing for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 is shown below. qwen3-coder-flash supports context cache. Input text that hits an implicit cache is billed at 20% of the standard rate. Input text that hits an explicit cache is billed at 10% of the standard rate.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32K

$0.144

$0.574

32K < Tokens ≤ 128K

$0.216

$0.861

128K < Tokens ≤ 256K

$0.359

$1.434

256K < Tokens ≤ 1M

$0.717

$3.584

More models

Model

Version

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-coder-plus

Currently qwen-coder-plus-2024-11-06.

Stable

131,072

129,024

8,192

$0.502

$1.004

qwen-coder-plus-latest

Always the latest snapshot-coder-plus.

Latest

qwen-coder-plus-2024-11-06

Also known as qwen-coder-plus-1106.

Snapshot

qwen-coder-turbo

Currently qwen-coder-turbo-2024-09-19.

Stable

131,072

129,024

8,192

$0.287

$0.861

qwen-coder-turbo-latest

Always the latest snapshot-coder-turbo.

Latest

qwen-coder-turbo-2024-09-19

Also known as qwen-coder-turbo-0919.

Snapshot

Qwen translation models

Qwen3-MT is a flagship Large Language Model (LLM) for translation and a comprehensive upgrade of Qwen 3. It supports translation between 92 languages, including Chinese, English, Japanese, Korean, French, Spanish, German, Thai, Indonesian, Vietnamese, and Arabic. The model's performance and translation quality are significantly improved. It provides enhanced stability for terminology customization, format preservation, and domain-specific prompting, resulting in more accurate and natural translations. Usage

International

In the international deployment mode, both the endpoint and data storage are located in the Singapore region. Model inference computing resources are dynamically scheduled worldwide, except in the Chinese Mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

Rules

(tokens)

(per 1M tokens)

qwen-mt-plus

Part of Qwen3-MT

16,384

8,192

8,192

$2.46

$7.37

1 million tokens each

Valid for 90 days after activating Model Studio

qwen-mt-flash

Part of Qwen3-MT

$0.16

$0.49

qwen-mt-lite

Part of Qwen3-MT

$0.12

$0.36

qwen-mt-turbo

Part of Qwen3-MT

$0.16

$0.49

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-mt-plus

Part of Qwen3-MT

16,384

8,192

8,192

$0.259

$0.775

qwen-mt-flash

Part of Qwen3-MT

$0.101

$0.280

qwen-mt-lite

Part of Qwen3-MT

$0.086

$0.229

Chinese Mainland

In the Chinese Mainland deployment mode, both the endpoint and data storage are in the Beijing region. Computing resources for model inference are limited to the Chinese Mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-mt-plus

Part of Qwen3-MT

16,384

8,192

8,192

$0.259

$0.775

qwen-mt-flash

Part of Qwen3-MT

$0.101

$0.280

qwen-mt-lite

Part of Qwen3-MT

$0.086

$0.229

qwen-mt-turbo

Part of Qwen3-MT

$0.101

$0.280

Qwen-MT

You can use the Qwen data mining model to extract structured information from documents for data annotation, content moderation, and other tasks. Usage | API reference

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(tokens)

(per 1M tokens)

qwen-doc-turbo

262,144

253,952

32,768

$0.087

$0.144

No free quota

Qwen-Deep-Research

The Qwen deep research model breaks down complex problems, performs inference and analysis using web search, and generates research reports. Usage | API reference

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1,000 tokens)

qwen-deep-research

1,000,000

997,952

32,768

$0.007742

$0.023367

Text generation - Qwen open-source edition

  • In model names, xxb indicates parameter scale. For example, qwen2-72b-instruct has 72 billion (72B) parameters.

  • Model Studio supports calling Qwen's open-source models without requiring local deployment. For open-source models, use Qwen3 and Qwen2.5.

Qwen3.5

Accepts text, image, and video input. Performs on par with Qwen3 Max for plain text tasks—faster and more cost-effective. Offers significant improvements in multimodal capabilities compared to the Qwen3 VL series.

Model

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

Free quota

Note

(tokens)

(per 1M tokens)

qwen3.5-397b-a17b

Default: Thinking

Thinking

262,144

258,048

81,920

65,536

Tiered pricing. See details below.

1 million tokens each

Valid for 90 days after activating Model Studio

Available in international regions only

Non-thinking

260,096

-

qwen3.5-122b-a10b

Default: Thinking

Thinking

262,144

258,048

81,920

65,536

Non-thinking

260,096

-

qwen3.5-27b

Default: Thinking

Thinking

262,144

258,048

81,920

65,536

Non-thinking

260,096

-

qwen3.5-35b-a3b

Default: Thinking

Thinking

262,144

258,048

81,920

65,536

Non-thinking

260,096

-

qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, and qwen3.5-35b-a3b use tiered pricing based on the number of input tokens per request.

International

Model

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

qwen3.5-397b-a17b

0 < tokens ≤ 256K

$0.60

$3.6

qwen3.5-122b-a10b

$0.40

$3.2

Qwen3.5-27B

$0.3

$2.40

Qwen3.5-35B-A3B

$0.25

$2

Global

Model

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

qwen3.5-397b-a17b

0 < tokens ≤ 128K

$0.172

$1.032

128K < tokens ≤ 256K

$0.43

$2.58

qwen3.5-122b-a10b

0 < tokens ≤ 128K

$0.115

$0.917

128K < tokens ≤ 256K

$0.287

$2.294

qwen3.5-27b

0 < tokens ≤ 128K

$0.086

$0.688

128K < tokens ≤ 256K

$0.258

$2.064

qwen3.5-35b-a3b

0 < tokens ≤ 128K

$0.057

$0.459

128K < tokens ≤ 256K

$0.229

$1.835

Chinese Mainland

Model

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

qwen3.5-397b-a17b

0 < tokens ≤ 128K

$0.172

$1.032

128K < tokens ≤ 256K

$0.43

$2.58

qwen3.5-122b-a10b

0 < tokens ≤ 128K

$0.115

$0.917

128K < tokens ≤ 256K

$0.287

$2.294

qwen3.5-27b

0 < tokens ≤ 128K

$0.086

$0.688

128K < tokens ≤ 256K

$0.258

$2.064

qwen3.5-35b-a3b

0 < tokens ≤ 128K

$0.057

$0.459

128K < tokens ≤ 256K

$0.229

$1.835

Qwen3

The qwen3-next-80b-a3b-thinking model, released in September 2025, supports only thinking mode. It improves instruction following compared with qwen3-235b-a22b-thinking-2507 and generates more concise summaries.

The qwen3-next-80b-a3b-instruct model, released in September 2025, supports only non-thinking mode. It improves Chinese understanding, logical reasoning, and text generation compared with qwen3-235b-a22b-instruct-2507.

The qwen3-235b-a22b-thinking-2507 and qwen3-30b-a3b-thinking-2507 models, released in July 2025, support only thinking mode. They are upgrades of qwen3-235b-a22b (thinking mode) and qwen3-30b-a3b (thinking mode).

The qwen3-235b-a22b-instruct-2507 and qwen3-30b-a3b-instruct-2507 models, released in July 2025, support only non-thinking mode. They are upgrades of qwen3-235b-a22b (non-thinking mode) and qwen3-30b-a3b (non-thinking mode).

The Qwen3 model, released in April 2025, supports both thinking mode and non-thinking mode. Use the enable_thinking parameter to switch between them. Qwen3 also delivers major capability improvements:

  1. Inference capability: Outperforms QwQ and same-size non-inference models on math, coding, and logical reasoning benchmarks. Matches top industry performance at this scale.

  2. Human preference capability: Improves creative writing, role assumption, multi-turn conversation, and instruction following. General capabilities exceed those of same-size models.

  3. Agent capability: Leads the industry in both thinking and non-thinking modes. Enables precise external tool calling.

  4. Multi-language capability: Supports over 100 languages and dialects. Translation, instruction understanding, and commonsense reasoning all improve significantly.

    Languages supported

    English

    Simplified Chinese

    Traditional Chinese

    French

    Spanish

    Arabic (uses Arabic script). Official language in many Arab countries.

    Russian (uses Cyrillic script). Official language in Russia and other countries.

    Portuguese (uses Latin script). Official language in Portugal, Brazil, and other Portuguese-speaking countries.

    German (uses Latin script). Official language in Germany, Austria, and other areas.

    Italian (uses Latin script). Official language in Italy, San Marino, and parts of Switzerland.

    Dutch (uses Latin script). Official language in the Netherlands, parts of Belgium (Flemish region), and Suriname.

    Danish (uses Latin script). Official language in Denmark.

    Irish (uses Latin script). One of the official languages of Ireland.

    Welsh (uses Latin script). An official language in Wales.

    Finnish (uses Latin script). Official language in Finland.

    Icelandic (uses Latin script). Official language in Iceland.

    Swedish (uses Latin script). Official language in Sweden.

    Norwegian Nynorsk (uses Latin script). An official written standard in Norway, used alongside Norwegian Bokmål.

    Norwegian Bokmål (uses Latin script). The primary official written standard used in Norway.

    Japanese (uses Japanese script). Official language in Japan.

    Korean (uses Hangul). Official language in South Korea and North Korea.

    Vietnamese (uses Latin script). Official language in Vietnam.

    Thai (uses Thai script). Official language in Thailand.

    Indonesian (uses Latin script). Official language in Indonesia.

    Malay (uses Latin script). The official language of Malaysia and a major language in surrounding areas.

    Burmese (uses Burmese script). Official language in Myanmar.

    Tagalog (uses Latin script). A major language in the Philippines.

    Khmer (uses Khmer script). Official language in Cambodia.

    Lao (uses Lao script). Official language in Laos.

    Hindi (uses Devanagari script). An official language in India.

    Bengali (uses Bengali script). Official language in Bangladesh and West Bengal, India.

    Urdu (uses Arabic script). An official language in Pakistan and is also spoken in India.

    Nepali (uses Devanagari script). Official language in Nepal.

    Hebrew (uses Hebrew script). Official language in Israel.

    Turkish (uses Latin script). Official language in Türkiye and Northern Cyprus.

    Persian (uses Arabic script). Official language in Iran, Tajikistan, and other areas.

    Polish (uses Latin script). Official language in Poland.

    Ukrainian (uses Cyrillic script). Official language in Ukraine.

    Czech (uses Latin script). Official language in the Czech Republic.

    Romanian (uses Latin script). Official language in Romania and Moldova.

    Bulgarian (uses Cyrillic script). Official language in Bulgaria.

    Slovak (uses Latin script). Official language in Slovakia.

    Hungarian (uses Latin script). Official language in Hungary.

    Slovenian (uses Latin script). Official language in Slovenia.

    Latvian (uses Latin script). Official language in Latvia.

    Estonian (uses Latin script). Official language in Estonia.

    Lithuanian (uses Latin script). Official language in Lithuania.

    Belarusian (uses Cyrillic script). An official language in Belarus.

    Greek (uses Greek script). Official language in Greece and Cyprus.

    Croatian (uses Latin script). Official language in Croatia.

    Macedonian (uses Cyrillic script). Official language in North Macedonia.

    Maltese (uses Latin script). Official language in Malta.

    Serbian (uses Cyrillic script). Official language in Serbia.

    Bosnian (uses Latin script). An official language in Bosnia and Herzegovina.

    Georgian (uses Georgian script). Official language in Georgia.

    Armenian (uses Armenian script). Official language in Armenia.

    Azerbaijani (uses Latin script). Official language in Azerbaijan.

    Kazakh (uses Cyrillic script). Official language in Kazakhstan.

    Uzbek (uses Latin script). Official language in Uzbekistan.

    Tajik (uses Cyrillic script). Official language in Tajikistan.

    Swahili (uses Latin script). A lingua franca and official language in many East African countries.

    Afrikaans (uses Latin script). An official language in South Africa, also spoken in Namibia.

    Cantonese (uses Traditional Chinese characters). The main language spoken in Guangdong Province, Hong Kong, and Macau.

    Luxembourgish (uses Latin script). An official language of Luxembourg, also spoken in parts of Germany.

    Limburgish (uses Latin script). Spoken mainly in the Netherlands, Belgium, and parts of Germany.

    Catalan (uses Latin script). An official language in Catalonia, also spoken in other parts of Spain.

    Galician (uses Latin script). An official language in Galicia, Spain.

    Asturian (uses Latin script). Spoken mainly in Asturias, Spain.

    Basque (uses Latin script). Spoken in the Basque regions of Spain and France. It is an official language in the Basque Autonomous Community of Spain.

    Occitan (uses Latin script). Spoken mainly in southern France.

    Venetian (uses Latin script). Spoken mainly in the Veneto region of Italy.

    Sardinian (uses Latin script). Spoken mainly in Sardinia, Italy.

    Sicilian (uses Latin script). Spoken mainly in Sicily, Italy.

    Friulian (uses Latin script). Spoken mainly in Friuli-Venezia Giulia, Italy.

    Lombard (uses Latin script). Spoken mainly in Lombardy, Italy.

    Ligurian (uses Latin script). Spoken mainly in Liguria, Italy.

    Faroese (uses Latin script). An official language of the Faroe Islands.

    Tosk Albanian (uses Latin script). Spoken mainly in southern Albania.

    Silesian (uses Latin script). Spoken mainly in Poland.

    Bashkir (uses Cyrillic script). Spoken mainly in Bashkortostan, Russia.

    Tatar (uses Cyrillic script). Spoken mainly in Tatarstan, Russia.

    Mesopotamian Arabic (uses Arabic script). Spoken mainly in Iraq.

    Najdi Arabic (uses Arabic script). Spoken mainly in the Najd region of Saudi Arabia.

    Egyptian Arabic (uses Arabic script). Spoken mainly in Egypt.

    Levantine Arabic (uses Arabic script). Spoken mainly in Syria and Lebanon.

    Ta'izzi-Adeni Arabic (uses Arabic script). Spoken mainly in Yemen and the Hadhramaut region of Saudi Arabia.

    Dari (uses Arabic script). An official language in Afghanistan.

    Tunisian Arabic (uses Arabic script). Spoken mainly in Tunisia.

    Moroccan Arabic (uses Arabic script). Spoken mainly in Morocco.

    Kabuverdianu (uses Latin script). Spoken mainly in Cape Verde.

    Tok Pisin (uses Latin script). A major lingua franca in Papua New Guinea.

    Eastern Yiddish (uses Hebrew script). Spoken mainly in Jewish communities.

    Sindhi (uses Arabic script). An official language in Sindh Province, Pakistan.

    Sinhala (uses Sinhala script). An official language in Sri Lanka.

    Telugu (uses Telugu script). An official language in Andhra Pradesh and Telangana, India.

    Punjabi (uses Gurmukhi script). Spoken in Punjab, India. It is an official language of India.

    Tamil (uses Tamil script). An official language in Tamil Nadu, India, and Sri Lanka.

    Gujarati (uses Gujarati script). An official language in Gujarat, India.

    Malayalam (uses Malayalam script). An official language in Kerala, India.

    Marathi (uses Devanagari script). An official language in Maharashtra, India.

    Kannada (uses Kannada script). An official language in Karnataka, India.

    Magahi (uses Devanagari script). Spoken mainly in Bihar, India.

    Oriya (uses Odia script). An official language in Odisha, India.

    Awadhi (uses Devanagari script). Spoken mainly in Uttar Pradesh, India.

    Maithili (uses Devanagari script). Spoken in Bihar, India, and the Terai region of Nepal. It is an official language of India.

    Assamese (uses Bengali script). An official language in Assam, India.

    Chhattisgarhi (uses Devanagari script). Spoken mainly in Chhattisgarh, India.

    Bhojpuri (uses Devanagari script). Spoken in parts of India and Nepal.

    Minangkabau (uses Latin script). Spoken mainly in Sumatra, Indonesia.

    Balinese (uses Latin script). Spoken mainly in Bali, Indonesia.

    Javanese (uses Latin script; also traditionally uses Javanese script). Widely spoken in Java, Indonesia.

    Banjar (uses Latin script). Spoken mainly in Kalimantan, Indonesia.

    Sundanese (uses Latin script; traditionally uses Sundanese script). Spoken mainly in western Java, Indonesia.

    Cebuano (uses Latin script). Spoken mainly in the Cebu region of the Philippines.

    Pangasinan (uses Latin script). Spoken mainly in Pangasinan Province, Philippines.

    Iloko (uses Latin script). Spoken mainly in the Philippines.

    Waray (Philippines) (uses Latin script). Spoken mainly in the Philippines.

    Haitian Creole (uses Latin script). An official language in Haiti.

    Papiamento (uses Latin script). Spoken mainly in Caribbean areas such as Aruba and Curaçao.

  5. Response format fixes: Resolves response format issues from earlier versions, such as malformed Markdown, mid-response truncation, and incorrect boxed output.

The open-source Qwen3 model, released in April 2025, does not support non-streaming output in thinking mode.
If you enable thinking mode for the open-source Qwen3 model but it does not output the thinking process, billing applies at the non-thinking mode rate.

Thinking | Non-thinking | Usage

International

In international deployment mode, endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese Mainland).

Model

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

Free quota

Note

(tokens)

(per 1M tokens)

qwen3-next-80b-a3b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.15

$1.2

1 million tokens each

Valid for 90 days after activating Model Studio

qwen3-next-80b-a3b-instruct

Thinking not supported

129,024

-

qwen3-235b-a22b-thinking-2507

Thinking only

126,976

81,920

$0.23

$2.3

qwen3-235b-a22b-instruct-2507

Thinking not supported

129,024

-

$0.92

qwen3-30b-a3b-thinking-2507

Thinking only

126,976

81,920

$0.2

$2.4

qwen3-30b-a3b-instruct-2507

Thinking not supported

129,024

-

$0.8

qwen3-235b-a22b

This model and the following models were all released in April 2025.

Non-thinking

129,024

-

16,384

$0.7

$2.8

Thinking

98,304

38,912

$8.4

qwen3-32b

Non-thinking

129,024

-

$0.16

$0.64

Thinking

98,304

38,912

qwen3-30b-a3b

Non-thinking

129,024

-

$0.2

$0.8

Thinking

98,304

38,912

$2.4

qwen3-14b

Non-thinking

129,024

-

8,192

$0.35

$1.4

Thinking

98,304

38,912

$4.2

qwen3-8b

Non-thinking

129,024

-

$0.18

$0.7

Thinking

98,304

38,912

$2.1

qwen3-4b

Non-thinking

129,024

-

$0.11

$0.42

Thinking

98,304

38,912

$1.26

qwen3-1.7b

Non-thinking

32,768

30,720

-

$0.42

Thinking

28,672

The sum of the input values cannot exceed 30,720.

$1.26

qwen3-0.6b

Non-thinking

30,720

-

$0.42

Thinking

28,672

The sum of the input must not exceed 30,720.

$1.26

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

The qwen3-32b, qwen3-14b, and qwen3-8b models currently support global deployment mode only in the US (Virginia) region.

Model

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-next-80b-a3b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.144

$1.434

No free quota

qwen3-next-80b-a3b-instruct

Thinking not supported

129,024

-

$0.574

qwen3-235b-a22b-thinking-2507

Thinking only

126,976

81,920

$0.23

$2.3

qwen3-235b-a22b-instruct-2507

Thinking not supported

129,024

-

$0.92

qwen3-30b-a3b-thinking-2507

Thinking only

126,976

81,920

$0.108

$1.076

qwen3-30b-a3b-instruct-2507

Thinking not supported

129,024

-

$0.431

qwen3-235b-a22b

Non-thinking

129,024

-

16,384

$0.287

$1.147

Thinking

98,304

38,912

$2.868

qwen3-32b

Non-thinking

129,024

-

$0.16

$0.64

Thinking

98,304

38,912

qwen3-30b-a3b

Non-thinking

129,024

-

$0.108

$0.431

Thinking

98,304

38,912

$1.076

qwen3-14b

Non-thinking

129,024

-

8,192

$0.144

$0.574

Thinking

98,304

38,912

$1.434

qwen3-8b

Non-thinking

129,024

-

$0.072

$0.287

Thinking

98,304

38,912

$0.717

Chinese Mainland

In Chinese Mainland deployment mode, endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese Mainland.

Model

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-next-80b-a3b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.144

$1.434

No free quota

qwen3-next-80b-a3b-instruct

Thinking not supported

129,024

-

$0.574

qwen3-235b-a22b-thinking-2507

Thinking only

126,976

81,920

$0.287

$2.868

qwen3-235b-a22b-instruct-2507

Thinking not supported

129,024

-

$1.147

qwen3-30b-a3b-thinking-2507

Thinking only

126,976

81,920

$0.108

$1.076

qwen3-30b-a3b-instruct-2507

Thinking not supported

129,024

-

$0.431

qwen3-235b-a22b

Non-thinking

129,024

-

16,384

$0.287

$1.147

Thinking

98,304

38,912

$2.868

qwen3-32b

Non-thinking

129,024

-

$0.287

$1.147

Thinking

98,304

38,912

$2.868

qwen3-30b-a3b

Non-thinking

129,024

-

$0.108

$0.431

Thinking

98,304

38,912

$1.076

qwen3-14b

Non-thinking

129,024

-

8,192

$0.144

$0.574

Thinking

98,304

38,912

$1.434

qwen3-8b

Non-thinking

129,024

-

$0.072

$0.287

Thinking

98,304

38,912

$0.717

qwen3-4b

Non-thinking

129,024

-

$0.044

$0.173

Thinking

98,304

38,912

$0.431

qwen3-1.7b

Non-thinking

32,768

30,720

-

$0.173

Thinking

28,672

Combined with input must not exceed 30,720

$0.431

qwen3-0.6b

Non-thinking

30,720

-

$0.173

Thinking

28,672

Combined with input must not exceed 30,720

$0.431

QwQ open-source

QwQ is a reasoning model trained from Qwen2.5-32B, with significantly enhanced reasoning capabilities through reinforcement learning. Its core metrics (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench, etc.) match DeepSeek-R1 full version levels and significantly outperform DeepSeek-R1-Distill-Qwen-32B, which is also based on Qwen2.5-32B. Usage | API reference

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Context window

Max input

Max CoT

Max response

Input cost

Output cost

(tokens)

(per 1M tokens)

qwq-32b

131,072

98,304

32,768

8,192

$0.287

$0.861

QwQ-Preview

qwq-32b-preview is an experimental research model developed by the Qwen team in 2024, focused on enhancing AI reasoning capabilities, especially in math and programming. See QwQ official blog for model limitations. Usage | API reference | Try online

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwq-32b-preview

32,768

30,720

16,384

$0.287

$0.861

Qwen2.5

Qwen2.5 is part of the Qwen Large Language Model (LLM) series. We released a series of base language models and instruction-tuned language models with parameter scales ranging from 7 billion to 72 billion. Qwen2.5 improves upon Qwen2 in the following ways:

  • Pre-trained on our latest large-scale dataset containing up to 18 trillion tokens.

  • Significantly expanded knowledge and greatly enhanced coding and math capabilities due to our domain-expert models.

  • Major improvements in instruction following, long-text generation (over 8K tokens), structured data understanding (such as tables), and structured output generation (especially JSON). More resilient to diverse system prompts, enhancing chatbot role-playing and conditional setup.

  • Supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.

UsageAPI referenceTry online

International

In international deployment mode, endpoints and data storage are located in Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the Chinese mainland).

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(tokens)

(per 1M tokens)

qwen2.5-14b-instruct-1m

1,008,192

1,000,000

8,192

$0.805

$3.22

1 million tokens each

Valid for 90 days after Model Studio activation

qwen2.5-7b-instruct-1m

$0.368

$1.47

qwen2.5-72b-instruct

131,072

129,024

$1.4

$5.6

qwen2.5-32b-instruct

$0.7

$2.8

qwen2.5-14b-instruct

$0.35

$1.4

qwen2.5-7b-instruct

$0.175

$0.7

Chinese mainland

In Chinese mainland deployment mode, endpoints and data storage are located in Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen2.5-14b-instruct-1m

1,000,000

1,000,000

8,192

$0.144

$0.431

qwen2.5-7b-instruct-1m

$0.072

$0.144

qwen2.5-72b-instruct

131,072

129,024

$0.574

$1.721

qwen2.5-32b-instruct

$0.287

$0.861

qwen2.5-14b-instruct

$0.144

$0.431

qwen2.5-7b-instruct

$0.072

$0.144

qwen2.5-3b-instruct

32,768

30,720

$0.044

$0.130

qwen2.5-1.5b-instruct

Free for limited time

qwen2.5-0.5b-instruct

QVQ

qvq-72b-preview is an experimental research model developed by the Qwen team, focused on enhancing visual reasoning capabilities, especially in mathematical reasoning. See QVQ official blog for model limitations. Usage | API reference

To have the model output its thinking process before the answer, use the commercial model QVQ.
Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qvq-72b-preview

32,768

16,384

Max 16,384 per image

16,384

$1.721

$5.161

Qwen-Omni

A new multimodal understanding and generation LLM trained from Qwen2.5, supporting text, image, audio, and video input understanding. Capable of simultaneous streaming generation of text and speech, with significantly improved multimodal content understanding speed. Usage | API reference

International

In international deployment mode, endpoints and data storage are located in Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the Chinese mainland).

Model

Context window

Max input

Max output

Free quota

(Note)

(tokens)

qwen2.5-omni-7b

32,768

30,720

2,048

1 million tokens (modality agnostic)

Valid for 90 days after Model Studio activation

After free quota is exhausted, input and output follow these billing rules:

Input billing item

Unit price (per 1M tokens)

Text

$0.10

Audio

$6.76

Image/video

$0.28

Output billing item

Unit price (per 1M tokens)

Text

$0.40 (when input contains text only)

$0.84 (when input contains images/audio/video)

Text+audio

$13.51 (audio)

Text output is not billed.

Chinese mainland

In Chinese mainland deployment mode, endpoints and data storage are located in Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Context window

Max input

Max output

(tokens)

qwen2.5-omni-7b

32,768

30,720

2,048

Input and output follow these billing rules:

Input billing item

Unit price (per 1M tokens)

Text

$0.087

Audio

$5.448

Images/video

$0.287

Output billing item

Unit price (per 1M tokens)

Text

$0.345 (when input contains text only)

$0.861 (when input contains images/audio/video)

Text+audio

$10.895 (audio)

Text output is not billed.

Qwen3-Omni-Captioner

Qwen3-Omni-Captioner is an open-source model based on Qwen3-Omni. Without any prompts, it automatically generates accurate, comprehensive descriptions for complex audio, ambient sounds, music, film sound effects, and more. It detects speaker emotions, musical elements (such as genre and instruments), sensitive information, and is suitable for audio content analysis, security review, intent recognition, audio editing, and other fields. Usage | API reference

International

Under the international deployment mode, access points and data storage are located in the Singapore region. Model inference computing resources are dynamically scheduled globally (excluding the Chinese Mainland).

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-omni-30b-a3b-captioner

65,536

32,768

32,768

$3.81

$3.06

1 million tokens

Valid for 90 days after activating Model Studio

Chinese Mainland

Under the Chinese Mainland deployment mode, access points and data storage are located in the Beijing region. Model inference computing resources are limited to the Chinese Mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-omni-30b-a3b-captioner

65,536

32,768

32,768

$2.265

$1.821

No free quota

Qwen-VL

Alibaba Cloud's open-source Qwen-VL edition. Usage | API reference

Compared to Qwen2.5-VL, Qwen3-VL delivers major improvements in model capabilities:

  • Agent interaction: Operates computer or mobile interfaces. Detects GUI elements, understands functions, and calls tools to complete tasks. Achieves top-tier performance on benchmarks such as OS World.

  • Visual coding: Generates code from images or videos. Converts design mockups and website screenshots into HTML, CSS, and JavaScript code.

  • Spatial intelligence: Supports 2D and 3D localization, and accurately determines object position, viewpoint changes, and occlusion relationships.

  • Long-video understanding: Understands videos up to 20 minutes in length and pinpoints moments down to the second.

  • Deep thinking: Performs deep reasoning. Excels at spotting fine details and analyzing cause-and-effect relationships. Achieves top-tier performance on benchmarks such as MathVista and MMMU.

  • OCR: Supports 33 languages. Delivers stable performance under challenging conditions such as low light, blur, and skew. Significantly improves accuracy for rare characters, ancient script, and domain-specific terms.

    Languages supported

    Qwen-VL supports 33 languages: Chinese, Japanese, Korean, Indonesian, Vietnamese, Thai, English, French, German, Russian, Portuguese, Spanish, Italian, Swedish, Danish, Czech, Norwegian, Dutch, Finnish, Türkçe, Polish, Swahili, Romanian, Serbian, Greek, Kazakh, Uzbek, Cebuano, Arabic, Urdu, Persian, Hindi/Devanagari, and Hebrew.

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding the Chinese Mainland.

Model

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-vl-235b-a22b-thinking

Thinking only

126,976

81,920

$0.4

$4

1 million tokens each

Valid for 90 days after activating Model Studio

qwen3-vl-235b-a22b-instruct

Non-thinking only

129,024

-

$1.6

qwen3-vl-32b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.16

$0.64

qwen3-vl-32b-instruct

Non-thinking only

129,024

-

qwen3-vl-30b-a3b-thinking

Thinking only

126,976

81,920

$0.2

$2.4

qwen3-vl-30b-a3b-instruct

Non-thinking only

129,024

-

$0.8

qwen3-vl-8b-thinking

Thinking only

126,976

81,920

$0.18

$2.1

qwen3-vl-8b-instruct

Non-thinking only

129,024

-

$0.7

More models

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen2.5-vl-72b-instruct

131,072

129,024

Max per image: 16,384

8,192

$2.8

$8.4

1 million tokens each

Valid for 90 days after activating Model Studio

qwen2.5-vl-32b-instruct

$1.4

$4.2

qwen2.5-vl-7b-instruct

$0.35

$1.05

qwen2.5-vl-3b-instruct

$0.21

$0.63

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

Model

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

(tokens)

(per 1M tokens)

qwen3-vl-235b-a22b-thinking

Thinking only

126,976

81,920

$0.287

$2.867

qwen3-vl-235b-a22b-instruct

Non-thinking only

129,024

-

$1.147

qwen3-vl-32b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.16

$0.64

qwen3-vl-32b-instruct

Non-thinking only

129,024

-

qwen3-vl-30b-a3b-thinking

Thinking only

126,976

81,920

$0.108

$1.076

qwen3-vl-30b-a3b-instruct

Non-thinking only

129,024

-

$0.431

qwen3-vl-8b-thinking

Thinking only

126,976

81,920

$0.072

$0.717

qwen3-vl-8b-instruct

Non-thinking only

129,024

-

$0.287

Chinese Mainland

In Chinese Mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese Mainland.

Model

Mode

Context window

Max input

Max CoT

Max output

Input cost

Output cost

CoT + output

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-vl-235b-a22b-thinking

Thinking only

131,072

126,976

81,920

$0.287

$2.867

No free quota

qwen3-vl-235b-a22b-instruct

Non-thinking only

129,024

-

$1.147

qwen3-vl-32b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.287

$2.868

qwen3-vl-32b-instruct

Non-thinking only

129,024

-

$1.147

qwen3-vl-30b-a3b-thinking

Thinking only

126,976

81,920

$0.108

$1.076

qwen3-vl-30b-a3b-instruct

Non-thinking only

129,024

-

$0.431

qwen3-vl-8b-thinking

Thinking only

126,976

81,920

$0.072

$0.717

qwen3-vl-8b-instruct

Non-thinking only

129,024

-

$0.287

More models

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen2.5-vl-72b-instruct

131,072

129,024

Single-image maximum: 16,384

8,192

$2.294

$6.881

No free quota

qwen2.5-vl-32b-instruct

$1.147

$3.441

qwen2.5-vl-7b-instruct

$0.287

$0.717

qwen2.5-vl-3b-instruct

$0.173

$0.517

qwen2-vl-72b-instruct

32,768

30,720

Max per image: 16,384

2,048

$2.294

$6.881

Qwen-Math

Qwen-Math is a language model built on the Qwen model to solve math problems. Qwen2.5-Math supports Chinese and English and integrates multiple inference methods, such as Chain of Thought (CoT), Program of Thought (PoT), and Tool-Integrated Reasoning (TIR). How to use | API reference | Try it online

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen2.5-math-72b-instruct

4,096

3,072

3,072

$0.574

$1.721

qwen2.5-math-7b-instruct

$0.144

$0.287

qwen2.5-math-1.5b-instruct

Free for a limited time

Qwen-Coder

An open-source code model from Qwen. The latest Qwen3-Coder series delivers strong Coding Agent capabilities. It excels at tool calling and environment interaction. It supports autonomous programming and delivers outstanding coding performance while maintaining broad general-purpose abilities.How to use | API reference

International

In the international deployment mode, endpoints and data storage are in the Singapore region. Model inference compute resources are scheduled dynamically across the globe, excluding the Chinese Mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

qwen3-coder-next

262,144

204,800

65,536

Tiered pricing. See details below.

1 million tokens each

Valid for 90 days after activating Model Studio

qwen3-coder-480b-a35b-instruct

qwen3-coder-30b-a3b-instruct

The models above use tiered billing based on the number of input tokens per request.

Model

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

qwen3-coder-next

0 < Tokens ≤ 32K

$0.3

$1.5

32K < Tokens ≤ 128K

$0.5

$2.5

128K < Tokens ≤ 256K

$0.8

$4

qwen3-coder-480b-a35b-instruct

0 < Tokens ≤ 32K

$1.5

$7.5

32K < Tokens ≤ 128K

$2.7

$13.5

128K < Tokens ≤ 200K

$4.5

$22.5

qwen3-coder-30b-a3b-instruct

0 < Tokens ≤ 32K

$0.45

$2.25

32K < Tokens ≤ 128K

$0.75

$3.75

128K < Tokens ≤ 200K

$1.2

$6

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen3-coder-480b-a35b-instruct

262,144

204,800

65,536

Tiered pricing. See details below.

qwen3-coder-30b-a3b-instruct

qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct use tiered billing based on the number of input tokens per request.

Model

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

qwen3-coder-480b-a35b-instruct

0 < Tokens ≤32K

$0.861

$3.441

32K < Tokens ≤128K

$1.291

$5.161

128K < Tokens ≤200K

$2.151

$8.602

qwen3-coder-30b-a3b-instruct

0 < Tokens ≤32K

$0.216

$0.861

32K < Tokens ≤128K

$0.323

$1.291

128K < Tokens ≤200K

$0.538

$2.151

Chinese Mainland

In the Chinese Mainland deployment mode, endpoints and data storage are in the Beijing region. Model inference compute resources are limited to the Chinese Mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen3-coder-next

262,144

204,800

65,536

Tiered pricing. See details below.

qwen3-coder-480b-a35b-instruct

qwen3-coder-30b-a3b-instruct

The models above use tiered billing based on the number of input tokens per request.

Model

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

qwen3-coder-next

0 < Tokens ≤ 32 K

$0.144

$0.574

32 K < Tokens ≤ 128 K

$0.216

$0.861

128 K < Tokens ≤ 256 K

$0.359

$1.434

qwen3-coder-480b-a35b-instruct

0 < Tokens ≤ 32 K

$0.861

$3.441

32 K < Tokens ≤ 128 K

$1.291

$5.161

128 K < Tokens ≤ 200 K

$2.151

$8.602

qwen3-coder-30b-a3b-instruct

0 < Tokens ≤ 32 K

$0.216

$0.861

32 K < Tokens ≤ 128 K

$0.323

$1.291

128 K < Tokens ≤ 200 K

$0.538

$2.151

More models

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen2.5-coder-32b-instruct

131,072

129,024

8,192

$0.287

$0.861

qwen2.5-coder-14b-instruct

qwen2.5-coder-7b-instruct

$0.144

$0.287

qwen2.5-coder-3b-instruct

32,768

30,720

Limited-time free trial

qwen2.5-coder-1.5b-instruct

qwen2.5-coder-0.5b-instruct

EU

In the EU deployment mode, endpoints and data storage are in the Germany (Frankfurt) region. Model inference compute resources are limited to the EU.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

qwen3-coder-next

262,144

204,800

65,536

Tiered pricing. See details below.

The model above uses tiered billing based on the number of input tokens per request.

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

0 < Tokens ≤ 32K

$0.3

$1.5

32K < Tokens ≤ 128K

$0.5

$2.5

128K < Tokens ≤ 256K

$0.8

$4

Text generation - Third-party models

DeepSeek

DeepSeek is a large language model from DeepSeek AI. API reference | Try it online

International

In International deployment mode, the endpoints and data storage are in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding the Chinese mainland.

Model

Context window

Max input

Max CoT

Max response

Input cost

Output cost

Free quota

(tokens)

(per 1M tokens)

deepseek-v3.2

685B parameter size
context cache

131,072

98,304

32,768

65,536

$0.57

$1.71

1 million tokens

Valid for 90 days after you activate Model Studio

Chinese mainland

In Chinese mainland deployment mode, the endpoints and data storage are in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Context window

Max input

Max CoT

Max response

Input cost

Output cost

(tokens)

(per 1M tokens)

deepseek-v3.2

685B parameter size
context cache
batch calls

131,072

98,304

32,768

65,536

$0.287

$0.431

deepseek-v3.2-exp

685B parameter size

deepseek-v3.1

685B parameter size

$0.574

$1.721

deepseek-r1

685B parameter size
batch calls

16,384

$2.294

deepseek-r1-0528

685B parameter size

deepseek-v3

671B parameter size
batch calls

131,072

N/A

$0.287

$1.147

deepseek-r1-distill-qwen-1.5b

Based on Qwen2.5-Math-1.5B

32,768

32,768

16,384

16,384

Free trial for a limited time

deepseek-r1-distill-qwen-7b

Based on Qwen2.5-Math-7B

$0.072

$0.144

deepseek-r1-distill-qwen-14b

Based on Qwen2.5-14B

$0.144

$0.431

deepseek-r1-distill-qwen-32b

Based on Qwen2.5-32B

$0.287

$0.861

deepseek-r1-distill-llama-8b

Based on Llama-3.1-8B

Free trial for a limited time

deepseek-r1-distill-llama-70b

Based on Llama-3.3-70B

Kimi

Kimi-K2 is a large language model from Moonshot AI. It has excellent capabilities in encoding and tool calling. How to use | Try it online

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Mode

Context window

Max input

Max CoT

Max response

Input cost

Output cost

(tokens)

(per 1M tokens)

kimi-k2.5

Thinking mode

262,144

258,048

81,920

98,304

$0.574

$3.011

Non-thinking mode

262,144

260,096

-

98,304

$0.574

$3.011

kimi-k2-thinking

Thinking mode

262,144

229,376

32,768

16,384

$0.574

$2.294

Moonshot-Kimi-K2-Instruct

Non-thinking mode

131,072

131,072

-

8,192

$0.574

$2.294

MiniMax

MiniMax is a large language model from MiniMax. It focuses on complex, real-world tasks. Its core strengths include multilingual programming and agent task processing. How to use

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Context window

Max input

Max CoT + response

The thinking_budget parameter is not supported

Input cost

Output cost

(tokens)

(per 1M tokens)

MiniMax-M2.5

196,608

196,601

32,768

$0.304

$1.213

GLM

The GLM series models are hybrid reasoning models from Zhipu AI designed for agents. They offer both thinking and non-thinking modes. GLM

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Context window

Max input

Max CoT

Max response

Input cost

Output cost

(tokens)

(per 1M tokens)

glm-5

202,752

202,752

32,768

16,384

Tiered billing applies. See the table below.

glm-4.7

169,984

glm-4.6

These models use a tiered billing plan based on the number of input tokens per request.

Model

Input tokens per request

Input cost (per 1M tokens)

Output cost (per 1M tokens)

glm-5

0 < Tokens <= 32K

$0.573

$2.58

32K < Tokens <= 198K

$0.86

$3.154

glm-4.7

0 < Tokens <= 32K

$0.431

$2.007

32K < Tokens <= 166K

$0.574

$2.294

glm-4.6

0 < Tokens <= 32K

$0.431

$2.007

32K < Tokens <= 166K

$0.574

$2.294

The models above are not integrated third-party services. They are deployed on Alibaba Cloud Model Studio servers.
The thinking and non-thinking modes for GLM models have the same price.

Image generation

Qwen text-to-image

The Qwen text-to-image model excels at complex text rendering, especially for Chinese and English text. API reference

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Unit price

Free quota

qwen-image-2.0-pro

Currently has the same capabilities as qwen-image-2.0-pro-2026-03-03

$0.075/image

Free quota for new users: 100 images each

Validity: within 90 days after activating Model Studio

qwen-image-2.0-pro-2026-03-03

$0.075/image

qwen-image-2.0

Currently has the same capabilities as qwen-image-2.0-2026-03-03

$0.035/image

qwen-image-2.0-2026-03-03

$0.035/image

qwen-image-max

Currently has the same capabilities as qwen-image-max-2025-12-30

$0.075/image

qwen-image-max-2025-12-30

$0.075/image

qwen-image-plus

Currently has the same capabilities as qwen-image

$0.03/image

qwen-image-plus-2026-01-09

$0.03/image

qwen-image

$0.035/image

Chinese mainland

In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Unit price

Free quota

qwen-image-2.0-pro

Currently has the same capabilities as qwen-image-2.0-pro-2026-03-03

$0.071676/image

No free quota

qwen-image-2.0-pro-2026-03-03

$0.071676/image

qwen-image-2.0

Currently has the same capabilities as qwen-image-2.0-2026-03-03

$0.028671/image

qwen-image-2.0-2026-03-03

$0.028671/image

qwen-image-max

Currently has the same capabilities as qwen-image-max-2025-12-30

$0.071677/image

qwen-image-max-2025-12-30

$0.071677/image

qwen-image-plus

Currently has the same capabilities as qwen-image

$0.028671/image

qwen-image-plus-2026-01-09

$0.028671/image

qwen-image

$0.035/image

Input prompt

Output image

Healing-style hand-drawn poster featuring three puppies playing with a ball on lush green grass, adorned with decorative elements such as birds and stars. The main title “Come Play Ball!” is prominently displayed at the top in bold, blue cartoon font. Below it, the subtitle “Come [Show Off Your Skills]!” appears in green font. A speech bubble adds playful charm with the text: “Hehe, watch me amaze my little friends next!” At the bottom, supplementary text reads: “We get to play ball with our friends again!” The color palette centers on fresh greens and blues, accented with bright pink and yellow tones to highlight a cheerful, childlike atmosphere.

image

Qwen image editing

The Qwen image editing model supports precise bilingual (Chinese and English) text editing, color grading, detail enhancement, style transfer, object addition or removal, position changes, action modifications, and other operations to enable complex image-and-text editing. API reference

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Unit price

Free quota

qwen-image-2.0-pro

Currently has the same capabilities as qwen-image-2.0-pro-2026-03-03

$0.075/image

Free quota for new users: 100 images each

Validity: within 90 days after activating Model Studio

qwen-image-2.0-pro-2026-03-03

$0.075/image

qwen-image-2.0

Currently has the same capabilities as qwen-image-2.0-2026-03-03

$0.035/image

qwen-image-2.0-2026-03-03

$0.035/image

qwen-image-edit-max

Currently has the same capabilities as qwen-image-edit-max-2026-01-16

$0.075/image

qwen-image-edit-max-2026-01-16

$0.075/image

qwen-image-edit-plus

Currently has the same capabilities as qwen-image-edit-plus-2025-10-30

$0.03/image

qwen-image-edit-plus-2025-12-15

$0.03/image

qwen-image-edit-plus-2025-10-30

$0.03/image

qwen-image-edit

$0.045/image

Chinese mainland

In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Unit price

Free quota

qwen-image-2.0-pro

Currently has the same capabilities as qwen-image-2.0-pro-2026-03-03

$0.071676/image

No free quota

qwen-image-2.0-pro-2026-03-03

$0.071676/image

qwen-image-2.0

Currently has the same capabilities as qwen-image-2.0-2026-03-03

$0.028671/image

qwen-image-2.0-2026-03-03

$0.028671/image

qwen-image-edit-max

Currently has the same capabilities as qwen-image-edit-max-2026-01-16

$0.071677/image

qwen-image-edit-max-2026-01-16

$0.071677/image

qwen-image-edit-plus

Currently has the same capabilities as qwen-image-edit-plus-2025-10-30

$0.028671/image

qwen-image-edit-plus-2025-12-15

$0.028671/image

qwen-image-edit-plus-2025-10-30

$0.028671/image

qwen-image-edit

$0.043/image

dog_and_girl (1)

Original image

狗修改图

Change the person in the image to a standing pose, bending over to hold the dog's front paws

image

Original image

image

Replace the words 'HEALTH INSURANCE' on the letter blocks with 'Tomorrow will be better'

5

Original image

5out

Replace the polka-dot shirt with a light blue shirt

6

Original image

6out

Change the background in the image to Antarctica

7

Original image

7out

Generate a cartoon profile picture of the person

image

Original image

image

Remove hair from the plate

Qwen image translation

The Qwen image translation models supports translating text in images from 11 languages into Chinese or English. It accurately preserves original layout and content information and offers customizable features such as glossary definition, sensitive words filter, and image entity detection. API reference

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Unit price

Free quota

qwen-mt-image

$0.000431/image

No free quota

en

Original image

ja

Japanese

es

Portuguese

ar

Arabic

Z-Image

Tongyi - text-to-image - Z-Image is a lightweight model that quickly generates high-quality images. The model supports Chinese and English text rendering, complex semantic understanding, various styles, and multiple resolutions and aspect ratios. API reference

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

z-image-turbo

Prompt extension disabled (prompt_extend=false): $0.015/image

Prompt extension enabled (prompt_extend=true): $0.03/image

100 images

Chinese mainland

In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Unit price

Free quota

z-image-turbo

Prompt extension disabled (prompt_extend=false): $0.01434/image

Prompt extension enabled (prompt_extend=true): $0.02868/image

No free quota

Input prompt

Output image

Photo of a stylish young woman with short black hair standing confidently in front of a vibrant cartoon-style mural wall. She wears an all-black outfit: a puffed bomber jacket with a ruffled collar, cargo shorts, fishnet tights, and chunky black Doc Martens, with a gold chain dangling from her waist. The background features four colorful comic-style panels: one reads “GRAND STAGE” and includes sneakers and a Gatorade bottle; another displays green Nike sneakers and a slice of pizza; the third reads “HARAJUKU st” with floating shoes; and the fourth shows a blue mouse riding a skateboard with the text “Takeshita WELCOME.” Dominant bright colors include yellow, teal, orange, pink, and green. Speech bubbles, halftone patterns, and playful characters enhance the urban street-art aesthetic. Daylight evenly illuminates the scene, and the ground beneath her feet is white tiled pavement. Full-body portrait, centered composition, slightly tilted stance, direct eye contact with the camera. High detail, sharp focus, dynamic framing.

b16c8008-83c1-4c80-ae22-786a2299bec3-1-转换自-png

Wan text-to-image

The Wan text-to-image model generates high-quality images from text. API reference | Try online

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

wan2.6-t2i currently only supports global deployment mode in the US (Virginia) region.

Model

Description

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

wan2.6-t2i Recommended

Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.

$0.028671/image

No free quota

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Description

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

wan2.6-t2i Recommended

Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.

$0.03/image

50 images

wan2.5-t2i-preview Recommended

Wan 2.5 preview. Removes single-side length limits and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.

$0.03/image

50 images

wan2.2-t2i-plus

Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture.

$0.05/image

100 images

wan2.2-t2i-flash

Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture.

$0.025/image

100 images

wan2.1-t2i-plus

Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details.

$0.05/image

200 images

wan2.1-t2i-turbo

Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed.

$0.025/image

200 images

Chinese mainland

In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Description

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

wan2.6-t2i Recommended

Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.

$0.028671/image

No free quota

wan2.5-t2i-preview Recommended

Wan 2.5 preview. Removes single-side length limits and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.

$0.028671/image

No free quota

wan2.2-t2i-plus

Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture.

$0.02007/image

No free quota

wan2.2-t2i-flash

Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture.

$0.028671/image

No free quota

wanx2.1-t2i-plus

Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details.

$0.028671/image

No free quota

wanx2.1-t2i-turbo

Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed.

$0.020070/image

No free quota

wanx2.0-t2i-turbo

Wan 2.0 Turbo Edition. Excels at textured portraits and creative designs. It is cost-effective.

$0.005735/image

No free quota

Input prompt

Output image

A needle-felted Santa Claus holding a gift, standing next to a white cat, with many colorful presents in the background. The entire scene should be cute, warm, and cozy, with some green plants in the background.

image

Wan image generation and editing 2.6

The Wan image generation model supports image editing and mixed text-and-image output to meet diverse generation and integration needs. API reference

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

wan2.6-image currently only supports global deployment mode in the US (Virginia) region.

Model

Unit price

Free quota

wan2.6-image

$0.028671/image

No free quota

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

wan2.6-image

$0.03/image

50 images

Chinese mainland

In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Unit price

Free quota

wan2.6-image

$0.028671/image

No free quota

Wan general image editing 2.5

The Wan general image editing 2.5 model supports inputting text, a single image, or multiple images to perform subject-consistent image editing and multi-image fusion creation. API reference

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

wan2.5-i2i-preview

$0.03/image

50 units

Chinese mainland

In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Unit price

Free quota

wan2.5-i2i-preview

$0.028671/image

No free quota

Model capabilities

Input example

Output image

Single-image editing

damotest2023_Portrait_photography_outdoors_fashionable_beauty_409ae3c1-19e8-4515-8e50-b3c9072e1282_2-转换自-png

a26b226d-f044-4e95-a41c-d1c0d301c30b-转换自-png

Replace the floral dress with a vintage-style lace gown featuring delicate embroidery on the collar and cuffs.

Multi-image fusion

image

p1028883

Place the alarm clock from image 1 beside the vase on the dining table in image 2.

Wan general image editing 2.1

The Wan general image editing model supports diverse image editing tasks using simple instructions. Use it for image outpainting, watermark removal, style transfer, image inpainting, and image enhancement. UsageAPI reference

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Billing rate

Free quota

wanx2.1-imageedit

$0.020070 per image

No free quota

The general image editing model currently supports the following features:

Model feature

Input image

Input prompt

Output image

Global stylization

image

Transform into a French picture book style.

image

Local stylization

image

Turn the house into a wooden plank style.

image

Instruction-based editing

image

Change the girl's hair to red.

image

Local redraw

Input image

image

Masked area image (white indicates the masked area)

image

A ceramic rabbit holding a ceramic flower.

Output image

image

Text watermark removal

image

Remove text from the image.

image

Image outpainting

20250319105917

A green fairy.

image

Image super resolution

Blurred image

image

Apply super resolution.

Sharp image

image

Image colorization

image

Blue background with yellow leaves.

image

Sketch-to-image

image

A Nordic minimalist living room.

image

Reference image

image

A cartoon character cautiously peeks out, gazing at a brilliant blue gem inside the room.

image

OutfitAnyone

  • The OutfitAnyone Plus model improves image definition, clothing texture detail, and logo fidelity compared to the Basic Edition. However, it takes longer to generate results. Use it for scenarios where speed is not critical. API reference | Try it online

  • OutfitAnyone Image Segmentation splits model images and clothing images. Use it for pre-processing and post-processing of OutfitAnyone images. API reference

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Description

Sample input

Sample output

aitryon-plus

OutfitAnyone Plus

output26

output29

aitryon-parsing-v1

OutfitAnyone Image Segmentation

OutfitAnyone billing rate

Model service

Model

Unit price

Discount

Tier

OutfitAnyone Plus

aitryon-plus

$0.071677 per image

None

None

OutfitAnyone Image Segmentation

aitryon-parsing-v1

$0.000574 per image

None

None

Video generation – Wan

Text-to-video

The Wan text-to-video model generates videos from a single sentence. Videos feature rich artistic styles and cinematic-quality visuals. API reference | Try it now

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

The wan2.6-t2v model supports only global deployment mode in the US (Virginia) region.

Model

Description

Unit price

Free quota

wan2.6-t2v Recommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

720P: $0.086012/second

1080P: $0.143353/second

No free quota

International

In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Description

Unit price

Free quota (Claim)

Valid for 90 days after activating Model Studio

wan2.6-t2v Recommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

720P: $0.10/second

1080P: $0.15/second

50 seconds

wan2.5-t2v-preview Recommended

Wan 2.5 preview. Supports automatic voiceover and custom audio file input.

480P: $0.05/second

720P: $0.10/second

1080P: $0.15/second

50 seconds

wan2.2-t2v-plus

Wan 2.2 Professional Edition. Significantly improved image detail and motion stability.

480P: $0.02/second

1080P: $0.10/second

50 seconds

wan2.1-t2v-turbo

Wan 2.1 Turbo Edition. Fast generation speed and balanced performance.

$0.036/second

200 seconds

wan2.1-t2v-plus

Wan 2.1 Professional Edition. Generates rich details and higher-quality visuals.

$0.10/second

200 seconds

US

In US deployment mode, both the access point and data storage are located in the US (Virginia) region. Model inference compute resources are limited to the United States.

Model

Description

Unit price

Free quota

wan2.6-t2v-us Recommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

720P: $0.1/second

1080P: $0.15/second

No free quota

Chinese mainland

In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Description

Unit price

Free quota

wan2.6-t2vRecommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

720P: $0.086012/second

1080p: 0.143353 per second

No free quota

wan2.5-t2v-previewRecommended

Wan 2.5 preview. Supports automatic voiceover and custom audio file input.

480P: $0.043006/second

720P: $0.086012/second

1080P: $0.143353/second

No free quota

wan2.2-t2v-plus

Wan 2.2 Professional Edition. Significantly improved image detail and motion stability.

480P: $0.02007/second

1080P: $0.100347/second

No free quota

wanx2.1-t2v-turbo

Faster generation speed and balanced performance.

$0.034405/second

No free quota

wanx2.1-t2v-plus

Generates richer details and higher-quality visuals.

$0.100347/second

No free quota

Input prompt

Output video (wan2.6, multi-shot video)

Shot from a low angle, in a medium close-up, with warm tones, mixed lighting (the practical light from the desk lamp blends with the overcast light from the window), side lighting, and a central composition. In a classic detective office, wooden bookshelves are filled with old case files and ashtrays. A green desk lamp illuminates a case file spread out in the center of the desk. A fox, wearing a dark brown trench coat and a light gray fedora, sits in a leather chair, its fur crimson, its tail resting lightly on the edge, its fingers slowly turning yellowed pages. Outside, a steady drizzle falls beneath a blue sky, streaking the glass with meandering streaks. It slowly raises its head, its ears twitching slightly, its amber eyes gazing directly at the camera, its mouth clearly moving as it speaks in a smooth, cynical voice: 'The case was cold, colder than a fish in winter. But every chicken has its secrets, and I, for one, intended to find them'.

Image-to-video – first frame

The Wan image-to-video model uses your input image as the first frame, then generates a video based on your prompt. Videos feature rich artistic styles and cinematic-quality visuals. API reference | Try it now

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

The wan2.6-i2v model supports only global deployment mode in the US (Virginia) region.

Model

Description

Unit price

Free quota

wan2.6-i2v Recommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

720P: $0.086012/second

1080P: $0.143353/second

No free quota

International

In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Description

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

wan2.6-i2v-flash Recommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

Output video with audio audio=true:

  • 720P: $0.05/second

  • 1080P: $0.075/second

Output video without audio audio=false:

  • 720P: $0.025/second

  • 1080P: $0.0375/second

50 seconds

wan2.6-i2v Recommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

720P: $0.10/second

1080P: $0.15/second

50 seconds

wan2.5-i2v-preview Recommended

Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads.

480P: $0.05/second

720P: $0.10/second

1080P: $0.15/second

50 seconds

wan2.2-i2v-flash

Wan 2.2 Flash Edition. Extremely fast generation speed with significant improvements in visual detail and motion stability.

480P: $0.015/second

720P: $0.036/second

50 seconds

wan2.2-i2v-plus

Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability.

480P: $0.02/second

1080P: $0.10/second

50 seconds

wan2.1-i2v-turbo

Wan 2.1 Turbo Edition. Fast generation speed with balanced performance.

$0.036/second

200 seconds

wan2.1-i2v-plus

Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals.

$0.10/second

200 seconds

US

In US deployment mode, both the access point and data storage are located in the US (Virginia) region. Model inference compute resources are limited to the United States.

Model

Description

Unit price

Free quota

wan2.6-i2v-us Recommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

720P: $0.1/second

1080P: $0.15/second

No free quota

Chinese mainland

In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Description

Unit price

Free quota

wan2.6-i2v-flash Recommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

Output video with audio audio=true:

  • 720P: $0.043006/second

  • 1080P: $0.071676/second

Output video without audio audio=false:

  • 720P: $0.021503/second

  • 1080P: $0.035838/second

No free quota

wan2.6-i2v Recommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

720P: $0.086012/second

1080P: $0.143353/second

No free quota

wan2.5-i2v-preview

Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads.

480P: $0.043006/second

720P: $0.086012/second

1080P: $0.143353/second

No free quota

wan2.2-i2v-plus

Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability.

480P: $0.02007/second

1080P: $0.100347/second

No free quota

wanx2.1-i2v-turbo

Wan 2.1 Turbo Edition. Fast generation speed with balanced performance.

$0.034405/second

No free quota

wanx2.1-i2v-plus

Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals.

$0.100347/second

No free quota

Input prompt

Input first-frame image and audio

Output video (wan2.6, multi-shot video)

A scene of urban fantasy art. A dynamic graffiti-style character. A boy painted with spray paint comes alive from a concrete wall. He raps in English at high speed while striking a classic, energetic rapper pose. The setting is under a railway bridge in an urban area at night. Lighting comes from a single streetlamp, creating a cinematic atmosphere full of high energy and stunning detail. The video's audio consists entirely of his rap, with no other dialogue or noise.

rap-转换自-png

Input audio:

Image-to-video – first and last frame

The Wan first-and-last-frame image-to-video model generates smooth, fluid videos using just two images—the first and last frames—plus your prompt. Videos feature rich artistic styles and cinematic-quality visuals. API reference | Try it now

International

In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

wan2.2-kf2v-flash

480P: $0.015/second

720P: $0.036/second

1080P: $0.07/second

50 seconds

wan2.1-kf2v-plus

$0.10/second

200 seconds

Chinese mainland

In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Unit price

Free quota (Note)

wan2.2-kf2v-flash

480P: $0.014335/second

720P: $0.028671/second

1080P: $0.068809/second

No free quota

wanx2.1-kf2v-plus

$0.100347/second

No free quota

Example input

Output video

First-frame image

Last-frame image

Prompt

first_frame

last_frame

Realistic style. A black kitten curiously looks up at the sky. The camera starts level, rises gradually, and ends with a top-down view of the kitten’s curious expression.

Reference-to-video

The Wan reference-to-video model lets you generate performance videos using characters and voices from reference videos or images. API reference

Billing rule: Both input and output videos are billed by video duration in seconds. Failed requests are not billed and do not consume your free quota.

  • Input video duration is capped at 5 seconds. See Wan reference-to-video for details.

  • Output video duration equals the duration of successfully generated video.

Global

In the Global deployment mode, endpoint and data storage are located in the US (Virginia) region or Germany (Frankfurt) region, and model inference computing resources are dynamically scheduled globally.

The wan2.6-r2v model supports only global deployment mode in the US (Virginia) region.

Model

Output video type

Input & output price

Free quota (Note)

wan2.6-r2v

Video with audio

720P: $0.086012/second

1080P: $0.143353/second

No free quota

International

In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Output video type

Input & output price

Free quota (Note)

wan2.6-r2v-flash Recommended

Video with audio

audio=true

720P: $0.05/second

1080P: $0.075/second

50 seconds

Valid for 90 days after activating Model Studio

Video without audio

audio=false

720P: $0.025/second

1080P: $0.0375/second

wan2.6-r2v

Video with audio

720P: $0.10/second

1080P: $0.15/second

50 seconds

Valid for 90 days after activating Model Studio

Chinese mainland

In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Output video type

Input & output price

Free quota (Note)

wan2.6-r2v-flash Recommended

Video with audio

audio=true

720P: $0.043006/second

1080P: $0.071676/second

No free quota

Video without audio

audio=false

720P: $0.021503/second

1080P: $0.035838/second

wan2.6-r2v

Video with audio

720P: $0.086012/second

1080P: $0.143353/second

No free quota

General video editing

The Wan general video editing unified model accepts multimodal inputs—including text, images, and videos—and performs both video generation and general editing tasks. API reference | Try it now

International

In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Unit price

Free quota (Note)

wan2.1-vace-plus

$0.1/second

50 seconds

Valid for 90 days after activating Model Studio

Chinese mainland

In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Unit price

Free quota (Note)

wanx2.1-vace-plus

$0.100347/second

No free quota

The unified video editing model supports these features:

Feature

Input reference image

Input prompt

Output video

Multi-image reference

Reference image 1 (for entity)

image

Reference image 2 (for background)

image

In the video, a girl gracefully walks out from the depths of an ancient, misty forest. Her steps are light, and the camera captures her every nimble movement. When the girl stops and looks around at the lush woods, she breaks into a smile of surprise and joy. This moment is captured in the interplay of light and shadow, recording the wonderful encounter between the girl and nature.

Output video

Video restyling

The video shows a black steampunk-style car driven by a gentleman, adorned with gears and copper pipes. The background is a steam-powered candy factory with retro elements, creating a vintage and playful scene.

Local editing

Input video

Input mask image (The white area indicates the editing region)

mask

The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, drinking with a look of contentment. The cafe is tastefully decorated, with soft tones and warm lighting illuminating the area where the lion is.

The content in the editing region is modified based on the prompt

Video extension

Input initial video segment (1 second)

A dog wearing sunglasses skateboards on a street, 3D cartoon.

Output extended video (5 seconds)

Video outpainting

An elegant lady is passionately playing the violin, with a full symphony orchestra behind her.

Wan – digital human

Generate natural talking, singing, or performing videos from a single portrait image and audio. Call the following models in order. wan2.2-s2v image detection | wan2.2-s2v video generation

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Model description

Unit price

wan2.2-s2v-detect

Checks whether the input image meets requirements (such as clarity, single person, front-facing).

$0.000574/image

wan2.2-s2v

Generates a dynamic portrait video from a validated image and an audio clip.

480P: $0.071677/second

720P: $0.129018/second

Example input

Output video

p1001125-转换自-jpeg

Input audio:

Wan – image-to-action

Offers standard and professional service modes. Uses a portrait image and reference video to transfer the video subject’s actions and expressions to the portrait image, generating a dynamic action video. API reference

International

In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Model service

Service description

Billing unit price

Free quota(View)

wan2.2-animate-move

Standard mode wan-std

Fast generation speed. Meets light needs such as basic animation demos. High cost-effectiveness.

$0.12/second

50 seconds total for both modes

Professional mode wan-pro

High animation smoothness. Natural transitions between actions and expressions. Results closely resemble real filming.

$0.18/second

Chinese mainland

In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Model service

Service description

Billing unit price

Free quota(View)

wan2.2-animate-move

Standard mode wan-std

Fast generation speed. Meets light needs such as basic animation demos. High cost-effectiveness.

$0.06/second

No free quota

Professional mode wan-pro

High animation smoothness. Natural transitions between actions and expressions. Results closely resemble real filming.

$0.09/second

Portrait image

Reference video

Output video (standard mode)

Output video (professional mode)

move_input_image

Wan – video character swap

Offers standard and professional service modes. Uses a portrait image and reference video to replace the main subject in the video with the portrait image, while preserving the original video’s scene, lighting, and hue. API reference

International

In international deployment mode, both the access point and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Model service

Service description

Billing unit price

Free quota(View)

wan2.2-animate-mix

Standard mode wan-std

Fast generation speed. Meets light needs such as basic animation demos. High cost-effectiveness.

$0.18/second

50 seconds total for both services

Professional mode wan-pro

High animation smoothness. Natural transitions between actions and expressions. Results closely resemble real filming.

$0.26/second

Chinese mainland

In Chinese mainland deployment mode, both the access point and data storage are located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Model service

Service description

Billing unit price

Free quota(View)

wan2.2-animate-mix

Standard mode wan-std

Fast generation speed. Meets light needs such as basic animation demos. High cost-effectiveness.

$0.09/second

No free quota

Professional mode wan-pro

High animation smoothness. Natural transitions between actions and expressions. Results closely resemble real filming.

$0.13/second

Portrait image

Reference video

Output video (standard mode)

Output video (professional mode)

mix_input_image

AnimateAnyone

Generate action videos from a portrait image and action templates. Call the following three models in order. AnimateAnyone image detection API details | AnimateAnyone action template generation | AnimateAnyone video generation API details

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Description

Unit price

animate-anyone-detect-gen2

Checks whether the input image meets requirements

$0.000574/image

animate-anyone-template-gen2

Extracts human motion from a motion video and generates an action template

$0.011469/second

animate-anyone-gen2

Generates an action video from a portrait image and an action template

Input: Portrait image

Input: Action video

Output (generated against image background)

Output (generated against video background)

04-9_16

Note
  • The examples above were generated by an app that integrates AnimateAnyone.

  • AnimateAnyone generates only video frames—not audio.

EMO

Generate dynamic portrait videos from a portrait image and a human voice audio file. Call the following models in order. EMO image detection | EMO video generation

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Description

Unit price

emo-detect-v1

Checks whether the input image meets requirements. No deployment needed. Call directly.

$0.000574/image

emo-v1

Generates dynamic portrait videos. No deployment needed. Call directly.

  • 1:1 aspect ratio video: $0.011469/second

  • 3:4 aspect ratio video: $0.022937/second

Input: Portrait image + human voice audio file

Output: Dynamic portrait video

Portrait:

上春山

Human voice audio: See video on the right

Portrait video:

Animation style intensity: Active ("style_level": "active")

LivePortrait

Quickly and efficiently generate dynamic portrait videos from a portrait image and a human voice audio file. Compared to EMO, LivePortrait offers faster generation and lower cost—but slightly lower quality. Call the following two models in order. LivePortrait image detection | LivePortrait video generation

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Description

Unit price

liveportrait-detect

Checks whether the input image meets requirements

$0.000574/image

liveportrait

Generates dynamic portrait videos

$0.002868/second

Input: Portrait image + human voice audio file

Output: Dynamic portrait video

Portrait:

Emoji男孩

Human voice audio: See video on the right

Portrait video:

Emoji

Generate dynamic facial videos from a face image and preset facial motion templates. Use cases include emoji creation and video asset generation. Call the following models in order. Emoji image detection | Emoji video generation

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Description

Unit price

emoji-detect-v1

Checks whether the input image meets requirements

$0.000574/image

emoji-v1

Generates matching facial expressions from a portrait image and a specified emoji template

$0.011469/second

Input: Portrait image

Output: Dynamic portrait video

image.png

Template sequence for “happy” expression: ("input.driven_id": "mengwa_kaixin")

VideoRetalk

Generate new videos where the subject’s lip movements match the input audio. Call the following model. API reference

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Description

Unit price

videoretalk

Generates a new video where the subject’s lip movements match the input audio

$0.011469/second

Video style transfer

Generate videos in different styles based on text input—or apply style transfer to input videos. API reference

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Description

Unit price

video-style-transform

Converts input videos into Japanese manga, American comic, or other styles

720P

$0.071677/second

540P

$0.028671/second

Input video

Output video (Japanese manga style)

Speech synthesis (text-to-speech)

Qwen speech synthesis

Supports mixed-language text input and streaming audio output. Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Chinese Mainland).

Qwen3-TTS-Instruct-Flash

Model

Version

Unit price

Max input characters

Free quota (Note)

qwen3-tts-instruct-flash

Currently, qwen3-tts-instruct-flash-2026-01-26.

Stable

$0.115/10K characters

600

10,000 characters

Valid for 90 days after activating Model Studio

qwen3-tts-instruct-flash-2026-01-26

Snapshot

  • Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VD

Model

Version

Unit price

Max input characters

Free quota (Note)

qwen3-tts-vd-2026-01-26

Snapshot

$0.115 per 10,000 characters

600

10,000 characters

Valid for 90 days after activating Model Studio

  • Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VC

Model

Version

Unit price

Max input characters

Free quota (Note)

qwen3-tts-vc-2026-01-22

Snapshot

$0.115/10K characters

600

10,000 characters

Valid for 90 days after activating Model Studio.

  • Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-Flash

Model

Version

Unit price

Max input characters

Free quota (Note)

qwen3-tts-flash

Currently, qwen3-tts-flash-2025-11-27.

Stable

$0.10 per 10,000 characters

600

10,000 characters

Valid for 90 days after activating Model Studio

qwen3-tts-flash-2025-11-27

Snapshot

qwen3-tts-flash-2025-09-18

Snapshot

If you activate Alibaba Cloud Model Studio before 00:00 on November 13, 2025: 2,000 characters

If you activate Alibaba Cloud Model Studio after 00:00 on November 13, 2025: 10,000 characters

Valid for 90 days after activating Model Studio.

  • Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Chinese Mainland

In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.

Qwen3-TTS-Instruct-Flash

Model

Version

Unit price

Max input characters

Free quota (Note)

qwen3-tts-instruct-flash

Currently, qwen3-tts-instruct-flash-2026-01-26.

Stable

$0.115/10K characters

600

No free quota is available.

qwen3-tts-instruct-flash-2026-01-26

Snapshot

  • Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VD

Model

Version

Unit price

Max input characters

Free quota (Note)

qwen3-tts-vd-2026-01-26

Snapshot

$0.115/10K characters

600

No free quota is available.

  • Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VC

Model

Version

Unit price

Max input characters

Free quota (Note)

qwen3-tts-vc-2026-01-22

Snapshot

$0.115/10K characters

600

No free quota is available.

  • Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-Flash

Model

Version

Unit price

Max input characters

Free quota (Note)

qwen3-tts-flash

Currently, qwen3-tts-flash-2025-11-27.

Stable

$0.114682 per 10,000 characters

600

No free quota is available.

qwen3-tts-flash-2025-11-27

Snapshot

qwen3-tts-flash-2025-09-18

Snapshot

  • Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen-TTS

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Free quota (Note)

(tokens)

(Per 1,000 tokens)

qwen-tts

Provides the same capabilities as qwen-tts-2025-04-10.

Stable

8,192

512

7,680

$0.230

$1.434

No free quota is available.

qwen-tts-latest

Provides the same capabilities as the latest snapshot.

Latest

qwen-tts-2025-05-22

Snapshot

qwen-tts-2025-04-10

Audio-to-token conversion rule: Each second of audio corresponds to 50 tokens. Audio shorter than 1 second is calculated as 50 tokens.

Qwen real-time text-to-speech

Supports streaming text input and streaming audio output. It can automatically adjust the speech rate based on the text content and punctuation. Usage | API reference

Qwen3-TTS-Instruct-Flash-Realtime supports Qwen real-time speech synthesis and can only use the default voice. It does not support cloned or designed voices.

Qwen3-TTS-VD-Realtime supports using voices from Qwen voice design for real-time speech synthesis, but does not support the default voice.

Qwen3-TTS-VC-Realtime supports using voices from Qwen voice cloning for real-time speech synthesis, but does not support the default voice.

Qwen3-TTS-Flash-Realtime and Qwen-TTS-Realtime can only use the default voice. They do not support cloned or designed voices.

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Chinese Mainland).

Qwen3-TTS-Instruct-Flash-Realtime

Model

Version

Unit price

Free quota (Note)

qwen3-tts-instruct-flash-realtime

Currently, qwen3-tts-instruct-flash-realtime-2026-01-22.

Stable

$0.143/10K characters

10,000 characters

Valid for 90 days after activating Model Studio.

qwen3-tts-instruct-flash-realtime-2026-01-22

Snapshot

  • Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VD-Realtime

Model

Version

Unit price

Free quota (Note)

qwen3-tts-vd-realtime-2026-01-15

Snapshot

$0.143353 per 10,000 characters

10,000 characters

Valid for 90 days after activating Model Studio

qwen3-tts-vd-realtime-2025-12-16

Snapshot

  • Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VC-Realtime

Model

Version

Unit price

Free quota(Note)

qwen3-tts-vc-realtime-2026-01-15

Snapshot

$0.13/10K characters

10,000 characters

Valid for 90 days after activating Model Studio.

qwen3-tts-vc-realtime-2025-11-27

Snapshot

  • Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-Flash-Realtime

Model

Version

Unit price

Free quota (Note)

qwen3-tts-flash-realtime

Currently, qwen3-tts-flash-realtime-2025-11-27.

Stable

$0.13 per 10,000 characters

10,000 characters

Valid for 90 days after activating Model Studio

qwen3-tts-flash-realtime-2025-11-27

Snapshot

qwen3-tts-flash-realtime-2025-09-18

Snapshot

If you activate Alibaba Cloud Model Studio before 00:00 on November 13, 2025: 2,000 characters

If you activate Alibaba Cloud Model Studio after 00:00 on November 13, 2025: 10,000 characters

Valid for 90 days after activating Model Studio

  • Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Chinese Mainland

In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.

Qwen3-TTS-Instruct-Flash-Realtime

Model

Version

Unit price

Free quota (Note)

qwen3-tts-instruct-flash-realtime

Current capabilities match qwen3-tts-instruct-flash-realtime-2026-01-22.

Stable

$0.143 per 10,000 characters

No free quota

qwen3-tts-instruct-flash-realtime-2026-01-22

Snapshot

  • Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VD-Realtime

Model

Version

Unit price

Free quota (Note)

qwen3-tts-vd-realtime-2026-01-15

Snapshot

$0.143353 per 10,000 characters

No free quota

qwen3-tts-vd-realtime-2025-12-16

Snapshot

  • Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VC-Realtime

Model

Version

Unit price

Free quota (Note)

qwen3-tts-vc-realtime-2026-01-15

Snapshot

$0.143353 per 10,000 characters

No free quota is available.

qwen3-tts-vc-realtime-2025-11-27

Snapshot

  • Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-Flash-Realtime

Model

Version

Unit price

Free quota (Note)

qwen3-tts-flash-realtime

Currently, qwen3-tts-flash-realtime-2025-11-27.

Stable

$0.143353 per 10,000 characters

No free quota is available.

qwen3-tts-flash-realtime-2025-11-27

Snapshot

qwen3-tts-flash-realtime-2025-09-18

Snapshot

  • Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

  • Character calculation rules: Billing is based on the number of input characters. The rules are as follows:

    • One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters

    • Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen-TTS-Realtime

Model

Version

Context window

Max input

Max output

Input cost

Output cost

Supported languages

Free quota (Note)

(tokens)

(Per 1,000 tokens)

qwen-tts-realtime

Currently, qwen-tts-realtime-2025-07-15.

Stable

8,192

512

7,680

$0.345

$1.721

Chinese, English

No free quota is available.

qwen-tts-realtime-latest

Currently, qwen-tts-realtime-2025-07-15.

Latest

Chinese, English

qwen-tts-realtime-2025-07-15

Snapshot

Chinese, English

Audio-to-token conversion rule: Each second of audio corresponds to 50 tokens. Audio shorter than 1 second is calculated as 50 tokens.

Qwen voice cloning

Voice cloning uses a large model for feature extraction, allowing you to clone voices without training. Provide 10 to 20 seconds of audio to generate a highly similar and natural-sounding custom voice. Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Chinese Mainland).

Model

Unit price

Free quota (Note)

qwen-voice-enrollment

$0.01 per voice

1,000 voices

Valid for 90 days after activating Model Studio.

Chinese Mainland

In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.

Model

Unit price

Free quota (Note)

qwen-voice-enrollment

$0.01 per sound

No free quota is available.

Qwen voice design

Voice design generates custom voices from text descriptions. It supports multi-language and multi-dimensional voice feature definitions, making it suitable for applications such as ad dubbing, character creation, and audio content production. Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Chinese Mainland).

Model

Unit price

Free quota (Note)

qwen-voice-design

$0.2 per voice

10 voices

Valid for 90 days after activating Model Studio.

Chinese Mainland

In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.

Model

Unit price

Free quota (Note)

qwen-voice-design

$0.20 per voice

No free quota is available.

CosyVoice speech synthesis

CosyVoice is a next-generation generative speech synthesis model from Alibaba Cloud. It deeply integrates text understanding and speech generation based on a large-scale pre-trained language model and supports real-time streaming text-to-speech synthesis. Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Chinese Mainland).

Model

Unit price

Free quota (Note)

cosyvoice-v3-plus

$0.26/10K characters

10,000 characters

Valid for 90 days after activating Model Studio.

cosyvoice-v3-flash

$0.13/10K characters

Character calculation rules: Chinese characters (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) are counted as 2 characters. All other characters (such as letters, numbers, and Japanese/Korean syllabaries) are counted as 1 character. SSML tag content is not billed.

Chinese Mainland

In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.

Model

Unit price

Free quota (Note)

cosyvoice-v3.5-plus

$0.22/10K characters

No free quota

cosyvoice-v3.5-flash

$0.116/10K characters

cosyvoice-v3-plus

$0.286706/10K characters

cosyvoice-v3-flash

$0.14335/10K characters

cosyvoice-v2

$0.286706/10K characters

Character calculation rules: Chinese characters (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) are counted as 2 characters. All other characters (such as letters, numbers, and Japanese/Korean syllabaries) are counted as 1 character. SSML tag content is not billed.

Speech recognition (speech-to-text) and translation (speech-to-target-language text)

Qwen3-LiveTranslate-Flash

Qwen3-LiveTranslate-Flash is an audio and video translation model based on the Qwen3-Omni architecture. It supports translation between 18 languages, including Chinese, English, Russian, and French. The model can use visual context to improve translation accuracy and outputs both text and speech. Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Chinese Mainland.

Model

Version

Context window

Max input

Max output

Free quota (Note)

(tokens)

qwen3-livetranslate-flash

Currently, qwen3-livetranslate-flash-2025-12-01.

Stable

53,248

49,152

4,096

1 million tokens each

Valid for 90 days after activating Model Studio

qwen3-livetranslate-flash-2025-12-01

Snapshot

The billing rules for input and output are as follows:

Input

Unit price (per 1M tokens)

Audio

$1.577

Video

The audio portion is billed separately.

$0.631

Output

Unit price (per 1M tokens)

Audio

$6.308

Text

$1.577

Chinese Mainland

In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.

Model

Version

Context window

Max input

Max output

Free quota (Note)

(tokens)

qwen3-livetranslate-flash

Currently, qwen3-livetranslate-flash-2025-12-01.

Stable

53,248

49,152

4,096

No free quota is available.

qwen3-livetranslate-flash-2025-12-01

Snapshot

The billing rules for input and output are as follows:

Input

Unit price (per 1M tokens)

Audio

$1.434

Video

The audio portion is billed separately.

$0.573

Output

Unit price (per 1M tokens)

Audio

$5.734

Text

$1.434

Qwen3-LiveTranslate-Flash-Realtime

Qwen3-LiveTranslate-Flash-Realtime is a multilingual, real-time audio and video translation model. It can recognize 18 languages and translate them into audio in 10 languages in real time.

Core features:

  • Multi-language support: Supports 18 languages, such as Chinese, English, French, German, Russian, Japanese, and Korean, and 6 Chinese dialects, including Mandarin, Cantonese, and Sichuanese.

  • Visual enhancement: Uses visual content to improve translation accuracy. The model analyzes lip movements, actions, and on-screen text to improve translation in noisy environments or for words with multiple meanings.

  • Low latency: Achieves simultaneous interpretation latency as low as 3 seconds.

  • High-quality simultaneous interpretation: Addresses cross-language word order issues using semantic unit prediction technology. The real-time translation quality is comparable to offline translation results.

  • Natural voice: Generates natural-sounding, human-like speech. The model adapts its tone and emotion based on the source speech content.

Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Chinese Mainland.

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(tokens)

qwen3-livetranslate-flash-realtime

Currently, qwen3-livetranslate-flash-realtime-2025-09-22.

Stable

53,248

49,152

4,096

1 million tokens

Valid for 90 days after activating Model Studio.

qwen3-livetranslate-flash-realtime-2025-09-22

Snapshot

After the free quota is used up, the billing rules for input and output are as follows:

Input

Unit price (per 1M tokens)

Audio

$10

Image

$1.3

Output

Unit price (per 1M tokens)

Text

$10

Audio

$38

Token calculation rules:

  • Audio: Each second of audio input or output consumes 12.5 tokens.

  • Image: Each 28×28 pixel input consumes 0.5 tokens.

Chinese Mainland

In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.

Model

Version

Context window

Max input

Max output

Free quota (Note)

(tokens)

qwen3-livetranslate-flash-realtime

Currently, qwen3-livetranslate-flash-realtime-2025-09-22.

Stable

53,248

49,152

4,096

No free quota is available.

qwen3-livetranslate-flash-realtime-2025-09-22

Snapshot

The billing rules for input and output are as follows:

Input

Unit price (per 1M tokens)

Audio

$9.175

Image

$1.147

Output

Unit price (per 1M tokens)

Text

$9.175

Audio

$34.405

Token calculation rules:

  • Audio: Each second of audio input or output consumes 12.5 tokens.

  • Image: Each 28×28 pixel input consumes 0.5 tokens.

Qwen audio file recognition

Based on the Qwen multimodal foundation model, this model supports features such as multi-language recognition, singing recognition, and noise rejection. Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Chinese Mainland.

Qwen3-ASR-Flash-Filetrans

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash-filetrans

Currently, qwen3-asr-flash-filetrans-2025-11-17.

Stable

$0.000035/second

36,000 seconds (10 hours)

Valid for 90 days after activating Model Studio.

qwen3-asr-flash-filetrans-2025-11-17

Snapshot

  • Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish

  • Supported sample rates: Any

Qwen3-ASR-Flash

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash

Currently, qwen3-asr-flash-2025-09-08.

Stable

$0.000035 per second

36,000 seconds (10 hours)

Valid for 90 days after activating Model Studio.

qwen3-asr-flash-2026-02-10

Snapshot

qwen3-asr-flash-2025-09-08

Snapshot

  • Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish

  • Supported sample rates: Any

US

In the US deployment mode, the endpoints and data storage are located in the US (Virginia) region. Model inference compute resources are limited to the US.

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash-us

Currently, qwen3-asr-flash-2025-09-08-us.

Stable

$0.000035/second

No free quota is available.

qwen3-asr-flash-2025-09-08-us

Snapshot

  • Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish

  • Supported sample rates: Any

Chinese Mainland

In the Chinese Mainland deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.

Qwen3-ASR-Flash-Filetrans

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash-filetrans

It offers the same capabilities as qwen3-asr-flash-filetrans-2025-11-17.

Stable

$0.000032/second

No free quota is available.

qwen3-asr-flash-filetrans-2025-11-17

Snapshot

  • Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish

  • Supported sample rates: Any

Qwen3-ASR-Flash

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash

Currently, qwen3-asr-flash-2025-09-08.

Stable

$0.000032/second

No free quota is available.

qwen3-asr-flash-2026-02-10

Snapshot

qwen3-asr-flash-2025-09-08

Snapshot

  • Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish

  • Supported sample rates: Any

Qwen real-time speech recognition

Qwen Real-Time Speech Recognition is a model with automatic language detection. It supports 11 languages and delivers accurate transcription even in complex audio environments. How to use | API reference

International

In international deployment mode, endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled across global regions, excluding Chinese Mainland.

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash-realtime

Currently, qwen3-asr-flash-realtime-2025-10-27

Stable

$0.00009/second

36,000 seconds (10 hours)

Valid for 90 days after activating Model Studio.

qwen3-asr-flash-realtime-2026-02-10

Snapshot

qwen3-asr-flash-realtime-2025-10-27

Snapshot

  • Languages supported: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Türkçe, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish

  • Sample rates supported: 8 kHz, 16 kHz

Chinese Mainland

In Chinese Mainland deployment mode, endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Chinese Mainland only.

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash-realtime

Currently, qwen3-asr-flash-realtime-2025-10-27

Stable

$0.000047/second

No free quota

qwen3-asr-flash-realtime-2026-02-10

Snapshot

qwen3-asr-flash-realtime-2025-10-27

Snapshot

  • Languages supported: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Türkçe, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish

  • Sample rates supported: 8 kHz, 16 kHz

Paraformer ASR

Paraformer speech recognition offers two versions: recorded file recognition and real-time speech recognition.

Recorded file recognition

Usage | API reference

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Unit price

Free quota (Note)

paraformer-v2

$0.000012/second

No free quota

paraformer-8k-v2

  • Languages supported:

    • paraformer-v2: Chinese (Mandarin, Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghai), English, Japanese, Korean, German, French, Russian

    • paraformer-8k-v2: Mandarin Chinese

  • Sample rates supported:

    • paraformer-v2: Any

    • paraformer-8k-v2: 8 kHz

  • Audio formats supported: AAC, AMR, AVI, FLAC, FLV, M4A, MKV, MOV, MP3, MP4, MPEG, OGG, OPUS, WAV, WEBM, WMA, WMV

Real-time speech recognition

Usage | API reference

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Unit price

Free quota (Note)

paraformer-realtime-v2

$0.000035/second

No free quota

paraformer-realtime-8k-v2

  • Languages supported:

    • paraformer-realtime-v2: Chinese (Mandarin, Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghai), English, Japanese, Korean, German, French, Russian

    • paraformer-realtime-8k-v2: Mandarin Chinese

  • Sample rates supported:

    • paraformer-realtime-v2: Any

    • paraformer-realtime-8k-v2: 8 kHz

  • Audio formats supported: PCM, WAV, MP3, OPUS, SPEEX, AAC, AMR

Fun-ASR speech recognition

Fun-ASR speech recognition offers two versions: audio file recognition and real-time speech recognition.

Audio file recognition

Usage | API reference

International

In the international deployment mode, endpoints and data storage are in the Singapore region. Model inference compute resources are dynamically scheduled globally, excluding Chinese Mainland.

Model

Version

Unit price

Free quota (Note)

fun-asr

Currently, fun-asr-2025-11-07

Stable

$0.000035/second

36,000 seconds (10 hours)

Valid for 90 days

fun-asr-2025-11-07

Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy

Snapshot

fun-asr-2025-08-25

fun-asr-mtl

Currently, fun-asr-mtl-2025-08-25

Stable

fun-asr-mtl-2025-08-25

Snapshot

  • Languages supported:

    • fun-asr and fun-asr-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.

    • fun-asr-2025-08-25: Mandarin and English.

    • fun-asr-mtl and fun-asr-mtl-2025-08-25: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.

  • Sample rates supported: Any

  • Audio formats supported: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

Chinese Mainland

In the Chinese Mainland deployment mode, endpoints and data storage are in the Beijing region. Model inference compute resources are limited to Chinese Mainland.

Model

Version

Unit price

Free quota (Note)

fun-asr

Currently, fun-asr-2025-11-07

Stable

$0.000032 / second

No free quota

fun-asr-2025-11-07

Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy

Snapshot

fun-asr-2025-08-25

fun-asr-mtl

Currently, fun-asr-mtl-2025-08-25

Stable

fun-asr-mtl-2025-08-25

Snapshot

  • Languages supported:

    • fun-asr and fun-asr-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.

    • fun-asr-2025-08-25: Mandarin and English.

    • fun-asr-mtl and fun-asr-mtl-2025-08-25: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.

  • Sample rates supported: Any

  • Audio formats supported: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

Real-time speech recognition

Usage | API reference

International

In the international deployment mode, endpoints and data storage are in the Singapore region. Model inference compute resources are dynamically scheduled globally, excluding Chinese Mainland.

Model

Version

Unit price

Free quota (Note)

fun-asr-realtime

Currently, fun-asr-realtime-2025-11-07

Stable

$0.00009/second

36,000 seconds (10 hours)

Valid for 90 days

fun-asr-realtime-2025-11-07

Snapshot

  • Languages supported: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.

  • Sample rates supported: 16 kHz

  • Audio formats supported: pcm, wav, mp3, opus, speex, aac, amr

Chinese Mainland

In the Chinese Mainland deployment mode, endpoints and data storage are in the Beijing region. Model inference compute resources are limited to Chinese Mainland.

Model

Version

Unit price

Free quota (Note)

fun-asr-realtime

Currently, fun-asr-realtime-2025-11-07

Stable

$0.000047/second

No free quota

fun-asr-realtime-2026-02-28

Snapshot

fun-asr-realtime-2025-11-07

Snapshot

fun-asr-realtime-2025-09-15

fun-asr-flash-8k-realtime

Currently, fun-asr-flash-8k-realtime-2026-01-28

Stable

$0.000032/second

fun-asr-flash-8k-realtime-2026-01-28

Snapshot

  • Languages supported:

    • fun-asr-realtime, fun-asr-realtime-2026-02-28, fun-asr-realtime-2025-11-07: Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia), English, and Japanese.

    • fun-asr-realtime-2025-09-15: Chinese (Mandarin), English

  • Sample rates supported: 16 kHz

  • Sample rates supported:

    • fun-asr-flash-8k-realtime and fun-asr-flash-8k-realtime-2026-01-28: 8 kHz

    • All other models: 16 kHz

  • Audio formats supported: pcm, wav, mp3, opus, speex, aac, amr

Text embeddings

Text embedding models convert text into numerical vectors that represent the meaning of the input. These models are suitable for search, clustering, recommendation, and classification tasks. Billing is based on the number of input tokens. API reference

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the Chinese mainland).

Model

Vector dimensions

Batch size

Max tokens per batch

Supported languages

Price

(per 1M input tokens)

Free quota

(Note)

text-embedding-v4

Part of the Qwen3-Embedding series

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, 64

10

8,192

Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and over 100 other major languages, plus multiple programming languages

$0.07

1 million tokens

Valid for 90 days after activating Model Studio

text-embedding-v3

1,024 (default), 768, or 512

10

8,192

Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and over 50 other languages

500,000 tokens

Valid for 90 days after activating Model Studio

Chinese mainland

In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Vector dimensions

Batch size

Max tokens per batch

Supported languages

Price

(per 1M input tokens)

Free quota

(Note)

text-embedding-v4

Part of the Qwen3-Embedding series
Batch calls are half price

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, 64

10

8,192

Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and over 100 other major languages, plus multiple programming languages

$0.072

No free quota

Hong Kong (China)

In Hong Kong (China) deployment mode, the endpoint and data storage are both located in Hong Kong (China). Model inference compute resources are limited to Hong Kong (China).

Model

Vector dimensions

Batch size

Max tokens per batch

Supported languages

Price

(per 1M input tokens)

Free quota

(Note)

text-embedding-v4

Part of the Qwen3-Embedding series

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, 64

10

8,192

Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and over 100 other major languages, plus multiple programming languages

$0.07

1 million tokens

Valid for 90 days after activating Model Studio

Note

Batch size refers to the max number of texts you can process in a single API call. For example, text-embedding-v4 has a batch size of 10, meaning one request can include up to 10 texts for vectorization, and each text must not exceed 8,192 tokens. This limit applies to:

  • String array input: The array can contain up to 10 elements.

  • File input: A text file can contain up to 10 lines of text.

Multimodal embedding

Multimodal embedding models transform text, images, or videos into a vector of floating-point numbers. These models are suitable for video classification, image classification, and image-text retrieval. API reference

International

In the International deployment mode, endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding the Chinese mainland).

Model

Data type

Vector dimensions

Price (per 1 million input tokens)

Free quota(Note)

tongyi-embedding-vision-plus

float(32)

1,152

$0.09

1 million tokens

Validity: Within 90 days of activating Model Studio.

tongyi-embedding-vision-flash

float(32)

768

Image/Video: $0.03

Text: $0.09

Chinese mainland

In the Chinese mainland deployment mode, endpoints and data storage are located in the Beijing region. Model inference compute resources are restricted to the Chinese mainland.

Model

Data type

Vector dimensions

Price (per 1 million input tokens)

qwen3-vl-embedding

float(32)

2560, 2048, 1536, 1024, 768, 512, 256

Image/Video: $0.258

Text: $0.1

multimodal-embedding-v1

1,024

Free trial

Ranking

These models are typically used for semantic retrieval. Given a query and a list of candidate documents, they rank the documents from highest to lowest based on semantic relevance to the query. API reference

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are scheduled dynamically worldwide (excluding the Chinese mainland).

Model

Max number of documents

Max input tokens per line

Max input tokens

Supported languages

Unit price (per 1M input tokens)

qwen3-rerank

500

4,000

30,000

Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and over 100 other major languages

$0.1

  • Max input tokens per line: Each query or document can contain up to 4,000 tokens. Input exceeding this limit will be truncated.

  • Max number of documents: A request can include up to 500 documents.

  • Max input tokens: The total number of tokens across all queries and documents in a single request must not exceed 30,000.

Chinese mainland

In Chinese mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Max number of documents

Max input tokens per line

Max input tokens

Supported languages

Unit price (per 1M input tokens)

qwen3-vl-rerank

100

8,000

120,000

Chinese, English, Japanese, Korean, French, German, and 33 other major languages

Images: $0.258

Text: $0.1

gte-rerank-v2

500

4,000

30,000

Chinese, English, Japanese, Korean, Thai, Spanish, French, Portuguese, German, Indonesian, Arabic, and over 50 other languages

$0.115

  • Max input tokens per line: Each query or document can contain up to 4,000 tokens. Input exceeding this limit will be truncated.

  • Max number of documents: A request can include up to 500 documents.

  • Max input tokens: The total number of tokens across all queries and documents in a single request must not exceed 30,000.

Domain specific

Intention recognition

The intention recognition model parses user intent quickly and accurately in under 100 milliseconds. It selects the right tool to solve user problems.API reference | Usage

Note

Only the Chinese Mainland deployment mode is supported. Endpoint and data storage are located in the Beijing region, and model inference computing resources are restricted to Chinese Mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

tongyi-intent-detect-v3

8,192

8,192

1,024

$0.058

$0.144

Role playing

Qwen role-playing models are designed for anthropomorphic dialog scenarios, such as virtual socializing, game NPCs, IP replication, and hardware, toys, or in-vehicle systems. Compared with other Qwen models, they improve persona fidelity, topic progression, and empathetic listening.Usage

International

In the international deployment mode, the endpoint and data storage are both in the Singapore region. Model inference compute resources are dynamically scheduled across global regions, excluding the Chinese mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-plus-character

32,768

30,000

4,000

$0.5

$1.4

qwen-plus-character-ja

8,192

7,680

512

$0.5

$1.4

Chinese mainland

In the Chinese mainland deployment mode, the endpoint and data storage are both in the Beijing region. Model inference compute resources are limited to the Chinese mainland.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qwen-plus-character

32,768

32,000

4,096

$0.115

$0.287

Retired models

Retired on January 30, 2026

Category

Model

Context window

Max input

Max output

Input cost (per 1M tokens)

Output cost (per 1M tokens)

Alternative

(tokens)

Qwen Plus

qwen-plus-2024-11-27

131,072

129,024

8,192

$0.115

$0.287

qwen-plus-2025-12-01

qwen-plus-2024-11-25

qwen-plus-2024-09-19

qwen-plus-2024-08-06

128,000

$0.574

$1.721

Qwen Turbo

qwen-turbo-2024-09-19

131,072

129,023

8,192

$0.044

$0.087

qwen-flash-2025-07-28

Qwen VL

qwen-vl-max-2024-10-30

32,768

30,720

Max 16384 per image

2,048

$2.868

$2.868

qwen3-vl-plus-2025-12-19

qwen-vl-max-2024-08-09

qwen-vl-plus-2024-08-09

$0.216

$0.646

qwen3-vl-flash-2025-10-15

Retired on August 20, 2025

Qwen2

Alibaba Cloud's Qwen2 open-source version. Usage | API reference | Try online

Model

Context window

Max input

Max output

Input cost

Output cost

Alternative

(tokens)

(per 1M tokens)

qwen2-72b-instruct

131,072

128,000

6,144

Free for a limited time

Qwen3, DeepSeek, Kimi, etc.

qwen2-57b-a14b-instruct

65,536

63,488

qwen2-7b-instruct

131,072

128,000

Qwen1.5

Alibaba Cloud's Qwen1.5 open-source version. Usage | API reference | Try online

Model

Context window

Max input

Max output

Input cost

Output cost

Alternative

(tokens)

(per 1M tokens)

qwen1.5-110b-chat

8,000

6,000

2,000

Free for a limited time

Qwen3, DeepSeek, Kimi, etc.

qwen1.5-72b-chat

qwen1.5-32b-chat

qwen1.5-14b-chat

qwen1.5-7b-chat