Models - Alibaba Cloud Model Studio - Alibaba Cloud Documentation Center

Text generation - Qwen

The following are the Qwen commercial models. Compared to the open-source versions, the commercial models offer the latest capabilities and improvements.

The parameter sizes of the commercial models are not disclosed.

Each model is updated periodically. To use a fixed version, you can select a snapshot version. A snapshot version is typically maintained for one month after the release of the next snapshot version.

We recommend that you use the stable or latest version for more lenient rate limiting conditions.

Qwen-Max

The most powerful model in the Qwen series, ideal for complex, multi-step tasks. Usage | Thinking | API reference | Try online

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model	Version	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost CoT + output	Free quota (Note)
			(tokens)				(per 1M tokens)
qwen3-max Currently qwen3-max-2026-01-23 Part of Qwen3 series Supports calling built-in tools	Stable	Thinking	262,144	258,048	81,920	32,768	Tiered pricing. See details below.		1 million tokens each Valid for 90 days after activating Model Studio
		Non-thinking			-	65,536
qwen3-max-2026-01-23 Thinking mode aka Qwen3-Max-Thinking Part of Qwen3 series Supports calling built-in tools	Snapshot	Thinking			81,920	32,768
		Non-thinking			-	65,536
qwen3-max-2025-09-23 Part of Qwen3 series	Snapshot	Non-thinking only
qwen3-max-preview Part of Qwen3 series	Preview	Thinking			81,920	32,768
		Non-thinking			-	65,536

The models above use tiered pricing based on the number of input tokens in the current request.

Input tokens per request	Input cost (per 1M tokens) qwen3-max and qwen3-max-preview support context cache.	Output cost (per 1M tokens)
0<Token≤32K	$1.2	$6
32K<Token≤128K	$2.4	$12
128K<Token≤252K	$3	$15

More models

Model	Version	Context window	Max input	Max output	Input cost	Output cost	Free quota (Note)
		(tokens)			(per 1M tokens)
qwen-max Currently qwen-max-2025-01-25 Batch calls at half price	Stable	32,768	30,720	8,192	$1.6	$6.4	1 million tokens each Valid for 90 days after activating Model Studio
qwen-max-latest Always the latest snapshot	Latest				$1.6	$6.4
qwen-max-2025-01-25 Also known as qwen-max-0125, Qwen2.5-Max	Snapshot

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model	Version	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost	Free quota (Note)
			(tokens)				(per 1K tokens)
qwen3-max Currently qwen3-max-2025-09-23 context cache discount available	Stable	Non-thinking only	262,144	258,048	-	65,536	Tiered pricing. See details below.		None
qwen3-max-2025-09-23	Snapshot	Non-thinking only
qwen3-max-preview Context cache discount available	Preview	Thinking			81,920	32,768
		Non-thinking			-	65,536

The models above use tiered pricing based on the number of input tokens in the current request.

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens) CoT + response
0<Token≤32K	$1.2	$6
32K<Token≤128K	$2.4	$12
128K<Token≤252K	$3	$15

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Version	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost
			(tokens)				(per 1M tokens)
qwen3-max Currently qwen3-max-2026-01-23 Part of Qwen3 series Supports calling built-in tools	Stable	Thinking	262,144	258,048	81,920	32,768	Tiered pricing. See details below.
		Non-thinking			-	65,536
qwen3-max-2026-01-23 Thinking mode aka Qwen3-Max-Thinking Part of Qwen3 series Supports calling built-in tools	Snapshot	Thinking			81,920	32,768
		Non-thinking			-	65,536
qwen3-max-2025-09-23 Part of Qwen3 series	Snapshot	Non-thinking only
qwen3-max-preview Part of Qwen3 series	Preview	Thinking			81,920	32,768
		Non-thinking			-	65,536

The models above use tiered pricing based on the number of input tokens in the current request.

Model	Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens) CoT + response
qwen3-max Batch calls at half price context cache discount available	0<Token≤32K	$0.359	$1.434
	32K<Token≤128K	$0.574	$2.294
	128K<Token≤252K	$1.004	$4.014
qwen3-max-2026-01-23	0<Token≤32K	$0.359	$1.434
	32K<Token≤128K	$0.574	$2.294
	128K<Token≤252K	$1.004	$4.014
qwen3-max-2025-09-23	0<Token≤32K	$0.861	$3.441
	32K<Token≤128K	$1.434	$5.735
	128K<Token≤252K	$2.151	$8.602
qwen3-max-preview context cache discount available	0<Token≤32K	$0.861	$3.441
	32K<Token≤128K	$1.434	$5.735
	128K<Token≤252K	$2.151	$8.602

More models

Model	Version	Context window	Max input	Max output	Input cost	Output cost
		(tokens)			(per 1M tokens)
qwen-max Currently qwen-max-2024-09-19 Batch calls at half price	Stable	32,768	30,720	8,192	$0.345	$1.377
qwen-max-latest Always the latest snapshot. Batch calls at half price	Latest	131,072	129,024
qwen-max-2025-01-25 Also known as qwen-max-0125, Qwen2.5-Max	Snapshot
qwen-max-2024-09-19 Also known as qwen-max-0919		32,768	30,720		$2.868	$8.602

qwen3-max-2026-01-23 thinking mode: Compared to the snapshot from September 23, 2025, it effectively integrates thinking and non-thinking modes, significantly improving overall model performance. In thinking mode, the model integrates three tools—web search, web extractor, and code interpreter—to achieve higher accuracy on complex problems by leveraging external tools during reasoning.

qwen3-max, qwen3-max-2026-01-23, and qwen3-max-2025-09-23 natively support the search agent feature, see Web Search.

Qwen-Plus

A balanced model with inference performance, cost, and speed between Qwen-Max and Qwen-Flash, ideal for moderately complex tasks. Usage | Thinking | API reference | Try online

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model	Version	Context window	Max input	Max output	Input cost	Output cost	Free quota (Note)
		(tokens)			(per 1M tokens)
qwen-plus Currently qwen-plus-2025-12-01 Part of Qwen3 series Batch calls at half price	Stable	1,000,000	Thinking 995,904 Non-thinking mode 997,952	32,768 Max CoT: 81,920	Tiered pricing. See details below.		1 million tokens each Valid for 90 days after activating Model Studio
qwen-plus-latest Currently qwen-plus-2025-12-01 Part of Qwen3 series	Latest		Thinking 995,904 Non-thinking mode 997,952
qwen-plus-2025-12-01 Part of Qwen3 series	Snapshot		Thinking 995,904 Non-thinking mode 997,952

qwen-plus-2025-09-11 Part of Qwen3 series
qwen-plus-2025-07-28 Also known as qwen-plus-0728 Part of Qwen3 series
qwen-plus-2025-07-14 Also known as qwen-plus-0714 Part of Qwen3 series		131,072	Thinking 98,304 Non-thinking mode 129,024	16,384 Max CoT: 38,912	$0.4	Thinking $4 Non-thinking mode $1.2
qwen-plus-2025-04-28 Also known as qwen-plus-0428 Part of Qwen3 series
qwen-plus-2025-01-25 Also known as qwen-plus-0125			129,024	8,192		$1.2

qwen-plus, qwen-plus-latest, qwen-plus-2025-12-01, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 use tiered pricing based on the number of input tokens in the current request.

Input tokens per request	Input cost (per 1M tokens)	Mode	Output cost (per 1M tokens)
0<Token≤256K	$0.4	Non-thinking mode	$1.2
0<Token≤256K	$0.4	Thinking	$4
256K<Token≤1M	$1.2	Non-thinking mode	$3.6
256K<Token≤1M	$1.2	Thinking	$12

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model	Version	Context window	Max input	Max output	Input cost	Output cost
		(tokens)			(per 1M tokens)
qwen-plus Currently qwen-plus-2025-12-01 Part of Qwen3 series	Stable	1,000,000	Thinking 995,904 Non-thinking mode 997,952	32,768 Max CoT: 81,920	Tiered pricing. See the description below the table.
qwen-plus-2025-12-01 Part of Qwen3 series	Snapshot		Thinking 995,904 Non-thinking mode 997,952

qwen-plus-2025-09-11 Part of the Qwen3 series
qwen-plus-2025-07-28 Part of the Qwen3 series

The models above use tiered pricing based on the number of input tokens in the current request. qwen-plus supports context cache.

Input tokens per request	Input cost (per 1M tokens)	Mode	Output cost (per 1M tokens)
0<Token≤256 KB	$0.4	Non-thinking mode	$1.2
0<Token≤256 KB	$0.4	Thinking	$4
256K<Token≤1M	$1.2	Non-thinking mode	$3.6
256K<Token≤1M	$1.2	Thinking	$12

US

In US deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are limited to the US.

Model	Version	Context window	Max input	Max output	Input cost	Output cost	Free quota (Note)
		(tokens)			(per 1M tokens)
qwen-plus-us Currently qwen-plus-2025-12-01-us Part of the Qwen3 series	Stable	1,000,000	Thinking 995,904 Non-thinking mode 997,952	32,768 Max CoT: 81,920	Tiered pricing. See details below.		None
qwen-plus-2025-12-01-us Part of the Qwen3 series	Snapshot		Thinking 995,904 Non-thinking mode 997,952

The models above use tiered pricing based on the number of input tokens in the current request. qwen-plus-us supports context cache.

Input tokens per request	Input cost (per 1M tokens)	Mode	Output cost (per 1M tokens)
0<Token≤256K	$0.4	Non-thinking mode	$1.2
0<Token≤256K	$0.4	Thinking	$4
256K<Token≤1M	$1.2	Non-thinking mode	$3.6
256K<Token≤1M	$1.2	Thinking	$12

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region, and model inference compute resources are limited to Mainland China.

Model	Version	Context window	Max input	Max output	Input cost	Output cost
		(tokens)			(per 1M tokens)
qwen-plus Currently qwen-plus-2025-12-01 Part of the Qwen3 series Batch calls at half price	Stable	1,000,000	Thinking 995,904 Non-thinking mode 997,952	32,768 Max CoT: 81,920	Tiered pricing. See details below.
qwen-plus-latest Currently qwen-plus-2025-12-01 Part of the Qwen3 series Batch calls at half price	Latest		Thinking 995,904 Non-thinking mode 997,952
qwen-plus-2025-12-01 Part of the Qwen3 series	Snapshot		Thinking 995,904 Non-thinking mode 997,952

qwen-plus-2025-09-11 Part of the Qwen3 series
qwen-plus-2025-07-28 Also known as qwen-plus-0728 Part of the Qwen3 series
qwen-plus-2025-07-14 Also known as qwen-plus-0714 Part of the Qwen3 series		131,072	Thinking 98,304 Non-thinking mode 129,024	16,384 Max CoT: 38,912	$0.115	Thinking $1.147 Non-thinking mode $0.287
qwen-plus-2025-04-28 Also known as qwen-plus-0428 Part of the Qwen3 series

qwen-plus, qwen-plus-latest, qwen-plus-2025-12-01, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 use tiered pricing based on the number of input tokens in the current request.

Input tokens per request	Input cost (per 1M tokens)	Mode	Output cost (per 1M tokens)
0<Token≤128K	$0.115	Non-thinking mode	$0.287
0<Token≤128K	$0.115	Thinking	$1.147
128K<Token≤256K	$0.345	Non-thinking mode	$2.868
128K<Token≤256K	$0.345	Thinking	$3.441
256K<Token≤1M	$0.689	Non-thinking mode	$6.881
256K<Token≤1M	$0.689	Thinking	$9.175

The models above support both thinking and non-thinking modes. You can switch between modes using the enable_thinking parameter. Additionally, these models offer the following significant improvements:

Reasoning ability: Significantly outperforms QwQ and similarly sized non-reasoning models in evaluations for math, code, and logical reasoning, achieving top-tier industry performance for a model of its size.
Human preference alignment: Features greatly enhanced capabilities for creative writing, role assumption, multi-turn conversation, and instruction following. Its general abilities significantly surpass those of similarly sized models.
Agent capabilities: Achieves industry-leading performance in both thinking and non-thinking modes and enables precise external tool invocation.
Multilingual support: Supports over 100 languages and dialects, and provides notable improvements in multilingual translation, instruction understanding, and commonsense reasoning.
Response formatting: Resolves issues found in previouss, such as incorrect Markdown formatting, response truncation, and incorrectly formatted boxed output.

For the models above, if thinking mode is enabled but no reasoning process is output, billing applies at the non-thinking mode rate.

More models

Model	Version	Context window	Max input	Max output	Input cost	Output cost
		(tokens)			(per 1M tokens)
qwen-plus-2025-01-25 Also known as qwen-plus-0125	Snapshot	131,072	129,024	8,192	$0.115	$0.287
qwen-plus-2025-01-12 Also known as qwen-plus-0112
qwen-plus-2024-12-20 Also known as qwen-plus-1220

Qwen-Flash

The fastest and lowest-cost model in the Qwen series, ideal for simple jobs. Qwen-Flash uses flexible tiered pricing, which provides more cost-effective billing than Qwen-Turbo. Usage | API reference | Try online | Thinking

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model	Version	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost CoT + output	Free quota (Note)
			(tokens)				(per 1K tokens)
qwen-flash Currently qwen-flash-2025-07-28 Part of the Qwen3 series Batch calls at half price	Stable	Thinking	1,000,000	995,904	81,920	32,768	Tiered pricing. See details below.		1 million tokens each Valid for 90 days after activating Model Studio
		Non-thinking		997,952	-
qwen-flash-2025-07-28 Part of the Qwen3 series	Snapshot	Thinking		995,904	81,920
		Non-thinking		997,952	-

The models above use tiered pricing based on the number of input tokens in the current request. qwen-flash supports cache and batch calls.

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0 < Tokens ≤ 256K	$0.05	$0.4
256K < Tokens ≤ 1M	$0.25	$2

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model	Version	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost CoT + output
			(tokens)				(per 1K tokens)
qwen-flash Currently qwen-flash-2025-07-28 Part of the Qwen3 series	Stable	Thinking	1,000,000	995,904	81,920	32,768	Tiered pricing. See details below.
		Non-thinking		997,952	-
qwen-flash-2025-07-28 Part of the Qwen3 series	Snapshot	Thinking		995,904	81,920
		Non-thinking		997,952	-

The models above use tiered pricing based on the number of input tokens in the current request. qwen-flash supports context cache.

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0 < Tokens ≤ 256K	$0.05	$0.4
256K < Tokens ≤ 1M	$0.25	$2

US

In US deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are limited to the US.

Model	Version	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost CoT + output	Free quota (Note)
			(tokens)				(per 1K tokens)
qwen-flash-us Currently qwen-flash-2025-07-28-us Part of the Qwen3 series	Stable	Thinking	1,000,000	995,904	81,920	32,768	Tiered pricing. See details below.		None
		Non-thinking		997,952	-
qwen-flash-2025-07-28-us Part of the Qwen3 series	Snapshot	Thinking		995,904	81,920
		Non-thinking		997,952	-

The models above use tiered pricing based on the number of input tokens in the current request.

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0 < Tokens ≤ 256K	$0.05	$0.4
256K < Tokens ≤ 1M	$0.25	$2

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Version	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost CoT + output
			(tokens)				(per 1K tokens)
qwen-flash Currently qwen-flash-2025-07-28 Part of the Qwen3 series Batch calls at half price	Stable	Thinking	1,000,000	995,904	81,920	32,768	Tiered pricing. See details below.
		Non-thinking		997,952	-
qwen-flash-2025-07-28 Part of the Qwen3 series	Snapshot	Thinking		995,904	81,920
		Non-thinking		997,952	-

The models above use tiered pricing based on the number of input tokens in the current request. qwen-flash supports context cache.

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0 < Tokens ≤ 128K	$0.022	$0.216
128K < Tokens ≤ 256K	$0.087	$0.861
256K < Tokens ≤ 1M	$0.173	$1.721

Qwen-Turbo

Qwen-Turbo will no longer receive updates. Replace it with Qwen-Flash. Qwen-Flash uses flexible tiered pricing for more cost-effective billing. Usage | API reference | Try online｜Thinking

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference computing resources are dynamically scheduled worldwide (excluding Mainland China).

Model	Version	Context window	Max input	Max output	Input cost	Output cost	Free quota (Note)
		(tokens)			(per 1M tokens)
qwen-turbo Currently qwen-turbo-2025-04-28 Part of the Qwen3 series Batch calls at half price	Stable	Thinking 131,072 Non-thinking mode 1,000,000	Thinking 98,304 Non-thinking mode 1,000,000	16,384 Max CoT: 38,912	$0.05	Thinking mode: $0.5 Non-thinking mode: $0.2	1 million tokens each Valid for 90 days after activating Model Studio
qwen-turbo-latest Always the latest snapshot Part of the Qwen3 series	Latest				$0.05	Thinking mode: $0.5 Non-thinking mode: $0.2
qwen-turbo-2025-04-28 Also known as qwen-turbo-0428 Part of the Qwen3 series	Snapshot
qwen-turbo-2024-11-01 Also known as qwen-turbo-1101		1,000,000	1,000,000	8,192		$0.2

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference computing resources are limited to Mainland China.

Model	Version	Context window	Max input	Max output	Input cost	Output cost
		(tokens)			(per 1M tokens)
qwen-turbo Currently qwen-turbo-2025-04-28 Part of the Qwen3 series	Stable	Thinking 131,072 Non-thinking mode 1,000,000	Thinking 98,304 Non-thinking mode 1,000,000	16,384 Max CoT: 38,912	$0.044	Thinking $0.431 Non-thinking mode $0.087
qwen-turbo-latest Always the latest snapshot Part of the Qwen3 series	Latest
qwen-turbo-2025-07-15 Also known as qwen-turbo-0715 Part of the Qwen3 series	Snapshot
qwen-turbo-2025-04-28 Also known as qwen-turbo-0428 Part of the Qwen3 series

QwQ

QwQ is a reasoning model trained on the Qwen2.5 base and significantly enhanced through reinforcement learning. It achieves performance comparable to the full-capacity DeepSeek-R1 on core metrics, such as AIME 24/25 and LiveCodeBench, and on certain general benchmarks, such as IFEval and LiveBench. Usage

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Version

Context window

Max input

Max CoT

Max response

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwq-plus

Stable

131,072

98,304

32,768

8,192

$0.8

$2.4

1 million tokens

Valid for 90 days after activating Model Studio

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Version	Context window	Max input	Max CoT	Max response	Input cost	Output cost
		(tokens)				(per 1M tokens)
qwq-plus Currently qwq-plus-2025-03-05 Batch calls at half price	Stable	131,072	98,304	32,768	8,192	$0.230	$0.574
qwq-plus-latest Always the latest snapshot	Latest
qwq-plus-2025-03-05 Also known as qwq-plus-0305	Snapshot

Qwen-Long

This Qwen series model features the longest context window, balanced capabilities, and a low cost. It is ideal for long-text analysis, information extraction, summarization, and classification tasks. Usage | Try online

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Version	Context window	Max input	Max output	Input cost	Output cost
		(tokens)			(per 1M tokens)
qwen-long-latest Always the latest snapshot Batch calls at half price	Stable	10,000,000	10,000,000	32,768	$0.072	$0.287
qwen-long-2025-01-25 Also known as qwen-long-0125	Snapshot

Qwen-Omni

Qwen-Omni accepts multimodal inputs, such as text, images, audio, and video, and generates text or speech responses. It offers multiple expressive, human-like voice options and supports multilingual and dialect speech output. This makes it suitable for audiovisual chat scenarios, such as visual recognition, emotion sensing, and education. Usage｜API reference

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference computing resources are dynamically scheduled worldwide (excluding Mainland China).

Model	Version	Mode	Context window	Max input	Max CoT	Max output	Free quota (Note)
			(tokens)
qwen3-omni-flash This model has the same capabilities as qwen3-omni-flash-2025-12-01.	Stable	Thinking	65,536	16,384	32,768	16,384	1 million tokens (regardless of modality) Valid for 90 days after activating Model Studio
		Non-thinking		49,152	-
qwen3-omni-flash-2025-12-01	Snapshot	Thinking	65,536	16,384	32,768	16,384
		Non-thinking		49,152	-
qwen3-omni-flash-2025-09-15 Also known as qwen3-omni-flash-0915	Snapshot	Thinking	65,536	16,384	32,768	16,384
		Non-thinking		49,152	-

After the free quota is used up, input and output are billed as follows. The pricing is the same for thinking and non-thinking modes. Audio output is not supported in thinking mode.

Input	Unit price (per 1M tokens)
Text	$0.43
Audio	$3.81
Image/Video	$0.78

Output

Unit price (per 1M tokens)

Text

$1.66 (when input contains text only)

$3.06 (when input contains images, video, or audio)

Text + Audio

This item is not billed in thinking mode.

$15.11 (audio)

Text output is not billed.

More models

Model	Version	Context window (tokens)	Max input (tokens)	Max output (tokens)	Free quota (Note)
		(tokens)
qwen-omni-turbo Has the same capabilities as the qwen-omni-turbo-2025-03-26 snapshot.	Stable	32,768	30,720	2,048	1 million tokens (regardless of modality) Valid for 90 days after activating Model Studio
qwen-omni-turbo-latest Always uses the latest snapshot. Identical capabilities	Latest
qwen-omni-turbo-2025-03-26 Also known as qwen-omni-turbo-0326.	Snapshot

After the free quota for commercial models is used up, the following input and output billing rules apply:

Enter the billing item.	Unit price (per 1M tokens)
Text	$0.07
Audio	$4.44
Image/Video	$0.21

Output

Unit price (per 1M tokens)

Text

$0.27 (when input contains text only)

$0.63 (when input contains images, video, or audio)

Text + Audio

$8.89 (audio)

Text output is not billed.

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference computing resources are limited to Mainland China.

Model	Version	Mode	Context window	Max input	Max CoT	Max output	Free quota (Note)
			(tokens)
qwen3-omni-flash Currently qwen3-omni-flash-2025-12-01	Stable	Thinking	65,536	16,384	32,768	16,384	No free quota
		Non-thinking		49,152	-
qwen3-omni-flash-2025-12-01	Snapshot	Thinking	65,536	16,384	32,768	16,384
		Non-thinking		49,152	-
qwen3-omni-flash-2025-09-15 Also known as qwen3-omni-flash-0915	Snapshot	Thinking	65,536	16,384	32,768	16,384
		Non-thinking		49,152	-

After the free quota is used up, input and output are billed as follows. The pricing is the same for thinking and non-thinking modes. Audio output is not supported in thinking mode.

Input Billing Item	Unit price (per 1M tokens)
Text	$0.258
Audio	$2.265
Image/Video	$0.473

Billing Output

Unit price (per 1M tokens)

Text

$0.989 (when input contains text only)

$1.821 (when input contains images, video, or audio)

Text + Audio

This item is not billed in thinking mode.

$8.974 (audio)

Text output is not billed.

More models

Model	Version	Context window	Max input	Max output	free quota Note
		(tokens)
qwen-omni-turbo Provides the same capabilities as qwen-omni-turbo-2025-03-26.	Stable	32,768	30,720	2,048	No free quota
qwen-omni-turbo-latest Always uses the latest snapshot. Same capabilities	Latest
qwen-omni-turbo-2025-03-26 Also known as qwen-omni-turbo-0326.	Snapshot
qwen-omni-turbo-2025-01-19 Also known as qwen-omni-turbo-0119.

The input and output billing rules are as follows:

Billing Item	Unit price (per 1M tokens)
Text	$0.058
Audio	$3.584
Image/Video	$0.216

Output

Unit price (per 1M tokens)

Text

$0.230 (when input contains text only)

$0.646 (when input contains images, audio, or video)

Text + Audio

$7.168 (audio)

Text output is not billed.

Billing example: A request with 1,000 text tokens and 1,000 image tokens as input, generating 1,000 text tokens and 1,000 audio tokens as output, costs: $0.000058 (text input) + $0.000216 (image input) + $0.007168 (audio output).

Use the Qwen3-Omni-Flash model for its significant capability improvements over Qwen-Omni-Turbo, which is no longer updated:

It is a hybrid thinking model that supports both thinking and non-thinking modes. Switch between modes using the enable_thinking parameter. By default, thinking mode is disabled.
Audio output is not supported in thinking mode. For audio output in non-thinking mode:
- qwen3-omni-flash-2025-12-01 supports up to 49 voice options, qwen3-omni-flash-2025-09-15 and qwen3-omni-flash support up to 17 voice options, and Qwen-Omni-Turbo supports only 4.
- Supports up to 10 languages, while Qwen-Omni-Turbo supports only 2.

Qwen-Omni-Realtime

Compared to Qwen-Omni, Qwen-Omni-Realtime supports streaming audio input and includes built-in Voice Activity Detection (VAD) to automatically detect the start and end of user speech. Usage｜Client events｜Server events

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model	Version	Context window	Max input	Max output	Free quota (Note)
		(tokens)
qwen3-omni-flash-realtime Currently qwen3-omni-flash-realtime-2025-12-01.	Stable	65,536	49,152	16,384	1 million tokens (regardless of modality) Valid for 90 days after activating Model Studio
qwen3-omni-flash-realtime-2025-12-01	Snapshot
qwen3-omni-flash-realtime-2025-09-15

After the free quota is used up, input and output are billed as follows:

Input	Unit price (per 1M tokens)
Text	$0.52
Audio	$4.57
Image	$0.94

Output

Unit price (per 1M tokens)

Text

$1.99 (for text-only input)

$3.67 (for inputs with images or audio)

Text + Audio

$18.13 (for audio)

Text output is not billed.

More models

Model	Version	Context window	Max input	Max output	Free quota (Note)
		(tokens)
qwen-omni-turbo-realtime Currently qwen-omni-turbo-realtime-2025-05-08	Stable	32,768	30,720	2,048	1 million tokens each (regardless of modality) Valid for 90 days after activating Model Studio
qwen-omni-turbo-realtime-latest Always the latest snapshot.	Latest
qwen-omni-turbo-realtime-2025-05-08	Snapshot

After the free quota is used up, input and output are billed as follows:

Input Billing Item	Unit price (per 1M tokens)
Text	$0.270
Audio	$4.440
Image	$0.840

Output

Unit price (per 1M tokens)

Text

$1.070 (when the input contains only text)

$2.52 (if the input contains images or audio)

Text + Audio

$8.890 (for audio)

The text portion of the output is not subject to billing.

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Version	Context window	Max input	Max output	Free quota (Note)
		(tokens)
qwen3-omni-flash-realtime This model currently has the same capabilities as qwen3-omni-flash-realtime-2025-12-01.	Stable	65,536	49,152	16,384	No free quota
qwen3-omni-flash-realtime-2025-12-01	Snapshot
qwen3-omni-flash-realtime-2025-09-15

After the free quota is used up, input and output are billed as follows:

Enter a billing item	Unit price (per 1M tokens)
Text	$0.315
Audio	$2.709
Image	$0.559

Output

Unit price (per 1M tokens)

text

$1.19 (if the input contains only text)

$2.179 (if the input contains images or audio)

Text + Audio

$10.766 (for audio)

The text part is not billed.

More models

Model	Version	Context window	Max input	Max output	Free quota (Note)
		(tokens)
qwen-omni-turbo-realtime This model currently has the same capabilities as the qwen-omni-turbo-2025-05-08 snapshot.	Stable	32,768	30,720	2,048	No free quota
qwen-omni-turbo-realtime-latest It always provides the same capabilities as the latest snapshot.	Latest
qwen-omni-turbo-realtime-2025-05-08	Snapshot

The input and output billing rules are as follows:

Billing Item Input	Unit price (per 1M tokens)
Text	$0.230
Audio	$3.584
Image	$0.861

Output

Unit price (per 1M tokens)

text

$0.918 (for text-only input)

$2.581 (for input with images or audio)

Text + Audio

$7.168 (for audio)

Text output is not billed.

Use the Qwen3-Omni-Flash-Realtime model instead of Qwen-Omni-Turbo-Realtime, which will no longer be updated. Qwen3-Omni-Flash-Realtime offers significant capability improvements. For audio output:

qwen3-omni-flash-realtime-2025-12-01 supports 49 voices. qwen3-omni-flash-realtime-2025-09-15 and qwen3-omni-realtime-flash support 17 voices. Qwen-Omni-Turbo-Realtime supports only 4.
Supports 10 languages, compared to Qwen-Omni-Turbo-Realtime's 2.

QVQ

QVQ is a visual reasoning model that supports visual input and CoT output. It demonstrates stronger capabilities in math, programming, visual analysis, creation, and general tasks. Usage | Try online

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model	Version	Context window	Max input	Max CoT	Max response	Input cost	Output cost	Free quota (Note)
		(tokens)				(per 1M tokens)
qvq-max Currently qvq-max-2025-03-25.	Stable	131,072	106,496 Max per image: 16,384	16,384	8,192	$1.2	$4.8	1 million input tokens each Valid for 90 days after activating Model Studio
qvq-max-latest Always the latest snapshot.	Latest
qvq-max-2025-03-25 Also known as qvq-max-0325.	Snapshot

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Version	Context window	Max input	Max CoT	Max response	Input cost	Output cost
		(tokens)				(per 1M tokens)
qvq-max Offers stronger visual reasoning and instruction-following capabilities than qvq-plus and delivers optimal performance for more complex tasks. Currently qvq-max-2025-03-25	Stable	131,072	106,496 Max per image: 16,384	16,384	8,192	$1.147	$4.588
qvq-max-latest Always the latest snapshot.	Latest
qvq-max-2025-05-15 Also known as qvq-max-0515.	Snapshot
qvq-max-2025-03-25 Also known as qvq-max-0325.
qvq-plus Currently qvq-plus-2025-05-15	Stable					$0.287	$0.717
qvq-plus-latest Always the latest snapshot.	Latest
qvq-plus-2025-05-15 Also known as qvq-plus-0515.	Snapshot

Qwen-VL

Qwen-VL is a text generation model with visual (image) understanding capabilities. It performs OCR, and can further summarize and reason. For example, it extracts attributes from product photos or solves problems based on exercise diagrams. Usage | API reference | Try online

Qwen-VL models are billed based on the total number of input and output tokens. For more information about image token calculation rules, see Visual Understanding.

International

In international deployment mode, the access point and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model	Version	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost CoT and output	Free quota (Note)
			(tokens)				(per 1M tokens)
qwen3-vl-plus Currently qwen3-vl-plus-2025-12-19	Stable	Thinking	262,144	258,048 Max per image: 16,384	81,920	32,768	Tiered pricing. See details below.		1 million input tokens and 1 million output tokens Valid for 90 days after activating Model Studio
		Non-thinking		260,096 Max per image: 16,384	-
qwen3-vl-plus-2025-12-19	Snapshot	Thinking		258,048 Max per image: 16,384	81,920
		Non-thinking		260,096 Max per image: 16,384	-
qwen3-vl-plus-2025-09-23	Snapshot	Thinking		258,048 Max per image: 16,384	81,920
		Non-thinking		260,096 Max per image: 16,384	-
qwen3-vl-flash Currently qwen3-vl-flash-2025-10-15	Stable	Thinking		258,048 Max per image: 16,384	81,920
		Non-thinking		260,096 Max per image: 16,384	-
qwen3-vl-flash-2026-01-22	Snapshot	Thinking		258,048 Max per image: 16,384	81,920
		Non-thinking		260,096 Max per image: 16,384	-
qwen3-vl-flash-2025-10-15	Snapshot	Thinking		258,048 Max per image: 16,384	81,920
		Non-thinking		260,096 Max per image: 16,384	-

The models above use tiered pricing based on the number of input tokens in the current request. The input and output prices are the same for thinking and non-thinking modes. In addition, qwen3-vl-plus and qwen3-vl-flash models support context cache.

qwen3-vl-plus series

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0 < Tokens ≤ 32K	$0.2	$1.6
32K < Tokens ≤ 128K	$0.3	$2.4
128K < Tokens ≤ 256K	$0.6	$4.8

qwen3-vl-flash series

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0 < Tokens ≤ 32,000	$0.05	$0.40
32,000 < Tokens ≤ 128,000	$0.075	$0.6
128,000 < Tokens ≤ 256,000	$0.12	$0.96

More models

Qwen-VL-Max

All models below belong to the Qwen2.5-VL series, and the qwen-vl-max model supports context cache.

Model	Version	Context window	Max input	Max output	Input cost	Output cost	Free quota (Note)
		(tokens)			(per 1M tokens)
qwen-vl-max Provides enhanced visual reasoning and instruction-following capabilities compared to qwen-vl-plus, delivering optimal performance for more complex tasks. Currently qwen-vl-max-2025-08-13.	Stable	131,072	129,024 Max per image: 16,384	8,192	$0.8	$3.2	1 million tokens for input and 1 million tokens for output Valid for 90 days after activating Model Studio
qwen-vl-max-latest Always the latest snapshot.	Latest				$0.8	$3.2
qwen-vl-max-2025-08-13 Also known as qwen-vl-max-0813. Visual understanding metrics have been fully upgraded, providing significantly enhanced capabilities in mathematics, reasoning, object recognition, and multilingual processing.	Snapshot
qwen-vl-max-2025-04-08 Also known as qwen-vl-max-0408. As part of the Qwen2.5-VL series, this model extends the context window to 128,000 tokens and significantly enhances mathematics and reasoning capabilities.

Qwen-VL-Plus

All models below belong to the Qwen2.5-VL series, and the qwen-vl-plus model supports context cache.

Model	Version	Context window	Max input	Max output	Input cost	Output cost	Free quota （Note）
		(tokens)			(per 1M tokens)
qwen-vl-plus Currently qwen-vl-plus-2025-08-15.	Stable	131,072	129,024 Max per image: 16,384	8,192	$0.21	$0.63	1 million tokens each Valid for 90 days after activating Model Studio
qwen-vl-plus-latest Always the latest snapshot.	Latest				$0.21	$0.63
qwen-vl-plus-2025-08-15 Also known as qwen-vl-plus-0815. Features significantly improved object recognition and localization, and multilingual processing capabilities.	Snapshot
qwen-vl-plus-2025-05-07 Also known as qwen-vl-plus-0507. Features significantly improved math, reasoning, and surveillance video content understanding capabilities.
qwen-vl-plus-2025-01-25 Also known as qwen-vl-plus-0125. Part of the Qwen2.5-VL series, this model extends the context window to 128K tokens and significantly enhances image and video understanding capabilities.

Global

In global deployment mode, the access point and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model	Version	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost CoT and output
			(tokens)				(per 1M tokens)
qwen3-vl-plus Currently qwen3-vl-plus-2025-12-19.	Stable	Thinking	262,144	258,048 Max per image: 16,384.	81,920	32,768	Tiered pricing. See details below.
		Non-thinking		260,096 Max per image: 16,384.	-
qwen3-vl-plus-2025-09-23	Snapshot	Thinking		258,048 Max per image: 16,384.	81,920
		Non-thinking		260,096 Max per image: 16,384.	-
qwen3-vl-flash Currently qwen3-vl-flash-2025-10-15.	Stable	Thinking		258,048 Max per image: 16,384.	81,920
		Non-thinking		260,096 Max per image: 16,384.	-
qwen3-vl-flash-2025-10-15	Snapshot	Thinking		258,048 Max per image: 16,384.	81,920
		Non-thinking		260,096 Max per image: 16,384.	-

The models above use tiered pricing based on the number of input tokens in the current request. The input and output prices are the same for thinking and non-thinking modes. In addition, qwen3-vl-plus and qwen3-vl-flash models support context cache.

qwen3-vl-plus series

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0 < Tokens ≤ 32,000	$0.20	$1.6
32,000 < Tokens ≤ 128,000	$0.30	$2.40
128,000 < Tokens ≤ 256,000	$0.60	$4.80

qwen3-vl-flash series

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0 < Tokens ≤ 32K	$0.05	$0.4
32K < Tokens ≤ 128K	$0.075	$0.6
128K < Tokens ≤ 256K	$0.12	$0.96

US

In US deployment mode, the access point and data storage are both located in the US (Virginia) region. Model inference compute resources are limited to the US.

Model	Version	Mode	Context window	Max input	Longest CoT	Max output	Input cost	Output cost For CoT and final output
			(tokens)				(per 1M tokens)
qwen3-vl-flash-us Offers the same capabilities as qwen3-vl-flash-2025-10-15-us.	Stable	Thinking		258,048 Max per image: 16,384	81,920	32,768	Tiered pricing. See details below.
		Non-thinking		260,096 Max per image: 16,384	-
qwen3-vl-flash-2025-10-15-us	Snapshot	Thinking		258,048 Max per image: 16,384	81,920
		Non-thinking		260,096 Max per image: 16,384	-

The models above use tiered pricing based on the number of input tokens in the current request. The input and output prices are the same for thinking and non-thinking modes. In addition, qwen3-vl-flash-us model supports context cache.

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0 < Tokens ≤ 32,000	$0.05	$0.4
32,000 < Tokens ≤ 128,000	$0.075	$0.6
128,000 < Tokens ≤ 256,000	$0.12	$0.96

Mainland China

In Mainland China deployment mode, the access point and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Version	Mode	Context window (tokens)	Max input (tokens)	Max CoT	Max output (tokens)	Input cost	Output cost	Free quota (Note)
			Token count				per 1 M tokens
qwen3-vl-plus Currently qwen3-vl-plus-2025-12-19 Batch calls at half price	Stable	Thinking	262,144	258,048 Max per image: 16,384	81,920	32,768	Tiered pricing. See details below.		No free quota
		Non-thinking		260,096 Max per image: 16,384	-
qwen3-vl-plus-2025-12-19	Snapshot	Thinking		258,048 Max per image: 16,384	81,920
		Non-thinking		260,096 Max per image: 16,384	-
qwen3-vl-plus-2025-09-23	Snapshot	Thinking		258,048 Max per image: 16,384	81,920
		Non-thinking		260,096 Max per image: 16,384	-
qwen3-vl-flash Currently qwen3-vl-flash-2025-10-15 Batch calls at half price	Stable	Thinking		258,048 Max per image: 16,384	81,920
		Non-thinking		260,096 Max per image: 16,384	-
qwen3-vl-flash-2026-01-22	Snapshot	Thinking		258,048 Max per image: 16,384	81,920
		Non-thinking		260,096 Max per image: 16,384	-
qwen3-vl-flash-2025-10-15	Snapshot	Thinking		258,048 Max per image: 16,384	81,920
		Non-thinking		260,096 Max per image: 16,384	-

The models above use tiered pricing based on the number of input tokens in the current request. The input and output prices are the same for thinking and non-thinking modes. In addition, qwen3-vl-plus and qwen3-vl-flash models support context cache.

qwen3-vl-plus series

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0 < Tokens ≤ 32K	$0.143	$1.434
32K < Tokens ≤ 128K	$0.215	$2.15
128K < Tokens ≤ 256K	$0.43	$4.301

qwen3-vl-flash series

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0 < Tokens ≤ 32,000	$0.022	$0.215
32,000 < Tokens ≤ 128,000	$0.043	$0.43
128,000 < Tokens ≤ 256,000	$0.086	$0.859

More models

Qwen-VL-Max series

Models updated on or after qwen-vl-max-2025-01-25 belong to the Qwen2.5-VL series, and the qwen-vl-max model supports context cache.

Model	Version	Context window	Max input	Max output	Input cost	Output cost
		(tokens)			(per 1M tokens)
qwen-vl-max Offers enhanced visual reasoning and instruction-following capabilities compared with qwen-vl-plus, delivering optimal performance for more complex tasks. Currently qwen-vl-max-2025-08-13. Batch calls are available at half price.	Stable	131,072	129,024 Max per image: 16,384	8,192	$0.23	$0.574
qwen-vl-max-latest Always the latest snapshot Batch calls are available at half price.	Latest
qwen-vl-max-2025-08-13 Also known as qwen-vl-max-0813. Features fully upgraded visual understanding metrics, with significantly enhanced capabilities in mathematics, reasoning, object recognition, and multilingual processing.	Snapshot
qwen-vl-max-2025-04-08 Also known as qwen-vl-max-0408. Provides enhanced mathematics and reasoning capabilities.					$0.431	$1.291
qwen-vl-max-2025-04-02 Also known as qwen-vl-max-0402. Delivers significantly improved accuracy when solving complex mathematics problems.
qwen-vl-max-2025-01-25 Also known as qwen-vl-max-0125. Upgraded to the Qwen2.5-VL series, it extends the context window to 128K tokens and significantly enhances image and video understanding capabilities.
qwen-vl-max-2024-12-30 Also known as qwen-vl-max-1230.		32,768	30,720 Max per image: 16,384	2,048	$0.431	$1.291
qwen-vl-max-2024-11-19 Also known as qwen-vl-max-1119.

Qwen-VL-Plus series

qwen-vl-plus-2025-01-25 belong to the Qwen2.5-VL series, and the qwen-vl-plus model supports context cache.

Model	Version	Context window	Max input	Max output	Input cost	Output cost
		(tokens)			(per 1M tokens)
qwen-vl-plus Currently qwen-vl-plus-2025-08-15. Batch calls at half price.	Stable	131,072	129,024 Max per image: 16,384	8,192	$0.115	$0.287
qwen-vl-plus-latest Always the latest snapshot. Batch calls at half price.	Latest
qwen-vl-plus-2025-08-15 Also known as qwen-vl-plus-0815. Significantly improved object recognition and localization, and multilingual processing capabilities.	Snapshot
qwen-vl-plus-2025-07-10 Also known as qwen-vl-plus-0710. Further improves the understanding of surveillance video content.		32,768	30,720 Max per image: 16,384		$0.022	$0.216
qwen-vl-plus-2025-05-07 Also known as qwen-vl-plus-0507. Significantly improved math, reasoning, and surveillance video content understanding capabilities.		131,072	129,024 Max per image: 16,384		$0.216	$0.646
qwen-vl-plus-2025-01-25 Also known as qwen-vl-plus-0125. Upgraded to the Qwen2.5-VL series, it extends the context to 128K tokens and significantly enhances image and video understanding capabilities.
qwen-vl-plus-2025-01-02 Also known as qwen-vl-plus-0102.		32,768	30,720 Max per image: 16,384	2,048

The qwen3-vl-flash-2026-01-22 model effectively integrates thinking and non-thinking modes. Compared to the snapshot of October 15, 2025, it significantly improves the model's overall performance. It achieves higher inference accuracy in business scenarios such as general visual recognition, security, store inspection, patrol inspection, and photo-based problem solving.

Qwen-OCR

Qwen-OCR is a model that specializes in text extraction. Compared to Qwen-VL, it focuses more on extracting text from images of items such as documents, tables, exam questions, and handwriting. It can recognize multiple languages, including English, French, Japanese, Korean, German, Russian, and Italian. Usage | API reference｜Try online

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model	Version	Context window	Max input	Max output	Input price	Output price	Free quota (Note)
		(tokens)			(per 1M tokens)
qwen-vl-ocr Equivalent to qwen-vl-ocr-2025-11-20.	Stable	38,192	30,000 Max per image: 30,000	8,192	$0.07	$0.16	1 million input tokens and 1 million output tokens Valid for 90 days after activating Model Studio
qwen-vl-ocr-2025-11-20 Also known as qwen-vl-ocr-1120. Based on the Qwen3-VL architecture, this model significantly improves document parsing and text localization.	Snapshot

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model	Version	Context window	Max input	Max output	Input price	Output price
		(tokens)			(per 1M tokens)
qwen-vl-ocr Equivalent to qwen-vl-ocr-2025-11-20.	Stable	38,192	30,000 Max per image: 30,000	8,192	$0.07	$0.16
qwen-vl-ocr-2025-11-20 Also known as qwen-vl-ocr-1120. Based on the Qwen3-VL architecture, this model significantly improves document parsing and text localization.	Snapshot

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Version	Context window	Max input	Max output	Input price	Output price	Free quota (Note)
		(tokens)			(per 1M tokens)
qwen-vl-ocr Currently qwen-vl-ocr-2025-11-20. Batch calls are available at half price.	Stable	38,192	30,000 Max per image: 30,000	8,192	$0.043	$0.072	No free quota
qwen-vl-ocr-latest Always the latest	Latest
qwen-vl-ocr-2025-11-20 Also known as qwen-vl-ocr-1120. Based on the Qwen3-VL architecture, this model significantly improves document parsing and text localization.	Snapshot
qwen-vl-ocr-2025-08-28 Also known as qwen-vl-ocr-0828.		34,096		4,096	$0.717	$0.717
qwen-vl-ocr-2025-04-13 Also known as qwen-vl-ocr-0413.
qwen-vl-ocr-2024-10-28 Also known as qwen-vl-ocr-1028.

Qwen-Math

Qwen-Math is a language model that specializes in solving mathematical problems. Usage | API reference | Try online

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Version	Context window	Max input	Max output	Input cost	Output cost
		(tokens)			(per 1M tokens)
qwen-math-plus This model currently has the same capabilities as qwen-math-plus-2024-09-19.	Stable	4,096	3,072	3,072	$0.574	$1.721
qwen-math-plus-latest Always the latest snapshot	Latest
qwen-math-plus-2024-09-19 Also known as qwen-math-plus-0919	Snapshot
qwen-math-plus-2024-08-16 Also known as qwen-math-plus-0816
qwen-math-turbo Currently qwen-math-turbo-2024-09-19.	Stable				$0.287	$0.861
qwen-math-turbo-latest Always the latest snapshot	Latest
qwen-math-turbo-2024-09-19 Also known as qwen-math-turbo-0919	Snapshot

Qwen-Coder

Qwen-Coder is a code generation model. The latest Qwen3-Coder-Plus series builds on Qwen3 and delivers advanced coding agent capabilities. It excels at tool calling, environment interaction, and autonomous programming—combining strong coding proficiency with general-purpose intelligence. Usage | API reference | Try online

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.

Model	Version	Context window	Max input	Max output	Input cost	Output cost	Free quota (Note)
		(tokens)			(per 1M tokens)
qwen3-coder-plus Currently qwen3-coder-plus-2025-09-23	Stable	1,000,000	997,952	65,536	Pricing is tiered. See the notes below the table.		1 million tokens each Validity period: 90 days after you activate Alibaba Cloud Model Studio
qwen3-coder-plus-2025-09-23	Snapshot
qwen3-coder-plus-2025-07-22	Snapshot
qwen3-coder-flash Currently qwen3-coder-flash-2025-07-28	Stable
qwen3-coder-flash-2025-07-28	Snapshot

The models above use tiered pricing based on the number of input tokens in the current request.

qwen3-coder-plus series

qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are priced as follows. qwen3-coder-plus supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price, while input text that hits the explicit cache is billed at 10% of the unit price.

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0<Token≤32K	$1	$5
32,000 < Tokens ≤ 128,000	$1.80	$9
128,000 < Tokens ≤ 256,000	$3	$15
256,000 < Tokens ≤ 1,000,000	$6	$60

qwen3-coder-flash series

qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are priced as follows. qwen3-coder-flash supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price, while input text that hits the explicit cache is billed at 10% of the unit price.

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
Up to 32,000	$0.30	$1.50
32,000 < Tokens ≤ 128,000	$0.50	$2.50
128,000 < Tokens ≤ 256,000	$0.80	$4.00
256,000 < Tokens ≤ 1,000,000	$1.6	$9.60

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model	Version	Context window	Max input	Max output	Input cost	Output cost
		(tokens)			(per 1M tokens)
qwen3-coder-plus Currently qwen3-coder-plus-2025-09-23	Stable	1,000,000	997,952	65,536	Pricing is tiered. See the note below the table.
qwen3-coder-plus-2025-09-23	Snapshot
qwen3-coder-plus-2025-07-22	Snapshot
qwen3-coder-flash Currently qwen3-coder-flash-2025-07-28	Stable
qwen3-coder-flash-2025-07-28	Snapshot

The models above use tiered pricing based on the number of input tokens in the current request.

qwen3-coder-plus series

qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are priced as follows. qwen3-coder-plus supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price.

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0<Token≤32K	$1	$5
32,000 < Tokens ≤ 128,000	$1.80	$9
128,000 < Tokens ≤ 256,000	$3	$15
256,000 < Tokens ≤ 1,000,000	$6	$60

qwen3-coder-flash series

qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are priced as follows. qwen3-coder-flash supports context cache. Input text that hits the cache is billed at 20% of the unit price.

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0 < Token ≤ 32K	$0.3	$1.5
32K < Tokens ≤ 128K	$0.5	$2.5
128K < Tokens ≤ 256K	$0.8	$4
256K < Tokens ≤ 1M	$1.6	$9.6

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Version	Context window	Max input	Max output	Input cost	Output cost
		(tokens)			(per 1M tokens)
qwen3-coder-plus Currently qwen3-coder-plus-2025-09-23	Stable	1,000,000	997,952	65,536	Tiered pricing. See details below.
qwen3-coder-plus-2025-09-23	Snapshot
qwen3-coder-plus-2025-07-22	Snapshot
qwen3-coder-flash Currently qwen3-coder-flash-2025-07-28	Stable
qwen3-coder-flash-2025-07-28	Snapshot

The models above use tiered pricing based on the number of input tokens in the current request.

qwen3-coder-plus series

qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are priced as follows. qwen3-coder-plus supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price, while input text that hits the explicit cache is billed at 10% of the unit price.

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0<Token≤32K	$0.574	$2.294
32K < Tokens ≤ 128K	$0.861	$3.441
128K < Tokens ≤ 256K	$1.434	$5.735
256K < Tokens ≤ 1M	$2.868	$28.671

qwen3-coder-flash series

qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are priced as follows. qwen3-coder-flash supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price, while input text that hits the explicit cache is billed at 10% of the unit price.

Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
0 < Token ≤ 32K	$0.144	$0.574
32 K < Tokens ≤ 128 K	$0.216	$0.861
128 K < Tokens ≤ 256 K	$0.359	$1.434
256 K < Tokens ≤ 1 M	$0.717	$3.584

More models

Model	Version	Context window	Max input	Max output	Input cost	Output cost
		(tokens)			(per 1M tokens)
qwen-coder-plus Same as qwen-coder-plus-2024-11-06	Stable	131,072	129,024	8,192	$0.502	$1.004
qwen-coder-plus-latest Same as the latest snapshot of qwen-coder-plus	Latest
qwen-coder-plus-2024-11-06 Also known as qwen-coder-plus-1106	Snapshot
qwen-coder-turbo Same as qwen-coder-turbo-2024-09-19	Stable	131,072	129,024	8,192	$0.287	$0.861
qwen-coder-turbo-latest Same as the latest snapshot of qwen-coder-turbo	Latest
qwen-coder-turbo-2024-09-19 Also known as qwen-coder-turbo-0919	Snapshot

Qwen-MT

Qwen-MT is a flagship Large Language Model (LLM) for translation, fully upgraded from Qwen 3. It supports translation between 92 languages, such as Chinese, English, Japanese, Korean, French, Spanish, German, Thai, Indonesian, Vietnamese, and Arabic. It features comprehensive upgrades in model performance and translation quality. The model offers more stable glossary customization, format retention, and domain-specific prompting, making translations more accurate and natural. Usage

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference computing resources are dynamically scheduled worldwide (excluding Mainland China).

Model	Context window	Max input	Max output	Input cost	Output cost	Free quota Rule description
	(tokens)			(per 1M tokens)
qwen-mt-plus Part of Qwen3-MT	16,384	8,192	8,192	$2.46	$7.37	1 million tokens Valid for 90 days after activating Model Studio
qwen-mt-flash Part of Qwen3-MT				$0.16	$0.49
qwen-mt-lite Part of Qwen3-MT				$0.12	$0.36
qwen-mt-turbo Part of Qwen3-MT				$0.16	$0.49

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference computing resources are dynamically scheduled worldwide.

Model	Context window	Max input	Max output	Input cost	Output cost
	(tokens)			(per 1M tokens)
qwen-mt-plus Part of Qwen3-MT	16,384	8,192	8,192	$2.46	$7.37
qwen-mt-flash Part of Qwen3-MT				$0.16	$0.49
qwen-mt-lite Part of Qwen3-MT				$0.12	$0.36

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference computing resources are limited to Mainland China.

Model	Context window	Max input	Max output	Input cost	Output cost
Model	(tokens)			(per 1M tokens)
qwen-mt-plus Belongs toQwen3-MT	16,384	8,192	8,192	$0.259	$0.775
qwen-mt-flash Belongs toQwen3-MT				$0.101	$0.280
qwen-mt-lite Belongs toQwen3-MT				$0.086	$0.229
qwen-mt-turbo Belongs toQwen3-MT				$0.101	$0.280

Qwen data mining model

The Qwen data mining model extracts structured information from documents for use in data annotation, content moderation, and other applications. Usage | API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Context window	Max input	Max output	Input cost	Output cost	Free quota
Model	(tokens)			(per 1M tokens)		Free quota
qwen-doc-turbo	262,144	253,952	32,768	$0.087	$0.144	No free quota

Qwen deep research model

The Qwen deep research model can break down complex problems, perform reasoning and analysis using web searches, and generate research reports.Usage | API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Context window	Max input	Max output	Input cost	Output cost
Model	(tokens)			(per 1K tokens)
qwen-deep-research	1,000,000	997,952	32,768	$0.007742	$0.023367

Text generation - Qwen - Open source

In the model names, xxb indicates the parameter size. For example, qwen2-72b-instruct indicates a parameter size of 72 billion (72B).
Model Studio supports invoking the open-source versions of Qwen. You do not need to deploy the models locally. For open-source versions, we recommend using the Qwen3 and Qwen2.5 models.

Qwen3

The qwen3-next-80b-a3b-thinking model, released in September 2025, supports only thinking mode. It improves instruction-following capabilities and delivers more concise summary responses than qwen3-235b-a22b-thinking-2507.

The qwen3-next-80b-a3b-instruct model, released in September 2025, supports only non-thinking mode. It enhances Chinese understanding, logical reasoning, and text generation capabilities compared to qwen3-235b-a22b-instruct-2507.

The qwen3-235b-a22b-thinking-2507 and qwen3-30b-a3b-thinking-2507 models, released in July 2025, support only thinking mode and are upgrades of qwen3-235b-a22b (thinking mode) and qwen3-30b-a3b (thinking mode), respectively.

The qwen3-235b-a22b-instruct-2507 and qwen3-30b-a3b-instruct-2507 models, released in July 2025, support only non-thinking mode and are upgrades of qwen3-235b-a22b (non-thinking mode) and qwen3-30b-a3b (non-thinking mode), respectively.

The Qwen3 models, released in April 2025, support both thinking and non-thinking modes. You can switch between modes using the enable_thinking parameter. Additionally, Qwen3 models deliver significant improvements in the following areas:

Reasoning ability: Significantly outperforms QwQ and similarly sized non-reasoning models on evaluations for math, code, and logical reasoning, achieving top-tier industry performance for a model of its size.
Human preference alignment: Features greatly enhanced capabilities for creative writing, role assumption, multi-turn conversation, and instruction following. Its general abilities significantly surpass those of similarly sized models.
Agent capabilities: Achieves industry-leading performance in both thinking and non-thinking modes and enables precise external tool invocation.

Multilingual support: Supports over 100 languages and dialects and provides notable improvements in multilingual translation, instruction understanding, and commonsense reasoning.

Supported languages

English

Simplified Chinese

Traditional Chinese

French

Spanish

Arabic is written in the Arabic script and is the official language of many Arab countries.

Russian is written in the Cyrillic script and is the official language of Russia and several other countries.

Portuguese uses the Latin alphabet and is the official language of Portugal, Brazil, and other Portuguese-speaking countries.

German is written in the Latin alphabet and is an official language in Germany, Austria, and other regions.

Italian uses the Latin alphabet and is an official language in Italy, San Marino, and parts of Switzerland.

Dutch uses the Latin alphabet and is an official language in the Netherlands, the Flemish Region of Belgium, and Suriname.

Danish, which uses the Latin alphabet, is the official language of Denmark.

Irish uses the Latin alphabet and is one of the official languages of Ireland.

Welsh uses the Latin alphabet and is one of the official languages of Wales.

Finnish, which uses the Latin alphabet, is an official language of Finland.

Icelandic, which uses the Latin alphabet, is the official language of Iceland.

Swedish, which uses the Latin alphabet, is the official language of Sweden.

Norwegian Nynorsk, which uses the Latin alphabet, is one of Norway's two official written standards, alongside Bokmål.

Norwegian Bokmål, which uses the Latin alphabet, is a major written standard for Norway.

Japanese is the official language of Japan and uses Japanese characters.

Korean is written in the Hangul script and is the official language of South Korea and North Korea.

Vietnamese, which uses the Latin alphabet, is the official language of Vietnam.

Thai, which uses the Thai alphabet, is the official language of Thailand.

Indonesian, which uses the Latin alphabet, is the official language of Indonesia.

Malay uses the Latin alphabet and is the primary language of Malaysia and surrounding regions.

Burmese, which uses the Burmese alphabet, is the official language of Myanmar.

Tagalog is one of the major languages of the Philippines and uses the Latin alphabet.

Khmer is written in the Khmer script and is the official language of Cambodia.

Lao is written in the Lao script and is the official language of Laos.

Hindi is one of the official languages of India and uses the Devanagari script.

Bengali is written in the Bengali script and is the official language of Bangladesh and the Indian state of West Bengal.

Urdu is written in the Arabic script and is one of the official languages of Pakistan. It is also spoken in India.

Nepali is written in the Devanagari script and is the official language of Nepal.

Hebrew is written in the Hebrew script and is the official language of Israel.

Turkish is written in the Latin alphabet and is the official language of Türkiye and Northern Cyprus.

Persian uses the Arabic script and is the official language in countries such as Iran and Tajikistan.

Polish, which uses the Latin alphabet, is the official language of Poland.

Ukrainian is written in the Cyrillic script and is the official language of Ukraine.

Czech, which uses the Latin alphabet, is the official language of the Czech Republic.

Romanian is written in the Latin alphabet and is the official language of Romania and Moldova.

Bulgarian, which uses the Cyrillic script, is the official language of Bulgaria.

Slovak, which uses the Latin alphabet, is the official language of Slovakia.

Hungarian uses the Latin alphabet and is the official language of Hungary.

Slovenian, which uses the Latin alphabet, is the official language of Slovenia.

Latvian, which uses the Latin alphabet, is the official language of Latvia.

Estonian, which uses the Latin alphabet, is the official language of Estonia.

Lithuanian, which uses the Latin alphabet, is the official language of Lithuania.

Belarusian is written in the Cyrillic script and is one of the official languages of Belarus.

Greek is written in the Greek alphabet and is the official language of Greece and Cyprus.

Croatian, which uses the Latin alphabet, is the official language of Croatia.

Macedonian is the official language of North Macedonia and is written in the Cyrillic script.

Maltese, which uses the Latin alphabet, is an official language in Malta.

Serbian, which uses the Cyrillic script, is the official language of Serbia.

Bosnian is one of the official languages of Bosnia and Herzegovina and uses the Latin alphabet.

Georgian is the official language of Georgia and is written in the Georgian script.

Armenian, which uses the Armenian alphabet, is the official language of Armenia.

Northern Azerbaijani uses the Latin alphabet and is the official language of Azerbaijan.

Kazakh, which uses the Cyrillic script, is the official language of Kazakhstan.

Northern Uzbek is written in the Latin alphabet and is the official language of Uzbekistan.

Tajik, which uses the Cyrillic script, is the official language of Tajikistan.

Swahili uses the Latin alphabet and is a lingua franca or official language in many East African countries.

Afrikaans uses the Latin alphabet and is spoken mainly in South Africa and Namibia.

Cantonese is written in Traditional Chinese characters and is a primary language in China's Guangdong Province, Hong Kong, and Macau.

Luxembourgish uses the Latin alphabet and is one of the official languages of Luxembourg. It is also spoken in parts of Germany.

Limburgish is written in the Latin alphabet and is spoken mainly in parts of the Netherlands, Belgium, and Germany.

Catalan uses the Latin alphabet and is spoken in Catalonia and other parts of Spain.

Galician uses the Latin alphabet and is spoken mainly in the Galicia region of Spain.

Asturian uses the Latin alphabet and is spoken mainly in the Asturias region of Spain.

Basque, which uses the Latin alphabet, is spoken mainly in the Basque Country of Spain and France. It is one of the official languages of the Basque Autonomous Community in Spain.

Occitan uses the Latin alphabet and is spoken mainly in southern France.

Venetian is spoken mainly in the Veneto region of Italy and uses the Latin alphabet.

Sardinian uses the Latin alphabet and is spoken mainly in Sardinia, Italy.

Sicilian is written in the Latin alphabet and is spoken mainly in Sicily, Italy.

Friulian uses the Latin alphabet and is spoken mainly in Friuli-Venezia Giulia, Italy.

Lombard is spoken mainly in the Lombardy region of Italy and uses the Latin alphabet.

Ligurian uses the Latin alphabet and is spoken mainly in the Liguria region of Italy.

Faroese is written in the Latin alphabet and is spoken mainly in the Faroe Islands, where it is one of the official languages.

Tosk Albanian, which uses the Latin alphabet, is the primary southern dialect of Albanian.

Silesian uses the Latin alphabet and is spoken mainly in Poland.

Bashkir uses the Cyrillic script and is spoken mainly in Bashkortostan, Russia.

Tatar uses the Cyrillic script and is spoken mainly in Tatarstan, Russia.

Mesopotamian Arabic is written in the Arabic script and is spoken mainly in Iraq.

Najdi Arabic uses the Arabic script and is spoken mainly in the Najd region of Saudi Arabia.

Egyptian Arabic is written in the Arabic script and is spoken mainly in Egypt.

Levantine Arabic uses the Arabic script and is spoken mainly in Syria and Lebanon.

Ta'izzi-Adeni Arabic, a Semitic language written in the Arabic script, is spoken mainly in Yemen and the Hadhramaut region of Saudi Arabia.

Dari uses the Arabic script and is one of the official languages of Afghanistan.

Tunisian Arabic is written in the Arabic script and is spoken mainly in Tunisia.

Moroccan Arabic is written in the Arabic script and is spoken mainly in Morocco.

Kabuverdianu is spoken mainly in Cape Verde and uses the Latin alphabet.

Tok Pisin is a primary lingua franca in Papua New Guinea and uses the Latin alphabet.

Eastern Yiddish is written in the Hebrew script and is used mainly in Jewish communities.

Sindhi is written in the Arabic script and is one of the official languages of Pakistan's Sindh province.

Sinhala is written in the Sinhala script and is one of the official languages of Sri Lanka.

Telugu is written in the Telugu script and is one of the official languages of the Indian states of Andhra Pradesh and Telangana.

Punjabi is written in the Gurmukhi script, spoken in the Indian state of Punjab, and is one of the official languages of India.

Tamil is written in the Tamil script and is one of the official languages of the Indian state of Tamil Nadu and Sri Lanka.

Gujarati is written in the Gujarati script and is an official language of the Indian state of Gujarat.

Malayalam is written in the Malayalam script and is one of the official languages of the Indian state of Kerala.

Marathi is written in the Devanagari script and is one of the official languages of the Indian state of Maharashtra.

Kannada is written in the Kannada script and is one of the official languages of the Indian state of Karnataka.

Magahi is written in the Devanagari script and is spoken mainly in the Indian state of Bihar.

Oriya uses the Urdu script and is one of the official languages of the Indian state of Odisha.

Awadhi is written in the Devanagari script and is spoken mainly in the Indian state of Uttar Pradesh.

Maithili is written in the Devanagari script. It is one of India's official languages and is spoken in the Indian state of Bihar and the Terai plains of Nepal.

Assamese uses the Bengali script and is one of the official languages of the Indian state of Assam.

Chhattisgarhi is written in the Devanagari script and is spoken mainly in the Indian state of Chhattisgarh.

Bhojpuri uses the Devanagari script and is spoken in parts of India and Nepal.

Minangkabau is written in the Latin alphabet and is spoken mainly on the island of Sumatra in Indonesia.

Balinese is written in the Latin alphabet and is spoken mainly on the island of Bali, Indonesia.

Javanese is widely spoken on the island of Java in Indonesia and is written using both the Latin alphabet and the Javanese script.

Banjar is written in the Latin alphabet and is spoken mainly on the island of Kalimantan in Indonesia.

Sundanese, spoken mainly in western Java, Indonesia, is written in the Latin alphabet but traditionally used the Sundanese script.

Cebuano uses the Latin alphabet and is spoken mainly in the Cebu region of the Philippines.

Pangasinan is written in the Latin alphabet and is spoken mainly in the Pangasinan province of the Philippines.

Iloko is spoken mainly in the Philippines and uses the Latin alphabet.

Waray is a language of the Philippines that uses the Latin alphabet.

Haitian Creole, which uses the Latin alphabet, is an official language of Haiti.

Papiamento uses the Latin alphabet and is spoken mainly in Caribbean regions such as Aruba and Curaçao.

Response formatting: Fixes issues found in previouss, such as incorrect Markdown rendering, response truncation, and incorrectly formatted boxed output.

Qwen3 open-source models released in April 2025 do not support non-streaming output in thinking mode.

If you enable thinking mode for Qwen3 open-source models and no reasoning process appears in the output, billing applies at the non-thinking mode rate.

Thinking | Non-thinking mode | Usage

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost	Free quota (Note)
Model	Mode	(tokens)				(per 1M tokens)		Free quota (Note)
qwen3-next-80b-a3b-thinking	Thinking only	131,072	126,976	81,920	32,768	$0.15	$1.2	1 million tokens each Valid for 90 days after activating Model Studio
qwen3-next-80b-a3b-instruct	Non-thinking		129,024	-		$0.15	$1.2
qwen3-235b-a22b-thinking-2507	Thinking only		126,976	81,920		$0.23	$2.3
qwen3-235b-a22b-instruct-2507	Non-thinking		129,024	-		$0.23	$0.92
qwen3-30b-a3b-thinking-2507	Thinking only		126,976	81,920		$0.2	$2.4
qwen3-30b-a3b-instruct-2507	Non-thinking		129,024	-		$0.2	$0.8
qwen3-235b-a22b This model and the following models were released in April 2025.	Non-thinking		129,024	-	16,384	$0.7	$2.8
	Thinking		98,304	38,912		$0.7	$8.4
qwen3-32b	Non-thinking		129,024	-		$0.16	$0.64
qwen3-32b	Thinking		98,304	38,912		$0.16	$0.64
qwen3-30b-a3b	Non-thinking		129,024	-		$0.2	$0.8
qwen3-30b-a3b	Thinking		98,304	38,912		$0.2	$2.4
qwen3-14b	Non-thinking		129,024	-	8,192	$0.35	$1.4
qwen3-14b	Thinking		98,304	38,912		$0.35	$4.2
qwen3-8b	Non-thinking		129,024	-		$0.18	$0.7
qwen3-8b	Thinking		98,304	38,912		$0.18	$2.1
qwen3-4b	Non-thinking		129,024	-		$0.11	$0.42
qwen3-4b	Thinking		98,304	38,912			$1.26
qwen3-1.7b	Non-thinking	32,768	30,720	-			$0.42
qwen3-1.7b	Thinking		28,672	The sum of the values must not exceed 30,720.			$1.26
qwen3-0.6b	Non-thinking		30,720	-			$0.42
qwen3-0.6b	Thinking		28,672	The sum of the inputs cannot exceed 30,720.			$1.26

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost	Free quota (Note)
Model	Mode	(tokens)				(per 1M Tokens)		Free quota (Note)
qwen3-next-80b-a3b-thinking	Thinking only	131,072	126,976	81,920	32,768	$0.15	$1.2	No free quota
qwen3-next-80b-a3b-instruct	Non-thinking		129,024	-		$0.15	$1.2
qwen3-235b-a22b-thinking-2507	Thinking only		126,976	81,920		$0.23	$2.3
qwen3-235b-a22b-instruct-2507	Non-thinking		129,024	-		$0.23	$0.92
qwen3-30b-a3b-thinking-2507	Thinking only		126,976	81,920		$0.2	$2.4
qwen3-30b-a3b-instruct-2507	Non-thinking		129,024	-		$0.2	$0.8
qwen3-235b-a22b	Non-thinking		129,024	-	16,384	$0.7	$2.8
qwen3-235b-a22b	Thinking		98,304	38,912		$0.7	$8.4
qwen3-32b	Non-thinking		129,024	-		$0.16	$0.64
qwen3-32b	Thinking		98,304	38,912		$0.16	$0.64
qwen3-30b-a3b	Non-thinking		129,024	-		$0.2	$0.8
qwen3-30b-a3b	Thinking		98,304	38,912		$0.2	$2.4
qwen3-14b	Non-thinking		129,024	-	8,192	$0.35	$1.4
qwen3-14b	Thinking		98,304	38,912		$0.35	$4.2
qwen3-8b	Non-thinking		129,024	-		$0.18	$0.7
qwen3-8b	Thinking		98,304	38,912		$0.18	$2.1

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost	Free quota (Note)
Model	Mode	(tokens)				(per 1M tokens)		Free quota (Note)
qwen3-next-80b-a3b-thinking	Thinking only	131,072	126,976	81,920	32,768	$0.144	$1.434	No free quota
qwen3-next-80b-a3b-instruct	Thinking mode is unavailable.		129,024	-		$0.144	$0.574
qwen3-235b-a22b-thinking-2507	Thinking only		126,976	81,920		$0.287	$2.868
qwen3-235b-a22b-instruct-2507	Non-thinking		129,024	-		$0.287	$1.147
qwen3-30b-a3b-thinking-2507	Thinking only		126,976	81,920		$0.108	$1.076
qwen3-30b-a3b-instruct-2507	Non-thinking		129,024	-		$0.108	$0.431
qwen3-235b-a22b	Non-thinking		129,024	-	16,384	$0.287	$1.147
qwen3-235b-a22b	Thinking		98,304	38,912		$0.287	$2.868
qwen3-32b	Non-thinking		129,024	-		$0.287	$1.147
qwen3-32b	Thinking		98,304	38,912		$0.287	$2.868
qwen3-30b-a3b	Non-thinking		129,024	-		$0.108	$0.431
qwen3-30b-a3b	Thinking		98,304	38,912		$0.108	$1.076
qwen3-14b	Non-thinking		129,024	-	8,192	$0.144	$0.574
qwen3-14b	Thinking		98,304	38,912		$0.144	$1.434
qwen3-8b	Non-thinking		129,024	-		$0.072	$0.287
qwen3-8b	Thinking		98,304	38,912		$0.072	$0.717
qwen3-4b	Non-thinking		129,024	-		$0.044	$0.173
qwen3-4b	Thinking		98,304	38,912			$0.431
qwen3-1.7b	Non-thinking	32,768	30,720	-			$0.173
qwen3-1.7b	Thinking		28,672	The sum of the input values must not exceed 30,720.			$0.431
qwen3-0.6b	Non-thinking		30,720	-			$0.173
qwen3-0.6b	Thinking		28,672	The sum of the input must not exceed 30,720.			$0.431

QwQ - Open source

The QwQ reasoning model is trained on Qwen2.5-32B. Reinforcement learning has significantly improved its inference capabilities. Core metrics for math and code (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench) are comparable to the full-power version of DeepSeek-R1. All metrics significantly exceed those of DeepSeek-R1-Distill-Qwen-32B, which is also based on Qwen2.5-32B. Usage | API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Context window	Max input	Max chain-of-thought	Max response	Input cost	Output cost
Model	(tokens)				(per 1M tokens)
qwq-32b	131,072	98,304	32,768	8,192	$0.287	$0.861

QwQ-Preview

The qwq-32b-preview model is an experimental research model developed by the Qwen team in 2024. It focuses on enhancing AI reasoning capabilities, especially in math and programming. For more information about the limitations of the qwq-32b-preview model, see the QwQ official blog. Usage | API reference | Try it online

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Context window	Max input	Max output	Input cost	Output cost
Model	(tokens)			(per 1M tokens)
qwq-32b-preview	32,768	30,720	16,384	$0.287	$0.861

Qwen2.5

Qwen2.5 is a series of Qwen large language models that includes base and instruction-tuned language models with parameter sizes ranging from 7 billion to 72 billion. Qwen2.5 includes the following improvements over Qwen2:

It is pre-trained on our latest large-scale dataset, which contains up to 18 trillion tokens.
Pre-training with specialized expert models has significantly increased the model's knowledge and greatly improved its coding and math capabilities.
It shows significant improvements in following instructions, generating long text (over 8K tokens), understanding structured data (such as tables), and generating structured output (especially JSON). It is also more resilient to diverse system prompts, which enhances the implementation of role-playing and conditional settings for chatbots.
It supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.

Usage | API reference | Try it online

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model	Context window	Max input	Max output	Input cost	Output cost	Free quota
	(tokens)			(per 1M tokens)
qwen2.5-14b-instruct-1m	1,008,192	1,000,000	8,192	$0.805	$3.22	1 million tokens each Valid for 90 days after you activate Model Studio.
qwen2.5-7b-instruct-1m				$0.368	$1.47
qwen2.5-72b-instruct	131,072	129,024		$1.4	$5.6
qwen2.5-32b-instruct				$0.7	$2.8
qwen2.5-14b-instruct				$0.35	$1.4
qwen2.5-7b-instruct				$0.175	$0.7

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Context window	Max input	Max output	Input cost	Output cost
Model	(tokens)			(per 1M tokens)
qwen2.5-14b-instruct-1m	1,000,000	1,000,000	8,192	$0.144	$0.431
qwen2.5-7b-instruct-1m	1,000,000	1,000,000		$0.072	$0.144
qwen2.5-72b-instruct	131,072	129,024		$0.574	$1.721
qwen2.5-32b-instruct				$0.287	$0.861
qwen2.5-14b-instruct				$0.144	$0.431
qwen2.5-7b-instruct				$0.072	$0.144
qwen2.5-3b-instruct	32,768	30,720		$0.044	$0.130
qwen2.5-1.5b-instruct				Free for a limited time
qwen2.5-0.5b-instruct				Free for a limited time

QVQ

The qvq-72b-preview model is an experimental research model developed by the Qwen team. It focuses on enhancing visual reasoning capabilities, especially in mathematical reasoning. For more information about the limitations of the qvq-72b-preview model, see the QVQ official blog.Usage | API reference

To have the model output its thinking process before the final answer, you can use the commercial version of the QVQ model.

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

(tokens)

(per 1M tokens)

qvq-72b-preview

32,768

16,384

Max 16,384 tokens per image

16,384

$1.721

$5.161

Qwen-Omni

This is a new multimodal large model for understanding and generation, trained on Qwen2.5. It supports text, image, speech, and video inputs, and can generate text and speech simultaneously in a stream. Its multimodal content understanding speed is significantly improved.Usage | API reference

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Context window

Max input

Max output

Free quota

(Note)

(tokens)

qwen2.5-omni-7b

32,768

30,720

2,048

1 million tokens (regardless of modality)

Valid for 90 days after activating Model Studio.

After the free quota is used up, the following billing rules apply to inputs and outputs:

Input	Price (per 1M tokens)
Text	$0.10
Audio	$6.76
Image/Video	$0.28

Output

Price (per 1M tokens)

Text

$0.40 (if the input contains only text)

$0.84 (if the input contains images, audio, or video)

Text+Audio

$13.51 (for the audio component)

The text portion of the output is not billed.

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Context window	Max input	Max output
Model	(tokens)
qwen2.5-omni-7b	32,768	30,720	2,048

The billing rules for inputs and outputs are as follows:

Input	Price (per 1M tokens)
Text	$0.087
Audio	$5.448
Image/Video	$0.287

Output

Price (per 1M tokens)

Text

$0.345 (for text-only input)

$0.861 (if the input includes images, audio, or video)

Text+Audio

$10.895 (for the audio portion)

The text portion of the output is not billed.

Qwen3-Omni-Captioner

Qwen3-Omni-Captioner is an open-source model based on Qwen3-Omni. Without any prompts, it automatically generates accurate and comprehensive descriptions for complex audio, such as speech, ambient sounds, music, and sound effects. It can identify speaker emotions, musical elements (such as style and instruments), and sensitive information, making it suitable for applications such as audio content analysis, security audits, intent recognition, and audio editing. Usage | API reference

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-omni-30b-a3b-captioner

65,536

32,768

$3.81

$3.06

1 million tokens

Valid for 90 days after activating Model Studio

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(tokens)

(per 1M tokens)

qwen3-omni-30b-a3b-captioner

65,536

32,768

$2.265

$1.821

No free quota.

Qwen-VL

This is the open-source version of Alibaba Cloud's Qwen-VL. Usage | API reference

Compared to Qwen2.5-VL, Qwen3-VL delivers significant improvements:

Agent interaction: It can operate computer or mobile interfaces, recognize GUI elements, understand their functions, and call tools to perform tasks, achieving top-tier performance in evaluations such as OS World.
Visual coding: It generates code from images or videos and supports creating HTML, CSS, and JavaScript code from design mockups, website screenshots, and similar inputs.
Spatial intelligence: It supports 2D and 3D positioning and accurately judges object orientation, perspective changes, and occlusion relationships.
Long video understanding: It supports understanding video content up to 20 minutes long and provides precise localization down to the second.
Deep thinking: It has deep thinking capabilities and excels at capturing fine details and analyzing cause-and-effect relationships, achieving top-tier performance in evaluations such as MathVista and MMMU.
OCR: Language support is expanded to 33 languages. The model delivers more stable performance in scenarios with complex lighting, blur, or tilted text. It also provides significantly improved accuracy for rare characters, ancient texts, and professional terminology.
Supported languages
The model supports the following 33 languages: Chinese, Japanese, Korean, Indonesian, Vietnamese, Thai, English, French, German, Russian, Portuguese, Spanish, Italian, Swedish, Danish, Czech, Norwegian, Dutch, Finnish, Turkish, Polish, Swahili, Romanian, Serbian, Greek, Kazakh, Uzbek, Cebuano, Arabic, Urdu, Persian, Hindi/Devanagari, and Hebrew.

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.

Model	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost CoT + output	Free quota (Note)
Model	Mode	(tokens)				(per 1M tokens)		Free quota (Note)
qwen3-vl-235b-a22b-thinking	Thinking only		126,976	81,920		$0.4	$4	1 million tokens each Valid for 90 days after activating Model Studio
qwen3-vl-235b-a22b-instruct	Non-thinking		129,024	-		$0.4	$1.6
qwen3-vl-32b-thinking	Thinking only	131,072	126,976	81,920	32,768	$0.16	$0.64
qwen3-vl-32b-instruct	Non-thinking only		129,024	-		$0.16	$0.64
qwen3-vl-30b-a3b-thinking	Thinking only		126,976	81,920		$0.2	$2.4
qwen3-vl-30b-a3b-instruct	Non-thinking		129,024	-		$0.2	$0.8
qwen3-vl-8b-thinking	Thinking		126,976	81,920		$0.18	$2.1
qwen3-vl-8b-instruct	Non-thinking		129,024	-		$0.18	$0.7

More models

Model	Context window	Max input	Max output	Input cost	Output cost	Free quota (Note)
	(tokens)			(per 1M tokens)
qwen2.5-vl-72b-instruct	131,072	129,024 Max per image: 16,384	8,192	$2.8	$8.4	1 million tokens each Valid for 90 days after Model Studio activation
qwen2.5-vl-32b-instruct				$1.4	$4.2
qwen2.5-vl-7b-instruct				$0.35	$1.05
qwen2.5-vl-3b-instruct				$0.21	$0.63

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost CoT + output
Model	Mode	(tokens)				(per 1M tokens)
qwen3-vl-235b-a22b-thinking	Thinking only		126,976	81,920		$0.4	$4
qwen3-vl-235b-a22b-instruct	Non-thinking only		129,024	-		$0.4	$1.6
qwen3-vl-32b-thinking	Thinking only	131,072	126,976	81,920	32,768	$0.16	$0.64
qwen3-vl-32b-instruct	Non-thinking only		129,024	-		$0.16	$0.64
qwen3-vl-30b-a3b-thinking	Thinking only		126,976	81,920		$0.2	$2.4
qwen3-vl-30b-a3b-instruct	Non-thinking only		129,024	-		$0.2	$0.8
qwen3-vl-8b-thinking	Thinking only		126,976	81,920		$0.18	$2.1
qwen3-vl-8b-instruct	Non-thinking only		129,024	-		$0.18	$0.7

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Mode	Context window	Max input	Max CoT	Max output	Input cost	Output cost CoT + output	Free quota (Note)
Model	Mode	(tokens)				(per 1M tokens)		Free quota (Note)
qwen3-vl-235b-a22b-thinking	Thinking only	131,072	126,976	81,920		$0.287	$2.867	No free quota
qwen3-vl-235b-a22b-instruct	Non-thinking only	131,072	129,024	-		$0.287	$1.147
qwen3-vl-32b-thinking	Thinking only	131,072	126,976	81,920	32,768	$0.287	$2.868
qwen3-vl-32b-instruct	Non-thinking only		129,024	-		$0.287	$1.147
qwen3-vl-30b-a3b-thinking	Thinking only		126,976	81,920		$0.108	$1.076
qwen3-vl-30b-a3b-instruct	Non-thinking only		129,024	-		$0.108	$0.431
qwen3-vl-8b-thinking	Thinking only		126,976	81,920		$0.072	$0.717
qwen3-vl-8b-instruct	Non-thinking only		129,024	-		$0.072	$0.287

More models

Model	Context window	Max input	Max output	Input cost	Output cost	Free quota (Note)
Model	(tokens)			(per 1M tokens)		Free quota (Note)
qwen2.5-vl-72b-instruct	131,072	129,024 Max per image: 16,384	8,192	$2.294	$6.881	No free quota
qwen2.5-vl-32b-instruct				$1.147	$3.441
qwen2.5-vl-7b-instruct				$0.287	$0.717
qwen2.5-vl-3b-instruct				$0.173	$0.517
qwen2-vl-72b-instruct	32,768	30,720 Max per image: 16,384	2,048	$2.294	$6.881

Qwen-Math

This is a language model built on the Qwen model that is specialized for solving mathematical problems. Qwen2.5-Math supports Chinese and English and integrates multiple reasoning methods, such as Chain of Thought (CoT), Program of Thought (PoT), and Tool-Integrated Reasoning (TIR). Usage | API reference | Try it online

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Context	Max input	Max output	Input cost	Output cost
	(tokens)			(per 1M tokens)
qwen2.5-math-72b-instruct	4,096	3,072	3,072	$0.574	$1.721
qwen2.5-math-7b-instruct				$0.144	$0.287
qwen2.5-math-1.5b-instruct				Free for a limited time

Qwen-Coder

Qwen-Coder is an open-source code model from the Qwen series. The latest Qwen3-Coder series has powerful coding agent capabilities. It excels at tool calling, environment interaction, and autonomous programming. The model combines excellent coding skills with general-purpose capabilities. Usage | API reference

International

In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.

Model	Context window	Max input	Max output	Input cost	Output cost	Free quota (Note)
	(Number of tokens)
qwen3-coder-480b-a35b-instruct	1 million tokens each Valid for 90 days after activating Model Studio	Tiered pricing. See the note below the table.		65,536	204,800	262,144
qwen3-coder-30b-a3b-instruct

The above models use tiered pricing based on the number of input tokens in the current request.

Model	Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
qwen3-coder-480b-a35b-instruct	0 < tokens ≤ 32K	$1.5	$7.5
	32K < tokens ≤ 128K	$2.7	$13.5
	128K < tokens ≤ 200K	$4.5	$22.5
qwen3-coder-30b-a3b-instruct	0 < tokens ≤ 32K	$0.45	$2.25
	32K < tokens ≤ 128K	$0.75	$3.75
	128K < tokens ≤ 200K	$1.2	$6

Global

In global deployment mode, the endpoint and data storage are both located in the US (Virginia) region. Model inference compute resources are dynamically scheduled worldwide.

Model	Context window	Max input	Max output	Input cost	Output cost
	(tokens)			(per 1M tokens)
qwen3-coder-480b-a35b-instruct	262,144	204,800	65,536	Pricing is tiered. See the note below the table.
qwen3-coder-30b-a3b-instruct

qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct use tiered pricing based on the number of input tokens in the current request.

Model	Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
qwen3-coder-480b-a35b-instruct	0 < Tokens ≤ 32K	$1.50	$7.50
	32K < Tokens ≤ 128K	$2.70	$13.50
	128K < Tokens ≤ 200K	$4.50	$22.50
qwen3-coder-30b-a3b-instruct	0 < Tokens ≤ 32K	$0.45	$2.25
	32K < Tokens ≤ 128K	$0.75	$3.75
	128K < Tokens ≤ 200K	$1.2	$6

Mainland China

In Mainland China deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Context window	Max input	Max output	Input cost	Output cost
	(tokens)			(per 1M tokens)
qwen3-coder-480b-a35b-instruct	Tiered pricing. See the description below the table.		65,536	204,800	262,144
qwen3-coder-30b-a3b-instruct

The above models use tiered pricing based on the number of input tokens in the current request.

Model	Input tokens per request	Cost per 1M input tokens	Cost per 1M output tokens
qwen3-coder-480b-a35b-instruct	0 < Tokens ≤ 32K	$0.861	$3.441
	32K < Tokens ≤ 128K	$1.291	$5.161
	128K < Tokens ≤ 200K	$2.151	$8.602
qwen3-coder-30b-a3b-instruct	0 < Tokens ≤ 32K	$0.216	$0.861
	32K < Tokens ≤ 128K	$0.323	$1.291
	128K < Tokens ≤ 200K	$0.538	$2.151

More models

Model	Context window	Max input	Max output	Input cost	Output cost
	(tokens)			(per 1M tokens)
qwen2.5-coder-32b-instruct	131,072	129,024	8,192	$0.287	$0.861
qwen2.5-coder-14b-instruct
qwen2.5-coder-7b-instruct				$0.144	$0.287
qwen2.5-coder-3b-instruct	32,768	30,720		Limited-time free trial
qwen2.5-coder-1.5b-instruct
qwen2.5-coder-0.5b-instruct

Text generation - Third-party

DeepSeek

DeepSeek is a large language model from DeepSeek AI. API reference | Try it online

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Context window	Max input	Max chain-of-thought	Max response	Input cost	Output cost
	(tokens)				(per 1M tokens)
deepseek-v3.2 685B full-power version Context cache discounts	131,072	98,304	32,768	65,536	$0.287	$0.431
deepseek-v3.2-exp 685B full-power version
deepseek-v3.1 685B full-power version					$0.574	$1.721
deepseek-r1 685B full-power version Batch half price				16,384		$2.294
deepseek-r1-0528 685B full-power version
deepseek-v3 671B full-power version Batch half price		131,072	N/A		$0.287	$1.147
deepseek-r1-distill-qwen-1.5b Based on Qwen2.5-Math-1.5B	32,768	32,768	16,384	16,384	Free trial for a limited time
deepseek-r1-distill-qwen-7b Based on Qwen2.5-Math-7B					$0.072	$0.144
deepseek-r1-distill-qwen-14b Based on Qwen2.5-14B					$0.144	$0.431
deepseek-r1-distill-qwen-32b Based on Qwen2.5-32B					$0.287	$0.861
deepseek-r1-distill-llama-8b Based on Llama-3.1-8B					Free trial for a limited time
deepseek-r1-distill-llama-70b Based on Llama-3.3-70B

Kimi

Kimi-K2 is a large language model launched by Moonshot AI. It has excellent coding and tool-calling capabilities. Usage | Try it online

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Mode	Context window	Max input	Max CoT	Max response	Input cost	Output cost
Model	Mode	(tokens)				(per 1M tokens)
kimi-k2.5	Thinking mode	262,144	258,048	32,768	32,768	$0.574	$3.011
kimi-k2.5	Non-thinking mode	262,144	260,096	-	32,768	$0.574	$3.011
kimi-k2-thinking	Thinking mode	262,144	229,376	32,768	16,384	$0.574	$2.294
Moonshot-Kimi-K2-Instruct	Non-thinking mode	131,072	131,072	-	8,192	$0.574	$2.294

GLM

The GLM series models are hybrid reasoning models from Zhipu AI that are designed for agents and support two modes: thinking and non-thinking. GLM

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Context window	Max input	Max chain-of-thought	Max response	Input cost	Output cost
	(tokens)				(per 1M tokens)
glm-4.7	202,752	169,984	32,768	16,384	Tiered pricing, see the table below.
glm-4.6

The above models use tiered pricing based on input tokens per request.

Model	Input tokens per request	Input cost (per 1M tokens)	Output cost (per 1M tokens)
glm-4.7	0<Token<=32K	$0.431	$2.007
glm-4.7	32K<Token<=166K	$0.574	$2.294
glm-4.6	0<Token<=32K	$0.431	$2.007
glm-4.6	32K<Token<=166K	$0.574	$2.294

The models are not integrated third-party services, but deployed on Model Studio servers.

GLM models have the same prices under both thinking and non-thinking modes.

Image generation

Qwen-Image

The Qwen text-to-image model excels at rendering complex text, especially in Chinese and English. API reference

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model	Unit price	Free quota
Model	Unit price	Free quota
qwen-image-max Currently has the same capabilities as qwen-image-max-2025-12-30	$0.075/image	Free quota: 100 images for each model Valid for 90 days after activating Model Studio
qwen-image-max-2025-12-30	$0.075/image
qwen-image-plus Currently has the same capabilities as qwen-image	$0.03/image
qwen-image-plus-2026-01-09	$0.03/image
qwen-image	$0.035/image

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Unit price	Free quota
Model	Unit price	Free quota
qwen-image-max Currently has the same capabilities as qwen-image-max-2025-12-30	$0.071677/image	No free quota
qwen-image-max-2025-12-30	$0.071677/image
qwen-image-plus Currently has the same capabilities as qwen-image	$0.028671/image
qwen-image-plus-2026-01-09	$0.028671/image
qwen-image	$0.035/image

Input prompt

Output image

Healing-style hand-drawn poster featuring three puppies playing with a ball on lush green grass, adorned with decorative elements such as birds and stars. The main title “Come Play Ball!” is prominently displayed at the top in bold, blue cartoon font. Below it, the subtitle “Come [Show Off Your Skills]!” appears in green font. A speech bubble adds playful charm with the text: “Hehe, watch me amaze my little friends next!” At the bottom, supplementary text reads: “We get to play ball with our friends again!” The color palette centers on fresh greens and blues, accented with bright pink and yellow tones to highlight a cheerful, childlike atmosphere.

Qwen-Image-Edit

The Qwen image editing model supports precise text editing in Chinese and English. It also supports operations such as color adjustment, detail enhancement, style transfer, adding or removing objects, and changing positions and actions. These features enable complex editing of images and text. API reference

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model	Unit price	Free quota
Model	Unit price	Free quota
qwen-image-edit-max Currently has the same capabilities as qwen-image-edit-max-2026-01-16	$0.075/image	Free quota: 100 images for each model Valid for 90 days after activating Model Studio
qwen-image-edit-max-2026-01-16	$0.075/image
qwen-image-edit-plus Currently has the same capabilities as qwen-image-edit-plus-2025-10-30	$0.03/image
qwen-image-edit-plus-2025-12-15	$0.03/image
qwen-image-edit-plus-2025-10-30	$0.03/image
qwen-image-edit	$0.045/image

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Unit price	Free quota
Model	Unit price	Free quota
qwen-image-edit-max Currently has the same capabilities as qwen-image-edit-max-2026-01-16	$0.071677/image	No free quota
qwen-image-edit-max-2026-01-16	$0.071677/image
qwen-image-edit-plus Currently has the same capabilities as qwen-image-edit-plus-2025-10-30	$0.028671/image
qwen-image-edit-plus-2025-12-15	$0.028671/image
qwen-image-edit-plus-2025-10-30	$0.028671/image
qwen-image-edit	$0.043/image

dog_and_girl (1)

Original image

狗修改图

Make the person bend over and hold the dog's front paw.

Original image

Change the text on the letter blocks from 'HEALTH INSURANCE' to 'Tomorrow will be better'.

Original image

5out

Change the dotted shirt to a light blue shirt.

Original image

6out

Change the background to Antarctica.

Original image

7out

Create a cartoon-style profile picture of the person.

Original image

Remove the hair from the dinner plate.

Qwen-MT-Image

The Qwen image translation model supports translating text from images in 11 languages into Chinese or English. It accurately preserves the original layout and content information and provides custom features such as term definition, sensitive word filtering, and image entity detection. API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Unit price	Free quota

qwen-mt-image	$0.000431/image	No free quota

Original image

Japanese

Portuguese

Arabic

Tongyi - text-to-Image - Z-Image

Tongyi - text-to-image - Z-Image is a lightweight model that quickly generates high-quality images. The model supports Chinese and English text rendering, complex semantic understanding, various styles, and multiple resolutions and aspect ratios. API reference

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Unit price

Free quota (Note)

_{Valid for 90 days after activating Model Studio}

z-image-turbo

Prompt extension disabled (prompt_extend=false): $0.015/image

Prompt extension enabled (prompt_extend=true): $0.03/image

100 images

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Unit price

Free quota

z-image-turbo

Prompt extension disabled (prompt_extend=false): $0.01434/image

Prompt extension enabled (prompt_extend=true): $0.02868/image

No free quota

Input prompt

Output image

Photo of a stylish young woman with short black hair standing confidently in front of a vibrant cartoon-style mural wall. She wears an all-black outfit: a puffed bomber jacket with a ruffled collar, cargo shorts, fishnet tights, and chunky black Doc Martens, with a gold chain dangling from her waist. The background features four colorful comic-style panels: one reads “GRAND STAGE” and includes sneakers and a Gatorade bottle; another displays green Nike sneakers and a slice of pizza; the third reads “HARAJUKU st” with floating shoes; and the fourth shows a blue mouse riding a skateboard with the text “Takeshita WELCOME.” Dominant bright colors include yellow, teal, orange, pink, and green. Speech bubbles, halftone patterns, and playful characters enhance the urban street-art aesthetic. Daylight evenly illuminates the scene, and the ground beneath her feet is white tiled pavement. Full-body portrait, centered composition, slightly tilted stance, direct eye contact with the camera. High detail, sharp focus, dynamic framing.

b16c8008-83c1-4c80-ae22-786a2299bec3-1-转换自-png

Wan text-to-image

The Wan text-to-image model generates high-quality images from text. API reference | Try it online

Global

In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.

Model

Description

Unit price

Free quota (Note)

_{Valid for 90 days after activating Model Studio}

wan2.6-t2i Recommended

Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.

$0.03/image

No free quota

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model	Description	Unit price	Free quota (Note) _{Valid for 90 days after activating Model Studio}
wan2.6-t2i `Recommended`	Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.	$0.03/image	50 images
wan2.5-t2i-preview `Recommended`	Wan 2.5 preview. Removes single-side length limits and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.	$0.03/image	50 images
wan2.2-t2i-plus	Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture.	$0.05/image	100 images
wan2.2-t2i-flash	Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture.	$0.025/image	100 images
wan2.1-t2i-plus	Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details.	$0.05/image	200 images
wan2.1-t2i-turbo	Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed.	$0.025/image	200 images

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Description	Unit price	Free quota (Note) _{Valid for 90 days after activating Model Studio}
wan2.6-t2i `Recommended`	Wan 2.6. Supports new synchronous interfaces and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.	$0.028671/image	No free quota
wan2.5-t2i-preview `Recommended`	Wan 2.5 preview. Removes single-side length limits and lets you freely select dimensions within the constraints of total pixel area and aspect ratio.	$0.028671/image	No free quota
wan2.2-t2i-plus	Wan 2.2 Professional Edition. Fully upgraded in creativity, stability, and realistic texture.	$0.02007/image	No free quota
wan2.2-t2i-flash	Wan 2.2 Flash Edition. Fully upgraded in creativity, stability, and realistic texture.	$0.028671/image	No free quota
wanx2.1-t2i-plus	Wan 2.1 Professional Edition. Supports multiple styles and generates images with rich details.	$0.028671/image	No free quota
wanx2.1-t2i-turbo	Wan 2.1 Turbo Edition. Supports multiple styles and offers fast generation speed.	$0.020070/image	No free quota
wanx2.0-t2i-turbo	Wan 2.0 Turbo Edition. Excels at textured portraits and creative designs. It is cost-effective.	$0.005735/image	No free quota

Input prompt	Output image
A needle-felted Santa Claus holding a gift and a white cat standing next to him against a background of colorful gifts and green plants, creating a cute, warm, and cozy scene.

Wan2.6 image generation and editing

The Wan2.6 image generation model supports image editing and can generate outputs that contain both text and images to meet various generation and integration requirements. API reference.

Global

In Global deployment mode, the access point and data storage are in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.

Model	Unit price	Free quota
wan2.6-image	$0.03/image	No free quota

International

In International deployment mode, the access point and data storage are in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Unit price

Free quota (Note)

_{Valid for 90 days after activating Model Studio}

wan2.6-image

$0.03/image

50 images

Mainland China

In Mainland China deployment mode, the access point and data storage are in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Unit price	Free quota
wan2.6-image	$0.028671/image	No free quota

Wan general image editing 2.5

The Wan2.5 general image editing model supports entity-consistent image editing and multi-image fusion. It accepts text, a single image, or multiple images as input. API reference.

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Unit price

Free quota (Note)

_{Valid for 90 days after activating Model Studio}

wan2.5-i2i-preview

$0.03/image

50 units

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Unit price	Free quota
wan2.5-i2i-preview	$0.028671/image	No free quota

Feature

Input example

Output image

Single-image editing

damotest2023_Portrait_photography_outdoors_fashionable_beauty_409ae3c1-19e8-4515-8e50-b3c9072e1282_2-转换自-png

a26b226d-f044-4e95-a41c-d1c0d301c30b-转换自-png

Change the floral dress to a vintage-style lace long dress with exquisite embroidery details on the collar and cuffs.

Multi-image fusion

p1028883

Place the alarm clock from Image 1 next to the vase on the dining table in Image 2.

Wan general image editing 2.1

The Wan2.1 general image editing model performs diverse image editing with simple instructions. It is suitable for scenarios such as outpainting, watermark removal, style transfer, image restoration, and image enhancement. Usage | API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Unit Price	Free Quota
wanx2.1-imageedit	$0.020070 per image	No free quota

The general image editing model currently supports the following features:

Model Features	Input image	Input prompt	Output image
Global stylization		French picture book style.
Local stylization		Change the house to a wooden plank style.
Instruction-based editing		Change the girl's hair to red.
Inpainting	Input image Masked image (The white area is the mask)	A ceramic rabbit holding a ceramic flower.	Output image
Text watermark removal		Remove the text from the image.
Outpainting		A green fairy.
Image super-resolution	Blurry image	Image super-resolution.	Clear image
Image colorization		Blue background, yellow leaves.
Line art to image		A living room in a minimalist Nordic style.
Placeholder Image		A cartoon character cautiously peeks out, spying on a brilliant blue gem inside the room.

OutfitAnyone

Compared to the basic version, the OutfitAnyone-Plus model offers improvements in image definition, clothing texture details, and logo restoration. However, it takes longer to generate images and is suitable for scenarios that are not time-sensitive. API reference | Try it online
OutfitAnyone-Image Parsing supports parsing model and clothing images, which can be used for pre-processing and post-processing of OutfitAnyone images. API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Description	Sample input	Sample output
aitryon-plus	OutfitAnyone-Plus
aitryon-parsing-v1	OutfitAnyone image parsing

OutfitAnyone pricing

Service	Model	Unit price	Discount	Tier
OutfitAnyone - Plus	aitryon-plus	$0.071677/image	None	None
OutfitAnyone - Image parsing	aitryon-parsing-v1	$0.000574/image	None	None

Video generation - Wan

Text-to-video

The Wan text-to-video model generates videos from a single sentence. The videos feature rich artistic styles and cinematic quality. API reference | Try it online

Global

In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.

Model

Description

Unit price

Free quota

wan2.6-t2v Recommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

720P: $0.1/second

1080P: $0.15/second

No free quota

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model	Description	Unit price	Free quota (Claim) _{Valid for 90 days after activating Model Studio}
wan2.6-t2v `Recommended`	Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.	720P: $0.10/second 1080P: $0.15/second	50 seconds
wan2.5-t2v-preview `Recommended`	Wan 2.5 preview. Supports automatic voiceover and custom audio file input.	480P: $0.05/second 720P: $0.10/second 1080P: $0.15/second	50 seconds
wan2.2-t2v-plus	Wan 2.2 Professional Edition. Significantly improved image detail and motion stability.	480P: $0.02/second 1080P: $0.10/second	50 seconds
wan2.1-t2v-turbo	Wan 2.1 Turbo Edition. Fast generation speed and balanced performance.	$0.036/second	200 seconds
wan2.1-t2v-plus	Wan 2.1 Professional Edition. Generates rich details and higher-quality visuals.	$0.10/second	200 seconds

US

In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.

Model

Description

Unit price

Free quota

wan2.6-t2v-us Recommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

720P: $0.1/second

1080P: $0.15/second

No free quota

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Description	Unit price	Free quota
wan2.6-t2v`Recommended`	Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.	720P: $0.086012/second 1080p: 0.143353 per second	No free quota
wan2.5-t2v-preview`Recommended`	Wan 2.5 preview. Supports automatic voiceover and custom audio file input.	480P: $0.043006/second 720P: $0.086012/second 1080P: $0.143353/second	No free quota
wan2.2-t2v-plus	Wan 2.2 Professional Edition. Significantly improved image detail and motion stability.	480P: $0.02007/second 1080P: $0.100347/second	No free quota
wanx2.1-t2v-turbo	Faster generation speed and balanced performance.	$0.034405/second	No free quota
wanx2.1-t2v-plus	Generates richer details and higher-quality visuals.	$0.100347/second	No free quota

Input prompt

Output video (wan2.6, multi-shot video)

Shot from a low angle, in a medium close-up, with warm tones, mixed lighting (the practical light from the desk lamp blends with the overcast light from the window), side lighting, and a central composition. In a classic detective office, wooden bookshelves are filled with old case files and ashtrays. A green desk lamp illuminates a case file spread out in the center of the desk. A fox, wearing a dark brown trench coat and a light gray fedora, sits in a leather chair, its fur crimson, its tail resting lightly on the edge, its fingers slowly turning yellowed pages. Outside, a steady drizzle falls beneath a blue sky, streaking the glass with meandering streaks. It slowly raises its head, its ears twitching slightly, its amber eyes gazing directly at the camera, its mouth clearly moving as it speaks in a smooth, cynical voice: 'The case was cold, colder than a fish in winter. But every chicken has its secrets, and I, for one, intended to find them '.

Image-to-video - first frame

The Wan image-to-video model uses an input image as the first frame of a video. It then generates the rest of the video based on a prompt. The videos feature rich artistic styles and cinematic quality. API reference | Try it online

Global

In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.

Model

Description

Unit price

Free quota

wan2.6-i2v Recommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

720P: $0.1/second

1080P: $0.15/second

No free quota

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model	Description	Unit price	Free quota (Note) _{Valid for 90 days after activating Model Studio}
wan2.6-i2v-flash `Recommended`	Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.	Output video with audio `audio=true`: 720P: $0.05/second 1080P: $0.075/second Output video without audio `audio=false`: 720P: $0.025/second 1080P: $0.0375/second	50 seconds
wan2.6-i2v `Recommended`	Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.	720P: $0.10/second 1080P: $0.15/second	50 seconds
wan2.5-i2v-preview `Recommended`	Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads.	480P: $0.05/second 720P: $0.10/second 1080P: $0.15/second	50 seconds
wan2.2-i2v-flash	Wan 2.2 Flash Edition. Extremely fast generation speed with significant improvements in visual detail and motion stability.	480P: $0.015/second 720P: $0.036/second	50 seconds
wan2.2-i2v-plus	Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability.	480P: $0.02/second 1080P: $0.10/second	50 seconds
wan2.1-i2v-turbo	Wan 2.1 Turbo Edition. Fast generation speed with balanced performance.	$0.036/second	200 seconds
wan2.1-i2v-plus	Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals.	$0.10/second	200 seconds

US

In US deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are limited to the US.

Model

Description

Unit price

Free quota

wan2.6-i2v-us Recommended

Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.

720P: $0.1/second

1080P: $0.15/second

No free quota

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Description	Unit price	Free quota
wan2.6-i2v-flash `Recommended`	Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.	Output video with audio `audio=true`: 720P: $0.043006/second 1080P: $0.071676/second Output video without audio `audio=false`: 720P: $0.021503/second 1080P: $0.035838/second	No free quota
wan2.6-i2v `Recommended`	Wan 2.6. Introduces multi-shot narrative capability and supports automatic voiceover and custom audio file input.	720P: $0.086012/second 1080P: $0.143353/second	No free quota
wan2.5-i2v-preview	Wan 2.5 preview. Supports automatic dubbing and custom audio file uploads.	480P: $0.043006/second 720P: $0.086012/second 1080P: $0.143353/second	No free quota
wan2.2-i2v-plus	Wan 2.2 Professional Edition. Delivers significant improvements in visual detail and motion stability.	480P: $0.02007/second 1080P: $0.100347/second	No free quota
wanx2.1-i2v-turbo	Wan 2.1 Turbo Edition. Fast generation speed with balanced performance.	$0.034405/second	No free quota
wanx2.1-i2v-plus	Wan 2.1 Professional Edition. Generates rich details and produces higher-quality, more textured visuals.	$0.100347/second	No free quota

Input first frame image and audio

Output video (wan2.6, multi-shot video)

rap-转换自-png

Input audio:

Input prompt: A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life from a concrete wall. He raps an English song at high speed while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.

Image-to-video - first and last frames

The Wan first-and-last-frame video model generates a smooth, dynamic video from a prompt. You only need to provide the first and last frame images. The videos feature rich artistic styles and cinematic quality. API reference | Try it online

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Unit price

Free quota (Note)

_{Valid for 90 days after activating Model Studio}

wan2.2-kf2v-flash

480P: $0.015/second

720P: $0.036/second

1080P: $0.07/second

50 seconds

wan2.1-kf2v-plus

$0.10/second

200 seconds

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Unit price

Free quota (Note)

wan2.2-kf2v-flash

480P: $0.014335/second

720P: $0.028671/second

1080P: $0.068809/second

No free quota

wanx2.1-kf2v-plus

$0.100347/second

No free quota

Example input			Output video
First frame	Last frame	Prompt	Output video
		In a realistic style, the camera starts at eye level on a small black cat looking up at the sky, then gradually moves upward to a top-down shot that focuses on the cat's curious eyes.

Reference-to-video

The Wan reference-to-video model uses a character's appearance and voice from an input video and a prompt to generate a new video that maintains character consistency. API reference

Billing rule: Both input and output videos are billed by the second. Failed jobs are not billed and do not consume the free quota.

The billable duration of the input video does not exceed 5 seconds. For more information, see Wan - reference-to-video.
The billable duration of the output video is the duration in seconds of the successfully generated video.

Global

In Global deployment mode, the access point and data storage are located in the US (Virginia) region, and inference computing resources are dynamically scheduled globally.

Model

Output video type

Input & output price

Free quota (Note)

wan2.6-r2v

Video with audio

720P: $0.1/second

1080P: $0.15/second

No free quota

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model	Output video type	Input & output price	Free quota (Note)
wan2.6-r2v-flash `Recommended`	Video with audio `audio=true`	720P: $0.05/second 1080P: $0.075/second	50 seconds Valid for 90 days after activating Model Studio
wan2.6-r2v-flash `Recommended`	Video without audio `audio=false`	720P: $0.025/second 1080P: $0.0375/second	50 seconds Valid for 90 days after activating Model Studio
wan2.6-r2v	Video with audio	720P: $0.10/second 1080P: $0.15/second	50 seconds Valid for 90 days after activating Model Studio

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Output video type	Input & output price	Free quota (Note)
wan2.6-r2v-flash `Recommended`	Video with audio `audio=true`	720P: $0.043006/second 1080P: $0.071676/second	No free quota
wan2.6-r2v-flash `Recommended`	Video without audio `audio=false`	720P: $0.021503/second 1080P: $0.035838/second	No free quota
wan2.6-r2v	Video with audio	720P: $0.086012/second 1080P: $0.143353/second	No free quota

General video editing

The Wan general video editing model supports multimodal inputs, including text, images, and videos. It can perform video generation and general editing tasks. API reference | Try it online

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model

Unit price

Free quota (Note)

wan2.1-vace-plus

$0.1/second

50 seconds

Valid for 90 days after activating Model Studio

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Unit price	Free quota (Note)
wanx2.1-vace-plus	$0.100347/second	No free quota

The general video editing model supports the following features:

Feature	Input reference image	Input prompt	Output video
Multi-image reference	Reference image 1 (for entity) Reference image 2 (for background)	In the video, a girl gracefully walks out from the depths of an ancient, misty forest. Her steps are light, and the camera captures her every nimble movement. When the girl stops and looks around at the lush woods, she breaks into a smile of surprise and joy. This moment is captured in the interplay of light and shadow, recording the wonderful encounter between the girl and nature.	Output video
Video restyling		The video shows a black steampunk-style car driven by a gentleman, adorned with gears and copper pipes. The background is a steam-powered candy factory with retro elements, creating a vintage and playful scene.
Local editing	Input video Input mask image (The white area indicates the editing region)	The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, drinking with a look of contentment. The cafe is tastefully decorated, with soft tones and warm lighting illuminating the area where the lion is.	The content in the editing region is modified based on the prompt
Video extension	Input initial video segment (1 second)	A dog wearing sunglasses skateboards on a street, 3D cartoon.	Output extended video (5 seconds)
Video outpainting		An elegant lady is passionately playing the violin, with a full symphony orchestra behind her.

Wan - digital human

This feature generates natural-looking videos of people speaking, singing, or performing, based on a single character image and an audio file. To use this feature, you can call the following models in sequence. wan2.2-s2v image detection | wan2.2-s2v video generation

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Description	Unit price

wan2.2-s2v-detect	Checks if an input image meets requirements, such as sufficient definition, a single person, and a frontal view.	$0.000574/image
wan2.2-s2v	Generates a dynamic video of a person from a valid image and an audio clip.	480p: $0.071677/second 720p: $0.129018/second

Sample input

Output video

p1001125-转换自-jpeg

Input audio:

Wan - animate image

Available in standard and professional modes. The model transfers the actions and expressions from a reference video to a character image, generating a video that animates the character from the image. API reference.

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model	Service	Description	Unit price	Free quota (View)
wan2.2-animate-move	Standard mode `wan-std`	A cost-effective service with fast generation speeds. Suitable for basic needs, such as simple animation demos.	$0.12/second	The total time for both patterns is 50 seconds.
wan2.2-animate-move	Professional mode `wan-pro`	Delivers high animation smoothness and natural transitions for actions and expressions. The output resembles a live-action video.	$0.18/second	The total time for both patterns is 50 seconds.

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Service	Description	Unit price	Free quota (View)
wan2.2-animate-move	Standard mode `wan-std`	Fast generation. Ideal for basic needs, such as simple animation demos. Cost-effective.	$0.06/second	No free quota
wan2.2-animate-move	Professional mode `wan-pro`	Provides high-quality, smooth animation with natural transitions for actions and expressions. The output is similar to a live-action video.	$0.09/second	No free quota

Character image	Reference video	Standard video	Output Video (Professional Mode)

Wan - video character swap

Available in standard and professional modes. The model replaces the main character in a video with a character from an image. It preserves the original video's scene, lighting, and hue. API reference.

International

In International deployment mode, the access point and data storage are located in the Singapore region, and inference computing resources are dynamically scheduled globally (excluding Mainland China).

Model	Service	Description	Unit price	Free quota (View)
wan2.2-animate-mix	Standard mode `wan-std`	Generates animations quickly. Ideal for basic requirements, such as simple demos. Highly cost-effective.	$0.18/s	The combined duration of both services is 50 seconds.
wan2.2-animate-mix	Professional mode `wan-pro`	Produces highly smooth animations with natural transitions for actions and expressions. The result closely resembles a live-action video.	$0.26/s	The combined duration of both services is 50 seconds.

Mainland China

In Mainland China deployment mode, the access point and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Service	Description	Unit price	Free quota (View)
wan2.2-animate-mix	Standard mode `wan-std`	Generates animations quickly. Ideal for basic requirements, such as simple demos. Highly cost-effective.	$0.09/s	No free quota
wan2.2-animate-mix	Professional mode `wan-pro`	Produces highly smooth animations with natural transitions for actions and expressions. The result closely resembles a live-action video.	$0.13/s	No free quota

Character image	Reference video	Standard output video	Professional output video

AnimateAnyone

This feature generates character motion videos based on a character image and a motion template. To use this feature, you can call the following three models in sequence. AnimateAnyone image detection API details | AnimateAnyone motion template generation | AnimateAnyone video generation API details

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Description	Unit price
animate-anyone-detect-gen2	Detects whether an input image meets the requirements.	$0.000574/image
animate-anyone-template-gen2	Extracts character motion from a video and generates a motion template.	$0.011469/second
animate-anyone-gen2	Generates a character action video from a character image and an action template.	$0.011469/second

Input: Character image	Input: Motion video	Outputs (generated from the image background)	Outputs Generated by Video Background

Note

The preceding example was generated by the Tongyi App, which integrates AnimateAnyone.
The content generated by the AnimateAnyone model is video only and does not include audio.

EMO

This feature generates dynamic portrait videos based on a portrait image and a human voice audio file. To use this feature, you can call the following models in sequence. EMO image detection | EMO video generation

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Description	Unit price
emo-detect-v1	Detects whether an input image meets the required specifications. This model can be called directly without deployment.	$0.000574/image
emo-v1	Generates a dynamic portrait video. This model can be called directly without deployment.	1:1 aspect ratio video: $0.011469/second 3:4 aspect ratio video: $0.022937/second

Input: Portrait image and human voice audio file

Output: Dynamic portrait video

Portrait:

上春山

Human voice audio: See the video on the right.

Character video:

Style level: active ("style_level": "active")

LivePortrait

This model quickly and efficiently generates dynamic portrait videos based on a portrait image and a human voice audio file. Compared to the EMO model, it generates videos faster and at a lower cost, but the quality is not as good. To use this feature, you can call the following two models in sequence. LivePortrait image detection | LivePortrait video generation

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Description	Unit price
liveportrait-detect	Detects whether an input image meets the requirements.	$0.000574/image
liveportrait	Generates a dynamic portrait video.	$0.002868/second

Input: Portrait image and voice audio

Output: Animated portrait video

Portrait image:

Emoji男孩

Voice audio: Sourced from the video on the right.

Portrait video:

Emoji

This feature generates dynamic face videos based on a face image and preset facial motion templates. This capability can be used for scenarios such as creating emojis and generating video materials. To use this feature, you can call the following models in sequence. Emoji image detection | Emoji video generation

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Description	Unit price
emoji-detect-v1	Detects whether an input image meets specified requirements.	$0.000574/image
emoji-v1	Generates custom emojis based on a portrait image and a specified emoji template.	$0.011469/second

Input: Portrait image	Output: Dynamic portrait video
	Parameter for the "Happy" emoji template: ("input.driven_id": "mengwa_kaixin")

VideoRetalk

This feature generates a video where the character's lip movements match the input audio, based on a character video and a human voice audio file. To use this feature, you can call the following model. API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Description	Unit price
videoretalk	Synchronizes a character's lip movements with input audio to generate a new video.	$0.011469/second

Video style transform

This model generates videos in different styles that match the semantic description of user-input text, or restyles a user-input video. API reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Description	Unit price

video-style-transform	Transforms an input video into styles such as Japanese comic and American comic.	720P	$0.071677/second
		540P	$0.028671/second

Input video	Output video (Manga style)

Speech synthesis (text-to-speech)

Qwen speech synthesis

Supports mixed-language text input and streaming audio output. Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Qwen3-TTS-Instruct-Flash

Model

Version

Unit price

Max input characters

Free quota (Note)

qwen3-tts-instruct-flash

Currently, qwen3-tts-instruct-flash-2026-01-26.

Stable

$0.115/10,000 characters

600

10,000 characters

Valid for 90 days after activating Model Studio

qwen3-tts-instruct-flash-2026-01-26

Snapshot

Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VD

Model

Version

Unit price

Max input characters

Free quota (Note)

qwen3-tts-vd-2026-01-26

Snapshot

$0.115 per 10,000 characters

600

10,000 characters

Valid for 90 days after activating Model Studio

Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VC

Model

Version

Unit price

Max input characters

Free quota (Note)

qwen3-tts-vc-2026-01-22

Snapshot

$0.115/10,000 characters

600

10,000 characters

Valid for 90 days after activating Model Studio.

Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-Flash

Model	Version	Unit price	Max input characters	Free quota (Note)
qwen3-tts-flash Currently, qwen3-tts-flash-2025-11-27.	Stable	$0.10 per 10,000 characters	600	10,000 characters Valid for 90 days after activating Model Studio
qwen3-tts-flash-2025-11-27	Snapshot
qwen3-tts-flash-2025-09-18	Snapshot			If you activate Alibaba Cloud Model Studio before 00:00 on November 13, 2025: 2,000 characters If you activate Alibaba Cloud Model Studio after 00:00 on November 13, 2025: 10,000 characters Valid for 90 days after activating Model Studio.

Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Mainland China

In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.

Qwen3-TTS-Instruct-Flash

Model

Version

Unit price

Max input characters

Free quota (Note)

qwen3-tts-instruct-flash

Currently, qwen3-tts-instruct-flash-2026-01-26.

Stable

$0.115/10,000 characters

600

No free quota is available.

qwen3-tts-instruct-flash-2026-01-26

Snapshot

Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VD

Model	Version	Unit price	Max input characters	Free quota (Note)
qwen3-tts-vd-2026-01-26	Snapshot	$0.115/10,000 characters	600	No free quota is available.

Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VC

Model	Version	Unit price	Max input characters	Free quota (Note)
qwen3-tts-vc-2026-01-22	Snapshot	$0.115/10,000 characters	600	No free quota is available.

Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-Flash

Model	Version	Unit price	Max input characters	Free quota (Note)
qwen3-tts-flash Currently, qwen3-tts-flash-2025-11-27.	Stable	$0.114682 per 10,000 characters	600	No free quota is available.
qwen3-tts-flash-2025-11-27	Snapshot
qwen3-tts-flash-2025-09-18	Snapshot

Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen-TTS

Model	Version	Context window	Max input	Max output	Input cost	Output cost	Free quota (Note)
		(tokens)			(Per 1,000 tokens)
qwen-tts Provides the same capabilities as qwen-tts-2025-04-10.	Stable	8,192	512	7,680	$0.230	$1.434	No free quota is available.
qwen-tts-latest Provides the same capabilities as the latest snapshot version.	Latest
qwen-tts-2025-05-22	Snapshot
qwen-tts-2025-04-10

Audio-to-token conversion rule: Each second of audio corresponds to 50 tokens. Audio shorter than 1 second is calculated as 50 tokens.

Qwen real-time speech synthesis

Supports streaming text input and streaming audio output. It can automatically adjust the speech rate based on the text content and punctuation. Usage | API reference

Qwen3-TTS-Instruct-Flash-Realtime supports Qwen real-time speech synthesis and can only use the default voice. It does not support cloned or designed voices.

Qwen3-TTS-VD-Realtime supports using voices from Voice Design (Qwen) for real-time speech synthesis, but does not support the default voice.

Qwen3-TTS-VC-Realtime supports using voices from Voice Cloning (Qwen) for real-time speech synthesis, but does not support the default voice.

Qwen3-TTS-Flash-Realtime and Qwen-TTS-Realtime can only use the default voice. They do not support cloned or designed voices.

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Qwen3-TTS-Instruct-Flash-Realtime

Model

Version

Unit price

Free quota (Note)

qwen3-tts-instruct-flash-realtime

Currently, qwen3-tts-instruct-flash-realtime-2026-01-22.

Stable

$0.143/10,000 characters

10,000 characters

Valid for 90 days after activating Model Studio.

qwen3-tts-instruct-flash-realtime-2026-01-22

Snapshot

Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VD-Realtime

Model

Version

Unit price

Free quota (Note)

qwen3-tts-vd-realtime-2026-01-15

Snapshot

$0.143353 per 10,000 characters

10,000 characters

Valid for 90 days after activating Model Studio

qwen3-tts-vd-realtime-2025-12-16

Snapshot

Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VC-Realtime

Model

Version

Unit price

Free quota(Note)

qwen3-tts-vc-realtime-2026-01-15

Snapshot

$0.13/10,000 characters

10,000 characters

Valid for 90 days after activating Model Studio.

qwen3-tts-vc-realtime-2025-11-27

Snapshot

Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-Flash-Realtime

Model	Version	Unit price	Free quota (Note)
qwen3-tts-flash-realtime Currently, qwen3-tts-flash-realtime-2025-11-27.	Stable	$0.13 per 10,000 characters	10,000 characters Valid for 90 days after activating Model Studio
qwen3-tts-flash-realtime-2025-11-27	Snapshot
qwen3-tts-flash-realtime-2025-09-18	Snapshot		If you activate Alibaba Cloud Model Studio before 00:00 on November 13, 2025: 2,000 characters If you activate Alibaba Cloud Model Studio after 00:00 on November 13, 2025: 10,000 characters Valid for 90 days after activating Model Studio

Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Mainland China

In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.

Qwen3-TTS-Instruct-Flash-Realtime

Model

Version

Unit price

Free quota (Note)

qwen3-tts-instruct-flash-realtime

Current capabilities match qwen3-tts-instruct-flash-realtime-2026-01-22.

Stable version

$0.143 per 10,000 characters

No free quota

qwen3-tts-instruct-flash-realtime-2026-01-22

Snapshot version

Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VD-Realtime

Model	Version	Unit price	Free quota (Note)
qwen3-tts-vd-realtime-2026-01-15	Snapshot	$0.143353 per 10,000 characters	No free quota
qwen3-tts-vd-realtime-2025-12-16	Snapshot	$0.143353 per 10,000 characters	No free quota

Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-VC-Realtime

Model	Version	Unit price	Free quota (Note)
qwen3-tts-vc-realtime-2026-01-15	Snapshot	$0.143353 per 10,000 characters	No free quota is available.
qwen3-tts-vc-realtime-2025-11-27	Snapshot	$0.143353 per 10,000 characters	No free quota is available.

Supported languages: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen3-TTS-Flash-Realtime

Model	Version	Unit price	Free quota (Note)
qwen3-tts-flash-realtime Currently, qwen3-tts-flash-realtime-2025-11-27.	Stable	$0.143353 per 10,000 characters	No free quota is available.
qwen3-tts-flash-realtime-2025-11-27	Snapshot
qwen3-tts-flash-realtime-2025-09-18	Snapshot

Supported languages: Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese
Character calculation rules: Billing is based on the number of input characters. The rules are as follows:
- One Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) = 2 characters
- Other characters, such as an English letter, a punctuation mark, or a space = 1 character

Qwen-TTS-Realtime

Model	Version	Context window	Max input	Max output	Input cost	Output cost	Supported languages	Free quota (Note)
		(tokens)			(Per 1,000 tokens)
qwen-tts-realtime Currently, qwen-tts-realtime-2025-07-15.	Stable	8,192	512	7,680	$0.345	$1.721	Chinese, English	No free quota is available.
qwen-tts-realtime-latest Currently, qwen-tts-realtime-2025-07-15.	Latest						Chinese, English
qwen-tts-realtime-2025-07-15	Snapshot						Chinese, English

Audio-to-token conversion rule: Each second of audio corresponds to 50 tokens. Audio shorter than 1 second is calculated as 50 tokens.

Qwen voice cloning

Voice cloning uses a large model for feature extraction, allowing you to clone voices without training. Provide 10 to 20 seconds of audio to generate a highly similar and natural-sounding custom voice. Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Unit price

Free quota (Note)

qwen-voice-enrollment

$0.01 per voice

1,000 voices

Valid for 90 days after activating Model Studio.

Mainland China

In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Unit price	Free quota (Note)
qwen-voice-enrollment	$0.01 per sound	No free quota is available.

Qwen voice design

Voice design generates custom voices from text descriptions. It supports multi-language and multi-dimensional voice feature definitions, making it suitable for applications such as ad dubbing, character creation, and audio content production. Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Unit price

Free quota (Note)

qwen-voice-design

$0.2 per voice

10 voices

Valid for 90 days after activating Model Studio.

Mainland China

In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Unit price	Free quota (Note)
qwen-voice-design	$0.20 per voice	No free quota is available.

CosyVoice speech synthesis

CosyVoice is a next-generation generative speech synthesis large language model (LLM) from Alibaba Cloud. It deeply integrates text understanding and speech generation based on a large-scale pre-trained language model and supports real-time streaming text-to-speech synthesis. Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide (excluding Mainland China).

Model

Unit price

Free quota (Note)

cosyvoice-v3-plus

$0.26/10,000 characters

10,000 characters

Valid for 90 days after activating Model Studio.

cosyvoice-v3-flash

$0.13/10,000 characters

Character calculation rules: Chinese characters (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) are counted as 2 characters. All other characters (such as letters, numbers, and Japanese/Korean syllabaries) are counted as 1 character. SSML tag content is not billed.

Mainland China

In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Unit price	Free quota (Note)
cosyvoice-v3-plus	$0.286706/10,000 characters	No free quota
cosyvoice-v3-flash	$0.14335/10,000 characters
cosyvoice-v2	$0.286706/10,000 characters

Character calculation rules: Chinese characters (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) are counted as 2 characters. All other characters (such as letters, numbers, and Japanese/Korean syllabaries) are counted as 1 character. SSML tag content is not billed.

Speech recognition (speech-to-text) and translation (speech-to-translation)

Qwen3-LiveTranslate-Flash

Qwen3-LiveTranslate-Flash is an audio and video translation model based on the Qwen3-Omni architecture. It supports translation between 18 languages, including Chinese, English, Russian, and French. The model can use visual context to improve translation accuracy and outputs both text and speech. Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.

Model	Version	Context window	Max input	Max output	Free quota (Note)
		(tokens)
qwen3-livetranslate-flash Currently, qwen3-livetranslate-flash-2025-12-01.	Stable	53,248	49,152	4,096	1 million tokens each Valid for 90 days after activating Model Studio
qwen3-livetranslate-flash-2025-12-01	Snapshot

The billing rules for input and output are as follows:

Input

Unit price (per 1M tokens)

Audio

$1.577

Video

The audio portion is billed separately.

$0.631

Output	Unit price (per 1M tokens)
Audio	$6.308
Text	$1.577

Mainland China

In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Version	Context window	Max input	Max output	Free quota (Note)
		(tokens)
qwen3-livetranslate-flash Currently, qwen3-livetranslate-flash-2025-12-01.	Stable	53,248	49,152	4,096	No free quota is available.
qwen3-livetranslate-flash-2025-12-01	Snapshot

The billing rules for input and output are as follows:

Input

Unit price (per 1M tokens)

Audio

$1.434

Video

The audio portion is billed separately.

$0.573

Output	Unit price (per 1M tokens)
Audio	$5.734
Text	$1.434

Qwen3-LiveTranslate-Flash-Realtime

Qwen3-LiveTranslate-Flash-Realtime is a multilingual, real-time audio and video translation model. It can recognize 18 languages and translate them into audio in 10 languages in real time.

Core features:

Multi-language support: Supports 18 languages, such as Chinese, English, French, German, Russian, Japanese, and Korean, and 6 Chinese dialects, including Mandarin, Cantonese, and Sichuanese.
Visual enhancement: Uses visual content to improve translation accuracy. The model analyzes lip movements, actions, and on-screen text to improve translation in noisy environments or for words with multiple meanings.
Low latency: Achieves simultaneous interpretation latency as low as 3 seconds.
High-quality simultaneous interpretation: Addresses cross-language word order issues using semantic unit prediction technology. The real-time translation quality is comparable to offline translation results.
Natural voice: Generates natural-sounding, human-like speech. The model adapts its tone and emotion based on the source speech content.

Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.

Model	Version	Context window	Max input	Max output	Free quota (Note)
		(tokens)
qwen3-livetranslate-flash-realtime Currently, qwen3-livetranslate-flash-realtime-2025-09-22.	Stable	53,248	49,152	4,096	1 million tokens Valid for 90 days after activating Model Studio.
qwen3-livetranslate-flash-realtime-2025-09-22	Snapshot

After the free quota is used up, the billing rules for input and output are as follows:

Input	Unit price (per 1M tokens)
Audio	$10
Image	$1.3

Output	Unit price (per 1M tokens)
Text	$10
Audio	$38

Token calculation rules:

Audio: Each second of audio input or output consumes 12.5 tokens.
Image: Each 28×28 pixel input consumes 0.5 tokens.

Mainland China

In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Version	Context window	Max input	Max output	Free quota (Note)
		(tokens)
qwen3-livetranslate-flash-realtime Currently, qwen3-livetranslate-flash-realtime-2025-09-22.	Stable	53,248	49,152	4,096	No free quota is available.
qwen3-livetranslate-flash-realtime-2025-09-22	Snapshot

The billing rules for input and output are as follows:

Input	Unit price (per 1M tokens)
Audio	$9.175
Image	$1.147

Output	Unit price (per 1M tokens)
Text	$9.175
Audio	$34.405

Token calculation rules:

Audio: Each second of audio input or output consumes 12.5 tokens.
Image: Each 28×28 pixel input consumes 0.5 tokens.

Qwen audio file recognition

Based on the Qwen multimodal foundation model, this model supports features such as multi-language recognition, singing recognition, and noise rejection. Usage | API reference

International

In the international deployment mode, the endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.

Qwen3-ASR-Flash-Filetrans

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash-filetrans

Currently, qwen3-asr-flash-filetrans-2025-11-17.

Stable

$0.000035/second

36,000 seconds (10 hours)

Valid for 90 days after activating Model Studio.

qwen3-asr-flash-filetrans-2025-11-17

Snapshot

Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Supported sample rates: Any

Qwen3-ASR-Flash

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash

Its capabilities match those of qwen3-asr-flash-2025-09-08.

Stable

$0.000035 per second

36,000 seconds (10 hours)

Valid for 90 days after activating Model Studio.

qwen3-asr-flash-2025-09-08

Snapshot

Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Supported sample rates: Any

US

In the US deployment mode, the endpoints and data storage are located in the US (Virginia) region. Model inference compute resources are limited to the US.

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash-us

Currently, qwen3-asr-flash-2025-09-08-us.

Stable

$0.000035/second

No free quota is available.

qwen3-asr-flash-2025-09-08-us

Snapshot

Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Supported sample rates: Any

Mainland China

In the Mainland China deployment mode, the endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China.

Qwen3-ASR-Flash-Filetrans

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash-filetrans

It offers the same capabilities as qwen3-asr-flash-filetrans-2025-11-17.

Stable

$0.000032/second

No free quota is available.

qwen3-asr-flash-filetrans-2025-11-17

Snapshot

Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Supported sample rates: Any

Qwen3-ASR-Flash

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash

Currently, qwen3-asr-flash-2025-09-08.

Stable

$0.000032/second

No free quota is available.

qwen3-asr-flash-2025-09-08

Snapshot

Supported languages: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Supported sample rates: Any

Qwen real-time speech recognition

Qwen Real-Time Speech Recognition is a Large Language Model (LLM) with automatic language detection. It supports 11 languages and delivers accurate transcription even in complex audio environments. How to use | API reference

International

In international deployment mode, endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled across global regions, excluding Mainland China.

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash-realtime

Currently, qwen3-asr-flash-realtime-2025-10-27

Stable

$0.00009/second

36,000 seconds (10 hours)

Valid for 90 days after activating Model Studio.

qwen3-asr-flash-realtime-2025-10-27

Snapshot

Languages supported: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Türkçe, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Sample rates supported: 8 kHz, 16 kHz

Mainland China

In Mainland China deployment mode, endpoints and data storage are located in the Beijing region. Model inference compute resources are limited to Mainland China only.

Model

Version

Unit price

Free quota (Note)

qwen3-asr-flash-realtime

Currently, qwen3-asr-flash-realtime-2025-10-27

Stable

$0.000047/second

No free quota

qwen3-asr-flash-realtime-2025-10-27

Snapshot

Languages supported: Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Türkçe, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Sample rates supported: 8 kHz, 16 kHz

Paraformer speech recognition

Paraformer speech recognition offers two versions: recorded file recognition and real-time speech recognition.

Recorded File Recognition

Usage | API Reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Unit price	Free quota (Note)
paraformer-v2	$0.000012/second	No free quota
paraformer-8k-v2	$0.000012/second	No free quota

Languages supported:
- paraformer-v2: Chinese (Mandarin, Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghai), English, Japanese, Korean, German, French, Russian
- paraformer-8k-v2: Mandarin Chinese
Sample rates supported:
- paraformer-v2: Any
- paraformer-8k-v2: 8 kHz
Audio formats supported: AAC, AMR, AVI, FLAC, FLV, M4A, MKV, MOV, MP3, MP4, MPEG, OGG, OPUS, WAV, WEBM, WMA, WMV

Real-Time Speech Recognition

Usage | API Reference

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Unit price	Free quota (Note)
paraformer-realtime-v2	$0.000035/second	No free quota
paraformer-realtime-8k-v2	$0.000035/second	No free quota

Languages supported:
- paraformer-realtime-v2: Chinese (Mandarin, Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, Shanghai), English, Japanese, Korean, German, French, Russian
- paraformer-realtime-8k-v2: Mandarin Chinese
Sample rates supported:
- paraformer-realtime-v2: Any
- paraformer-realtime-8k-v2: 8 kHz
Audio formats supported: PCM, WAV, MP3, OPUS, SPEEX, AAC, AMR

Fun-ASR speech recognition

Fun-ASR speech recognition offers two versions: audio file recognition and real-time speech recognition.

Audio File Recognition

Usage instructions | API reference

International

In the international deployment mode, endpoints and data storage are in the Singapore region. Model inference compute resources are dynamically scheduled globally, excluding Mainland China.

Model	Version	Unit price	Free quota (Note)
fun-asr Currently, fun-asr-2025-11-07	Stable	$0.000035/second	36,000 seconds (10 hours) Valid for 90 days
fun-asr-2025-11-07 Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy	Snapshot
fun-asr-2025-08-25	Snapshot
fun-asr-mtl Currently, fun-asr-mtl-2025-08-25	Stable
fun-asr-mtl-2025-08-25	Snapshot

Languages supported:
- fun-asr and fun-asr-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
- fun-asr-2025-08-25: Mandarin and English.
- fun-asr-mtl and fun-asr-mtl-2025-08-25: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
Sample rates supported: Any
Audio formats supported: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

Mainland China

In the Mainland China deployment mode, endpoints and data storage are in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Version	Unit price	Free quota (Note)
fun-asr Currently, fun-asr-2025-11-07	Stable	$0.000032 / second	No free quota
fun-asr-2025-11-07 Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy	Snapshot
fun-asr-2025-08-25	Snapshot
fun-asr-mtl Currently, fun-asr-mtl-2025-08-25	Stable
fun-asr-mtl-2025-08-25	Snapshot

Languages supported:
- fun-asr and fun-asr-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
- fun-asr-2025-08-25: Mandarin and English.
- fun-asr-mtl and fun-asr-mtl-2025-08-25: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
Sample rates supported: Any
Audio formats supported: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

Real-Time Speech Recognition

Usage instructions | API reference

International

In the international deployment mode, endpoints and data storage are in the Singapore region. Model inference compute resources are dynamically scheduled globally, excluding Mainland China.

Model

Version

Unit price

Free quota (Note)

fun-asr-realtime

Currently, fun-asr-realtime-2025-11-07

Stable

$0.00009/second

36,000 seconds (10 hours)

Valid for 90 days

fun-asr-realtime-2025-11-07

Snapshot

Languages supported: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
Sample rates supported: 16 kHz
Audio formats supported: pcm, wav, mp3, opus, speex, aac, amr

Mainland China

In the Mainland China deployment mode, endpoints and data storage are in the Beijing region. Model inference compute resources are limited to Mainland China.

Model	Version	Unit price	Free quota (Note)
fun-asr-realtime Currently, fun-asr-realtime-2025-11-07	Stable	$0.000047/second	No free quota
fun-asr-realtime-2025-11-07 Improved far-field VAD compared to fun-asr-realtime-2025-09-15 for higher accuracy.	Snapshot
fun-asr-realtime-2025-09-15	Snapshot

Languages supported:
- fun-asr-realtime and fun-asr-realtime-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong–Taiwan regions—including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
- fun-asr-realtime-2025-09-15: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, and Thai.
Sample rates supported: 16 kHz
Audio formats supported: pcm, wav, mp3, opus, speex, aac, amr

Text embedding

Text embedding models convert text into numerical representations for tasks such as search, clustering, recommendation, and classification. Billing for these models is based on the number of input tokens. API reference

International

In the international deployment mode, endpoints and data storage are located in the Singapore region. Inference computing resources are scheduled globally, excluding Mainland China.

Model

Embedding dimensions

Batch size

Max tokens per batch (Note)

Supported languages

Price

(1M input tokens)

Free quota

(Note)

text-embedding-v4

Part of the Qwen3-Embedding series

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64

10

8,192

More than 100 major languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, and various programming languages

$0.07

1 million tokens

Valid for 90 days after you activate Model Studio.

text-embedding-v3

1,024 (default), 768, or 512

10

8,192

Over 50 languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian

500,000 tokens

Valid for 90 days after you activate Model Studio.

Mainland China

In the Mainland China deployment mode, endpoints and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model

Embedding dimensions

Batch size

Max tokens per batch (Note)

Supported languages

Price

(1M input tokens)

Free quota

(Note)

text-embedding-v4

Part of the Qwen3-Embedding series

Batch half price

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64

10

8,192

More than 100 major languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, and various programming languages

$0.072

No free quota

Note

Batch size is the max number of texts that a single API call can process. For example, the batch size for text-embedding-v4 is 10. This means a single request can vectorize up to 10 texts, and each text cannot exceed 8,192 tokens. This limit applies to:

String array input: The array can contain up to 10 elements.
File input: The text file can contain up to 10 lines of text.

Multimodal embedding

A multimodal embedding model converts text, images, and videos into a vector of floating-point numbers. The model is suitable for applications such as video classification, image classification, and image-text retrieval. API reference

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are scheduled globally, excluding Mainland China.

Model

Data type

Embedding dimensions

Unit price (1M input tokens)

Free quota (Note)

tongyi-embedding-vision-plus

float(32)

1,152

$0.09

1 million tokens

Valid for 90 days after you activate Model Studio.

tongyi-embedding-vision-flash

float(32)

768

Image/Video: $0.03

Text: $0.09

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model	Data type	Embedding dimensions	Price (1,000 input tokens)
multimodal-embedding-v1	float(32)	1,024	Free trial

Text rerank

This feature is typically used for semantic retrieval. Given a query, it sorts a list of candidate documents in descending order of their semantic relevance. API reference

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Only available in Mainland China (Beijing) region.

Model	Max number of documents	Max input tokens per item	Max input tokens	Supported languages	Price (1M input tokens)
gte-rerank-v2	500	4,000	30,000	More than 50 languages, such as Chinese, English, Japanese, Korean, Thai, Spanish, French, Portuguese, German, Indonesian, and Arabic	$0.115

Max input tokens per item: Each query or document is limited to 4,000 tokens. Input that exceeds this limit is truncated.
Max number of documents: Each request is limited to 500 documents.
Max input tokens: The total number of tokens for all queries and documents in a single request is limited to 30,000.

Domain specific

Intent recognition

The Qwen intent recognition model can quickly and accurately parse user intents in milliseconds and select the appropriate tools to resolve user issues. API reference | Usage

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model	Context window	Max input	Max output	Input cost	Output cost
Model	(tokens)			(per 1M tokens)
tongyi-intent-detect-v3	8,192	8,192	1,024	$0.058	$0.144

Role playing

Qwen's role-playing model is ideal for scenarios that require human-like conversation, such as virtual social interactions, NPCs in games, replicating IP characters, hardware, toys, and in-vehicle systems. Compared to other Qwen models, this model offers enhanced capabilities in character fidelity, conversation progression, and empathetic listening. Usage

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled globally, excluding Mainland China.

Model	Context window	Max input	Max output	Input cost	Output cost
Model	(tokens)			(per 1M tokens)
qwen-plus-character	32,768	30,000	4,000	$0.5	$1.4
qwen-flash-character	8,192	8,000	4,096	$0.05	$0.4
qwen-plus-character-ja	8,192	7,680	512	$0.5	$1.4

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model	Context window	Max input	Max output	Input cost	Output cost
Model	(tokens)			(per 1M tokens)
qwen-plus-character	32,768	32,000	4,096	$0.115	$0.287

Retired models

Retired on January 30, 2026

Category	Model	Context window	Max input	Max output	Input cost (per 1M tokens)	Output cost (per 1M tokens)	Alternative model
		(tokens)
Qwen-Plus	qwen-plus-2024-11-27	131,072	129,024	8,192	$0.115	$0.287	qwen-plus-2025-12-01
	qwen-plus-2024-11-25
	qwen-plus-2024-09-19
	qwen-plus-2024-08-06		128,000		$0.574	$1.721
Qwen-Turbo	qwen-turbo-2024-09-19	131,072	129,023	8,192	$0.044	$0.087	qwen-flash-2025-07-28
Qwen-VL	qwen-vl-max-2024-10-30	32,768	30,720 Max 16384 per image	2,048	$2.868	$2.868	qwen3-vl-plus-2025-12-19
	qwen-vl-max-2024-08-09
	qwen-vl-plus-2024-08-09				$0.216	$0.646	qwen3-vl-flash-2025-10-15

Retired on August 20, 2025

Qwen2

The open-source Qwen2 model from Alibaba Cloud. Usage | API reference | Try it online

Model	Context window	Max input	Max output	Input cost	Output cost	Alternative model
	(tokens)			(per 1M tokens)
qwen2-72b-instruct	131,072	128,000	6,144	Free for a limited time		Qwen3, DeepSeek, Kimi, and others
qwen2-57b-a14b-instruct	65,536	63,488
qwen2-7b-instruct	131,072	128,000

Qwen1.5

The open-source Qwen1.5 model from Alibaba Cloud. Usage | API reference | Try it online

Model	Context window	Max input	Max output	Input cost	Output cost	Alternative model
	(tokens)			(per 1M tokens)
qwen1.5-110b-chat	8,000	6,000	2,000	Free for a limited time		Qwen3, DeepSeek, Kimi, and others
qwen1.5-72b-chat
qwen1.5-32b-chat
qwen1.5-14b-chat
qwen1.5-7b-chat

Category	Subcategory	Description
Text generation	General-purpose large language models	Qwen large language models: Commercial models (Qwen-Max, Qwen-Plus, Qwen-Flash), open source models (Qwen3)
	Multimodal models	Visual understanding model Qwen-VL
	Domain-specific models	Code model, Translation model
Image generation	Text-to-image	Wan text-to-image: Basic text-to-image: Generates beautiful images from a single sentence. Mixed text and image outputs: Generates a text description followed by a corresponding image, resulting in a seamless text-and-image output.
Image generation	Image editing	Wan image editing: Supports scenarios such as multi-image fusion, style transfer, object detection, image restoration, and watermark removal. Model series: Wan2.6.
Video generation	Text-to-video	Generates high-quality videos with rich styles from a single sentence.
	Image-to-video	First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt.
	Video-to-video	Reference-to-video: Generates a video that maintains character consistency using a prompt and the appearance and voice from an input video.

Category	Subcategory	Description
Text generation	General-purpose large language models	Qwen large language models: Commercial models (Qwen-Max, Qwen-Plus, Qwen-Flash), open source models (Qwen3, Qwen2.5)
	Multimodal models	Visual understanding model Qwen-VL, visual reasoning model QVQ, omni-modal model Qwen-Omni, and real-time multi-modal model Qwen-Omni-Realtime
	Domain-specific models	Code model, Translation model, Role-playing model
Image generation	Text-to-image	Qwen-Image: Excels in handling complex instructions, rendering both Chinese and English text, and generating high-definition, photorealistic images. It supports the selection of different models based on efficiency and quality requirements. Wan text-to-image: Basic text-to-image: Generates beautiful images from a single sentence. Mixed text and image outputs: Generates a text description followed by a corresponding image, resulting in a seamless text-and-image output. Tongyi - text-to-Image - Z-Image: A lightweight text-to-image model that quickly generates high-quality images and supports bilingual rendering in Chinese and English, complex semantic understanding, and a variety of styles and themes.
Image generation	Image editing	Qwen-Image-Edit: Supports prompts in both Chinese and English, enabling complex image and text editing operations such as style transfer, text modification, and object editing. It also supports multi-image fusion and is adaptable to a wide variety of industrial application scenarios. Wan image editing: Supports scenarios such as multi-image fusion, style transfer, object detection, image restoration, and watermark removal. Model series include the following: Wan2.6, Wan2.5.
Speech synthesis and recognition	Speech synthesis (text-to-speech)	Qwen speech synthesis and Qwen realtime speech synthesis can be used for text-to-speech in scenarios such as intelligent voice customer service, audiobooks, in-car navigation, and educational tutoring.
Speech synthesis and recognition	Speech recognition and translation	Qwen realtime speech recognition, Qwen audio file recognition, Qwen3-LiveTranslate-Flash-Realtime, and Fun-ASR speech recognition can perform speech-to-text for scenarios such as real-time meeting records, real-time live stream captions, and telephone customer service.
Video generation	Text-to-video	Generates high-quality videos with rich styles from a single sentence.
	Image-to-video	First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt. First-and-last-frame-to-video: Generates a smooth and dynamic video based on the provided first and last frames and a prompt. Multi-image-to-video: Generates a video by referencing the entity or background in one or more input images, combined with a prompt.
	Video-to-video	Reference-to-video: Generates a video that maintains character consistency using a prompt and the appearance and voice from an input video.
	General video editing	General video editing: Performs various video editing tasks based on input text, images, and videos. For example, it can generate a new video by extracting motion features from an input video and combining them with a prompt.
Embedding	Text embedding	Converts text into a set of numbers that represent the text. It is suitable for search, clustering, recommendation, and classification tasks.

Category	Model	Description
Text generation	General-purpose large language models	Qwen large language models: Commercial models (Qwen-Max, Qwen-Plus, Qwen-Flash), open source models (Qwen3, Qwen2.5) Third-party models: DeepSeek, Kimi
	Multimodal models	Visual understanding model Qwen-VL, visual reasoning model QVQ, and omni-modal model Qwen-Omni
	Domain-specific models	Code model, Mathematical model, Translation model, Data mining model, Research model, Intention recognition model, Role-playing model
Image generation	Text-to-image	Qwen-Image: Excels in handling complex instructions, rendering both Chinese and English text, and generating high-definition, photorealistic images. It supports the selection of different models based on efficiency and quality requirements. Wan text-to-image: Basic text-to-image: Generates beautiful images from a single sentence. Mixed text and image outputs: Generates a text description followed by a corresponding image, resulting in a seamless text-and-image output. Tongyi - text-to-Image - Z-Image: A lightweight text-to-image model that quickly generates high-quality images and supports bilingual rendering in Chinese and English, complex semantic understanding, and a variety of styles and themes.
Image generation	Image editing	General-purpose models: Qwen-Image-Edit: Supports prompts in both Chinese and English, enabling complex image and text editing operations such as style transfer, text modification, and object editing. It also supports multi-image fusion and is adaptable to a wide variety of industrial application scenarios. Wan image editing: Supports scenarios such as multi-image fusion, style transfer, object detection, image restoration, and watermark removal. Model series include the following: Wan2.6, Wan2.5, Wan2.1. More models: Qwen Image Translation, OutfitAnyone
Speech synthesis and recognition	Speech synthesis (text-to-speech)	Qwen speech synthesis, Qwen realtime speech synthesis, and CosyVoice speech synthesis convert text to speech for scenarios such as voice-based customer service, audiobooks, in-car navigation, and educational tutoring.
Speech synthesis and recognition	Speech recognition and translation	Qwen realtime speech recognition, Qwen audio file recognition, Fun-ASR speech recognition, and Paraformer speech recognition convert speech to text for scenarios such as real-time meeting transcription, real-time live stream captioning, and customer service calls.
Video editing and generation	Text-to-video	Generates high-quality videos with rich styles from a single sentence.
	Image-to-video	First-frame-to-video: Generates a video from an initial image and a prompt. First-and-last-frame-to-video: Generates a video with a natural transition based on the first and last frame images and a prompt. Multi-image-to-video: Generates a video from one or more images and a text prompt, based on the entities or backgrounds in the source images. Dance video generation: AnimateAnyone generates dance videos from a character image and an action video. Image + audio to generate lip-sync videos Wan - digital human generates video from a person's image and audio. It provides a wide and natural range of motion, supports various frame sizes such as full-body, half-body, and portrait, and is suitable for scenarios such as singing and performance. EMO uses a person's image and audio to generate video with highly expressive lip-syncing and facial expressions. It supports portrait and half-body shots and is ideal for close-up scenarios. LivePortrait uses a portrait image and an audio file and is ideal for voice narration scenarios. Emoji video generation: Emoji generates facial emoji videos from facial images and preset dynamic facial templates.
	Video-to-video	Reference-to-video: Generates a video that maintains character consistency using a prompt and the appearance and voice from an input video.
	General-purpose video editing	General video editing: Performs various video editing tasks based on text prompts, images, and videos. For example, you can generate a new video by extracting motion features from an input video and combining them with a text prompt. Video lip-syncing: VideoRetalk uses a person's video and audio and is ideal for scenarios such as short video production and video translation. Video style transfer: Video style transform transforms videos into various styles, such as Japanese manga and American comics.
Vector	Text embedding	Converts text into a set of numbers that represent the text. It is used for search, clustering, recommendation, and classification.
Vector	Multimodal embedding	Converts text, images, and speech into a set of numbers. It is used for audio and video classification, image classification, and image-text retrieval.

Flagship models	Qwen-Max Ideal for complex tasks. The most powerful model.	Qwen-Plus A balance of performance, speed, and cost.	Qwen-Flash Ideal for simple jobs. Fast and low-cost.	Qwen-Coder An excellent code model that excels at tool calling and environment interaction.
Max context window ^(tokens)	262,144	1,000,000	1,000,000	1,000,000
Min input cost ^{(per 1M tokens)}	$1.2	$0.4	$0.05	$0.3
Min output cost ^{(per 1M tokens)}	$6	$1.2	$0.4	$1.5

Flagship models

Global

International

US

Mainland China

Model overview

Global

International

US

Mainland China

Text generation - Qwen

Qwen-Max

International

Global

Mainland China

Qwen-Plus

International

Global

US

Mainland China

Qwen-Flash

International

Global

US

Mainland China

Qwen-Turbo

International

Mainland China

QwQ

International

Mainland China

Qwen-Long

Qwen-Omni

International

Mainland China

Qwen-Omni-Realtime

International

Mainland China

QVQ

International

Mainland China

Qwen-VL

International

qwen3-vl-plus series

qwen3-vl-flash series

Qwen-VL-Max

Qwen-VL-Plus

Global

qwen3-vl-plus series

qwen3-vl-flash series

US

Mainland China

qwen3-vl-plus series

qwen3-vl-flash series

Qwen-VL-Max series

Qwen-VL-Plus series

Qwen-OCR

International

Global

Mainland China

Qwen-Math

Qwen-Coder

International

qwen3-coder-plus series

qwen3-coder-flash series

Global

qwen3-coder-plus series

qwen3-coder-flash series

Mainland China

qwen3-coder-plus series

qwen3-coder-flash series

Qwen-MT

International

Global

Mainland China

Qwen data mining model

Qwen deep research model

Text generation - Qwen - Open source

Qwen3

International