All Products
Search
Document Center

Alibaba Cloud Model Studio:Model invocation pricing

Last Updated:Jan 08, 2026

Text generation - Qwen

Qwen-Max

Billing is based on the number of input and output tokens.

If the model supports batch calls, the batch price for input and output tokens is 50% of the real-time price. If the model supports context cache, only input tokens are eligible for a discount. These two discounts cannot be applied at the same time.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Mode

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

CoT + response

qwen3-max

Context cache discount applicable

Non-thinking only

0<Token≤32K

$1.2

$6

32K<Token≤128K

$2.4

$12

128K<Token≤252K

$3

$15

qwen3-max-2025-09-23

Non-thinking only

0<Token≤32K

$1.2

$6

32K<Token≤128K

$2.4

$12

128K<Token≤252K

$3

$15

qwen3-max-preview

Context cache discount applicable

Thinking and non-thinking

0<Token≤32K

$1.2

$6

32K<Token≤128K

$2.4

$12

128K<Token≤252K

$3

$15

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Mode

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

CoT + response

Free quota (Note)

qwen3-max

Batch calling half price
Context cache discount applicable

Non-thinking only

0<Token≤32K

$1.2

$6

1 million tokens each

Validity: 90 days after activating Model Studio

32K<Token≤128K

$2.4

$12

128K<Token≤252K

$3

$15

qwen3-max-2025-09-23

Non-thinking only

0<Token≤32K

$1.2

$6

32K<Token≤128K

$2.4

$12

128K<Token≤252K

$3

$15

qwen3-max-preview

Context cache discount applicable

Thinking and non-thinking

0<Token≤32K

$1.2

$6

32K<Token≤128K

$2.4

$12

128K<Token≤252K

$3

$15

More models

Model

Mode

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen-max

Batch calling half price

Non-thinking only

No tiered pricing

$1.6

$6.4

1 million tokens each
Validity: 90 days after activating Model Studio

qwen-max-latest

Non-thinking only

No tiered pricing

$1.6

$6.4

qwen-max-2025-01-25

Non-thinking only

No tiered pricing

$1.6

$6.4

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Mode

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

CoT + response

qwen3-max

Batch calling half price
Context cache discount applicable

Non-thinking only

0<Token≤32K

$0.459

$1.836

32K<Token≤128K

$0.918

$3.672

128K<Token≤252K

$1.377

$5.508

qwen3-max-2025-09-23

Non-thinking only

0<Token≤32K

$0.861

$3.441

32K<Token≤128K

$1.434

$5.735

128K<Token≤252K

$2.151

$8.602

qwen3-max-preview

Context cache discount applicable

Thinking and non-thinking

0<Token≤32K

$0.861

$3.441

32K<Token≤128K

$1.434

$5.735

128K<Token≤252K

$2.151

$8.602

More models

Model

Mode

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

qwen-max

Non-thinking only

No tiered pricing

$0.345

$1.377

qwen-max-latest

Non-thinking only

No tiered pricing

$0.345

$1.377

qwen-max-2025-01-25

Non-thinking only

No tiered pricing

$0.345

$1.377

qwen-max-2024-09-19

Non-thinking only

No tiered pricing

$2.868

$8.602

Qwen-Plus

Billing is based on the number of input and output tokens.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

Non-thinking mode

Thinking mode (CoT + response)

qwen-plus

0<Token≤256K

$0.4

$1.2

$4

256K<Token≤1M

$1.2

$3.6

$12

qwen-plus-2025-12-01

0<Token≤256K

$0.4

$1.2

$4

256K<Token≤1M

$1.2

$3.6

$12

qwen-plus-2025-09-11

0<Token≤256K

$0.4

$1.2

$4

256K<Token≤1M

$1.2

$3.6

$12

qwen-plus-2025-07-28

0<Token≤256K

$0.4

$1.2

$4

256K<Token≤1M

$1.2

$3.6

$12

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

Non-thinking mode

Thinking mode (CoT + response)

qwen-plus

0<Token≤256K

$0.4

$1.2

$4

1 million tokens each
Validity: 90 days after activating Model Studio

256K<Token≤1M

$1.2

$3.6

$12

qwen-plus-latest

0<Token≤256K

$0.4

$1.2

$4

256K<Token≤1M

$1.2

$3.6

$12

qwen-plus-2025-12-01

0<Token≤256K

$0.4

$1.2

$4

256K<Token≤1M

$1.2

$3.6

$12

qwen-plus-2025-09-11

0<Token≤256K

$0.4

$1.2

$4

256K<Token≤1M

$1.2

$3.6

$12

qwen-plus-2025-07-28

0<Token≤256K

$0.4

$1.2

$4

256K<Token≤1M

$1.2

$3.6

$12

qwen-plus-2025-07-14

No tiered pricing

$0.4

$1.2

$4

qwen-plus-2025-04-28

No tiered pricing

$0.4

$1.2

$4

qwen-plus-2025-01-25

No tiered pricing

$0.4

$1.2

-

US

In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.

Note

Models in the US deployment mode do not have a free quota.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

Non-thinking mode

Thinking mode (CoT + response)

qwen-plus-us

0<Token≤256K

$0.4

$1.2

$4

256K<Token≤1M

$1.2

$3.6

$12

qwen-plus-2025-12-01-us

0<Token≤256K

$0.4

$1.2

$4

256K<Token≤1M

$1.2

$3.6

$12

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

Non-thinking mode

Thinking mode (CoT + response)

qwen-plus

0<Token≤128K

$0.115

$0.287

$1.147

128K<Token≤256K

$0.345

$2.868

$3.441

256K<Token≤1M

$0.689

$6.881

$9.175

qwen-plus-latest

0<Token≤128K

$0.115

$0.287

$1.147

128K<Token≤256K

$0.345

$2.868

$3.441

256K<Token≤1M

$0.689

$6.881

$9.175

qwen-plus-2025-12-01

0<Token≤128K

$0.115

$0.287

$1.147

128K<Token≤256K

$0.345

$2.868

$3.441

256K<Token≤1M

$0.689

$6.881

$9.175

qwen-plus-2025-09-11

0<Token≤128K

$0.115

$0.287

$1.147

128K<Token≤256K

$0.345

$2.868

$3.441

256K<Token≤1M

$0.689

$6.881

$9.175

qwen-plus-2025-07-28

0<Token≤128K

$0.115

$0.287

$1.147

128K<Token≤256K

$0.345

$2.868

$3.441

256K<Token≤1M

$0.689

$6.881

$9.175

qwen-plus-2025-07-14

No tiered pricing

$0.115

$0.287

$1.147

qwen-plus-2025-04-28

No tiered pricing

$0.115

$0.287

$1.147

More models

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

qwen-plus-2025-01-25

No tiered pricing

$0.115

$0.287

qwen-plus-2025-01-12

No tiered pricing

$0.115

$0.287

qwen-plus-2024-12-20

No tiered pricing

$0.115

$0.287

qwen-plus-2024-11-27

No tiered pricing

$0.115

$0.287

qwen-plus-2024-11-25

No tiered pricing

$0.115

$0.287

qwen-plus-2024-09-19

No tiered pricing

$0.115

$0.287

qwen-plus-2024-08-06

No tiered pricing

$0.574

$1.721

Qwen-Flash

Billing is based on the number of input and output tokens.

If the model supports batch calls, the batch price for input and output tokens is 50% of the real-time price. If the model supports context cache, only input tokens are eligible for a discount. These two discounts cannot be applied at the same time.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

qwen-flash

Context cache discount applicable

0<Token≤256K

$0.05

$0.4

256K<Token≤1M

$0.25

$2

qwen-flash-2025-07-28

0<Token≤256K

$0.05

$0.4

256K<Token≤1M

$0.25

$2

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen-flash

Batch calling half price
Context cache discount applicable

0<Token≤256K

$0.05

$0.4

1 million tokens each
Validity: 90 days after activating Model Studio

256K<Token≤1M

$0.25

$2

qwen-flash-2025-07-28

0<Token≤256K

$0.05

$0.4

256K<Token≤1M

$0.25

$2

US

In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.

Note

Models in the US deployment mode do not have a free quota.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

qwen-flash

0<Token≤256K

$0.05

$0.4

256K<Token≤1M

$0.25

$2

qwen-flash-2025-07-28

0<Token≤256K

$0.05

$0.4

256K<Token≤1M

$0.25

$2

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

qwen-flash

Context cache discount applicable

0<Token≤128K

$0.022

$0.216

128K<Token≤256K

$0.087

$0.861

256K<Token≤1M

$0.173

$1.721

qwen-flash-2025-07-28

0<Token≤128K

$0.022

$0.216

128K<Token≤256K

$0.087

$0.861

256K<Token≤1M

$0.173

$1.721

Qwen-Turbo

Note

Qwen-Turbo will no longer be updated. We recommend Qwen-Flash instead.

Billing is based on the number of input and output tokens.

If the model supports batch calls, the batch price for input and output tokens is 50% of the real-time price.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

Non-thinking mode

Thinking mode (CoT + response)

qwen-turbo

Batch calling half price

$0.05

$0.2

$0.5

1 million tokens each
Validity: 90 days after activating Model Studio

qwen-turbo-latest

$0.05

$0.2

$0.5

qwen-turbo-2025-04-28

$0.05

$0.2

$0.5

More models

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen-turbo-2024-11-01

$0.05

$0.2

1 million tokens each
Validity: 90 days after activating Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price (Million tokens)

Output price (Million tokens)

Non-thinking mode

Thinking mode (CoT + response)

qwen-turbo

$0.044

$0.087

$0.431

qwen-turbo-latest

$0.044

$0.087

$0.431

qwen-turbo-2025-07-15

$0.044

$0.087

$0.431

qwen-turbo-2025-04-28

$0.044

$0.087

$0.431

QwQ

Billing is based on the number of input and output tokens.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwq-plus

$0.8

$2.4

1 million tokens
Validity: 90 days after activating Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price (Million tokens)

Output price (Million tokens)

qwq-plus

$0.230

$0.574

qwq-plus-latest

$0.230

$0.574

qwq-plus-2025-03-05

$0.230

$0.574

Qwen-Long

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Billing is based on the number of input and output tokens.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen-long-latest

$0.072

$0.287

No free quota

qwen-long-2025-01-25

$0.072

$0.287

Qwen-Omni

Billing rule: You are charged based on the number of input and output tokens. For token calculation rules for different modalities, see Billing and rate limiting.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Mode

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

Input: Text

Input: Audio

Input: Image/Video

Output: Text

Plain text input

Output: Text

Multimodal input

Output: Text+Audio

Only audio is billed

qwen3-omni-flash

Thinking and non-thinking

$0.43

$3.81

$0.78

$1.66

$3.06

$15.11

1 million tokens each (regardless of modality)

Validity: 90 days after activating Model Studio

qwen3-omni-flash-2025-12-01

Thinking and non-thinking

$0.43

$3.81

$0.78

$1.66

$3.06

$15.11

qwen3-omni-flash-2025-09-15

Thinking and non-thinking

$0.43

$3.81

$0.78

$1.66

$3.06

$15.11

More models

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

Input: Text

Input: Audio

Input: Image/Video

Output: Text

Plain text input

Output: Text

Multimodal input

Output: Text+Audio

Only audio is billed

qwen-omni-turbo

$0.07

$4.44

$0.21

$0.27

$0.63

$8.89

1 million tokens each (regardless of modality)

Validity: 90 days after activating Model Studio

qwen-omni-turbo-latest

$0.07

$4.44

$0.21

$0.27

$0.63

$8.89

qwen-omni-turbo-2025-03-26

$0.07

$4.44

$0.21

$0.27

$0.63

$8.89

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Mode

Input price (Million tokens)

Output price (Million tokens)

Input: Text

Input: Audio

Audio part is billed separately

Input: Image/Video

Output: Text

Plain text input

Output: Text

Multimodal input

Output: Text+Audio

Only audio is billed

qwen3-omni-flash

Thinking and non-thinking

$0.258

$2.265

$0.473

$0.989

$1.821

$8.974

qwen3-omni-flash-2025-12-01

Thinking and non-thinking

$0.258

$2.265

$0.473

$0.989

$1.821

$8.974

qwen3-omni-flash-2025-09-15

Thinking and non-thinking

$0.258

$2.265

$0.473

$0.989

$1.821

$8.974

More models

Model

Input price (Million tokens)

Output price (Million tokens)

Input: Text

Input: Audio

Audio part is billed separately

Input: Image/Video

Output: Text

Plain text input

Output: Text

Multimodal input

Output: Text+Audio

Only audio is billed

qwen-omni-turbo

$0.058

$3.584

$0.216

$0.230

$0.646

$7.168

qwen-omni-turbo-latest

$0.058

$3.584

$0.216

$0.230

$0.646

$7.168

qwen-omni-turbo-2025-03-26

$0.058

$3.584

$0.216

$0.230

$0.646

$7.168

qwen-omni-turbo-2025-01-19

$0.058

$3.584

$0.216

$0.230

$0.646

$7.168

Qwen-Omni-Realtime

Billing rule: You are charged based on the number of input and output tokens. For token calculation rules for different modalities, see Billing and rate limiting.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

Input: Text

Input: Audio

Audio part is billed separately

Input: Image

Output: Text

Plain text input

Output: Text

Multimodal input

Output: Text+Audio

Only audio is billed

qwen3-omni-flash-realtime

$0.52

$4.57

$0.94

$1.99

$3.67

$18.13

1 million tokens each (regardless of modality)

Validity: 90 days after activating Model Studio

qwen3-omni-flash-realtime-2025-12-01

$0.52

$4.57

$0.94

$1.99

$3.67

$18.13

qwen3-omni-flash-2025-09-15-realtime

$0.52

$4.57

$0.94

$1.99

$3.67

$18.13

qwen-omni-turbo-realtime

$0.270

$4.440

$0.840

$1.070

$2.520

$8.890

qwen-omni-turbo-realtime-latest

$0.270

$4.440

$0.840

$1.070

$2.520

$8.890

qwen-omni-turbo-realtime-2025-05-08

$0.270

$4.440

$0.840

$1.070

$2.520

$8.890

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price (Million tokens)

Output price (Million tokens)

Input: Text

Input: Audio

Audio part is billed separately

Input: Image

Output: Text

Plain text input

Output: Text

Multimodal input

Output: Text+Audio

Only audio is billed

qwen3-omni-flash-realtime

$0.315

$2.709

$0.559

$1.19

$2.179

$10.766

qwen3-omni-flash-realtime-2025-12-01

$0.315

$2.709

$0.559

$1.19

$2.179

$10.766

qwen3-omni-flash-realtime-2025-09-15

$0.315

$2.709

$0.559

$1.19

$2.179

$10.766

qwen-omni-turbo-realtime

$0.230

$3.584

$0.861

$0.918

$2.581

$7.168

qwen-omni-turbo-realtime-latest

$0.230

$3.584

$0.861

$0.918

$2.581

$7.168

qwen-omni-turbo-realtime-2025-05-08

$0.230

$3.584

$0.861

$0.918

$2.581

$7.168

QVQ

Billing rule: You are charged based on the number of input and output tokens. For token calculation rules for different modalities, see Billing and rate limiting.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qvq-max

$1.2

$4.8

1 million tokens each
Validity: 90 days after activating Model Studio

qvq-max-latest

$1.2

$4.8

qvq-max-2025-03-25

$1.2

$4.8

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price (Million tokens)

Output price (Million tokens)

qvq-max

$1.147

$4.588

qvq-max-latest

$1.147

$4.588

qvq-max-2025-05-15

$1.147

$4.588

qvq-max-2025-03-25

$1.147

$4.588

qvq-plus

$0.287

$0.717

qvq-plus-latest

$0.287

$0.717

qvq-plus-2025-05-15

$0.287

$0.717

Qwen-VL

Billing is based on the number of input and output tokens.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Mode

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

CoT + response

qwen3-vl-plus

Thinking and non-thinking

0<Token≤32K

$0.2

$1.6

32K<Token≤128K

$0.3

$2.4

128K<Token≤256K

$0.6

$4.8

qwen3-vl-plus-2025-09-23

Thinking and non-thinking

0<Token≤32K

$0.2

$1.6

32K<Token≤128K

$0.3

$2.4

128K<Token≤256K

$0.6

$4.8

qwen3-vl-flash

Thinking and non-thinking

0<Token≤32K

$0.05

$0.4

32K<Token≤128K

$0.075

$0.6

128K<Token≤256K

$0.12

$0.96

qwen3-vl-flash-2025-10-15

Thinking and non-thinking

0<Token≤32K

$0.05

$0.4

32K<Token≤128K

$0.075

$0.6

128K<Token≤256K

$0.12

$0.96

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Mode

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

CoT + response

Free quota (Note)

qwen3-vl-plus

Thinking and non-thinking

0<Token≤32K

$0.2

$1.6

1 million tokens each
Validity: 90 days after activating Model Studio

32K<Token≤128K

$0.3

$2.4

128K<Token≤256K

$0.6

$4.8

qwen3-vl-plus-2025-12-19

Thinking and non-thinking

0<Token≤32K

$0.2

$1.6

32K<Token≤128K

$0.3

$2.4

128K<Token≤256K

$0.6

$4.8

qwen3-vl-plus-2025-09-23

Thinking and non-thinking

0<Token≤32K

$0.2

$1.6

32K<Token≤128K

$0.3

$2.4

128K<Token≤256K

$0.6

$4.8

qwen3-vl-flash

Thinking and non-thinking

0<Token≤32K

$0.05

$0.4

32K<Token≤128K

$0.075

$0.6

128K<Token≤256K

$0.12

$0.96

qwen3-vl-flash-2025-10-15

Thinking and non-thinking

0<Token≤32K

$0.05

$0.4

32K<Token≤128K

$0.075

$0.6

128K<Token≤256K

$0.12

$0.96

More models

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen-vl-max

No tiered pricing

$0.8

$3.2

1 million tokens each

Validity: 90 days after activating Model Studio

qwen-vl-max-latest

No tiered pricing

$0.8

$3.2

qwen-vl-max-2025-08-13

No tiered pricing

$0.8

$3.2

qwen-vl-max-2025-04-08

No tiered pricing

$0.8

$3.2

qwen-vl-plus

No tiered pricing

$0.21

$0.63

qwen-vl-plus-latest

No tiered pricing

$0.21

$0.63

qwen-vl-plus-2025-08-15

No tiered pricing

$0.21

$0.63

qwen-vl-plus-2025-05-07

No tiered pricing

$0.21

$0.63

qwen-vl-plus-2025-01-25

No tiered pricing

$0.21

$0.63

US

In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.

Note

Models in the US deployment mode do not have a free quota.

Model

Mode

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

CoT + response

qwen3-vl-flash-us

Thinking and non-thinking

0<Token≤32K

$0.05

$0.4

32K<Token≤128K

$0.075

$0.6

128K<Token≤256K

$0.12

$0.96

qwen3-vl-flash-2025-10-15-us

Thinking and non-thinking

0<Token≤32K

$0.05

$0.4

32K<Token≤128K

$0.075

$0.6

128K<Token≤256K

$0.12

$0.96

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Mode

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

CoT + response

qwen3-vl-plus

Thinking and non-thinking

0<Token≤32K

$0.143

$1.434

32K<Token≤128K

$0.215

$2.15

128K<Token≤256K

$0.43

$4.301

qwen3-vl-plus-2025-12-19

Thinking and non-thinking

0<Token≤32K

$0.143

$1.434

32K<Token≤128K

$0.215

$2.15

128K<Token≤256K

$0.43

$4.301

qwen3-vl-plus-2025-09-23

Thinking and non-thinking

0<Token≤32K

$0.143

$1.434

32K<Token≤128K

$0.215

$2.15

128K<Token≤256K

$0.43

$4.301

qwen3-vl-flash

Thinking and non-thinking

0<Token≤32K

$0.022

$0.215

32K<Token≤128K

$0.043

$0.43

128K<Token≤256K

$0.086

$0.859

qwen3-vl-flash-2025-10-15

Thinking and non-thinking

0<Token≤32K

$0.022

$0.215

32K<Token≤128K

$0.043

$0.43

128K<Token≤256K

$0.086

$0.859

More models

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

qwen-vl-max

No tiered pricing

$0.23

$0.574

qwen-vl-max-latest

No tiered pricing

$0.23

$0.574

qwen-vl-max-2025-08-13

No tiered pricing

$0.23

$0.574

qwen-vl-max-2025-04-08

No tiered pricing

$0.431

$1.291

qwen-vl-max-2025-04-02

No tiered pricing

$0.431

$1.291

qwen-vl-max-2025-01-25

No tiered pricing

$0.431

$1.291

qwen-vl-max-2024-12-30

No tiered pricing

$0.431

$1.291

qwen-vl-max-2024-11-19

No tiered pricing

$0.431

$1.291

qwen-vl-max-2024-10-30

No tiered pricing

$2.868

$2.868

qwen-vl-max-2024-08-09

No tiered pricing

$2.868

$2.868

qwen-vl-plus

No tiered pricing

$0.115

$0.287

qwen-vl-plus-latest

No tiered pricing

$0.115

$0.287

qwen-vl-plus-2025-08-15

No tiered pricing

$0.115

$0.287

qwen-vl-plus-2025-07-10

No tiered pricing

$0.022

$0.216

qwen-vl-plus-2025-05-07

No tiered pricing

$0.216

$0.646

qwen-vl-plus-2025-01-25

No tiered pricing

$0.216

$0.646

qwen-vl-plus-2025-01-02

No tiered pricing

$0.216

$0.646

qwen-vl-plus-2024-08-09

No tiered pricing

$0.216

$0.646

Qwen-OCR

Billing is based on the number of input and output tokens.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Input price (Million tokens)

Output price (Million tokens)

qwen-vl-ocr

$0.07

$0.16

qwen-vl-ocr-2025-11-20

$0.07

$0.16

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen-vl-ocr

$0.72

$0.72

1 million tokens each
Validity: 90 days after activating Model Studio

qwen-vl-ocr-2025-11-20

$0.07

$0.16

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price (Million tokens)

Output price (Million tokens)

qwen-vl-ocr

$0.717

$0.717

qwen-vl-ocr-latest

$0.043

$0.072

qwen-vl-ocr-2025-11-20

qwen-vl-ocr-2025-04-13

$0.717

$0.717

qwen-vl-ocr-2024-10-28

Qwen-Math

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Billing is based on the number of input and output tokens.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen-math-plus

$0.574

$1.721

No free quota

qwen-math-plus-latest

$0.574

$1.721

qwen-math-plus-2024-09-19

$0.574

$1.721

qwen-math-plus-2024-08-16

$0.574

$1.721

qwen-math-turbo

$0.287

$0.861

qwen-math-turbo-latest

$0.287

$0.861

qwen-math-turbo-2024-09-19

$0.287

$0.861

Qwen-Coder

Billing is based on the number of input and output tokens.

If the model supports context cache, only input tokens are eligible for a discount.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

qwen3-coder-plus

Context cache discount applicable

0<Token≤32K

$1

$5

32K<Token≤128K

$1.8

$9

128K<Token≤256K

$3

$15

256K<Token≤1M

$6

$60

qwen3-coder-plus-2025-09-23

0<Token≤32K

$1

$5

32K<Token≤128K

$1.8

$9

128K<Token≤256K

$3

$15

256K<Token≤1M

$6

$60

qwen3-coder-plus-2025-07-22

0<Token≤32K

$1

$5

32K<Token≤128K

$1.8

$9

128K<Token≤256K

$3

$15

256K<Token≤1M

$6

$60

qwen3-coder-flash

Context cache discount applicable

0<Token≤32K

$0.3

$1.5

32K<Token≤128K

$0.5

$2.5

128K<Token≤256K

$0.8

$4

256K<Token≤1M

$1.6

$9.6

qwen3-coder-flash-2025-07-28

0<Token≤32K

$0.3

$1.5

32K<Token≤128K

$0.5

$2.5

128K<Token≤256K

$0.8

$4

256K<Token≤1M

$1.6

$9.6

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen3-coder-plus

Context cache discount applicable

0<Token≤32K

$1

$5

1 million tokens each
Validity: 90 days after activating Model Studio

32K<Token≤128K

$1.8

$9

128K<Token≤256K

$3

$15

256K<Token≤1M

$6

$60

qwen3-coder-plus-2025-09-23

0<Token≤32K

$1

$5

32K<Token≤128K

$1.8

$9

128K<Token≤256K

$3

$15

256K<Token≤1M

$6

$60

qwen3-coder-plus-2025-07-22

0<Token≤32K

$1

$5

32K<Token≤128K

$1.8

$9

128K<Token≤256K

$3

$15

256K<Token≤1M

$6

$60

qwen3-coder-flash

0<Token≤32K

$0.3

$1.5

32K<Token≤128K

$0.5

$2.5

128K<Token≤256K

$0.8

$4

256K<Token≤1M

$1.6

$9.6

qwen3-coder-flash-2025-07-28

0<Token≤32K

$0.3

$1.5

32K<Token≤128K

$0.5

$2.5

128K<Token≤256K

$0.8

$4

256K<Token≤1M

$1.6

$9.6

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

qwen3-coder series

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

qwen3-coder-plus

Context cache discount applicable

0<Token≤32K

$0.574

$2.294

32K<Token≤128K

$0.861

$3.441

128K<Token≤256K

$1.434

$5.735

256K<Token≤1M

$2.868

$28.671

qwen3-coder-plus-2025-09-23

0<Token≤32K

$0.574

$2.294

32K<Token≤128K

$0.861

$3.441

128K<Token≤256K

$1.434

$5.735

256K<Token≤1M

$2.868

$28.671

qwen3-coder-plus-2025-07-22

0<Token≤32K

$0.574

$2.294

32K<Token≤128K

$0.861

$3.441

128K<Token≤256K

$1.434

$5.735

256K<Token≤1M

$2.868

$28.671

qwen3-coder-flash

0<Token≤32K

$0.144

$0.574

32K<Token≤128K

$0.216

$0.861

128K<Token≤256K

$0.359

$1.434

256K<Token≤1M

$0.717

$3.584

qwen3-coder-flash-2025-07-28

0<Token≤32K

$0.144

$0.574

32K<Token≤128K

$0.216

$0.861

128K<Token≤256K

$0.359

$1.434

256K<Token≤1M

$0.717

$3.584

Earlier qwen-coder series

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

qwen-coder-plus

No tiered pricing

$0.502

$1.004

qwen-coder-plus-latest

No tiered pricing

$0.502

$1.004

qwen-coder-plus-2024-11-06

No tiered pricing

$0.502

$1.004

qwen-coder-turbo

No tiered pricing

$0.287

$0.861

qwen-coder-turbo-latest

No tiered pricing

$0.287

$0.861

qwen-coder-turbo-2024-09-19

No tiered pricing

$0.287

$0.861

Qwen-MT

Billing is based on the number of input and output tokens.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Input price (Million tokens)

Output price (Million tokens)

qwen-mt-plus

$2.46

$7.37

qwen-mt-flash

$0.16

$0.49

qwen-mt-lite

$0.12

$0.36

International

In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen-mt-plus

$2.46

$7.37

1 million tokens each
Validity: 90 days after activating Model Studio

qwen-mt-flash

$0.16

$0.49

qwen-mt-lite

$0.12

$0.36

qwen-mt-turbo

$0.16

$0.49

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price (Million tokens)

Output price (Million tokens)

qwen-mt-plus

$0.259

$0.775

qwen-mt-flash

$0.101

$0.280

qwen-mt-lite

$0.086

$0.229

qwen-mt-turbo

$0.101

$0.280

Qwen data mining

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Billing is based on the number of input and output tokens.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen-doc-turbo

$0.087

$0.144

No free quota

Qwen deep research

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Billing is based on the number of input and output tokens.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen-deep-research

$7.742

$23.367

No free quota

Text generation - Qwen - Open source

Qwen3

Billing is based on the number of input and output tokens.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Mode

Input price (Million tokens)

Output price (Million tokens)

Non-thinking mode

Thinking mode (CoT + response)

qwen3-next-80b-a3b-thinking

Thinking only

$0.15

-

$1.2

qwen3-next-80b-a3b-instruct

Non-thinking only

$0.15

$1.2

-

qwen3-235b-a22b-thinking-2507

Thinking only

$0.23

-

$2.3

qwen3-235b-a22b-instruct-2507

Non-thinking only

$0.23

$0.92

-

qwen3-30b-a3b-thinking-2507

Thinking only

$0.2

-

$2.4

qwen3-30b-a3b-instruct-2507

Non-thinking only

$0.2

$0.8

-

qwen3-235b-a22b

Thinking and non-thinking

$0.7

$2.8

$8.4

qwen3-32b

Thinking and non-thinking

$0.16

$0.64

$0.64

qwen3-30b-a3b

Thinking and non-thinking

$0.2

$0.8

$2.4

qwen3-14b

Thinking and non-thinking

$0.35

$1.4

$4.2

qwen3-8b

Thinking and non-thinking

$0.18

$0.7

$2.1

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Mode

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

Non-thinking mode

Thinking mode

qwen3-next-80b-a3b-thinking

Thinking only

$0.15

-

$1.2

1 million tokens each
Validity: 90 days after activating Model Studio

qwen3-next-80b-a3b-instruct

Non-thinking only

$0.15

$1.2

-

qwen3-235b-a22b-thinking-2507

Thinking only

$0.23

-

$2.3

qwen3-235b-a22b-instruct-2507

Non-thinking only

$0.23

$0.92

-

qwen3-30b-a3b-thinking-2507

Thinking only

$0.2

-

$2.4

qwen3-30b-a3b-instruct-2507

Non-thinking only

$0.2

$0.8

-

qwen3-235b-a22b

Thinking and non-thinking

$0.7

$2.8

$8.4

qwen3-32b

Thinking and non-thinking

$0.16

$0.64

$0.64

qwen3-30b-a3b

Thinking and non-thinking

$0.2

$0.8

$2.4

qwen3-14b

Thinking and non-thinking

$0.35

$1.4

$4.2

qwen3-8b

Thinking and non-thinking

$0.18

$0.7

$2.1

qwen3-4b

Thinking and non-thinking

$0.11

$0.42

$1.26

qwen3-1.7b

Thinking and non-thinking

$0.11

$0.42

$1.26

qwen3-0.6b

Thinking and non-thinking

$0.11

$0.42

$1.26

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Mode

Input price (Million tokens)

Output price (Million tokens)

Non-thinking mode

Thinking mode (CoT + response)

qwen3-next-80b-a3b-thinking

Thinking only

$0.144

-

$1.434

qwen3-next-80b-a3b-instruct

Non-thinking only

$0.144

$0.574

-

qwen3-235b-a22b-thinking-2507

Thinking only

$0.287

-

$2.868

qwen3-235b-a22b-instruct-2507

Non-thinking only

$0.287

$1.147

-

qwen3-30b-a3b-thinking-2507

Thinking only

$0.108

-

$1.076

qwen3-30b-a3b-instruct-2507

Non-thinking only

$0.108

$0.431

-

qwen3-235b-a22b

Thinking and non-thinking

$0.287

$1.147

$2.868

qwen3-32b

Thinking and non-thinking

$0.287

$1.147

$2.868

qwen3-30b-a3b

Thinking and non-thinking

$0.108

$0.431

$1.076

qwen3-14b

Thinking and non-thinking

$0.144

$0.574

$1.434

qwen3-8b

Thinking and non-thinking

$0.072

$0.287

$0.717

qwen3-4b

Thinking and non-thinking

$0.044

$0.173

$0.431

qwen3-1.7b

Thinking and non-thinking

$0.044

$0.173

$0.431

qwen3-0.6b

Thinking and non-thinking

$0.044

$0.173

$0.431

QwQ - Open source

Billing is based on the number of input and output tokens.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwq-32b

$0.287

$0.861

No free quota

QwQ-Preview

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Billing is based on the number of input and output tokens.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwq-32b-preview

$0.287

$0.861

No free quota

Qwen2.5

Billing is based on the number of input and output tokens.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen2.5-14b-instruct-1m

$0.805

$3.22

1 million tokens each
Validity: 90 days after activating Model Studio

qwen2.5-7b-instruct-1m

$0.368

$1.47

qwen2.5-72b-instruct

$1.4

$5.6

qwen2.5-32b-instruct

$0.7

$2.8

qwen2.5-14b-instruct

$0.35

$1.4

qwen2.5-7b-instruct

$0.175

$0.7

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price (Million tokens)

Output price (Million tokens)

qwen2.5-14b-instruct-1m

$0.144

$0.431

qwen2.5-7b-instruct-1m

$0.072

$0.144

qwen2.5-72b-instruct

$0.574

$1.721

qwen2.5-32b-instruct

$0.287

$0.861

qwen2.5-14b-instruct

$0.144

$0.431

qwen2.5-7b-instruct

$0.072

$0.144

qwen2.5-3b-instruct

$0.044

$0.130

qwen2.5-1.5b-instruct

Limited time free

qwen2.5-0.5b-instruct

QVQ

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Billing is based on the number of input and output tokens.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qvq-72b-preview

$1.721

$5.161

No free quota

Qwen-Omni

Billing rule: You are charged based on the number of input and output tokens. For token calculation rules for different modalities, see Billing and rate limiting.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

Input: Text

Input: Audio

Input: Image/Video

Output: Text

Plain text input

Output: Text

Multimodal input

Output: Text+Audio

Only audio is billed

qwen2.5-omni-7b

$0.10

$6.76

$0.28

$0.40

$0.84

$13.51

1 million tokens (regardless of modality)

Validity: 90 days after activating Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price (Million tokens)

Output price (Million tokens)

Input: Text

Input: Audio

Input: Image/Video

Output: Text

Plain text input

Output: Text

Multimodal input

Output: Text+Audio

Only audio is billed

qwen2.5-omni-7b

$0.087

$5.448

$0.287

$0.345

$0.861

$10.895

Qwen3-Omni-Captioner

Billing is based on the number of input and output tokens.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen3-omni-30b-a3b-captioner

$3.81

$3.06

1 million tokens
Validity: 90 days after activating Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price (Million tokens)

Output price (Million tokens)

qwen3-omni-30b-a3b-captioner

$2.265

$1.821

Qwen-VL

Billing is based on the number of input and output tokens.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Mode

Input price (Million tokens)

Output price (Million tokens)

CoT + response

qwen3-vl-235b-a22b-thinking

Thinking only

$0.287

$2.867

qwen3-vl-235b-a22b-instruct

Non-thinking only

$0.287

$1.147

qwen3-vl-32b-thinking

Thinking only

$0.287

$2.867

qwen3-vl-32b-instruct

Non-thinking only

$0.287

$1.147

qwen3-vl-30b-a3b-thinking

Thinking only

$0.108

$1.075

qwen3-vl-30b-a3b-instruct

Non-thinking only

$0.108

$0.43

qwen3-vl-8b-thinking

Thinking only

$0.072

$0.717

qwen3-vl-8b-instruct

Non-thinking only

$0.072

$0.287

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Mode

Input price (Million tokens)

Output price (Million tokens)

CoT + response

Free quota (Note)

qwen3-vl-235b-a22b-thinking

Thinking only

$0.4

$4

1 million tokens each
Validity: 90 days after activating Model Studio

qwen3-vl-235b-a22b-instruct

Non-thinking only

$0.4

$1.6

qwen3-vl-32b-thinking

Thinking only

$0.16

$0.64

qwen3-vl-32b-instruct

Non-thinking only

$0.16

$0.64

qwen3-vl-30b-a3b-thinking

Thinking only

$0.2

$2.4

qwen3-vl-30b-a3b-instruct

Non-thinking only

$0.2

$0.8

qwen3-vl-8b-thinking

Thinking only

$0.18

$2.1

qwen3-vl-8b-instruct

Non-thinking only

$0.18

$0.7

More models

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen2.5-vl-72b-instruct

$2.8

$8.4

1 million tokens each
Validity: 90 days after activating Model Studio

qwen2.5-vl-32b-instruct

$1.4

$4.2

qwen2.5-vl-7b-instruct

$0.35

$1.05

qwen2.5-vl-3b-instruct

$0.21

$0.63

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Mode

Input price (Million tokens)

Output price (Million tokens)

CoT + response

qwen3-vl-235b-a22b-thinking

Thinking only

$0.287

$2.8677

qwen3-vl-235b-a22b-instruct

Non-thinking only

$0.287

$1.147

qwen3-vl-32b-thinking

Thinking only

$0.287

$2.868

qwen3-vl-32b-instruct

Non-thinking only

$0.287

$1.147

qwen3-vl-30b-a3b-thinking

Thinking only

$0.108

$1.076

qwen3-vl-30b-a3b-instruct

Non-thinking only

$0.108

$0.431

qwen3-vl-8b-thinking

Thinking only

$0.072

$0.717

qwen3-vl-8b-instruct

Non-thinking only

$0.072

$0.287

More models

Model

Input price (Million tokens)

Output price (Million tokens)

qwen2.5-vl-72b-instruct

$2.294

$6.881

qwen2.5-vl-32b-instruct

$1.147

$3.441

qwen2.5-vl-7b-instruct

$0.287

$0.717

qwen2.5-vl-3b-instruct

$0.173

$0.517

qwen2-vl-72b-instruct

$2.294

$6.881

qwen2-vl-7b-instruct

Limited time free

qwen2-vl-2b-instruct

Qwen-Math

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Billing is based on the number of input and output tokens.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen2.5-math-72b-instruct

$0.574

$1.721

No free quota

qwen2.5-math-7b-instruct

$0.144

$0.287

qwen2.5-math-1.5b-instruct

Limited time free

Qwen-Coder

Billing is based on the number of input and output tokens.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

qwen3-coder-480b-a35b-instruct

0<Token≤32K

$1.5

$7.5

32K<Token≤128K

$2.7

$13.5

128K<Token≤200K

$4.5

$22.5

qwen3-coder-30b-a3b-instruct

0<Token≤32K

$0.45

$2.25

32K<Token≤128K

$0.75

$3.75

128K<Token≤200K

$1.2

$6

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen3-coder-480b-a35b-instruct

0<Token≤32K

$1.5

$7.5

1 million tokens each
Validity: 90 days after activating Model Studio

32K<Token≤128K

$2.7

$13.5

128K<Token≤200K

$4.5

$22.5

qwen3-coder-30b-a3b-instruct

0<Token≤32K

$0.45

$2.25

32K<Token≤128K

$0.75

$3.75

128K<Token≤200K

$1.2

$6

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input tokens per request

Input price (Million tokens)

Output price (Million tokens)

qwen3-coder-480b-a35b-instruct

0<Token≤32K

$0.861

$3.441

32K<Token≤128K

$1.291

$5.161

128K<Token≤200K

$2.151

$8.602

qwen3-coder-30b-a3b-instruct

0<Token≤32K

$0.216

$0.861

32K<Token≤128K

$0.323

$1.291

128K<Token≤200K

$0.538

$2.151

qwen2.5-coder-32b-instruct

No tiered pricing

$0.287

$0.861

qwen2.5-coder-14b-instruct

No tiered pricing

$0.287

$0.861

qwen2.5-coder-7b-instruct

No tiered pricing

$0.144

$0.287

qwen2.5-coder-3b-instruct

No tiered pricing

Limited time free

qwen2.5-coder-1.5b-instruct

No tiered pricing

qwen2.5-coder-0.5b-instruct

No tiered pricing

Text generation - Third party

DeepSeek

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Billing is based on the number of input and output tokens.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

deepseek-v3.2

$0.287

$0.431

No free quota

deepseek-v3.2-exp

$0.287

$0.431

deepseek-v3.1

$0.574

$1.721

deepseek-r1

$0.574

$2.294

deepseek-r1-0528

$0.574

$2.294

deepseek-v3

$0.287

$1.147

deepseek-r1-distill-qwen-1.5b

Limited time free

deepseek-r1-distill-qwen-7b

$0.072

$0.144

No free quota

deepseek-r1-distill-qwen-14b

$0.144

$0.431

deepseek-r1-distill-qwen-32b

$0.287

$0.861

deepseek-r1-distill-llama-8b

Limited time free

deepseek-r1-distill-llama-70b

Limited time free

Kimi

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Billing is based on the number of input and output tokens.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

kimi-k2-thinking

$0.574

$2.294

No free quota

Moonshot-Kimi-K2-Instruct

$0.574

$2.294

Image generation

Inputs are not billed. Billing is based on the number of successfully generated images in the output.

Billing formula: Fee = Unit price per image × Number of successfully generated images.

Billing details:

  • The fee is not affected by the resolution or aspect ratio of the output images.

  • Failed requests do not incur fees or consume the free quota.

Billing example: Partial image generation failure

Assume that the unit price is $0.10/image. If you call the API to generate 4 images, but only 3 image URLs are successfully returned and 1 image fails:

  • Billed quantity: 3

  • Fee calculation: 0.1 × 3 = $0.3.

Qwen-Image

Only output is billed. For rules, see Image generation.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Output price

Free quota (Note)

qwen-image-max

$0.075/image

100 images each
Validity: 90 days after activating Model Studio

qwen-image-max-2025-12-30

$0.075/image

qwen-image-plus

$0.03/image

qwen-image

$0.035/image

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Output price

qwen-image-max

$0.071677/image

qwen-image-max-2025-12-30

$0.071677/image

qwen-image-plus

$0.028671/image

qwen-image

$0.035/image

Qwen-Image-Edit

Only output is billed. For rules, see Image generation.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Output price

Free quota (Note)

qwen-image-edit-plus

$0.03/image

100 images each
Validity: 90 days after activating Model Studio

qwen-image-edit-plus-2025-12-15

$0.03/image

qwen-image-edit-plus-2025-10-30

$0.03/image

qwen-image-edit

$0.045/image

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Output price

qwen-image-edit-plus

$0.028671/image

qwen-image-edit-plus-2025-12-15

$0.028671/image

qwen-image-edit-plus-2025-10-30

$0.028671/image

qwen-image-edit

$0.043/image

Qwen-MT-Image

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Only output is billed. For rules, see Image generation.

Model

Output price

Free quota (Note)

qwen-mt-image

$0.000431/image

No free quota

Tongyi - text-to-image - Z-Image

Only output is billed. For rules, see Image generation.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Output price

Free quota (Note)

z-image-turbo

Prompt rewriting disabled (prompt_extend=false) : $0.015/image

Prompt rewriting enabled (prompt_extend=true) : $0.03/image

100 images

Validity: 90 days after activating Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Output price

z-image-turbo

Prompt rewriting disabled (prompt_extend=false) : $0.01434/image

Prompt rewriting enabled (prompt_extend=true) : $0.02868/image

Wan text-to-image

Only output is billed. For rules, see Image generation.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Output price

wan2.6-t2i

$0.03/image

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Output price

Free quota (Note)

wan2.6-t2i

$0.03/image

50 images

wan2.5-t2i-preview

$0.03/image

50 images

wan2.2-t2i-plus

$0.05/image

100 images

wan2.2-t2i-flash

$0.025/image

100 images

wan2.1-t2i-plus

$0.05/image

200 images

wan2.1-t2i-turbo

$0.025/image

200 images

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Output price

wan2.6-t2i

$0.028671/image

wan2.5-t2i-preview

$0.028671/image

wan2.2-t2i-plus

$0.020070/image

wan2.2-t2i-flash

$0.028671/image

wanx2.1-t2i-plus

$0.028671/image

wanx2.1-t2i-turbo

$0.020070/image

wanx2.0-t2i-turbo

$0.005735/image

Wan image generation and editing

Only output is billed. For rules, see Image generation.

Global

Note

Global (Virginia) models do not offer a free quota.

Model

Output price

wan2.6-image

$0.03/image

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Output price

Free quota (Note)

wan2.6-image

$0.03/image

50 images
Validity: 90 days after activating Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Output price

wan2.6-image

$0.028671/image

Wan general image editing

Only output is billed. For rules, see Image generation.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Service

Model

Output price

Free quota (Note)

General image editing 2.5

wan2.5-i2i-preview

$0.03/image

50 images
Validity: 90 days after activating Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Service

Model

Output price

General image editing 2.5

wan2.5-i2i-preview

$0.028671/image

General image editing 2.1

wanx2.1-imageedit

$0.020070/image

OutfitAnyone

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

  • aitryon-plus: Input is not charged, but output charged. For billing rules, see Image generation.

  • aitryon-parsing-v1: Input is charged, but output not charged. You are charged based on the number of input images. You are not charged for failed requests.

Service

Model

Price

Free quota (Note)

OutfitAnyone - Plus

aitryon-plus

$0.071677/image

No free quota

OutfitAnyone - Image parsing

aitryon-parsing-v1

$0.000574/image

Video generation

Inputs are not billed. Billing is based on the duration in seconds of successfully generated videos in the output.

Billing formula: Fee = Unit price per video × Duration of successfully generated video (in seconds).

Billing details:

  • For some models, the price is based on the output video resolution. The prices for different resolutions, such as 480P, 720P, and 1080P, vary.

  • For some models, the price is based on the output video mode. The prices for different video modes, such as Standard Edition and Professional Edition, vary.

  • For some models, the price is based on the output video aspect ratio. The prices for different video aspect ratios, such as 1:1 and 3:4, vary.

  • Some models use uniform pricing, regardless of resolution, pattern, or aspect ratio.

  • Failed requests do not incur fees or consume the free quota.

Wan - text-to-video

Only output is billed. For rules, see Video generation.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Output video resolution

Output price

wan2.6-t2v

720P

$0.1/second

1080P

$0.15/second

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Output video resolution

Output price

Free quota (Note)

Validity: 90 days after activating Model Studio

wan2.6-t2v

720P

$0.10/second

50 seconds

1080P

$0.15/second

wan2.5-t2v-preview

480P

$0.05/second

50 seconds

720P

$0.10/second

1080P

$0.15/second

wan2.2-t2v-plus

480P

$0.02/second

50 seconds

1080P

$0.10/second

wan2.1-t2v-turbo

480P

$0.036/second

200 seconds

720P

$0.036/second

wan2.1-t2v-plus

720P

$0.10/second

200 seconds

US

In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.

Note

Models in the US deployment mode do not have a free quota.

Model

Output video resolution

Output price

wan2.6-t2v-us

720P

$0.1/second

1080P

$0.15/second

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Output video resolution

Output price

wan2.6-t2v

720P

$0.086012/second

1080P

$0.143353/second

wan2.5-t2v-preview

480P

$0.043006/second

720P

$0.086012/second

1080P

$0.143353/second

wan2.2-t2v-plus

480P

$0.02007/second

1080P

$0.100347/second

wanx2.1-t2v-turbo

480P

$0.034405/second

720P

$0.034405/second

wanx2.1-t2v-plus

720P

$0.100347/second

Wan - image-to-video - first fame

Only output is billed. For rules, see Video generation.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Output video resolution

Output price

wan2.6-i2v

720P

$0.1/second

1080P

$0.15/second

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Output video resolution

Output price

Free quota (Note)

Validity: 90 days after activating Model Studio

wan2.6-i2v

720P

$0.10/second

50 seconds

1080P

$0.15/second

wan2.5-i2v-preview

480P

$0.05/second

50 seconds

720P

$0.10/second

1080P

$0.15/second

wan2.2-i2v-flash

480P

$0.015/second

50 seconds

720P

$0.036/second

wan2.2-i2v-plus

480P

$0.02/second

50 seconds

1080P

$0.10/second

wan2.1-t2v-turbo

480P

$0.036/second

200 seconds

720P

$0.036/second

wan2.1-t2v-plus

720P

$0.10/second

200 seconds

US

In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.

Note

Models in the US deployment mode do not have a free quota.

Model

Output video resolution

Output price

wan2.6-i2v-us

720P

$0.1/second

1080P

$0.15/second

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Output video resolution

Output price

wan2.6-i2v

720P

$0.086012/second

1080P

$0.143353/second

wan2.5-i2v-preview

480P

$0.043006/second

720P

$0.086012/second

1080P

$0.143353/second

wan2.2-i2v-plus

480P

$0.02007/second

1080P

$0.100347/second

wanx2.1-t2v-turbo

480P

$0.034405/second

720P

$0.034405/second

wanx2.1-t2v-plus

720P

$0.100347/second

Wan - image-to-video - first and last frames

Only output is billed. For rules, see Video generation.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Output video resolution

Output price

Free quota (Note)

Validity: 90 days after activating Model Studio

wan2.2-kf2v-flash

480P

$0.015/second

50 seconds

720P

$0.036/second

1080P

$0.07/second

wan2.1-kf2v-plus

720P

$0.10/second

200 seconds

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Output video resolution

Output price

wan2.2-kf2v-flash

480P

$0.014335/second

720P

$0.028671/second

1080P

$0.068809/second

wanx2.1-kf2v-plus

720P

$0.100347/second

Wan - reference-to-video

Billing rule: You are charged for both the input and output videos by seconds of video duration. Failed generations will not charge or consume the free quota.

  • The input video is billed for no more than 5 seconds. For specific rules, see Billing and rate limiting.

  • The output video is billed based on seconds of successfully generated video.

Global

In the Global deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are dynamically scheduled worldwide.

Note

Models in the Global deployment mode do not have a free quota.

Model

Video resolution

Input price

Output price

wan2.6-r2v

720P

$0.1/second

$0.1/second

1080P

$0.15/second

$0.15/second

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Video resolution

Input price

Output price

Free quota (Note)

Validity: 90 days after activating Model Studio

wan2.6-r2v

720P

$0.10/second

$0.10/second

50 seconds

1080P

$0.15/second

$0.15/second

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Video resolution

Input price

Output price

wan2.6-r2v

720P

$0.086012/second

$0.086012/second

1080P

$0.143353/second

$0.143353/second

Wan - general video editing

Only output is billed. For rules, see Video generation.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Output video resolution

Output price

Free quota (Note)

wan2.1-vace-plus

720P

$0.10/second

50 seconds

Validity: 90 days after activating Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Output video resolution

Output price

wanx2.1-vace-plus

720P

$0.100347/second

Wan - digital human

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

  • wan2.2-s2v-detect: Input is charged, but output not charged. You are charged based on the number of detected images. Each input image is charged once, regardless of whether the detection is successful.

  • wan2.2-s2v: Input is not charged, but output charged. You are charged based on the duration of the successfully generated video in seconds. For billing rules, see Video generation.

Service

Model

Price

Free quota (Note)

Image detection

wan2.2-s2v-detect

Input image: $0.000574/image

No free quota

Video generation

wan2.2-s2v

Output video:

  • 480P: $0.071677/second

  • 720P: $0.129018/second

Wan - image-to-action

Only output is billed. For rules, see Video generation.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Output video mode

Output price

Free quota (Note)

wan2.2-animate-move

Standard mode Wan - std

$0.12/second

50 seconds

Validity: 90 days after activating Model Studio

Professional mode Wan - pro

$0.18/second

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Output video mode

Output price

wan2.2-animate-move

Standard mode Wan - std

$0.06/second

Professional mode Wan - pro

$0.09/second

Wan - Video character swap

Only output is billed. For rules, see Video generation.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Output video mode

Output price

Free quota (Note)

wan2.2-animate-mix

Standard mode Wan - std

$0.18/second

50 seconds

Validity: 90 days after activating Model Studio

Professional mode Wan - pro

$0.26/second

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Output video mode

Output price

wan2.2-animate-mix

Standard mode Wan - std

$0.09/second

Professional mode Wan - pro

$0.13/second

AnimateAnyone

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

  • animate-anyone-detect-gen2: Input is charged, but output not charged. You are charged based on the number of detected images. Each input image is charged once, regardless of whether the detection is successful.

  • animate-anyone-template-gen2: Input is not charged, but output charged. You are charged based on the duration of the successfully generated video in seconds. For billing rules, see Video generation.

  • animate-anyone-gen2: Input is not charged, but output charged. You are charged based on the duration of the successfully generated video in seconds. For billing rules, see Video generation.

Service

Model

Price

Free quota (Note)

Image detection

animate-anyone-detect-gen2

Input image: $0.000574/image

No free quota

Action template generation

animate-anyone-template-gen2

Output video: $0.011469/second

Video generation

animate-anyone-gen2

Output video: $0.011469/second

EMO

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

  • emo-detect-v1: Input is charged, but output not charged. You are charged based on the number of detected images. Each input image is charged once, regardless of whether the detection is successful.

  • emo-v1: Input is not charged, but output charged. You are charged based on the duration of the successfully generated video in seconds. For billing rules, see Video generation.

Service

Model

Price

Free quota (Note)

Image detection

emo-detect-v1

Input image: $0.000574/image

No free quota

Video generation

emo-v1

Output video:

  • 1:1 aspect ratio: $0.011469/second

  • 3:4 aspect ratio: $0.022937/second

LivePortrait

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

  • liveportrait-detect: Input is charged, but output not charged. You are charged based on the number of detected images. Each input image is charged once, regardless of whether the detection is successful.

  • liveportrait: Input is not charged, but output charged. You are charged based on the duration of the successfully generated video in seconds. For billing rules, see Video generation.

Service

Model

Price

Free quota (Note)

Image detection

liveportrait-detect

Input image: $0.000574/image

No free quota

Video generation

liveportrait

Output video: $0.002868/second

Emoji

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

  • emoji-detect-v1: Input is charged, but output not charged. You are charged based on the number of detected images. Each input image is charged once, regardless of whether the detection is successful.

  • emoji-v1: Input is not charged, but output charged. You are charged based on the duration of the successfully generated video in seconds. For billing rules, see Video generation.

Service

Model

Price

Free quota (Note)

Image detection

emoji-detect-v1

Input image: $0.000574/image

No free quota

Video generation

emoji-v1

Output video: $0.011469/second

VideoRetalk

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Only output is billed. For rules, see Video generation.

Model

Output price

Free quota (Note)

videoretalk

$0.011469/second

No free quota

Video style reform

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Only output is billed. For rules, see Video generation.

Model

Output video resolution

Output price

Free quota (Note)

video-style-transform

540P

$0.028671/second

No free quota

720P

$0.071677/second

Speech synthesis (text-to-speech)

Qwen-TTS

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

qwen3-tts series

Billing rule: You are charged based on the number of input text characters. You are not charged for the output.

Model

Input price

Free quota (Note)

qwen3-tts-flash

$0.1/10,000 characters

If Model Studio activated before 00:00 on November 13, 2025: 2000 characters

If Model Studio activated after 00:00 on November 13, 2025: 10,000 characters

Validity: 90 days after activating Model Studio

qwen3-tts-flash-2025-11-27

$0.1/10,000 characters

10,000 characters

Validity: 90 days after activating Model Studio

qwen3-tts-flash-2025-09-18

$0.1/10,000 characters

If Model Studio activated before 00:00 on November 13, 2025: 2000 characters

If Model Studio activated after 00:00 on November 13, 2025: 10,000 characters

Validity: 90 days after activating Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

qwen3-tts series

Billing rule: You are charged based on the number of input text characters. You are not charged for the output.

Model

Input price (10,000 characters)

Output price (10,000 characters)

qwen3-tts-flash

$0.114682

Not charged

qwen3-tts-flash-2025-11-27

$0.114682

Not charged

qwen3-tts-flash-2025-09-18

$0.114682

Not charged

qwen-tts series

Billing rule: You are charged based on the number of input and output tokens.

Model

Input price (Million tokens)

Output price (Million tokens)

qwen-tts-flash

$0.23

$1.434

qwen-tts-latest

$0.23

$1.434

qwen-tts-2025-05-22

$0.23

$1.434

qwen-tts-2025-04-10

$0.23

$1.434

Qwen-TTS-Realtime

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

qwen3-tts-vd realtime series

Billing rule: You are charged based on the number of input text characters. You are not charged for the output.

Model

Input price

Free quota (Note)

qwen3-tts-vd-realtime-2025-12-16

$0.143353/10,000 characters

10,000 characters

Validity: 90 days after activating Model Studio

qwen3-tts-vc realtime series

Billing rule: You are charged based on the number of input text characters. You are not charged for the output.

Model

Input price

Free quota (Note)

qwen3-tts-vc-realtime-2025-11-27

$0.13/10,000 characters

10,000 characters

Validity: 90 days after activating Model Studio

qwen3-tts realtime series

Billing rule: You are charged based on the number of input text characters. You are not charged for the output.

Model

Input price

Free quota (Note)

qwen3-tts-flash-realtime

$0.13/10,000 characters

If Model Studio activated before 00:00 on November 13, 2025: 2000 characters

If Model Studio activated after 00:00 on November 13, 2025: 10,000 characters

Validity: 90 days after activating Model Studio

qwen3-tts-flash-realtime-2025-11-27

$0.13/10,000 characters

10,000 characters

Validity: 90 days after activating Model Studio

qwen3-tts-flash-realtime-2025-09-18

$0.13/10,000 characters

If Model Studio activated before 00:00 on November 13, 2025: 2000 characters

If Model Studio activated after 00:00 on November 13, 2025: 10,000 characters

Validity: 90 days after activating Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

qwen3-tts-vd realtime series

Billing rule: You are charged based on the number of input text characters. You are not charged for the output.

Model

Input price (10,000 characters)

Output price

qwen3-tts-vd-realtime-2025-12-16

$0.143353

Not charged

qwen3-tts-vc realtime series

Billing rule: You are charged based on the number of input text characters. You are not charged for the output.

Model

Input price (10,000 characters)

Output price

qwen3-tts-vc-realtime-2025-11-27

$0.143353

Not charged

qwen3-tts realtime series

Billing rule: You are charged based on the number of input text characters. You are not charged for the output.

Model

Input price (10,000 characters)

Output price

qwen3-tts-flash-realtime

$0.143353

Not charged

qwen3-tts-flash-realtime-2025-11-27

$0.143353

Not charged

qwen3-tts-flash-realtime-2025-09-18

$0.143353

Not charged

qwen-tts realtime series

Billing rule: You are charged based on the number of input and output tokens.

Model

Input price (Million tokens)

Input price (Million tokens)

qwen-tts-realtime

$0.345

$1.721

qwen-tts-realtime-latest

$0.345

$1.721

qwen-tts-realtime-2025-07-15

$0.345

$1.721

Qwen-TTS voice cloning

Billing rule: You are charged based on the number of new voices created.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Price (per voice)

Free quota (Note)

qwen-voice-enrollment

$0.01

1000 voices/account

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Price (per voice)

qwen-voice-enrollment

$0.01

Qwen-TTS voice design

Billing rule: You are charged based on the number of new voices created.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Price (per voice)

Free quota (Note)

qwen-voice-design

$0.2

10 voices/account

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Price (per voice)

qwen-voice-design

$0.2

CosyVoice

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Billing rule: You are charged based on the number of input text characters. You are not charged for the output.

Model

Input price

Free quota (Note)

cosyvoice-v3-plus

$0.286706/10,000 characters

No free quota

cosyvoice-v3-flash

$0.14335/10,000 characters

cosyvoice-v2

$0.286706/10,000 characters

Speech recognition (speech-to-text) and translation (speech-to-translation)

Qwen3-LiveTranslate-Flash-Realtime

Billing rule: You are charged based on the number of input and output tokens. For token calculation rules for different modalities, see Billing.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

Input: Audio

Input: Image

Output: Text

Output: Audio

qwen3-livetranslate-flash-realtime

$10

$1.3

$10

$38

1 million tokens each
Validity: 90 days after activating Model Studio

qwen3-livetranslate-flash-realtime-2025-09-22

$10

$1.3

$10

$38

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price (Million tokens)

Output price (Million tokens)

Input: Audio

Input: Image

Output: Text

Output: Audio

qwen3-livetranslate-flash-realtime

$9.175

$1.147

$9.175

$34.405

qwen3-livetranslate-flash-realtime-2025-09-22

$9.175

$1.147

$9.175

$34.405

Qwen-ASR

Billing rule: You are charged based on the duration of the input audio in seconds. You are not charged for the output.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price

Free quota (Note)

qwen3-asr-flash-filetrans

$0.000035/second

36,000 seconds (10 hours)
Validity: 90 days after activating Model Studio

qwen3-asr-flash-filetrans-2025-11-17

qwen3-asr-flash

qwen3-asr-flash-2025-09-08

US

In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region. Inference computing resources are limited to the United States.

Note

Models in the US deployment mode do not have a free quota.

Model

Input price

qwen3-asr-flash-us

$0.000035/second

qwen3-asr-flash-2025-09-08-us

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price

qwen3-asr-flash-filetrans

$0.000032/second

qwen3-asr-flash-filetrans-2025-11-17

qwen3-asr-flash

qwen3-asr-flash-2025-09-08

Qwen-ASR-Realtime

Billing rule: You are charged based on the duration of the input audio in seconds. You are not charged for the output.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price

Free quota (Note)

qwen3-asr-flash-realtime

$0.000090/second

36,000 seconds (10 hours)
Validity: 90 days after activating Model Studio

qwen3-asr-flash-realtime-2025-10-27

$0.000090/second

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price

qwen3-asr-flash-realtime

$0.000047/second

qwen3-asr-flash-realtime-2025-10-27

Fun-ASR

Audio file recognition

Billing rule: You are charged based on the duration of the input audio in seconds. You are not charged for the output.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price

Free quota (Note)

fun-asr

$0.000035/second

36,000 seconds (10 hours)
Valid for 90 days

fun-asr-2025-11-07

fun-asr-2025-08-25

fun-asr-mtl

fun-asr-mtl-2025-08-25

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price

fun-asr

$0.000032/second

fun-asr-2025-11-07

fun-asr-2025-08-25

fun-asr-mtl

fun-asr-mtl-2025-08-25

Real-time speech recognition

Billing rule: You are charged based on the duration of the input audio in seconds. You are not charged for the output.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price

Free quota (Note)

fun-asr-realtime

$0.00009/second

36,000 seconds (10 hours)

Valid for 90 days

fun-asr-realtime-2025-11-07

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price

fun-asr-realtime

$0.000047/second

fun-asr-realtime-2025-11-07

fun-asr-realtime-2025-09-15

Paraformer

Audio file recognition

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Billing rule: You are charged based on the duration of the input audio in seconds. You are not charged for the output.

Model

Input price

paraformer-v2

$0.000012/second

paraformer-8k-v2

Real-time speech recognition

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Billing rule: You are charged based on the duration of the input audio in seconds. You are not charged for the output.

Model

Input price

Free quota (Note)

paraformer-realtime-v2

$0.000035/second

No free quota

paraformer-realtime-8k-v2

Text embedding

Billing rule: You are charged based on the number of input tokens. You are not charged for the output.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price (Million tokens)

Free quota (Note)

text-embedding-v4

$0.07

1 million tokens
Validity: 90 days after activating Model Studio

text-embedding-v3

$0.07

500,000 tokens
Validity: 90 days after activating Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price (Million tokens)

text-embedding-v4

$0.072

Multimodal embedding

Billing rule: You are charged based on the number of input tokens. You are not charged for the output.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price (Million input tokens)

Free quota (Note)

tongyi-embedding-vision-plus

$0.09

1 million tokens

Validity: 90 days after activating Model Studio

tongyi-embedding-vision-flash

Image/Video: $0.03

Text: $0.09

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model

Input price (Million tokens)

Free quota (Note)

multimodal-embedding-v1

Free trial

No token quota

Text rerank

Billing rule: You are charged based on the number of input tokens. You are not charged for the output.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price (Million tokens)

Free quota (Note)

qwen3-rerank

$0.1

1 million tokens

Validity: 90 days after activating Model Studio

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Note

Models in Mainland China deployment mode do not have a free quota.

Model

Input price (Million tokens)

gte-rerank-v2

$0.115

Domain specific

Intent recognition

Note

Only the Mainland China deployment mode is supported. In this mode, the endpoint and data storage are located in the Beijing region, and inference computing resources are limited to Mainland China.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

tongyi-intent-detect-v3

$0.058

$0.144

No free quota

Role playing

Billing is based on the number of input and output tokens.

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding Mainland China.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen-plus-character-ja

$0.5

$1.4

No free quota

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to Mainland China.

Model

Input price (Million tokens)

Output price (Million tokens)

Free quota (Note)

qwen-plus-character

$0.115

$0.287

No free quota