All Products
Search
Document Center

Alibaba Cloud Model Studio:Visual understanding models

Last Updated:Apr 23, 2026

Choose the right model for your use case, such as image analysis, video understanding, or OCR.

Image and video understanding

Start with qwen3.6-plus, the flagship Qwen model. It supports 1M context window, up to 2-hour videos, function calling, and built-in tools. Once your application is stable, you can switch to qwen3.6-flash to reduce costs. It offers near-flagship performance with the same context length and feature set.

Image resolution

Most models support up to 16 million pixels per image. Higher resolutions use more tokens. Token count per image: h x w / (32 x 32) + 2.

Video support

  • Up to 2 hours / 2 GB: qwen3.6-plus, qwen3.6-flash, qwen3.5-plus, qwen3.5-flash

  • Up to 1 hour / 2 GB: qwen3-vl-plus, qwen3-vl-flash

  • Up to 1 hour / 2 GB: qwen3.5-omni-plus, qwen3.5-omni-flash (also supports audio input)

Function calling and built-in tools

Allows the model to perform actions based on image or video content.

  • Function calling: Supported by the Qwen3.6, Qwen3.5, and Qwen3-VL series.

  • Built-in tools (web search, code execution, no setup required): Available for qwen3.6-plus, qwen3.6-flash, qwen3.5-plus, and qwen3.5-flash.

Structured output

Get valid JSON output from visual inputs, such as extracting product details from a photo.

Supported by the Qwen3.6, Qwen3.5, and Qwen3-VL series in non-thinking mode.

OCR and document extraction

qwen-vl-ocr is optimized for text extraction from documents, tables, exam papers, and handwritten content. For general text extraction from images, use qwen3.6-plus or qwen3.6-flash.

Recommended models

Model

Context

Max pixels/image

Max video duration

Max video size

Max images

Max videos

Function calling

Built-in tools

Structured output

qwen3.6-plus

1M

16M

2 hours

2 GB

256

64

Supported

Supported

Supported

qwen3.6-flash

1M

16M

2 hours

2 GB

256

64

Supported

Supported

Supported

qwen3.5-omni-plus

64k

--

1 hour

2 GB

2,048

512

Supported

--

Supported

All models

Qwen3.6

Model ID

Input

Output

Context

Max output

Max images

Max videos

Function calling

Built-in tools

Structured output

qwen3.6-plus

Text, images, video

Text

1M

64k

256

64

Supported

Supported

Supported

qwen3.6-plus-2026-04-02

Text, images, video

Text

1M

64k

256

64

Supported

Supported

Supported

qwen3.6-flash

Text, images, video

Text

1M

64k

256

64

Supported

Supported

Supported

qwen3.6-flash-2026-04-16

Text, images, video

Text

1M

64k

256

64

Supported

Supported

Supported

qwen3.6-35b-a3b

Text, images, video

Text

256k

64k

256

64

Supported

Supported

Supported

Qwen3.5

Model ID

Input

Output

Context

Max output

Max images

Max videos

Function calling

Built-in tools

Structured output

qwen3.5-plus

Text, images, video

Text

1M

64k

256

64

Supported

Supported

Supported

qwen3.5-plus-2026-02-15

Text, images, video

Text

1M

64k

256

64

Supported

Supported

Supported

qwen3.5-flash

Text, images, video

Text

1M

64k

256

64

Supported

Supported

Supported

qwen3.5-flash-2026-02-23

Text, images, video

Text

1M

64k

256

64

Supported

Supported

Supported

qwen3.5-397b-a17b

Text, images, video

Text

32k

8k

256

64

Supported

Supported

Supported

qwen3.5-122b-a10b

Text, images, video

Text

32k

8k

256

64

Supported

Supported

Supported

qwen3.5-27b

Text, images, video

Text

32k

8k

256

64

Supported

Supported

Supported

qwen3.5-35b-a3b

Text, images, video

Text

32k

8k

256

64

Supported

Supported

Supported

Legacy and other models

These models are no longer recommended. For new projects, use the Qwen3.6 or Qwen3.5 series. For full model specifications, visit the Models page.

China (Beijing) | Singapore | U.S. | China (Hong Kong) | Germany (Frankfurt)

View list of legacy and other models

Qwen3-VL

  • qwen3-vl-plus

  • qwen3-vl-plus-2026-01-25

  • qwen3-vl-flash

  • qwen3-vl-flash-2026-01-25

Qwen2.5-VL

  • qwen2.5-vl-72b-instruct

  • qwen2.5-vl-32b-instruct

  • qwen2.5-vl-7b-instruct

  • qwen2.5-vl-3b-instruct

Qwen-Omni

  • qwen3-omni-flash

  • qwen3-omni-flash-2025-10-22

  • qwen-omni-turbo and its snapshot versions

Qwen-OCR

  • qwen-vl-ocr

  • qwen-vl-ocr-latest

  • qwen-vl-ocr-2025-07-14

QVQ

  • qvq-max

  • qvq-max-2025-08-28

  • qvq-plus

  • qvq-plus-2025-08-27

Legacy Qwen-VL

  • qwen-vl-max and its snapshots

  • qwen-vl-plus and its snapshots