Visual understanding models - Alibaba Cloud Model Studio

Choose the right model for your use case, such as image analysis, video understanding, or OCR.

Image and video understanding

Start with qwen3.6-plus, the flagship Qwen model. It supports 1M context window, up to 2-hour videos, function calling, and built-in tools. Once your application is stable, you can switch to qwen3.6-flash to reduce costs. It offers near-flagship performance with the same context length and feature set.

Image resolution

Most models support up to 16 million pixels per image. Higher resolutions use more tokens. Token count per image: h x w / (32 x 32) + 2.

Video support

Up to 2 hours / 2 GB: qwen3.6-plus, qwen3.6-flash, qwen3.5-plus, qwen3.5-flash
Up to 1 hour / 2 GB: qwen3-vl-plus, qwen3-vl-flash
Up to 1 hour / 2 GB: qwen3.5-omni-plus, qwen3.5-omni-flash (also supports audio input)

Function calling and built-in tools

Allows the model to perform actions based on image or video content.

Function calling: Supported by the Qwen3.6, Qwen3.5, and Qwen3-VL series.
Built-in tools (web search, code execution, no setup required): Available for qwen3.6-plus, qwen3.6-flash, qwen3.5-plus, and qwen3.5-flash.

Structured output

Get valid JSON output from visual inputs, such as extracting product details from a photo.

Supported by the Qwen3.6, Qwen3.5, and Qwen3-VL series in non-thinking mode.

OCR and document extraction

qwen-vl-ocr is optimized for text extraction from documents, tables, exam papers, and handwritten content. For general text extraction from images, use qwen3.6-plus or qwen3.6-flash.

Recommended models

Model	Context	Max pixels/image	Max video duration	Max video size	Max images	Max videos	Function calling	Built-in tools	Structured output
`qwen3.6-plus`	1M	16M	2 hours	2 GB	256	64	Supported	Supported	Supported
`qwen3.6-flash`	1M	16M	2 hours	2 GB	256	64	Supported	Supported	Supported
`qwen3.5-omni-plus`	64k	--	1 hour	2 GB	2,048	512	Supported	--	Supported

All models

Qwen3.6

Model ID	Input	Output	Context	Max output	Max images	Max videos	Function calling	Built-in tools	Structured output
`qwen3.6-plus`	Text, images, video	Text	1M	64k	256	64	Supported	Supported	Supported
`qwen3.6-plus-2026-04-02`	Text, images, video	Text	1M	64k	256	64	Supported	Supported	Supported
`qwen3.6-flash`	Text, images, video	Text	1M	64k	256	64	Supported	Supported	Supported
`qwen3.6-flash-2026-04-16`	Text, images, video	Text	1M	64k	256	64	Supported	Supported	Supported
`qwen3.6-35b-a3b`	Text, images, video	Text	256k	64k	256	64	Supported	Supported	Supported

Qwen3.5

Model ID	Input	Output	Context	Max output	Max images	Max videos	Function calling	Built-in tools	Structured output
`qwen3.5-plus`	Text, images, video	Text	1M	64k	256	64	Supported	Supported	Supported
`qwen3.5-plus-2026-02-15`	Text, images, video	Text	1M	64k	256	64	Supported	Supported	Supported
`qwen3.5-flash`	Text, images, video	Text	1M	64k	256	64	Supported	Supported	Supported
`qwen3.5-flash-2026-02-23`	Text, images, video	Text	1M	64k	256	64	Supported	Supported	Supported
`qwen3.5-397b-a17b`	Text, images, video	Text	32k	8k	256	64	Supported	Supported	Supported
`qwen3.5-122b-a10b`	Text, images, video	Text	32k	8k	256	64	Supported	Supported	Supported
`qwen3.5-27b`	Text, images, video	Text	32k	8k	256	64	Supported	Supported	Supported
`qwen3.5-35b-a3b`	Text, images, video	Text	32k	8k	256	64	Supported	Supported	Supported

Legacy and other models

These models are no longer recommended. For new projects, use the Qwen3.6 or Qwen3.5 series. For full model specifications, visit the Models page.

China (Beijing) | Singapore | U.S. | China (Hong Kong) | Germany (Frankfurt)

View list of legacy and other models

Qwen3-VL

qwen3-vl-plus
qwen3-vl-plus-2026-01-25
qwen3-vl-flash
qwen3-vl-flash-2026-01-25

Qwen2.5-VL

qwen2.5-vl-72b-instruct
qwen2.5-vl-32b-instruct
qwen2.5-vl-7b-instruct
qwen2.5-vl-3b-instruct

Qwen-Omni

qwen3-omni-flash
qwen3-omni-flash-2025-10-22
qwen-omni-turbo and its snapshot versions

Qwen-OCR

qwen-vl-ocr
qwen-vl-ocr-latest
qwen-vl-ocr-2025-07-14

QVQ

qvq-max
qvq-max-2025-08-28
qvq-plus
qvq-plus-2025-08-27

Legacy Qwen-VL

qwen-vl-max and its snapshots
qwen-vl-plus and its snapshots