newly listed models, new models, model release notes - Alibaba Cloud Model Studio

Global

In the global deployment mode, endpoints and data storage are located in the US (Virginia) region, and model inference compute resources are dynamically scheduled worldwide.

Type	Time	Specification	Description
Text-to-image	2026-01-04	wan2.6-t2i	A new sync interface is added. It supports selecting custom dimensions within the constraints of total pixel area and aspect ratio. Wan - text-to-image V2
Image generation and editing	2026-01-04	wan2.6-image	Supports image editing and mixed image-text output. Wan - Image generation and editing 2.6
Image-to-video based on the first frame	2026-01-04	wan2.6-i2v	Adds the multi-shot narrative feature, which supports audio using automatic dubbing or custom audio files. Wan - image-to-video - first frame
Reference-to-video	2026-01-04	wan2.6-r2v	Generates a multi-shot video based on a character's appearance and voice from a reference video. It also supports automatic dubbing. Wan - reference-to-video
Text-to-video	2026-01-04	wan2.6-t2v	Adds the multi-shot narrative feature, which supports audio using automatic dubbing or custom audio files. Wan - text-to-video
Visual understanding	2026-01-04	qwen3-vl-flash, qwen3-vl-flash-2025-10-15	A small-scale visual understanding model from the Qwen3 series. It effectively combines thinking and non-thinking modes. Compared to the open-source Qwen3-VL-30B-A3B, it delivers better performance and a faster response time. Image and video understanding
Visual understanding	2026-01-04	qwen3-vl-8b-thinking, qwen3-vl-8b-instruct	An 8B dense open-source model from the Qwen3-VL series, available in both thinking and non-thinking versions. It uses less GPU memory and performs multimodal understanding and inference. It supports ultra-long contexts such as long videos and documents, 2D/3D visual positioning, and comprehensive spatial intelligence and object recognition.Image and video understanding
Visual understanding	2026-01-04	qwen3-vl-32b-thinking, qwen3-vl-32b-instruct	A 32B dense model from the Qwen3-VL series. Its overall performance is second only to the Qwen3-VL-235B model. It excels in document recognition and understanding, spatial intelligence, object recognition, visual 2D detection, and spatial reasoning. This makes it suitable for complex perception tasks in general scenarios.Image and video understanding
Visual understanding	2026-01-04	qwen3-vl-plus, qwen3-vl-plus-2025-09-23, qwen3-vl-235b-a22b-thinking, qwen3-vl-235b-a22b-instruct	A visual understanding model from the Qwen3 series. It effectively combines thinking and non-thinking modes, and its visual agent capabilities are among the best in the world. This version features comprehensive upgrades in visual encoding, spatial intelligence, and multimodal thinking. It also significantly improves visual perception and recognition capabilities.Image and video understanding
Reasoning model	2026-01-04	qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct	A new generation of open-source models based on Qwen3. The thinking model has improved instruction-following capabilities and provides more concise summary responses compared to qwen3-235b-a22b-thinking-2507. Deep Thinking. The instruct model has enhanced Chinese understanding, logical reasoning, and text generation capabilities compared to qwen3-235b-a22b-instruct-2507. Text generation overview.
Reasoning model	2026-01-04	qwen3-max, qwen3-max-2025-09-23	Compared to the qwen3-max-preview version, this model has been specifically upgraded for agent programming and tool calling. This official release achieves state-of-the-art (SOTA) performance in its domain and supports more complex agent requirements.Qwen-Max
Reasoning model	2026-01-04	qwen3-max-preview	The Qwen-Max model is a preview version based on Qwen3. Compared to the Qwen 2.5 series, its general-purpose capabilities are greatly improved, delivering significantly enhanced performance in Chinese and English text comprehension, complex instruction following, subjective open-ended tasks, multilingual processing, and tool calling. The model also produces fewer knowledge hallucinations.Qwen-Max
Code model	2026-01-04	qwen3-coder-flash, qwen3-coder-flash-2025-07-28	The fastest and most cost-effective model in the Qwen-Coder series. Coding capabilities (Qwen-Coder).
Code model	2026-01-04	qwen3-coder-plus, qwen3-coder-plus-2025-07-22, qwen3-coder-30b-a3b-instruct, qwen3-coder-480b-a35b-instruct	A code generation model based on Qwen3. It has powerful Coding Agent capabilities and excels at tool calling and environment interaction. It combines excellent code capabilities with general-purpose abilities. Coding capabilities (Qwen-Coder)
Reasoning model	2026-01-04	qwen3-30b-a3b-thinking-2507, qwen3-30b-a3b-instruct-2507	An upgraded version of qwen3-30b-a3b. The thinking model has improved logical, general, knowledge-enhanced, and creative capabilities. Deep Thinking. The instruct model has improved creative capabilities and model security. Text generation overview.
Reasoning model	2026-01-04	qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507	An upgraded version of qwen3-235b-a22b. The thinking model has greatly improved logical, general, knowledge-enhanced, and creative capabilities, making it suitable for high-difficulty, strong-inference scenarios. Deep Thinking. The instruct model has improved creative capabilities and model security. Text generation overview.
Reasoning model	2026-01-04	qwen3-30b-a3b, qwen3-32b, qwen3-14b, qwen3-8b	Qwen3 models support both thinking and non-thinking modes. Switch between the two modes using the `enable_thinking` parameter. In addition, the capabilities of Qwen3 models have been greatly enhanced: Inference capability: In evaluations for math, code, and logical reasoning, the models significantly outperform QwQ and other non-reasoning models of similar size, reaching top-tier industry levels. Human preference alignment: Capabilities for creative writing, role-playing, multi-turn conversation, and instruction following are greatly improved. The general capabilities significantly exceed those of models of a similar size. Agent capability: The models achieve industry-leading performance in both thinking and non-thinking modes. They can accurately invoke external tools. Multilingual capability: The models support over 100 languages and dialects. Capabilities for multilingual translation, instruction understanding, and common-sense reasoning are significantly improved. Response format fixes: This version fixes response format issues from previous versions, such as abnormal Markdown, mid-sentence truncation, and incorrect boxed output. For more information about thinking mode, see Deep Thinking. For more information about non-thinking mode, see Text generation overview.
Text extraction	2026-01-04	qwen-vl-ocr-2025-11-20	This snapshot of the Qwen text extraction model is based on the Qwen3-VL architecture and significantly improves document parsing and text localization. Text Extraction
Text extraction	2026-01-04	qwen-vl-ocr	qwen-vl-ocr is a model specialized for OCR. It significantly improves text extraction from images such as tables and exam papers. Text Extraction.
Reasoning model	2026-01-04	qwen-plus-2025-12-01, qwen-plus-2025-09-11	A model from the Qwen3 series. Compared to qwen-plus-2025-07-28, it has improved instruction-following capabilities and provides more concise summaries in thinking mode. Deep Thinking. In non-thinking mode, it has enhanced Chinese language comprehension and logical reasoning capabilities. Text generation overview.
Reasoning model	2026-01-04	qwen-plus-2025-07-28	A model from the Qwen3 series. Compared to the previous version, it increases the context length to 1,000,000. For more information about thinking mode, see Deep Thinking. For more information about non-thinking mode, see Text generation overview.
Reasoning model	2026-01-04	qwen-plus	Offers a balance of capabilities. Its inference performance, cost, and speed are between those of Qwen-Max and Qwen-Flash, making it suitable for moderately complex tasks. Model List
Multilingual translation	2026-01-04	qwen-mt-lite	A basic text translation Large Language Model (LLM) from Qwen. It supports translation between 31 languages. It offers a faster response and lower cost than qwen-mt-flash, making it suitable for latency-sensitive scenarios.Translation capabilities (Qwen-MT)
Multilingual translation	2026-01-04	qwen-mt-plus, qwen-mt-flash	The Qwen-MT model is a large language model for machine translation, optimized from the Qwen model. It excels at translation between Chinese and English, between Chinese and minor languages, and between English and minor languages. It supports 26 minor languages, such as Japanese, Korean, French, Spanish, German, Portuguese (Brazil), Thai, Indonesian, Vietnamese, and Arabic. In addition to multilingual translation, it offers features such as terminology intervention, domain prompting, and translation memory to improve translation quality in complex scenarios. Translation capabilities (Qwen-MT).
Text-to-text	2026-01-04	qwen-flash, qwen-flash-2025-07-28	The fastest and most cost-effective model in the Qwen series, suitable for simple jobs. Qwen-Flash

International

In the international deployment mode, endpoints and data storage are located in the Singapore region, and model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.

Type	Time	Specification	Description
Speech synthesis	2026-02-10	cosyvoice-v3-plus, cosyvoice-v3-flash	CosyVoice speech synthesis adds v3 models that support speech synthesis with system voices and cloned voices.Real-time speech synthesis - CosyVoice
Speech synthesis	2026-02-10	qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26	Qwen speech synthesis introduces an instruction-control model that supports precise control of synthesis output through natural language instructions. Speech synthesis - Qwen
Speech synthesis	2026-02-10	qwen3-tts-vd-2026-01-26	Qwen speech synthesis introduces a voice design model that supports creating customized voices through text descriptions. Speech synthesis - Qwen
Speech synthesis	2026-02-10	qwen3-tts-vc-2026-01-22	Qwen speech synthesis introduces a voice cloning model that supports quickly cloning voices based on real audio samples. Speech synthesis - Qwen
Speech synthesis	2026-02-04	qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22	Qwen real-time speech synthesis introduces an instruction-control model that supports precise control of speech synthesis output through natural language instructions. Real-time Text-to-Speech - Qwen
Reference-to-video	2026-02-02	wan2.6-r2v-flash	Generate multi-shot videos with automatic dubbing, using a character's appearance from a reference video or image. Wan - reference-to-video
Visual understanding	2026-01-28	qwen3-vl-flash-2026-01-22	The new Qwen-VL snapshot integrates thinking and non-thinking modes. Compared to the snapshot released on October 15, 2025, this model provides significantly improved overall performance and higher-accuracy inference in scenarios such as general visual recognition, security, store inspection, and photo-based problem-solving. Image and video understanding
Speech recognition	2026-01-28	qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17	The Qwen3-ASR-Flash-Filetrans series now supports word-level timestamps. By setting the new `enable_words` parameter, you can get millisecond-level word and character alignment information and achieve more semantically accurate, fine-grained sentence segmentation. Audio file recognition - Qwen
Reasoning model	2026-01-27	qwen3-max-2026-01-23	Compared to the snapshot version from September 23, 2025, this model effectively merges thinking and non-thinking modes, significantly enhancing its overall performance. In thinking mode, the model is integrated with three tools: web search, web extractor, and code interpreter. By leveraging external tools during its thought process, it achieves higher accuracy on complex problems. OpenAI-compatible - Responses
Visual understanding	2026-01-23	qwen3-vl-flash-2026-01-22	A new snapshot of Qwen-VL. Compared to the snapshot from October 15, 2025, it effectively merges thinking and non-thinking modes, significantly enhancing the model's overall performance. It achieves higher inference accuracy in business scenarios such as general visual recognition, security monitoring, store/patrol inspections, and photo-based problem solving. Image and video understanding
Image-to-video	2026-01-18	wan2.6-i2v-flash	Supports the generation of videos with and without sound, which are billed independently based on their respective billing rules. It also has multi-shot narrative and audio processing capabilities. Wan - image-to-video - first frame
Image editing	2026-01-18	qwen-image-edit-max, qwen-image-edit-max-2026-01-16	The Qwen Image Edit Max series provides more stable and versatile editing capabilities, enhances industrial design and geometric inference, and improves character consistency and editing precision.Image Editing - Qwen
Speech synthesis	2026-01-16	qwen3-tts-vc-realtime-2026-01-15	Qwen real-time speech synthesis has released a new snapshot model that further optimizes the Voice cloning (Qwen) effect. The synthesized voice is more natural and closer to the original compared to qwen3-tts-vc-realtime-2025-11-27. Real-time Text-to-Speech - Qwen
Text-to-image	2026-01-12	qwen-image-plus-2026-01-09	A new snapshot model for Qwen text-to-image generation, which is a distilled and accelerated version of qwen-image-max that rapidly generates high-quality images. Qwen-text-to-image
Image-to-video	2026-01-08	wan2.2-kf2v-flash	The model can generate a seamless and smooth dynamic video from the input first and last frame images based on a prompt. Wan - image-to-video - first and last frames
Speech recognition	2026-01-06	qwen3-asr-flash, qwen3-asr-flash-2025-09-08	Qwen3-ASR-Flash supports an OpenAI compatible mode. Recognize audio files using Qwen
Text-to-image	2025-12-31	qwen-image-max, qwen-image-max-2025-12-30	The Qwen text-to-image model Max series offers enhanced realism and naturalness compared to the Plus series. It effectively reduces AI-generated artifacts and excels in aspects such as human figure textures, texture details, and text rendering. Qwen text-to-image
Image editing	2025-12-23	qwen-image-edit-plus-2025-12-15	The latest snapshot for Qwen-Image-Editing enhances character consistency, industrial design capabilities, and geometric inference compared to the previous version. It also optimizes the alignment of spatial layout, texture, and style between the edited and original images, resulting in more precise edits. Image Editing - Qwen
Text-to-image	2025-12-22	z-image-turbo	A lightweight text-to-image model that quickly generates high-quality images. It supports bilingual rendering in Chinese and English, complex semantic understanding, multiple styles and themes, and flexibly adapts to various resolutions and aspect ratios. Text-to-image Z-Image
Visual understanding	2025-12-19	qwen3-vl-plus-2025-12-19	The new Qwen-VL snapshot model features improved instruction-following capabilities and lower latency. Image and video understanding
Speech recognition	2025-12-19	qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17, qwen3-asr-flash, qwen3-asr-flash-2025-09-08	Added support for speech recognition in 9 more languages, including Czech and Danish. Audio file recognition - Qwen
Speech recognition	2025-12-17	qwen3-asr-flash-realtime, qwen3-asr-flash-realtime-2025-10-27	Speech recognition now supports nine additional languages, including Czech and Danish. Real-time Speech Recognition - Qwen
Speech recognition	2025-12-17	qwen3-asr-flash, qwen3-asr-flash-2025-09-08	Supports audio with any sample rate and sound channel. Audio file recognition - Qwen
Speech recognition	2025-12-17	fun-asr-mtl, fun-asr-mtl-2025-08-25	Support for speech recognition in 31 languages, including Chinese, English, Japanese, and Korean. This feature is ideal for Southeast Asia scenarios. Audio file recognition - Fun-ASR/Paraformer
Voice design	2025-12-16	qwen-voice-design	Qwen has released a voice design model for generating customized voices from text descriptions. Use this model with the qwen3-tts-vd-realtime-2025-12-16 model to generate speech in 10 languages. Voice design (Qwen)
Speech synthesis	2025-12-16	qwen3-tts-vd-realtime-2025-12-16 (snapshot)	The new snapshot model for Qwen real-time speech synthesis uses voices from Voice design (Qwen) for low-latency, high-stability real-time synthesis. It supports multi-language output, automatically adjusts the tone based on the text, and optimizes synthesis for complex text. Real-time Text-to-Speech - Qwen
Text-to-image	2025-12-16	wan2.6-t2i	A new sync interface is added. It supports selecting custom dimensions within the constraints of total pixel area and aspect ratio. Wan - text-to-image V2
Image generation and editing	2025-12-16	wan2.6-image	Supports image editing and mixed image-text output. Wan2.6 - image generation and editing
Image-to-video based on the first frame	2025-12-16	wan2.6-i2v	Adds the multi-shot narrative feature, which supports audio using automatic dubbing or custom audio files. Wan - image-to-video - first frame
Reference-to-video	2025-12-16	wan2.6-r2v	Generates a multi-shot video based on a character's appearance and voice from a reference video. It supports automatic dubbing. Wan - reference-to-video
Text-to-video	2025-12-16	wan2.6-t2v	Adds the multi-shot narrative feature, which supports audio using automatic dubbing or custom audio files. Wan - text-to-video
Speech recognition	2025-12-12	fun-asr, fun-asr-2025-11-07	Feature updates for Fun-ASR audio file recognition: Added support for singing recognition to transcribe entire songs. Audio file recognition - Fun-ASR/Paraformer
Omni-modal	2025-12-04	qwen3-omni-flash-2025-12-01	The latest Qwen Omni snapshot model increases the number of supported timbres to 49 and features a significant upgrade to its instruction-following capabilities, enabling it to efficiently understand text, images, audio, and video. Non-real-time (Qwen-Omni)
Real-time multimodal	2025-12-04	qwen3-omni-flash-realtime-2025-12-01	The latest snapshot model for the Qwen Omni real-time version offers low-latency multimodal interaction. The number of supported timbres is increased to 49, and the model's instruction-following ability and interactive experience are significantly upgraded. Real-time (Qwen-Omni-Realtime)
Speech translation	2025-12-04	qwen3-livetranslate-flash, qwen3-livetranslate-flash-2025-12-01	Qwen3-LiveTranslate-Flash is an audio and video translation model that translates between 18 languages, such as Chinese, English, Russian, and French. It leverages visual context to improve translation accuracy and provides both text and speech output. Audio and Video File Translation – Qwen
Multilingual translation	2025-12-02	qwen-mt-lite	A basic text translation Large Language Model (LLM) from Qwen. It supports translation between 31 languages. It offers a faster response and lower cost than qwen-mt-flash, making it suitable for latency-sensitive scenarios. Machine translation (Qwen-MT)
Voice cloning	2025-11-27	qwen-voice-enrollment	Qwen released a voice cloning model. It generates a highly similar voice from just over 5 seconds of audio. When used with the qwen3-tts-vc-realtime-2025-11-27 model, it can create a high-fidelity clone of a person's voice and output it in real time across 11 languages. Voice cloning (Qwen)
Speech synthesis	2025-11-27	qwen3-tts-vc-realtime-2025-11-27 (snapshot)	Qwen real-time speech synthesis released a new snapshot model. It uses voices generated by Voice cloning (Qwen) for low-latency, high-stability real-time synthesis. The model supports multilingual output. It can automatically adjust the tone based on the text and optimize the synthesis performance for complex text. Real-time Text-to-Speech - Qwen
Speech synthesis	2025-11-27	qwen3-tts-flash-realtime-2025-11-27 (snapshot)	Qwen real-time speech synthesis released a new snapshot model. It features low latency and high stability. The model offers a richer selection of voices, and each voice supports multilingual output. It automatically adjusts the tone based on the text and enhances synthesis performance for complex text. Real-time Text-to-Speech - Qwen
Speech synthesis	2025-11-27	qwen3-tts-flash-2025-11-27 (snapshot)	Qwen Speech Synthesis published a new snapshot model. It offers a richer selection of voices. Each voice supports multilingual output. The model automatically adjusts the tone based on the text and provides optimized synthesis for complex text. Qwen text-to-speech (TTS)
Text extraction	2025-11-21	qwen-vl-ocr-2025-11-20 (snapshot)	This snapshot of the Qwen text extraction model is based on the Qwen3-VL architecture and significantly improves document parsing and text localization. Text Extraction
Speech recognition	2025-11-20	qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17 (snapshot)	Qwen audio file recognition released a new model. It is designed for asynchronous transcription of audio files and supports recordings up to 12 hours long. Audio file recognition - Qwen
Speech recognition	2025-11-19	fun-asr-2025-11-07 (snapshot)	Fun-ASR audio file recognition released a new snapshot model. This model optimizes far-field voice activity detection (VAD) to improve recognition accuracy and stability. In addition to Chinese and English, the model now supports multiple Chinese dialects and Japanese. Audio file recognition - Fun-ASR/Paraformer
Multilingual translation	2025-11-11	qwen-mt-flash	Compared to qwen-mt-turbo, this model supports streaming incremental output and offers improved overall performance. Translation capabilities (Qwen-MT)
Image-to-video	2025-11-10	wan2.2-animate-move	This model transfers the actions and expressions of a character from a template video to a single static image to generate a video of the character in motion. Wan - image to animation
Image-to-video	2025-11-10	wan2.2-animate-mix	This model replaces the main character in a reference video with a character from an image. It preserves the original video's scene, lighting, and tone for seamless character replacement. Wan - video character swap
Reasoning model	2025-11-03	qwen3-max-preview	The thinking mode of the qwen3-max-preview model features significantly improved overall inference capabilities. It performs especially well in agent programming, common-sense reasoning, and tasks related to math, science, and general purposes. Deep Thinking
Image editing	2025-10-31	qwen-image-edit-plus, qwen-image-edit-plus-2025-10-30	Built on qwen-image-edit, this model has optimized inference performance and system stability. It significantly reduces the response time for image generation and editing and supports returning multiple images in a single request. Image Editing - Qwen
Real-time speech recognition	2025-10-27	qwen3-asr-flash-realtime, qwen3-asr-flash-realtime-2025-10-27	The Qwen real-time speech recognition large language model (LLM) features automatic language detection. It can detect 11 language types and provides accurate transcription in complex audio environments. Real-time Speech Recognition - Qwen
Visual understanding	2025-10-21	qwen3-vl-32b-thinking, qwen3-vl-32b-instruct	A 32B dense model from the Qwen3-VL series. Its overall performance is second only to the Qwen3-VL-235B model. It excels in document recognition and understanding, spatial intelligence and object recognition, and visual 2D detection/spatial reasoning. This makes it suitable for complex perception tasks in general scenarios. Image and video understanding
Visual understanding	2025-10-16	qwen3-vl-flash, qwen3-vl-flash-2025-10-15	A small-scale visual understanding model from the Qwen3 series. It effectively combines thinking and non-thinking modes. Compared to the open-source Qwen3-VL-30B-A3B, it delivers better performance and a faster response time. Image and video understanding
Visual understanding	2025-10-14	qwen3-vl-8b-thinking, qwen3-vl-8b-instruct	An 8B dense model from the Qwen3-VL series. It uses less GPU memory and can perform multimodal understanding and inference. It supports ultra-long contexts such as long videos and documents, and visual 2D/3D positioning. It also has comprehensive spatial intelligence and object recognition capabilities. Image and video understanding
Visual understanding	2025-10-03	qwen3-vl-30b-a3b-thinking, qwen3-vl-30b-a3b-instruct	Based on the new generation of open-source Qwen3-VL model, this model has a fast response time. It features stronger multimodal understanding and inference, visual agent capabilities, and support for ultra-long contexts such as long videos and long documents. It also has comprehensively upgraded spatial intelligence and object recognition capabilities, making it capable of handling complex real-world tasks. Image and video understanding
Visual understanding	2025-09-23	qwen3-vl-plus, qwen3-vl-plus-2025-09-23, qwen3-vl-235b-a22b-thinking, qwen3-vl-235b-a22b-instruct	A visual understanding model from the Qwen3 series. It effectively combines thinking and non-thinking modes, and its visual agent capabilities are among the best in the world. This version features comprehensive upgrades in visual encoding, spatial intelligence, and multimodal thinking. Its visual perception and recognition capabilities are also significantly improved. Image and video understanding
Text-to-image	2025-09-23	qwen-image-plus	This model excels at rendering complex text, especially for Chinese and English. It creates complex layouts that mix images and text. It is also more cost-effective than qwen-image. Qwen text-to-image
Code model	2025-09-23	qwen3-coder-plus-2025-09-23	Compared to the previous version (snapshot from July 22), this model has improved performance on downstream tasks and greater robustness in tool calling. It also features enhanced code security. Coding capabilities (Qwen-Coder)
Reasoning model	2025-09-11	qwen-plus-2025-09-11	This model is part of the Qwen3 series. Compared to qwen-plus-2025-07-28, it offers improved instruction-following and generates more concise summaries in thinking mode. Deep Thinking. In non-thinking mode, it provides enhanced Chinese comprehension and logical reasoning. Text generation overview.
Reasoning model	2025-09-11	qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct	This is a new generation of open-source models based on Qwen3. The thinking model has improved instruction-following capabilities and provides more concise summary responses compared to qwen3-235b-a22b-thinking-2507. Deep Thinking. The instruct model has enhanced Chinese understanding, logical reasoning, and text generation capabilities compared to qwen3-235b-a22b-instruct-2507. Text generation overview.
Text-to-text	2025-09-05	qwen3-max-preview	Qwen-Max is a preview model based on Qwen3. It offers a significant improvement in general capabilities over the Qwen 2.5 series. The model shows greatly enhanced performance in understanding Chinese and English text, following complex instructions, and handling subjective open-ended tasks. Its multilingual and tool calling capabilities are also stronger. The model is less prone to knowledge-based hallucinations. Qwen-Max
Image editing	2025-08-19	qwen-image-edit	The Qwen image editing model supports precise text editing in both Chinese and English, rendering intent, detail enhancement, style transfer, adding or removing objects, and changing positions and actions. It can perform complex image and text editing. Image Editing - Qwen
Visual understanding	2025-08-18	qwen-vl-plus-2025-08-15	A visual understanding model. It offers significant improvements in object detection, localization, and multilingual processing. Image and video understanding
Text-to-image	2025-08-14	qwen-image	The Qwen-Image model excels at complex text rendering, especially for Chinese and English text, and can create complex mixed layouts of images and text. Qwen text-to-image
Visual understanding	2025-08-13	qwen-vl-max-2025-08-13	A visual understanding model. It has comprehensively improved visual understanding metrics and significantly enhanced capabilities in math, reasoning, object recognition, and multilingual processing. Image and video understanding
Code model	2025-08-05	qwen3-coder-flash, qwen3-coder-flash-2025-07-28	The fastest and most cost-effective model in the Qwen-Coder series. Coding capabilities (Qwen-Coder).
Reasoning model	2025-08-05	qwen-flash, qwen-flash-2025-07-28	The fastest and most cost-effective model in the Qwen series, suitable for simple jobs. Models
reasoning model	2025-07-30	qwen-plus-2025-07-28	A model from the Qwen3 series. Compared to the previous version, it increases the context length to 1,000,000. For more information about thinking mode, see Deep Thinking. For more information about non-thinking mode, see Text generation overview.
Reasoning model	2025-07-30	qwen3-30b-a3b-thinking-2507 qwen3-30b-a3b-instruct-2507	An upgraded version of qwen3-30b-a3b. The thinking model has improved logical, general, knowledge-enhanced, and creative capabilities. Deep Thinking. The instruct model has improved creative capabilities and model security. Text generation overview.
Image-to-video	2025-07-28	wan2.2-i2v-plus	Compared to the 2.1 model, the new version has significantly improved image detail and motion stability. The generation speed is increased by up to 50%. Wan - image-to-video - first frame
Text-to-video	2025-07-28	wan2.2-t2v-plus	Compared to the 2.1 model, this new version has significantly improved image detail and motion stability. Generation speed is 50% faster. Text-to-video
Text-to-image	2025-07-28	wan2.2-t2i-flash, wan2.2-t2i-plus	Compared to the 2.1 model, this new version is comprehensively upgraded in creativity, stability, and photorealism. The generation speed is increased by 50%. Text-to-image.
Reasoning model	2025-07-24	qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507	An upgraded version of qwen3-235b-a22b. The thinking model has major improvements in logic, general capabilities, knowledge enhancement, and creativity. It is suitable for difficult scenarios that require strong inference. Deep Thinking. The instruct model has improved creative capabilities and model security. Text generation overview.
Code model	2025-07-23	qwen3-coder, qwen3-coder-plus-2025-07-22	A code generation model based on Qwen3. It has powerful Coding Agent capabilities and excels at tool calling and environment interaction. It combines excellent code capabilities with general-purpose abilities. Coding capabilities (Qwen-Coder)
Visual understanding	2025-06-04	qwen-vl-plus-2025-05-07	A visual understanding model. The model has significantly improved capabilities in math, reasoning, and understanding content from monitoring videos. Image and video understanding.
Text-to-image	2025-05-22	wan2.1-t2i-turbo, wan2.1-t2i-plus	Generates an image from a single sentence. The model generates images of any resolution and aspect ratio, up to 2 million pixels. It is available in two versions: turbo and plus. Text-to-image
Visual understanding	2025-05-16	qwen-vl-max-2025-04-08	A visual understanding model. It has improved math and reasoning abilities. The response style is adjusted to align with human preferences, and the detail and format clarity of model responses are significantly improved. Image and video understanding
Visual understanding	2025-05-16	qwen-vl-plus-2025-01-25	A visual understanding model. It belongs to the Qwen2.5-VL series. Compared to the previous version, this model expands the context to 128K and significantly enhances image and video understanding.
Video editing	2025-05-19	wan2.1-vace-plus	A general-purpose video editing model. The model supports multimodal input, combining images, videos, and text prompts. It performs various tasks such as image-to-video (generating a video based on the entity or background of a reference image) and video repainting (extracting motion features from an input video to generate a video). Wan - general video editing
Reasoning model	2025-04-28	Qwen3 commercial models qwen-plus-2025-04-28, qwen-turbo-2025-04-28 Qwen3 open-source models qwen3-235b-a22b, qwen3-30b-a3b, qwen3-32b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b	Qwen3 models support both thinking and non-thinking modes. Switch between the two modes using the `enable_thinking` parameter. In addition, the capabilities of Qwen3 models have been greatly enhanced: Inference capability: In evaluations for math, code, and logical reasoning, the models significantly outperform QwQ and other non-reasoning models of similar size, reaching top-tier industry levels. Human preference alignment: Capabilities for creative writing, role-playing, multi-turn conversation, and instruction following are greatly improved. The general capabilities significantly exceed those of models of a similar size. Agent capability: The models achieve industry-leading performance in both reasoning and non-reasoning modes. They can accurately call external tools. Multilingual capability: The models support over 100 languages and dialects. Capabilities for multilingual translation, instruction understanding, and common-sense reasoning are significantly improved. Response format fixes: This version fixes response format issues from previous versions, such as abnormal Markdown, mid-sentence truncation, and incorrect boxed output. For more information about thinking mode, see Deep Thinking. For more information about non-thinking mode, see Text generation overview.
Text-to-video	2025-04-21	wan2.1-t2v-turbo, wan2.1-t2v-plus	Generates a video from a single sentence. It has powerful instruction-following capabilities, supports large and complex movements, and can replicate real-world physics. The generated videos feature rich artistic styles and cinematic-quality visuals. Wan - text-to-video.
Image-to-video	2025-04-21	wan2.1-kf2v-plus, wan2.1-i2v-turbo, wan2.1-i2v-plus,	Based on the input first and last frame images, the model can generate a smooth and fluid dynamic video according to the prompt. Wan - image-to-video - first and last frames Uses an input image as the first frame of the video and then generates the video based on a prompt. Wan - image-to-video - first frame.
Visual reasoning	2025-03-28	qvq-max, qvq-max-latest, qvq-max-2025-03-25	A visual reasoning model. It supports visual input and chain-of-thought output, demonstrating stronger capabilities in math, programming, visual analysis, creation, and general tasks. Visual reasoning
Omni-modal	2025-03-26	qwen2.5-omni-7b	A new multimodal Large Language Model (LLM) from Qwen for understanding and generation. It supports text, image, speech, and video input, and outputs text and audio. It provides two natural conversational voices. Non-real-time (Qwen-Omni).
Visual understanding	2025-03-24	qwen2.5-vl-32b-instruct	A visual understanding model. Its ability to solve math problems is close to the level of Qwen2.5VL-72B. The response style has been significantly adjusted to align with human preferences. The model now gives much more detailed and clearly formatted answers, especially for objective questions in math, logical reasoning, and knowledge Q&A. Image and video understanding
Reasoning model	2025-03-06	qwq-plus	The QwQ reasoning model, trained on the Qwen2.5 model, greatly improves its reasoning ability through reinforcement learning. The model's core metrics for math and code (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench, etc.) reach the level of the full-performance DeepSeek-R1. Deep Thinking
Visual understanding	2025-01-27	qwen2.5-vl-3b-instruct qwen2.5-vl-7b-instruct qwen2.5-vl-72b-instruct	Compared to the Qwen2-VL Large Language Model (LLM), this model has the following improvements: It has significantly improved capabilities in instruction following, mathematical calculations, code generation, and structured output (JSON output). It supports unified parsing of visual content, such as text, charts, and layouts in images. The model can also accurately locate visual elements and supports both detection frames and coordinate points. It can understand long video files (up to 10 minutes), provide event localization with second-level precision, and recognize the sequence and speed of events. Image and video understanding
Text-to-text	2025-01-27	qwen-max-2025-01-25 qwen2.5-14b-instruct-1m qwen2.5-7b-instruct-1m	The qwen-max-2025-01-25 model (also known as Qwen2.5-Max): The best-performing model in the Qwen series. It has significant improvements in code writing and understanding, logical reasoning, and multilingual capabilities. The response style is greatly adjusted to align with human preferences. The detail and format clarity of model responses are significantly improved, with targeted enhancements in content creation, JSON format adherence, and role-playing. Text generation overview The qwen2.5-14b-instruct-1m and qwen2.5-7b-instruct-1m models: Compared to the qwen2.5-14b-instruct and qwen2.5-7b-instruct models, the context length is increased to 1,000,000. Text generation overview
Text-to-text	2025-01-17	qwen-plus-2025-01-12	Compared to the qwen-plus-2024-12-20 model, this version has improved overall capabilities in both Chinese and English. It shows significant improvements in Chinese and English common sense and reading comprehension. The ability to switch naturally between different languages, dialects, and styles is also significantly improved, as is its Chinese instruction following capability. qwen-plus-2025-01-12.
Multilingual translation	2024-12-25	qwen-mt-plus qwen-mt-turbo	The Qwen-MT model is a machine translation large language model optimized based on the Qwen model. It excels at translation between Chinese and English, Chinese and minor languages, and English and minor languages. The minor languages include 26 languages such as Japanese, Korean, French, Spanish, German, Portuguese (Brazil), Thai, Indonesian, Vietnamese, and Arabic. In addition to multilingual translation, the model supports features such as terminology intervention, domain prompting, and translation memory to improve translation quality in complex scenarios. Machine translation (Qwen-MT).
Visual understanding	2024-12-18	qwen2-vl-72b-instruct	Achieved state-of-the-art results in multiple visual understanding benchmarks, significantly enhancing the processing capabilities for multimodal tasks. Image and video understanding.

US

In the US deployment mode, endpoints and data storage are located in the US (Virginia) region, and model inference compute resources are restricted to the United States.

Type	Time	Specification	Description
Image-to-video based on the first frame	2026-01-04	wan2.6-i2v-us	Adds the multi-shot narrative feature, which supports audio using automatic dubbing or custom audio files. Wan - image-to-video - first frame
Text-to-video	2026-01-04	wan2.6-t2v-us	Adds the multi-shot narrative feature and provides audio support through automatic dubbing or custom audio files. Wan - text-to-video
Speech recognition	2026-01-04	qwen3-asr-flash-us, qwen3-asr-flash-2025-09-08-us	Supports audio with any sample rate and sound channel. Audio file recognition - Qwen
Visual understanding	2026-01-04	qwen3-vl-flash-us, qwen3-vl-flash-2025-10-15-us	A small-scale visual understanding model from the Qwen3 series. It effectively combines thinking and non-thinking modes. Compared to the open-source Qwen3-VL-30B-A3B, it delivers better performance and a faster response time. Image and video understanding
Reasoning model	2026-01-04	qwen-plus-2025-12-01-us	A model from the Qwen3 series. Compared to qwen-plus-2025-07-28, it has improved instruction-following capabilities and provides more concise summary responses in thinking mode. Deep Thinking. In non-thinking mode, its Chinese understanding and logical reasoning abilities are enhanced. Text generation overview
Reasoning model	2026-01-04	qwen-plus-us	Offers a balance of capabilities. Its inference performance, cost, and speed are between those of Qwen-Max and Qwen-Flash, making it suitable for moderately complex tasks. Model List
Text-to-text	2026-01-04	qwen-flash-us, qwen-flash-2025-07-28-us	The fastest and most cost-effective model in the Qwen series, suitable for simple jobs. Qwen-Flash

Mainland China

In the Mainland China deployment mode, endpoints and data storage are located in the Beijing region, and model inference compute resources are restricted to Mainland China.

Type	Time	Specification	Description
Speech synthesis	2026-02-10	qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26	Qwen speech synthesis introduces an instruction-control model that supports precise control of synthesis output through natural language instructions. Speech synthesis - Qwen
Speech synthesis	2026-02-10	qwen3-tts-vd-2026-01-26	Qwen speech synthesis introduces a voice design model that supports creating customized voices through text descriptions. Speech synthesis - Qwen
Speech synthesis	2026-02-10	qwen3-tts-vc-2026-01-22	Qwen speech synthesis introduces a voice cloning model that supports quickly cloning voices based on real audio samples. Speech synthesis - Qwen
Speech synthesis	2026-02-04	qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22	Qwen real-time speech synthesis introduces an instruction-control model that supports precise control of speech synthesis output through natural language instructions. Real-time Text-to-Speech - Qwen
Reference-to-video	2026-02-02	wan2.6-r2v-flash	Generate multi-shot videos with automatic dubbing, using a character's appearance from a reference video or image. Wan - reference-to-video
Text-to-text & image understanding	2026-01-30	kimi-k2.5	This visual understanding model from Moonshot AI achieving open-source SoTA performance in Agent, code, visual understanding, and a range of general intelligent tasks. It also supports image, video, and text input, thinking and non-thinking modes, and dialogue and Agent tasks. Kimi
Speech recognition	2026-01-28	qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17	The Qwen3-ASR-Flash-Filetrans series now supports word-level timestamps. By setting the new `enable_words` parameter, you can get millisecond-level word and character alignment information and achieve more semantically accurate, fine-grained sentence segmentation. Audio file recognition - Qwen
Reasoning model	2026-01-27	qwen3-max-2026-01-23	Compared to the snapshot version from September 23, 2025, this model effectively merges thinking and non-thinking modes, significantly enhancing its overall performance. In thinking mode, the model is integrated with three tools: web search, web extractor, and code interpreter. By leveraging external tools during its thought process, it achieves higher accuracy on complex problems. OpenAI-compatible - Responses
Visual understanding	2026-01-23	qwen3-vl-flash-2026-01-22	A new snapshot of Qwen-VL. Compared to the snapshot from October 15, 2025, it effectively merges thinking and non-thinking modes, significantly enhancing the model's overall performance. It achieves higher inference accuracy in business scenarios such as general visual recognition, security monitoring, store/patrol inspections, and photo-based problem solving. Image and video understanding
Image-to-video	2026-01-17	wan2.6-i2v-flash	Supports the generation of videos with sound and silent videos, which are billed independently according to their respective billing rules. It also provides multi-shot narrative and audio processing capabilities. Wan - image-to-video - first frame
Image editing	2026-01-17	qwen-image-edit-max, qwen-image-edit-max-2026-01-16	The Qwen Image Editing Model Max series offers more stable and richer editing capabilities, enhanced industrial design and geometric inference capabilities, and improved role consistency and editing precision. Image Editing - Qwen
Speech synthesis	2026-01-16	qwen3-tts-vc-realtime-2026-01-15	A new snapshot model is added for Qwen real-time speech synthesis. The performance of voice cloning is further optimized. The new model sounds more natural and is closer to the original voice than qwen3-tts-vc-realtime-2025-11-27. Real-time Text-to-Speech - Qwen
Text-to-image	2026-01-12	qwen-image-plus-2026-01-09	This new snapshot model for Qwen text-to-image is a distilled and accelerated version of qwen-image-max that rapidly generates high-quality images. Qwen text-to-image
Reasoning model	2026-01-12	deepseek-v3.2	The deepseek-v3.2 model supports implicit and explicit caching to improve response speed and reduce usage costs without affecting response quality. Context cache
Image-to-video	2026-01-08	wan2.2-kf2v-flash	Based on a prompt and the input start and end frames, the model can generate a smooth, dynamic video. Wan - image-to-video - first and last frames
Speech recognition	2026-01-06	qwen3-asr-flash, qwen3-asr-flash-2025-09-08	Qwen3-ASR-Flash supports OpenAI compatible mode. Audio file recognition - Qwen
Speech synthesis	2026-01-05	cosyvoice-v3-flash	Speech Synthesis CosyVoice adds 24 new voices (Voice list): Dialects: Long Jiayi, Long Laotie Overseas marketing: loongkyong, loongtomoka Poetry reading: Long Fei Voice assistant: Long Xiaochun, Long Xiaoxia, YUMI Social companionship: Long Cheng, Long Ze, Long Zhe, Long Yan, Long Xing, Long Tian, Long Wan, Long Qiang, Long Feifei, Long Hao Audiobook: Long Sanshu, Long Yuan, Long Yue, Long Xiu, Long Nan News report: Long Shu
Text-to-image	2025-12-31	qwen-image-max, qwen-image-max-2025-12-30	The Max series of the Qwen image generation model enhances image realism and naturalness compared to the Plus series, effectively reduces AI-generated artifacts, and excels in areas such as character textures, texture details, and text rendering. Qwen text-to-image
Image editing	2025-12-23	qwen-image-edit-plus-2025-12-15	The latest snapshot model for Qwen-Image-Editing enhances character consistency, industrial design capabilities, and geometric inference compared to the previous version. It also optimizes the alignment of spatial layout, texture, and style between the edited and original images, resulting in more precise edits. Image Editing - Qwen
Text-to-image	2025-12-19	z-image-turbo	A lightweight text-to-image model that quickly generates high-quality images. It supports bilingual rendering in Chinese and English, complex semantic understanding, multiple styles and themes, and flexible adaptation to various resolutions and aspect ratios. Text-to-image Z-Image
Visual understanding	2025-12-19	qwen3-vl-plus-2025-12-19	The new Qwen-VL snapshot model features improved instruction-following capabilities and lower latency. Image and video understanding
Speech recognition	2025-12-19	qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17, qwen3-asr-flash, qwen3-asr-flash-2025-09-08	Speech recognition now supports 9 more languages, including Czech and Danish. Audio file recognition - Qwen
Speech recognition	2025-12-17	qwen3-asr-flash-realtime, qwen3-asr-flash-realtime-2025-10-27	Speech recognition is now supported in nine additional languages, including Czech and Danish. Real-time Speech Recognition - Qwen
Speech recognition	2025-12-17	qwen3-asr-flash, qwen3-asr-flash-2025-09-08	Supports audio with any sample rate and sound channel. Audio file recognition - Qwen
Speech recognition	2025-12-17	fun-asr-mtl, fun-asr-mtl-2025-08-25	Supports speech recognition (ASR) for 31 languages, including Chinese, English, Japanese, and Korean. Ideal for scenarios in Southeast Asia. Audio file recognition - Fun-ASR/Paraformer
Speech synthesis	2025-12-16	qwen3-tts-vd-realtime-2025-12-16 (snapshot)	The new snapshot model for Qwen's real-time speech synthesis uses Voice design (Qwen)-generated voices for low-latency, high-stability real-time synthesis. It supports multi-language output, automatically adjusts the tone based on the text, and optimizes synthesis performance for complex text. Real-time Text-to-Speech - Qwen
Text-to-image	2025-12-16	wan2.6-t2i	A new sync interface is added. It supports selecting custom dimensions within the constraints of total pixel area and aspect ratio. Wan - text-to-image V2
Image generation and editing	2025-12-16	wan2.6-image	Supports image editing and mixed image-text output. Wan2.6 - image generation and editing
Image-to-video based on the first frame	2025-12-16	wan2.6-i2v	Adds the multi-shot narrative feature and audio support, allowing you to use automatic dubbing or upload a custom audio file. Wan - image-to-video - first frame
Reference-to-video	2025-12-16	wan2.6-r2v	Generates a multi-shot video based on a character's appearance and voice from a reference video. It also supports automatic dubbing. Wan - reference-to-video
Text-to-video	2025-12-16	wan2.6-t2v	Adds the multi-shot narrative feature, which supports audio using automatic dubbing or custom audio files. Text-to-Video
Speech recognition	2025-12-12	fun-asr, fun-asr-2025-11-07	Feature updates for Fun-ASR audio file recognition: Added support for singing recognition to transcribe entire songs. Audio file recognition - Fun-ASR/Paraformer
Speech synthesis	2025-12-11	cosyvoice-v3-flash, cosyvoice-v3-plus	The cosyvoice-v3-flash model now includes five new system voices: longanrou_v3, longyingjing_v3, longyingling_v3, longanling_v3, and longhan_v3. All these voices support the timestamp and SSML features. Voice list The voice cloning feature for the cosyvoice-v3-flash and cosyvoice-v3-plus models has been enhanced. The feature now supports timestamps and SSML, and provides improved prosody. To try the enhancement, create a new voice. CosyVoice voice cloning API
Omni-modal	2025-12-04	qwen3-omni-flash-2025-12-01	The latest Qwen Omni snapshot model increases the number of supported timbres to 49 and features a significant upgrade to its instruction-following capabilities, enabling it to efficiently understand text, images, audio, and video.Non-real-time (Qwen-Omni)
Real-time multimodal	2025-12-04	qwen3-omni-flash-realtime-2025-12-01	The latest snapshot model for the Qwen Omni real-time version offers low-latency multimodal interaction. The number of supported timbres is increased to 49, and the model's instruction-following ability and interactive experience are significantly upgraded. Real-time (Qwen-Omni-Realtime)
Speech translation	2025-12-04	qwen3-livetranslate-flash, qwen3-livetranslate-flash-2025-12-01	Qwen3-LiveTranslate-Flash is an audio and video translation model that translates between 18 languages, such as Chinese, English, Russian, and French. It leverages visual context to improve translation accuracy and provides both text and speech output. Audio and Video File Translation – Qwen
Reasoning model	2025-12-04	deepseek-v3.2	DeepSeek-V3.2 is the official model that introduces DeepSeek Sparse Attention, a sparse attention mechanism. It is also the first DeepSeek model to integrate thinking with tool usage, and it supports tool calling in both thinking and non-thinking modes. DeepSeek - Model Studio
Multilingual translation	2025-12-02	qwen-mt-lite	The Qwen basic text translation Large Language Model (LLM) supports translation between 31 languages. Compared to qwen-mt-flash, this model provides faster responses at a lower cost, making it suitable for latency-sensitive scenarios. Translation capabilities (Qwen-MT).
Voice cloning	2025-11-27	qwen-voice-enrollment	Qwen released a voice cloning model. It can quickly generate a highly similar voice from an audio clip of 5 seconds or more. When combined with the qwen3-tts-vc-realtime-2025-11-27 model, it can clone a person's voice with high fidelity and output it in real time across 11 languages. Voice cloning (Qwen)
Speech synthesis	2025-11-27	qwen3-tts-vc-realtime-2025-11-27 (snapshot)	Qwen real-time speech synthesis has released a new snapshot model that provides low-latency, high-stability real-time synthesis using voices generated by Voice cloning (Qwen). The model also supports multilingual output, automatically adjusts the tone based on the text, and optimizes synthesis for complex text. Real-time Text-to-Speech - Qwen
Speech synthesis	2025-11-27	qwen3-tts-flash-realtime-2025-11-27 (snapshot)	Qwen real-time speech synthesis released a new snapshot model. It offers low latency and high stability. It provides more voice options, and a single voice can support multilingual output. The model automatically adjusts tone based on the text and improves synthesis for complex text. Real-time Text-to-Speech - Qwen
Speech synthesis	2025-11-27	qwen3-tts-flash-2025-11-27 (snapshot)	Qwen Speech Synthesis released a new snapshot model. It offers more voice options, and a single voice supports multilingual output. The model automatically adapts its tone to the text and provides optimized synthesis for complex text. Speech synthesis - Qwen
Text extraction	2025-11-21	qwen-vl-ocr-2025-11-20 (snapshot)	This snapshot of the Qwen text extraction model is based on the Qwen3-VL architecture and significantly improves document parsing and text localization.Text Extraction
Speech recognition	2025-11-20	qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17 (snapshot)	Qwen Audio File Recognition released a new model. This model is designed for the asynchronous transcription of audio files and supports recordings up to 12 hours long. Audio File Recognition - Qwen
Speech synthesis	2025-11-19	cosyvoice-v3-flash	This version improves pronunciation accuracy and voice similarity compared to previous versions. It also supports more languages, including German, Spanish, French, Italian, and Russian. Real-time speech synthesis - CosyVoice
Reasoning model	2025-11-11	kimi-k2-thinking	This is a thinking model from Moonshot AI. It has general agent and reasoning capabilities, excels at deep reasoning, and can solve complex problems through multi-step tool calling. Kimi
Multilingual translation	2025-11-10	qwen-mt-flash	Compared to qwen-mt-turbo, this model supports streaming incremental output and has improved overall performance.Translation capabilities (Qwen-MT)
Image-to-video	2025-11-04	wan2.2-animate-mix	Replaces the main character in a reference video with the character from an input image, while preserving the original video's scene, lighting, and tone for a seamless character replacement. Wan - Video character replacement
Reasoning model	2025-11-03	qwen3-max-preview	The thinking mode of the qwen3-max-preview model provides significant improvements in overall reasoning capabilities, especially delivering superior performance in agent programming, common-sense reasoning, math, science, and general tasks. Deep Thinking
Image-to-video	2025-11-03	wan2.2-animate-move	Transfers the actions and expressions of a character from a template video to a single static character image to generate a video of the character in motion. Wan - Image-to-action
Image editing	2025-10-31	qwen-image-edit-plus, qwen-image-edit-plus-2025-10-30	Based on qwen-image-edit, this model offers optimized inference performance and system stability. It greatly reduces the response time for image generation and editing, and supports returning multiple images in a single request. Image Editing - Qwen
Real-time speech recognition	2025-10-27	qwen3-asr-flash-realtime, qwen3-asr-flash-realtime-2025-10-27	The Qwen real-time speech recognition large model features automatic language detection. It can recognize 11 language types and transcribe audio accurately in complex environments. Real-time Speech Recognition - Qwen
Visual understanding	2025-10-21	qwen3-vl-32b-thinking, qwen3-vl-32b-instruct	A 32B dense model from the Qwen3-VL series. Its overall performance is second only to the Qwen3-VL-235B model. It excels in document recognition and understanding, spatial intelligence, object recognition, 2D visual detection, and spatial reasoning. It is suitable for complex perception tasks in general scenarios. Image and video understanding
Visual understanding	2025-10-16	qwen3-vl-flash, qwen3-vl-flash-2025-10-15	A small-sized visual understanding model from the Qwen3 series. It effectively combines thinking and non-thinking modes. Compared to the open-source Qwen3-VL-30B-A3B, it performs better and responds faster. Image and video understanding
Visual understanding	2025-10-14	qwen3-vl-8b-thinking, qwen3-vl-8b-instruct	An 8B dense open-source model from the Qwen3-VL series, available in both thinking and non-thinking versions. It uses less GPU memory and can perform multimodal understanding and reasoning. It supports ultra-long contexts such as long videos and long documents, 2D/3D visual positioning, and comprehensive spatial intelligence and object recognition. Image and video understanding
Visual understanding	2025-10-03	qwen3-vl-30b-a3b-thinking, qwen3-vl-30b-a3b-instruct	Based on the new-generation, open-source Qwen3-VL model, it is available in both thinking and non-thinking versions. It has a fast response speed and stronger capabilities for multimodal understanding, reasoning, and visual agent tasks. It also supports ultra-long contexts such as long videos and documents. Its spatial intelligence and object recognition capabilities are fully upgraded to handle complex real-world tasks. Image and video understanding
Reasoning model	2025-09-30	deepseek-v3.2-exp	A hybrid inference architecture model that supports both thinking and non-thinking modes. It introduces a sparse attention mechanism to improve training and inference efficiency for long texts. It is priced lower than deepseek-v3.1. DeepSeek - Model Studio.
Text-to-image	2025-09-23	qwen-image-plus	This model excels at rendering complex text, especially Chinese and English. It can create complex mixed-media layouts of images and text. It is more cost-effective than qwen-image. Text-to-image (Qwen-Image).
Visual understanding	2025-09-23	qwen3-vl-plus, qwen3-vl-plus-2025-09-23, qwen3-vl-235b-a22b-thinking, qwen3-vl-235b-a22b-instruct	The Qwen3 series of visual understanding models effectively combines thinking and non-thinking modes, and their visual agent capabilities are world-class. This version features comprehensive upgrades in visual encoding, spatial intelligence, and multimodal thinking. Their visual perception and recognition capabilities are significantly improved. Image and video understanding
Code model	2025-09-23	qwen3-coder-plus-2025-09-23	Compared to the previous version (July 22 snapshot), this version has improved robustness for downstream tasks and tool calling, and enhanced code security. Coding capabilities (Qwen-Coder)
Reasoning model	2025-09-11	qwen-plus-2025-09-11	This model is part of the Qwen3 series. Compared to qwen-plus-2025-07-28, it follows instructions better and provides more concise summaries in thinking mode. Deep Thinking. In non-thinking mode, it has enhanced Chinese language understanding and logical reasoning. Text generation overview
Reasoning model	2025-09-11	qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct	A new generation of open-source models based on Qwen3. The thinking model has improved instruction-following capabilities and provides more concise summaries compared to qwen3-235b-a22b-thinking-2507. Deep Thinking. The instruct model has enhanced Chinese understanding, logical reasoning, and text generation capabilities compared to qwen3-235b-a22b-instruct-2507. Text generation overview.
Text-to-text	2025-09-05	qwen3-max-preview	The Qwen-Max model (preview), based on Qwen3, offers significant improvements in general capabilities compared to the Qwen 2.5 series. It has significantly enhanced abilities in Chinese and English text understanding, complex instruction following, subjective open-ended tasks, multilingual tasks, and tool calling. The model also has fewer knowledge hallucinations. Qwen-Max
Text, image, video, voice, etc.	2025-08-05	Models	This is the first release in the Beijing region.