International (Singapore)
Type | Time | Model Specifications | Description |
Speech recognition | 2025-12-17 | qwen3-asr-flash-realtime, qwen3-asr-flash-realtime-2025-10-27 | Adds support for speech recognition in 9 languages, including Czech and Danish. Real-time speech recognition - Qwen |
Speech recognition | 2025-12-17 | qwen3-asr-flash, qwen3-asr-flash-2025-09-08 | Supports audio with any sample rate and number of sound channels. Audio file recognition - Qwen |
Speech recognition | 2025-12-17 | fun-asr-mtl, fun-asr-mtl-2025-08-25 | Supports speech recognition for 31 languages, including Chinese, English, Japanese, and Korean. It is especially suitable for Southeast Asian markets. Audio file recognition - Fun-ASR/Paraformer/SenseVoice |
Voice design | 2025-12-16 | qwen-voice-design | Qwen released a voice design model that generates customized voices from text descriptions. Use this model with qwen3-tts-vd-realtime-2025-12-16 to generate speech in 10 languages. Voice design |
Speech synthesis | 2025-12-16 | qwen3-tts-vd-realtime-2025-12-16 (snapshot) | Qwen real-time speech synthesis released a new snapshot model. It uses voices generated by Voice design for low-latency, high-stability real-time synthesis. It supports multi-language output, automatically adjusts tone based on text, and optimizes synthesis performance for complex text. Real-time speech synthesis - Qwen |
Speech recognition | 2025-12-12 | fun-asr, fun-asr-2025-11-07 | Fun-ASR audio file recognition feature updates:
|
Omni-modal | 2025-12-04 | qwen3-omni-flash-2025-12-01 | The latest Qwen Omni snapshot model increases supported timbres to 49 and significantly upgrades instruction-following capabilities, enabling efficient understanding of text, images, audio, and video. Omni-modal |
Real-time multimodal | 2025-12-04 | qwen3-omni-flash-realtime-2025-12-01 | The latest snapshot model for Qwen Omni real-time version provides low-latency multimodal interaction. Supported timbres increase to 49, and both instruction-following ability and interactive experience are significantly upgraded. Real-time multimodal |
Speech translation | 2025-12-04 | qwen3-livetranslate-flash, qwen3-livetranslate-flash-2025-12-01 | Qwen3-LiveTranslate-Flash is an audio and video translation model that translates between 18 languages (including Chinese, English, Russian, and French). It leverages visual context to improve translation accuracy and outputs both text and speech. Audio and video translation - Qwen |
Multilingual translation | 2025-12-02 | qwen-mt-lite | A basic text translation model from Qwen. It supports translation between 31 languages. It offers faster response time and lower cost than qwen-mt-flash, making it suitable for latency-sensitive scenarios. Translation capabilities (Qwen-MT) |
Voice cloning | 2025-11-27 | qwen-voice-enrollment | Qwen released a voice cloning model. It generates a highly similar voice from just over 5 seconds of audio. When used with the qwen3-tts-vc-realtime-2025-11-27 model, it can create a high-fidelity clone of a person's voice and output it in real time across 11 languages. Voice cloning |
Speech synthesis | 2025-11-27 | qwen3-tts-vc-realtime-2025-11-27 (snapshot) | Qwen real-time speech synthesis released a new snapshot model. It uses voices generated by Voice cloning for low-latency, high-stability real-time synthesis. It supports multi-language output, automatically adjusts tone based on text, and optimizes synthesis performance for complex text. Real-time speech synthesis - Qwen |
Speech synthesis | 2025-11-27 | qwen3-tts-flash-realtime-2025-11-27 (snapshot) | Qwen real-time speech synthesis released a new snapshot model. It features low latency and high stability. The model offers a richer selection of voices, and each voice supports multi-language output. It automatically adjusts tone based on text and enhances synthesis performance for complex text. Real-time speech synthesis - Qwen |
Speech synthesis | 2025-11-27 | qwen3-tts-flash-2025-11-27 (snapshot) | Qwen speech synthesis released a new snapshot model. It offers more voice options. Each voice supports multi-language output. The model adaptively adjusts tone based on text and has optimized synthesis capabilities for complex text. Speech synthesis - Qwen |
Text extraction | 2025-11-21 | qwen-vl-ocr-2025-11-20 (snapshot) | This snapshot of the Qwen text extraction model is based on the Qwen3-VL architecture and significantly improves document parsing and text localization. Text extraction |
Speech recognition | 2025-11-20 | qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17 (snapshot) | Qwen audio file recognition released a new model. It is designed for asynchronous transcription of audio files and supports recordings up to 12 hours long. Audio file recognition - Qwen |
Speech recognition | 2025-11-19 | fun-asr-2025-11-07 (snapshot) | Fun-ASR audio file recognition released a new snapshot model. It optimizes far-field voice activity detection (VAD) to improve recognition accuracy and stability, and adds support for multiple Chinese dialects and Japanese beyond the original Chinese and English. Audio file recognition - Fun-ASR/Paraformer/SenseVoice |
Multilingual translation | 2025-11-11 | qwen-mt-flash | Compared to qwen-mt-turbo, this model supports streaming incremental output and offers improved overall performance. Translation capabilities (Qwen-MT) |
Image-to-video | 2025-11-10 | wan2.2-animate-move | Transfers actions and expressions from a template video to a single static character image to generate a character motion video. Wan - Image-to-motion |
Image-to-video | 2025-11-10 | wan2.2-animate-mix | Replaces the main character in a reference video with the character from an input image while preserving the original video's scene, lighting, and tone for seamless character replacement. Wan - Video character replacement |
Reasoning model | 2025-11-03 | qwen3-max-preview | The thinking mode of the qwen3-max-preview model: significantly improves overall reasoning capabilities, especially in agent programming, common-sense reasoning, and tasks related to math, science, and general purposes. Deep thinking |
Image editing | 2025-10-31 | qwen-image-edit-plus, qwen-image-edit-plus-2025-10-30 | Built on qwen-image-edit, this model has optimized inference performance and system stability. It significantly reduces the response time for image generation and editing and supports returning multiple images in a single request. Image editing - Qwen |
Real-time speech recognition | 2025-10-27 | qwen3-asr-flash-realtime, qwen3-asr-flash-realtime-2025-10-27 | The Qwen real-time speech recognition model features automatic language detection. It can identify 11 language types and provides accurate transcription in complex audio environments. Real-time speech recognition - Qwen |
Visual understanding | 2025-10-21 | qwen3-vl-32b-thinking, qwen3-vl-32b-instruct | A 32B dense model from the Qwen3-VL series. Its overall performance is second only to the Qwen3-VL-235B model. It excels in document recognition and understanding, spatial intelligence and object recognition, and visual 2D detection/spatial reasoning. This makes it suitable for complex perception tasks in general scenarios. Visual understanding |
Visual understanding | 2025-10-16 | qwen3-vl-flash, qwen3-vl-flash-2025-10-15 | A small-scale visual understanding model from the Qwen3 series. It effectively combines thinking and non-thinking modes. Compared to the open-source Qwen3-VL-30B-A3B, it delivers better performance and a faster response time. Visual understanding |
Visual understanding | 2025-10-14 | qwen3-vl-8b-thinking, qwen3-vl-8b-instruct | An 8B dense model from the Qwen3-VL series. It uses less GPU memory and can perform multimodal understanding and inference. It supports ultra-long contexts such as long videos and documents, and visual 2D/3D positioning. It also has comprehensive spatial intelligence and object recognition capabilities. Visual understanding |
Visual understanding | 2025-10-03 | qwen3-vl-30b-a3b-thinking, qwen3-vl-30b-a3b-instruct | Based on the new generation of open-source Qwen3-VL models, this model has a fast response time. It features stronger multimodal understanding and inference, visual agent capabilities, and support for ultra-long contexts such as long videos and documents. It also has comprehensively upgraded spatial intelligence and object recognition capabilities, making it suitable for complex real-world tasks. Visual understanding |
Visual understanding | 2025-09-23 | qwen3-vl-plus, qwen3-vl-plus-2025-09-23, qwen3-vl-235b-a22b-thinking, qwen3-vl-235b-a22b-instruct | A visual understanding model from the Qwen3 series. It effectively combines thinking and non-thinking modes, and its visual agent capabilities are among the best in the world. This version features comprehensive upgrades in visual encoding, spatial intelligence, and multimodal thinking. Its visual perception and recognition capabilities are also significantly improved. Visual understanding |
Text-to-image | 2025-09-23 | qwen-image-plus | This model excels at complex text rendering, especially for Chinese and English text. It can create complex mixed layouts of images and text and is more cost-effective than qwen-image. Text-to-image (Qwen-Image) |
Code model | 2025-09-23 | qwen3-coder-plus-2025-09-23 | Compared to the previous version (snapshot from July 22), this model has improved robustness in downstream tasks and tool calling, along with enhanced code security. Code capabilities (Qwen-Coder) |
Reasoning model | 2025-09-11 | qwen-plus-2025-09-11 | This model is part of the Qwen3 series. Compared to qwen-plus-2025-07-28, it has improved instruction-following capabilities and provides more concise summaries in thinking mode. Deep thinking. In non-thinking mode, its Chinese understanding and logical reasoning abilities are enhanced. Overview of text generation models. |
Reasoning model | 2025-09-11 | qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct | A new generation of open-source models based on Qwen3. The thinking model has improved instruction-following capabilities and provides more concise summaries compared to qwen3-235b-a22b-thinking-2507. Deep thinking. The instruct model has enhanced Chinese understanding, logical reasoning, and text generation capabilities compared to qwen3-235b-a22b-instruct-2507. Overview of text generation models. |
Text-to-text | 2025-09-05 | qwen3-max-preview | The Qwen-Max model (preview version) based on Qwen3. It offers a significant improvement in overall general capabilities compared to the Qwen 2.5 series. It has notably enhanced abilities in Chinese and English text understanding, complex instruction following, subjective open-ended tasks, multilingual tasks, and tool calling. The model also has fewer knowledge hallucinations. Qwen-Max |
Image editing | 2025-08-19 | qwen-image-edit | The Qwen image editing model supports precise bilingual Chinese-English text editing, rendering intent, detail enhancement, style transfer, adding or removing objects, and changing positions and actions, enabling complex image-text editing. Image editing - Qwen |
Visual understanding | 2025-08-18 | qwen-vl-plus-2025-08-15 | A visual understanding model. It has significantly improved capabilities in object recognition and localization, and multilingual processing. Visual understanding |
Text-to-image | 2025-08-14 | qwen-image | The Qwen-Image model excels at complex text rendering, especially for Chinese and English text. It can create complex mixed layouts of images and text. Text-to-image (Qwen-Image) |
Visual understanding | 2025-08-13 | qwen-vl-max-2025-08-13 | A visual understanding model. Visual understanding metrics are greatly improved, with significantly enhanced capabilities in math, reasoning, object recognition, and multilingual processing. Visual understanding |
Code model | 2025-08-05 | qwen3-coder-flash, qwen3-coder-flash-2025-07-28 | The fastest and most cost-effective model in the Qwen-Coder series. Code capabilities (Qwen-Coder). |
Reasoning model | 2025-08-05 | qwen-flash, qwen-flash-2025-07-28 | The fastest and most cost-effective model in the Qwen series, suitable for simple jobs. Model List |
Reasoning model | 2025-07-30 | qwen-plus-2025-07-28 | This model belongs to the Qwen3 series. Compared to the previous version, it increases the context length to 1,000,000. For more information about thinking mode, see Deep thinking. For more information about non-thinking mode, see Overview of text generation models. |
Reasoning model | 2025-07-30 | qwen3-30b-a3b-thinking-2507 qwen3-30b-a3b-instruct-2507 | An upgraded version of qwen3-30b-a3b. The thinking model has improved logical, general, knowledge-enhanced, and creative capabilities. Deep thinking. The instruct model has improved creative capabilities and model security. Overview of text generation models. |
Image-to-video | 2025-07-28 | wan2.2-i2v-plus | Compared to the 2.1 model, the new version significantly improves image detail and motion stability, with a 50% increase in generation speed. First-frame-to-video |
Text-to-video | 2025-07-28 | wan2.2-t2v-plus | Compared to the 2.1 model, the new version significantly improves image detail and motion stability, with a 50% increase in generation speed. Text-to-video |
Text-to-image | 2025-07-28 | wan2.2-t2i-flash, wan2.2-t2i-plus | Compared to the 2.1 model, the new version is comprehensively upgraded in creativity, stability, and photorealism, with a 50% increase in generation speed. Text-to-image |
Reasoning model | 2025-07-24 | qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507 | An upgraded version of qwen3-235b-a22b. The thinking model has greatly improved logical, general, knowledge-enhanced, and creative capabilities, making it suitable for high-difficulty, strong-reasoning scenarios. Deep thinking. The instruct model has improved creative capabilities and model security. Overview of text generation models. |
Code model | 2025-07-23 | qwen3-coder, qwen3-coder-plus-2025-07-22 | A code generation model based on Qwen3. It has powerful Coding Agent capabilities and excels at tool calling and environment interaction. It combines excellent code capabilities with general-purpose abilities. Code capabilities (Qwen-Coder) |
Visual understanding | 2025-06-04 | qwen-vl-plus-2025-05-07 | A visual understanding model. The model has significantly improved capabilities in math, reasoning, and understanding content from monitoring videos. Visual understanding |
Text-to-image | 2025-05-22 | wan2.1-t2i-turbo, wan2.1-t2i-plus | Generates an image from a single sentence. The model supports generating images of any resolution and aspect ratio, up to 2 million pixels. It is available in a turbo version and a professional edition (plus). Text-to-image |
Visual understanding | 2025-05-16 | qwen-vl-max-2025-04-08 | A visual understanding model. Math and reasoning abilities are improved, the response style is adjusted to align with human preferences, and the detail and format clarity of model responses are significantly improved. Visual understanding |
Visual understanding | 2025-05-16 | qwen-vl-plus-2025-01-25 | A visual understanding model. It belongs to the Qwen2.5-VL series. Compared to the previous version, it extends the context to 128k, significantly enhancing image and video understanding capabilities. |
Video editing | 2025-05-19 | wan2.1-vace-plus | A general-purpose video editing model. The model has multimodal input capabilities, combining images, videos, and text prompts. It can perform various tasks such as image-to-video (generating a video based on the subject or background of a reference image) and video repainting (extracting motion features from an input video to generate a new video). General-purpose video editing |
Reasoning model | 2025-04-28 | Qwen3 commercial models qwen-plus-2025-04-28, qwen-turbo-2025-04-28 Qwen3 open-source models qwen3-235b-a22b, qwen3-30b-a3b, qwen3-32b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b | Qwen3 models support both thinking and non-thinking modes. Switch between the two modes using the
For more information about thinking mode, see Deep thinking. For more information about non-thinking mode, see Overview of text generation models. |
Text-to-video | 2025-04-21 | wan2.1-t2v-turbo, wan2.1-t2v-plus |
|
Image-to-video | 2025-04-21 | wan2.1-kf2v-plus, wan2.1-i2v-turbo, wan2.1-i2v-plus, |
|
Visual reasoning | 2025-03-28 | qvq-max, qvq-max-latest, qvq-max-2025-03-25 | A visual reasoning model. It supports visual input and chain-of-thought output, demonstrating stronger capabilities in math, programming, visual analysis, creation, and general tasks. Visual reasoning |
Omni-modal | 2025-03-26 | qwen2.5-omni-7b | A new omni-modal understanding and generation model from Qwen. It supports text, image, speech, and video input, and outputs text and audio. It provides 2 natural conversational voices. For usage instructions, see Omni-modal. |
Visual understanding | 2025-03-24 | qwen2.5-vl-32b-instruct | A visual understanding model. Its ability to solve math problems approaches the level of Qwen2.5VL-72B. The response style is greatly adjusted to align with human preferences, especially for objective questions such as math, logical reasoning, and knowledge Q&A pairs. The detail and format clarity of model responses are significantly improved. Visual understanding |
Reasoning model | 2025-03-06 | qwq-plus | A QwQ reasoning model trained on the Qwen2.5 model. It greatly improves the model's reasoning ability through reinforcement learning. The model's core metrics for math and code (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench, etc.) reach the level of the full-performance DeepSeek-R1. Deep thinking |
Visual understanding | 2025-01-27 | qwen2.5-vl-3b-instruct qwen2.5-vl-7b-instruct qwen2.5-vl-72b-instruct |
|
Text-to-text | 2025-01-27 | qwen-max-2025-01-25 qwen2.5-14b-instruct-1m qwen2.5-7b-instruct-1m |
|
Text-to-text | 2025-01-17 | qwen-plus-2025-01-12 |
|
Multilingual translation | 2024-12-25 | qwen-mt-plus qwen-mt-turbo |
|
Visual understanding | 2024-12-18 | qwen2-vl-72b-instruct |
|
China (Beijing)
Type | Time | Model specifications | Description |
Speech recognition | 2025-12-17 | qwen3-asr-flash-realtime, qwen3-asr-flash-realtime-2025-10-27 | Adds support for speech recognition in 9 languages, including Czech and Danish. Real-time speech recognition - Qwen |
Speech recognition | 2025-12-17 | qwen3-asr-flash, qwen3-asr-flash-2025-09-08 | Supports audio with any sample rate and number of sound channels. Audio file recognition - Qwen |
Speech recognition | 2025-12-17 | fun-asr-mtl, fun-asr-mtl-2025-08-25 | Supports speech recognition for 31 languages, including Chinese, English, Japanese, and Korean. It is especially suitable for Southeast Asian markets. Audio file recognition - Fun-ASR/Paraformer/SenseVoice |
Voice design | 2025-12-16 | qwen-voice-design | Qwen released a voice design model that generates customized voices from text descriptions. Use this model with qwen3-tts-vd-realtime-2025-12-16 to generate speech in 10 languages. Voice design |
Speech synthesis | 2025-12-16 | qwen3-tts-vd-realtime-2025-12-16 (snapshot) | Qwen real-time speech synthesis released a new snapshot model. It uses voices generated by Voice design for low-latency, high-stability real-time synthesis. It supports multi-language output, automatically adjusts tone based on text, and optimizes synthesis performance for complex text. Real-time speech synthesis - Qwen |
Speech recognition | 2025-12-12 | fun-asr, fun-asr-2025-11-07 | Fun-ASR audio file recognition feature updates:
|
Speech synthesis | 2025-12-11 | cosyvoice-v3-flash, cosyvoice-v3-plus |
|
Omni-modal | 2025-12-04 | qwen3-omni-flash-2025-12-01 | The latest Qwen Omni snapshot model increases supported timbres to 49 and significantly upgrades instruction-following capabilities, enabling efficient understanding of text, images, audio, and video. Omni-modal |
Real-time multimodal | 2025-12-04 | qwen3-omni-flash-realtime-2025-12-01 | The latest snapshot model for Qwen Omni real-time version provides low-latency multimodal interaction. Supported timbres increase to 49, and both instruction-following ability and interactive experience are significantly upgraded. Real-time multimodal |
Speech translation | 2025-12-04 | qwen3-livetranslate-flash, qwen3-livetranslate-flash-2025-12-01 | Qwen3-LiveTranslate-Flash is an audio and video translation model that translates between 18 languages (including Chinese, English, Russian, and French). It leverages visual context to improve translation accuracy and outputs both text and speech. Audio and video translation - Qwen |
Reasoning model | 2025-12-04 | deepseek-v3.2 | DeepSeek-V3.2 is the official model that introduces DeepSeek Sparse Attention, a sparse attention mechanism. It is also DeepSeek's first model to integrate thinking into tool usage and supports tool calling in both thinking and non-thinking modes. |
Multilingual translation | 2025-12-02 | qwen-mt-lite | A basic text translation model from Qwen. It supports translation between 31 languages. It offers faster response time and lower cost than qwen-mt-flash, making it suitable for latency-sensitive scenarios. Translation capabilities (Qwen-MT) |
Voice cloning | 2025-11-27 | qwen-voice-enrollment | Qwen released a voice cloning model. It generates a highly similar voice from just over 5 seconds of audio. When used with the qwen3-tts-vc-realtime-2025-11-27 model, it can create a high-fidelity clone of a person's voice and output it in real time across 11 languages. Voice cloning |
Speech synthesis | 2025-11-27 | qwen3-tts-vc-realtime-2025-11-27 (snapshot) | Qwen real-time speech synthesis released a new snapshot model. It uses voices generated by Voice cloning for low-latency, high-stability real-time synthesis. It supports multi-language output, automatically adjusts tone based on text, and optimizes synthesis performance for complex text. Real-time speech synthesis - Qwen |
Speech synthesis | 2025-11-27 | qwen3-tts-flash-realtime-2025-11-27 (snapshot) | Qwen real-time speech synthesis released a new snapshot model. It features low latency and high stability. The model offers a richer selection of voices, and each voice supports multi-language output. It automatically adjusts tone based on text and enhances synthesis performance for complex text. Real-time speech synthesis - Qwen |
Speech synthesis | 2025-11-27 | qwen3-tts-flash-2025-11-27 (snapshot) | Qwen speech synthesis released a new snapshot model. It offers more voice options. Each voice supports multi-language output. The model adaptively adjusts tone based on text and has optimized synthesis capabilities for complex text. Speech synthesis - Qwen |
Text extraction | 2025-11-21 | qwen-vl-ocr-2025-11-20 (snapshot) | This snapshot of the Qwen text extraction model is based on the Qwen3-VL architecture and significantly improves document parsing and text localization. Text extraction |
Speech recognition | 2025-11-20 | qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17 (snapshot) | Qwen audio file recognition released a new model. It is designed for asynchronous transcription of audio files and supports recordings up to 12 hours long. Audio file recognition - Qwen |
Speech synthesis | 2025-11-19 | cosyvoice-v3-flash | Compared to previous versions, this model improves pronunciation accuracy and voice similarity, and adds support for more languages (German, Spanish, French, Italian, Russian). Real-time speech synthesis - CosyVoice/Sambert |
Reasoning model | 2025-11-11 | kimi-k2-thinking | A thinking model from Moonshot AI. It has general agent and reasoning capabilities, excels at deep reasoning, and solves complex problems through multi-step tool calling. Kimi |
Multilingual translation | 2025-11-10 | qwen-mt-flash | Compared to qwen-mt-turbo, this model supports streaming incremental output and offers improved overall performance. Translation capabilities (Qwen-MT) |
Image-to-video | 2025-11-04 | wan2.2-animate-mix | Replaces the main character in a reference video with the character from an input image while preserving the original video's scene, lighting, and tone for seamless character replacement. Wan - Video character replacement |
Reasoning model | 2025-11-03 | qwen3-max-preview | The thinking mode of the qwen3-max-preview model: significantly improves overall reasoning capabilities, especially in agent programming, common-sense reasoning, and tasks related to math, science, and general purposes. Deep thinking |
Image-to-video | 2025-11-03 | wan2.2-animate-move | Transfers actions and expressions from a template video to a single static character image to generate a character motion video. Wan - Image-to-motion |
Image editing | 2025-10-31 | qwen-image-edit-plus, qwen-image-edit-plus-2025-10-30 | Built on qwen-image-edit, this model has optimized inference performance and system stability. It significantly reduces the response time for image generation and editing and supports returning multiple images in a single request. Image editing - Qwen |
Real-time speech recognition | 2025-10-27 | qwen3-asr-flash-realtime, qwen3-asr-flash-realtime-2025-10-27 | The Qwen real-time speech recognition model features automatic language detection. It can identify 11 language types and provides accurate transcription in complex audio environments. Real-time speech recognition - Qwen |
Visual understanding | 2025-10-21 | qwen3-vl-32b-thinking, qwen3-vl-32b-instruct | A 32B dense model from the Qwen3-VL series. Its overall performance is second only to the Qwen3-VL-235B model. It excels in document recognition and understanding, spatial intelligence and object recognition, and visual 2D detection/spatial reasoning. This makes it suitable for complex perception tasks in general scenarios. Visual understanding |
Visual understanding | 2025-10-16 | qwen3-vl-flash, qwen3-vl-flash-2025-10-15 | A small-scale visual understanding model from the Qwen3 series. It effectively combines thinking and non-thinking modes. Compared to the open-source Qwen3-VL-30B-A3B, it delivers better performance and a faster response time. Visual understanding |
Visual understanding | 2025-10-14 | qwen3-vl-8b-thinking, qwen3-vl-8b-instruct | An 8B dense open-source model from the Qwen3-VL series, available in both thinking and non-thinking versions. It uses less GPU memory and can perform multimodal understanding and reasoning. It supports ultra-long contexts such as long videos and documents, 2D/3D visual positioning, and comprehensive spatial intelligence and object recognition. Visual understanding |
Visual understanding | 2025-10-03 | qwen3-vl-30b-a3b-thinking, qwen3-vl-30b-a3b-instruct | Based on the new generation of open-source Qwen3-VL models, available in both thinking and non-thinking versions. It has a fast response speed and stronger capabilities for multimodal understanding, reasoning, visual agents, and ultra-long context support such as long videos and documents. Its spatial intelligence and object recognition capabilities are fully upgraded to handle complex real-world tasks. Visual understanding |
Reasoning model | 2025-09-30 | deepseek-v3.2-exp | A hybrid reasoning architecture model that supports both thinking and non-thinking modes. It introduces a sparse attention mechanism to improve training and inference efficiency for long texts, and is priced lower than deepseek-v3.1. For details, see DeepSeek. |
Text-to-image | 2025-09-23 | qwen-image-plus | This model excels at complex text rendering, especially for Chinese and English text. It can create complex mixed layouts of images and text and is more cost-effective than qwen-image. Text-to-image (Qwen-Image) |
Visual understanding | 2025-09-23 | qwen3-vl-plus, qwen3-vl-plus-2025-09-23, qwen3-vl-235b-a22b-thinking, qwen3-vl-235b-a22b-instruct | A visual understanding model from the Qwen3 series. It effectively combines thinking and non-thinking modes, and its visual agent capabilities are world-class. This version features comprehensive upgrades in visual encoding, spatial intelligence, and multimodal thinking. Its visual perception and recognition capabilities are significantly improved. Visual understanding |
Code model | 2025-09-23 | qwen3-coder-plus-2025-09-23 | Compared to the previous version (snapshot from July 22), it offers improved robustness in downstream tasks and tool calling, along with enhanced code security. Code capabilities (Qwen-Coder) |
Reasoning model | 2025-09-11 | qwen-plus-2025-09-11 | This model is part of the Qwen3 series. Compared to qwen-plus-2025-07-28, it has improved instruction-following capabilities and provides more concise summaries in thinking mode. Deep thinking. In non-thinking mode, its Chinese understanding and logical reasoning abilities are enhanced. Overview of text generation models. |
Reasoning model | 2025-09-11 | qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct | A new generation of open-source models based on Qwen3. The thinking model has improved instruction-following capabilities and provides more concise summaries compared to qwen3-235b-a22b-thinking-2507. Deep thinking. The instruct model has enhanced Chinese understanding, logical reasoning, and text generation capabilities compared to qwen3-235b-a22b-instruct-2507. Overview of text generation models. |
Text-to-text | 2025-09-05 | qwen3-max-preview | The Qwen-Max model (preview version) based on Qwen3. It offers a significant improvement in overall general capabilities compared to the Qwen 2.5 series. It has notably enhanced abilities in Chinese and English text understanding, complex instruction following, subjective open-ended tasks, multilingual tasks, and tool calling. The model also has fewer knowledge hallucinations. Qwen-Max |
Text, image, video, audio, etc. | 2025-08-05 | First release in the Beijing region. |