
Qwen3-Omni is a next-generation native multimodal large model capable of seamlessly processing multiple input modalities—including text, images, audio, and video—and generating both text and natural-sounding speech outputs simultaneously via real-time streaming responses. This version introduces multiple enhancements to improve model performance and efficiency.
Qwen3-Omni-Flash-2025-12-01 is a comprehensively upgraded iteration built upon Qwen3-Omni.
Key highlights of this upgraded version include:
On objective benchmarks, Qwen3-Omni-Flash-2025-12-01 achieves substantial improvements across all modalities compared to Qwen3-Omni-Flash:
🧠 Stronger Text Understanding & Generation:
Major gains in logical reasoning (ZebraLogic +5.6), code generation (LiveCodeBench-v6 +9.3, MultiPL-E +2.7), and holistic writing quality (WritingBench +2.2), enabling more reliable execution of complex, multi-step instructions.
👂 More Accurate Speech Understanding:
Significantly lower word error rate on Fleurs-zh, along with a +3.2 improvement on VoiceBench, reflecting enhanced comprehension of spoken language in real-world dialogue scenarios.
🎙️More Natural Speech Synthesis:
Higher-quality, human-like voice generation across multiple languages—especially in Chinese and multilingual contexts—with improved prosody, pacing, and pausing that closely mirrors natural human speech.
👁️Deeper Image Understanding:
Breakthrough performance on visual reasoning tasks, including +4.7 on MMMU, +4.8 on MMMU-Pro, and +2.2 on MathVision_full, demonstrating a stronger ability to “see,” interpret, and reason about complex visual content—from diagrams to mathematical figures.
🎬 More Coherent Video Understanding:
Steady improvement in video semantic comprehension (MLVU +1.6), further strengthened by tighter audio-visual synchronization, laying a solid foundation for seamless real-time video conversations.
With this upgrade, Qwen3-Omni-Flash-2025-12-01 truly embodies the vision of “Hear You. See You. Follow Smarter.”—delivering an AI interaction experience that is more natural, precise, and vivid than ever before.

We are eager to hear your feedback and see the innovative applications you create with Qwen3-Omni. In the near future, we will further advance the model along multiple axes, including multi-speaker ASR, video OCR, audio–video proactive learning, and enhance support for agent-based workflows and function calling.
If you find our model helpful in your research, we’d appreciate a citation!
BibTeX
@misc{qwen3_omni_20251201,
author = {{Qwen Team, Alibaba}},
title = {{Qwen3-Omni-Flash-2025-12-01:Hear You. See You. Follow Smarter!}},
year = {2025},
url = {https://qwen.ai/blog?id=qwen3-omni-20251201},
urldate = {2025-12-09}
}
See the original source here
SAPO: A Stable and Performant Reinforcement Learning Method for Training Large Language Models
Alibaba Unveils Wan2.6 Series Enabling Everyone to Star in Videos
1,363 posts | 485 followers
FollowAmy - November 9, 2022
Alibaba Cloud Community - October 15, 2025
hyj1991 - July 22, 2019
Ced - February 17, 2025
Alibaba Cloud Community - August 27, 2025
Alibaba Cloud Community - September 3, 2025
1,363 posts | 485 followers
Follow
Container Compute Service (ACS)
A cloud computing service that provides container compute resources that comply with the container specifications of Kubernetes
Learn More
Container Service for Kubernetes
Alibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn More
Tongyi Qianwen (Qwen)
Top-performance foundation models from Alibaba Cloud
Learn More
Alibaba Cloud for Generative AI
Accelerate innovation with generative AI to create new business success
Learn MoreMore Posts by Alibaba Cloud Community