
Qwen3-Omni is a next-generation native multimodal large model capable of seamlessly processing multiple input modalities—including text, images, audio, and video—and generating both text and natural-sounding speech outputs simultaneously via real-time streaming responses. This version introduces multiple enhancements to improve model performance and efficiency.
Qwen3-Omni-Flash-2025-12-01 is a comprehensively upgraded iteration built upon Qwen3-Omni.
Key highlights of this upgraded version include:
On objective benchmarks, Qwen3-Omni-Flash-2025-12-01 achieves substantial improvements across all modalities compared to Qwen3-Omni-Flash:
🧠 Stronger Text Understanding & Generation:
Major gains in logical reasoning (ZebraLogic +5.6), code generation (LiveCodeBench-v6 +9.3, MultiPL-E +2.7), and holistic writing quality (WritingBench +2.2), enabling more reliable execution of complex, multi-step instructions.
👂 More Accurate Speech Understanding:
Significantly lower word error rate on Fleurs-zh, along with a +3.2 improvement on VoiceBench, reflecting enhanced comprehension of spoken language in real-world dialogue scenarios.
🎙️More Natural Speech Synthesis:
Higher-quality, human-like voice generation across multiple languages—especially in Chinese and multilingual contexts—with improved prosody, pacing, and pausing that closely mirrors natural human speech.
👁️Deeper Image Understanding:
Breakthrough performance on visual reasoning tasks, including +4.7 on MMMU, +4.8 on MMMU-Pro, and +2.2 on MathVision_full, demonstrating a stronger ability to “see,” interpret, and reason about complex visual content—from diagrams to mathematical figures.
🎬 More Coherent Video Understanding:
Steady improvement in video semantic comprehension (MLVU +1.6), further strengthened by tighter audio-visual synchronization, laying a solid foundation for seamless real-time video conversations.
With this upgrade, Qwen3-Omni-Flash-2025-12-01 truly embodies the vision of “Hear You. See You. Follow Smarter.”—delivering an AI interaction experience that is more natural, precise, and vivid than ever before.

We are eager to hear your feedback and see the innovative applications you create with Qwen3-Omni. In the near future, we will further advance the model along multiple axes, including multi-speaker ASR, video OCR, audio–video proactive learning, and enhance support for agent-based workflows and function calling.
If you find our model helpful in your research, we’d appreciate a citation!
BibTeX
@misc{qwen3_omni_20251201,
author = {{Qwen Team, Alibaba}},
title = {{Qwen3-Omni-Flash-2025-12-01:Hear You. See You. Follow Smarter!}},
year = {2025},
url = {https://qwen.ai/blog?id=qwen3-omni-20251201},
urldate = {2025-12-09}
}
See the original source here
SAPO: A Stable and Performant Reinforcement Learning Method for Training Large Language Models
1,297 posts | 456 followers
FollowAmy - November 9, 2022
Alibaba Cloud Community - October 15, 2025
hyj1991 - July 22, 2019
Ced - February 17, 2025
Alibaba Cloud Community - August 27, 2025
Alibaba Cloud Community - September 3, 2025
1,297 posts | 456 followers
Follow
Tongyi Qianwen (Qwen)
Top-performance foundation models from Alibaba Cloud
Learn More
Alibaba Cloud for Generative AI
Accelerate innovation with generative AI to create new business success
Learn More
AI Acceleration Solution
Accelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn More
Platform For AI
A platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreMore Posts by Alibaba Cloud Community