Alibaba Cloud has introduced QwQ-32B, a compact reasoning model with only 32 billion parameters, delivering performance comparable to other larger cutting edge models.
Built on Qwen2.5-32B, Alibaba Cloud’s latest large language model with the exact parameter count, QwQ-32B excels across a variety of benchmarks, including AIME 24 (mathematical reasoning), Live CodeBench (coding proficiency), LiveBench (test set contamination and objective evaluation), IFEval (instruction-following ability), and BFCL (tool and function-calling capabilities).
The results below highlight QwQ-32B’s performance in comparison to other leading models, including DeepSeek-R1-Distilled-Qwen-32B, DeepSeek-R1-Distilled-Llama-70B, o1-mini, and the original DeepSeek-R1.
The exceptional performance of QwQ-32B highlights the power of Reinforcement Learning (RL), the core technique behind the model, when applied to a robust foundation model like Qwen2.5-32B, which is pre-trained on extensive world knowledge. By leveraging continuous RL scaling, QwQ-32B demonstrates significant improvements in mathematical reasoning and coding proficiency.
Additionally, the model was trained using rewards from a general reward model and rule-based verifiers, enhancing its general capabilities. These include better instruction-following, alignment with human preferences, and improved agent performance.
The research team has also integrated agent-related capabilities into QwQ-32B, enabling it to think critically, utilize tools effectively, and adapt its reasoning based on environmental feedback. The team is also exploring further integration of agents with RL to enable long-horizon reasoning, aiming to unlock even greater intelligence through inference-time scaling.
QwQ-32B is now available as an open-source model on Hugging Face and Model Scope under the Apache 2.0 license, allowing free downloads. It is also accessible via Qwen Chat. Thanks to its significantly reduced deployment costs, the model can be efficiently deployed on consumer-grade hardware.
For more details about QwQ-32B, visit the official blog post: QwQ-32B: Embracing the Power of Reinforcement Learning.
This article was originally published on Alizila written by Crystal Liu.
Q&A: Alibaba Cloud’s Ouyang Xin on AI's role for Cloud Security
1,115 posts | 342 followers
FollowAlibaba Cloud Community - December 2, 2024
Alibaba Cloud Community - January 2, 2025
Alibaba Cloud Community - February 27, 2025
Ced - February 17, 2025
Fuji - February 25, 2025
Alibaba Cloud Community - September 19, 2024
1,115 posts | 342 followers
FollowTop-performance foundation models from Alibaba Cloud
Learn MoreA one-stop generative AI platform to build intelligent applications that understand your business, based on Qwen model series such as Qwen-Max and other popular models
Learn MoreAccelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreMore Posts by Alibaba Cloud Community