
The Alibaba Qwen team has received the prestigious “NeurIPS 2025 Best Paper Award” at the Conference on Neural Information Processing Systems (NeurIPS), one of the world’s most premier conferences in machine learning and artificial intelligence. The award recognizes the team’s pioneering research on attention mechanisms in large language models (LLMs).
The winning paper, titled “Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free”, is the first in the industry to systematically examine how attention gating affects the performance and training of large models.

Gating, a mechanism that controls the flow of information through the network, is one of the most widely used techniques in LLM architectures. Functioning like “intelligent noise-canceling headphones” for a model, it helps filter out irrelevant information and boosts overall effectiveness.
To rigorously evaluate the role of gating, the Qwen team conducted an extensive study, comparing over 30 variants of 15B Mixture-of-Experts (MoE) models and 1.7B dense models trained on a 3.5-trillion-token dataset. Research results show that a simple architectural modification – adding a head-specific sigmoid gate after Scaled Dot-Product Attention (SDPA) – consistently improves model performance. This modification enhances training stability, allows for larger learning rates, and improves scaling properties.
These findings have already been incorporated into the Qwen3-Next model released in September 2025, which introduced architectural innovations by replacing standard attention with a combination of Gated DeltaNet and Gated Attention. This design improves in-context learning capabilities while increasing computational efficiency.
To support further research and community adoption, the Qwen team has already released related codes and models on Github and HuggingFace.
“The main recommendation of the paper is easily implemented, and given the extensive evidence provided in the paper for this modification to LLM architecture, we expect this idea to be widely adopted,” commented by the NeurIPS Selection Committee.
“This paper represents a substantial amount of work that is possible only with access to industrial scale computing resources, and the authors’ sharing of the results of their work, which will advance the community’s understanding of attention in large language models, is highly commendable, especially in an environment where there has been a move away from open sharing of scientific results around LLMs.” added the Selection Committee.
This article was originally published on Alizila written by Claire Mo.
Alibaba’s Quark Unveils Revamped AI Browser, Deeply Integrated with Qwen
Scale Smarter, Not Harder: The 2026 AI Growth Roadmap for SMBs
1,291 posts | 455 followers
FollowApsaraDB - December 3, 2025
Alibaba Cloud Community - September 23, 2025
Alibaba Cloud Community - October 28, 2025
Farruh - December 5, 2025
ApsaraDB - July 30, 2024
Alibaba Cloud Community - September 16, 2025
1,291 posts | 455 followers
Follow
Tongyi Qianwen (Qwen)
Top-performance foundation models from Alibaba Cloud
Learn More
Alibaba Cloud for Generative AI
Accelerate innovation with generative AI to create new business success
Learn More
AI Acceleration Solution
Accelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn More
Platform For AI
A platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreMore Posts by Alibaba Cloud Community