×
Community Blog Entering the Physical AI Era: Introducing the Qwen-Robot Suite

Entering the Physical AI Era: Introducing the Qwen-Robot Suite

This article introduces Alibaba's Qwen-Robot Suite, a new set of foundational models for physical AI and robotics.
  • The debut of Alibaba’s first Qwen-based foundational robotics models marks a key milestone as the company extends its foundational model architecture from the digital realm into physical AI.
  • Qwen-Robot Suite models demonstrated industry-leading performance across dozens of robotics benchmarks.

Qwen_robot_suite_cover

For years, large multimodal models have excelled at understanding digital data, such as text, speech, and images. However, translating this general intelligence into precise physical actions across diverse environments has remained a major bottleneck. Traditional robots often struggle in unfamiliar settings or with new instructions because they cannot dynamically map language commands to physical movements.

To address this challenge, Alibaba has officially launched its first suite of Qwen-based foundational robotics models: Qwen-Robot Suite. The suite comprises three core models:

Qwen-RobotManip: a generalizable Vision-Language-Action (VLA) model; Qwen-RobotNav: A scalable Vision-Language-Navigation (VLN) model; and Qwen-RobotWorld: A video world model designed for embodied intelligence.

By addressing distinct facets of physical interaction—from mobility and manipulation to world dynamics—the Qwen-Robot Suite enables real-world robots, whether industrial arms, delivery bots, or robotic dogs, to dynamically perceive, reason, and act in real time. Crucially, the suite’s strong generalization capabilities allow these models to handle unseen tasks and instructions adaptively. They can operate smoothly in unfamiliar environments and interact with novel objects while strictly adhering to physical laws and following natural language commands.

This debut represents a pivotal milestone as Alibaba extends its Qwen architecture from the digital realm into physical AI. As a frontrunner in the agentic AI era, the tech giant is actively shifting its focus from simple chatbots to autonomous agents built to manage real-life complex tasks in both the digital world and, increasingly, the physical space.

The three models have demonstrated industry-leading performance across dozens of authoritative robotics benchmarks, including RoboChallenge, a large-scale benchmark for embodied intelligence using real robots. The Qwen-Robot Suite has already entered pilot testing with selected Alibaba Cloud enterprise customers in the robotics sector.

2
Qwen-RobotManip (codename Lira and Atlas) tops RoboChallenge, a large-scale real-robot-based benchmarking of embodied intelligence

  • Qwen-RobotManip: Built on the Qwen3.5-4B VL model, this VLA model was trained on over 38,000 hours of purely open-source data, curated from robotics repositories, human manipulation videos, and synthesized human-to-robot datasets. It delivers a three-fold improvement over the prior state-of-the-art (SOTA) in cross-embodiment transfer, making it possible to deploy the model across diverse robot hardware with minimal retraining.
  • Qwen-RobotNav: Powered by Qwen3-VL and trained on a curated corpus of 15.6 million samples spanning trajectory planning and vision-language reasoning, this VLN model serves as both a scalable navigation engine and a unified interface for agentic navigation systems. That makes it a natural building block for agentic systems handling long-horizon tasks such as embodied question answering (or “EQA”, an AI task where a robot is asked a question of physical spaces like “Where did I leave my badge”).
  • Qwen-RobotWorld: This video world model predicts physically grounded future visual trajectories based on current observations. Trained on 8.6 million video-text pairs comprising over 200 million frames across more than 20 embodiment types and 500 action categories, the model can generate synthetic video training data for robots and allow systems to simulate future trajectories before execution. This capability is highly applicable for robotic manipulation, embodied planning, and complex indoor navigation.

3
The Qwen-Robot Suite demonstrates industry-leading performance across robot evaluation benchmarks

The Qwen-Robot Suite unlocks the potential to transition general AI models into practical agents within physical space. General-purpose Qwen models can compose directly with the robotic models, utilizing them as specialized tools to bridge the gap between general intelligence and physical action. For instance, in an agentic workflow handling an open-ended request like “check whether a green umbrella was left at Cotti Coffee,” an agentic system that utilizes a general-purpose Qwen model as an upper-level strategic planner and Qwen-RobotNav as the tool for real-time execution can autonomously navigate the physical venue to return an evidence-grounded answer.

Looking ahead, Alibaba plans to integrate the Qwen-Robot Suite into a wider ecosystem of physical agents, empowering them with highly autonomous perception, spatial decision-making, and long-horizon execution in dynamic real-world environments.


This article was originally published on Alizila written by Crystal Liu.

0 0 0
Share on

Alibaba Cloud Community

1,426 posts | 499 followers

You may also like

Comments