Alibaba Cloud has introduced Wanx 2.1, the latest iteration of its multimodal large model Tongyi Wanxiang (Wanx), which first debuted in July 2023. Designed to generate high-quality images and videos from text input, Wanx 2.1 represents a significant leap forward in AI-driven visual content creation.
The new model excels at generating realistic visuals by accurately handling complex movements, enhancing pixel quality, adhering to physical rules, and optimizing the precision of instruction follow-through. Its precision in following instructions has propelled Wanx 2.1 to the top of the VBench leaderboard, a comprehensive benchmark suite for video generative models. According to VBench, with an overall score of 84.7%, Wanx 2.1 leads in key dimensions such as dynamic degree, spatial relationships, and multi-object interactions.
To maximize the visual generation quality, the research team behind Wanx 2.1 has made significant technology progress across several fronts: first of all, by leveraging a proprietary VAE (Variational Autoencoder) and DiT (Denoising Diffusion Transformer) framework, Wanx 2.1 excels in strengthening temporal and spatial relationships and hence, achieving higher visual realism in dealing with scenes that involve complicated motion movement and physical rules.
By employing a full space-time attention mechanism, the model can also mimic the complex dynamics of the real world with remarkable accuracy.
Innovative approaches has also been adopted to accelerate the model’s training process using ultra-long context. This ensures seamless integration of text instructions into video generation,enabling faster and more intuitive content creation.
Additionally, Wanx 2.1 has achieved a groundbreaking milestone by becoming the first video generation model to support text effects in both Chinese and English, meeting the diverse creative needs of industries such as advertising design and short video production.
Text Prompt:「平拍一位女性花样滑冰运动员在冰场上进行表演的全景。她穿着紫色的滑冰服,脚踩白色的滑冰鞋,正在进行一个旋转动作。她的手臂张开,身体向后倾斜,展现了她的技巧和优雅」。English translation: “A panoramic shot of a female figure skater performing on an ice rink. She is wearing a purple skating outfit and white skates, executing a spinning move. Her arms are outstretched, and her body leans backward, showcasing her skill and grace.”
As a result of such innovative approaches, Wanx 2.1 demonstrates its ability to generate videos with large-scale bodily movements and complex rotations. Even in challenging scenarios such as figure skating, swimming, and diving, the model maintains body coordination and adheres to realistic motion trajectories, setting a new benchmark for video generation.
Wanx 2.1 is currently available for free on its official Chinese website. Individual developers and corporate users can explore its potential through Alibaba Cloud’s generative AI platform, Model Studio. This empowers users to create high-quality visual content tailored to their unique needs, further bridging the gap between AI technology and creative industries.
This article was originally published on Alizila writtern by Crystal Liu.
AI Forward: Alibaba Cloud Developer Summit 2025 Now Open for Registration
1,097 posts | 321 followers
FollowAlibaba Cloud Community - January 21, 2025
Rupal_Click2Cloud - December 12, 2023
Alibaba Cloud Community - August 12, 2024
Alibaba Cloud Storage - February 10, 2021
Alibaba Cloud Community - June 14, 2024
Alibaba Cloud Community - December 25, 2024
1,097 posts | 321 followers
FollowTop-performance foundation models from Alibaba Cloud
Learn MoreAccelerate innovation with generative AI to create new business success
Learn MoreAccelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreA platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreMore Posts by Alibaba Cloud Community