All Products
Search
Document Center

Alibaba Cloud Model Studio:Video generation and editing models

Last Updated:Apr 23, 2026

Choose the right model for text-to-video, image-to-video, and video editing.

Text-to-video

To generate videos with synchronized audio from text prompts, use wan2.7-t2v. It supports multi-shot narratives, 1080P resolution, and clips up to 15 seconds long.

Audio requirements

If your project requires synchronized narration, sound effects, or background music, use wan2.7-t2v. wan2.6-t2v also supports audio and offers better SDK compatibility, while wan2.5-t2v-preview is another option. For silent video, use a cost-effective model like wan2.2-t2v-plus.

Resolution and duration

For 1080P resolution and up to 15 seconds, use wan2.7-t2v. wan2.6-t2v offers the same specifications and is a good secondary option. For 480P to 720P resolution and a 5-second duration, use wan2.2-t2v-plus or the lower-cost wan2.1-t2v-turbo.

Image-to-video

To create dynamic videos from static images, use wan2.7-i2v. wan2.6-i2v offers similar quality and better SDK compatibility. For lower cost, use wan2.6-i2v-flash.

Build long, coherent videos

Use a first/last frame model, such as wan2.7-i2v or wan2.2-kf2v-flash, to stitch multiple clips together. Setting the last frame of one clip as the first frame of the next creates seamless transitions, ideal for narratives, product demos, or tutorials.

Single image to video

Use wan2.7-i2v, which supports audio, 1080P resolution, and durations from 2 to 15 seconds. wan2.6-i2v has the same specifications and offers better SDK compatibility. If speed and low cost are priorities, use wan2.6-i2v-flash.

Video reference

To maintain character consistency across scenes, use wan2.7-r2v. wan2.6-r2v also supports multiple characters and audio synchronization, and offers better SDK compatibility. For a faster, lower-cost option, use wan2.6-r2v-flash.

Video editing

To edit existing videos using text instructions for tasks like style transfer or element replacement, use wan2.7-videoedit. For video inpainting, outpainting, or local editing, use wan2.1-vace-plus.

Character animation

Motion-driven character animation

To transfer motion from a reference video to a character in a static image, use wan2.2-animate-move. The background remains unchanged. The pro mode (wan-pro) produces results closer to live-action footage, while the standard mode (wan-std) is faster and more cost-effective.

Character replacement in video

To replace a character in a video with one from a source image, use wan2.2-animate-mix. It also supports pro and standard modes.

Recommended models

Model

Use case

Audio

Max resolution

Max duration

wan2.7-t2v

Highest quality text-to-video

Supported

720P, 1080P

2–15s

wan2.6-t2v

Text-to-video, good SDK compatibility

Supported

720P, 1080P

2–15s

wan2.7-i2v

Highest quality image-to-video

Supported

720P, 1080P

2–15s

wan2.6-i2v-flash

Cost-effective image-to-video

Supported

720P, 1080P

2–15s

wan2.7-i2v

Stitching clips with first/last frames

--

720P, 1080P

2–15s

wan2.7-r2v

Cross-scene character consistency

Supported

720P, 1080P

2–10s

wan2.6-r2v-flash

Cost-effective character consistency

Optional

720P, 1080P

2–10s

wan2.7-videoedit

Instruction-based editing, style transfer

--

720P, 1080P

2–10s

wan2.2-animate-move

Transfer motion to a static character

--

720P

2–30s

wan2.2-animate-mix

Replace a character in a video

--

720P

2–30s

All models

Wan 2.7

The following models are available in international and Chinese mainland deployment scopes.

Model ID

Type

Features

Output specifications

wan2.7-t2v

text-to-video

Audio sync, multi-shot narrative

720P, 1080P. 2–15s. 30 fps, MP4

wan2.7-i2v

image-to-video

First frame, first/last frame, video continuation, audio-driven

720P, 1080P. 2–15s. 30 fps, MP4

wan2.7-r2v

video reference

Multi-character, ImageN/VideoN reference format

720P, 1080P. 2–10s. 30 fps, MP4

wan2.7-videoedit

video editing

Instruction-based editing, style transfer

720P, 1080P. 2–10s. 30 fps, MP4

Wan 2.6

The following models are available in international and Chinese mainland deployment scopes.

Model ID

Type

Features

Output specifications

wan2.6-t2v

text-to-video

Audio sync, multi-shot narrative

720P, 1080P. 2–15s. 30 fps, MP4

wan2.6-i2v

image-to-video

Audio sync, multi-shot narrative

720P, 1080P. 2–15s. 30 fps, MP4

wan2.6-i2v-flash

image-to-video

Audio, multi-shot, fast generation

720P, 1080P. 2–15s. 30 fps, MP4

wan2.6-r2v

video reference

Audio sync, multi-character, narrative

720P, 1080P. 2–10s. 30 fps, MP4

wan2.6-r2v-flash

video reference

Multi-character, fast generation

720P, 1080P. 2–10s. 30 fps, MP4

wan2.6-t2v-us

text-to-video

Audio sync, multi-shot narrative; for US deployment scope

720P, 1080P. 2–15s. 30 fps, MP4

wan2.6-i2v-us

image-to-video

Audio sync, multi-shot narrative; for US deployment scope

720P, 1080P. 2–15s. 30 fps, MP4

Wan 2.5

The following models are available in international and Chinese mainland deployment scopes.

Model ID

Type

Features

Output specifications

wan2.5-t2v-preview

text-to-video

Audio sync

480P, 720P, 1080P. 5s, 10s. 30 fps, MP4

wan2.5-i2v-preview

image-to-video

Audio sync

480P, 720P, 1080P. 5s, 10s. 30 fps, MP4

Wan 2.2

The following models are available in international and Chinese mainland deployment scopes.

Model ID

Type

Features

Output specifications

wan2.2-t2v-plus

text-to-video

No audio

480P, 1080P. 5s. 30 fps, MP4

wan2.2-i2v-plus

image-to-video

No audio

480P, 1080P. 5s. 30 fps, MP4

wan2.2-i2v-flash

image-to-video

No audio, 50% faster than 2.1

480P, 720P, 1080P. 5s. 30 fps, MP4

wan2.2-kf2v-flash

first/last frame

No audio

480P, 720P, 1080P. 5s. 30 fps, MP4

wan2.2-animate-move

character animation

wan-std / wan-pro modes

720P. 2–30s. 15/25 fps. MP4

wan2.2-animate-mix

character replacement

wan-std / wan-pro modes

720P. 2–30s. 15/25 fps. MP4

Wan 2.1 (Wan 2.7 is recommended)

The following models are available in international and Chinese mainland deployment scopes.

Model ID

Type

Features

Output specifications

wan2.1-t2v-plus

text-to-video

No audio

720P. 5s. 30 fps, MP4

wan2.1-t2v-turbo

text-to-video

No audio

480P, 720P. 5s. 30 fps, MP4

wan2.1-i2v-plus

image-to-video

No audio

720P. 5s. 30 fps, MP4

wan2.1-i2v-turbo

image-to-video

No audio

480P, 720P. 3–5s. 30 fps, MP4

wan2.1-kf2v-plus

first/last frame

No audio

720P. 5s. 30 fps, MP4

wan2.1-vace-plus

video editing

No audio

720P. Max 5s. 30 fps, MP4