Generate lip-sync videos from one image and one audio clip. Supports portrait, half-body, or full-body frames with no composition restrictions.
This document applies only to the China (Beijing) region. An API key from the China (Beijing) region is required to use the model.
Model overview
Sample results
|
Sample input |
Output video |
|
Input audio |
Models and pricing
|
Model |
Description |
Unit price |
Rate limit (shared by Alibaba Cloud accounts and RAM users) |
|
|
RPS limit for task submission API |
Concurrent tasks |
|||
|
wan2.2-s2v-detect |
Validates image quality, single person, and frontal view. |
$0.000574/image |
5 |
No limit for sync APIs |
|
wan2.2-s2v |
Generates a video from a validated image and audio clip. |
480p: $0.071677/second 720p: $0.129018/second |
5 |
1 |
Video generation workflow:
-
Validate image with wan2.2-s2v-detect API
-
If compliant, submit video generation task with wan2.2-s2v API (image URL + audio URL), then poll for results
Getting started
Prerequisites
Before you call the API, activate Model Studio and get an API key. Then, set the API key as an environment variable.
Sample code
The sample image has already passed detection. Code below generates a video.
HTTP workflow: create task → retrieve result.
Step 1: Create a task to get a task ID
Returns a task_id for querying results.
curl 'https://dashscope.aliyuncs.com/api/v1/services/aigc/image2video/video-synthesis/' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "wan2.2-s2v",
"input": {
"image_url": "https://img.alicdn.com/imgextra/i3/O1CN011FObkp1T7Ttowoq4F_!!6000000002335-0-tps-1440-1797.jpg",
"audio_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250825/iaqpio/input_audio.MP3"
},
"parameters": {
"style": "speech"
}
}'
Step 2: Query the result by task ID
Replace 86ecf553-d340-4e21-xxxxxxxxx with the actual task ID.
API keys are region-specific. See API key documentation for details.
For models in the Beijing region, replace base_url with https://dashscope.aliyuncs.com/api/v1/tasks/86ecf553-d340-4e21-xxxxxxxxxcurl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/86ecf553-d340-4e21-xxxxxxxxx \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"Task ID expires after 24 hours. Expired queries return status UNKNOWN.
Model comparison
Model selection: Use wan2.2-s2v for full-body or large half-body frames. For cost-effective portraits, use EMO.
|
Feature comparison |
Digital Human wan2.2-s2v |
EMO (View) |
|
Model description |
More natural movements with wider frame support (especially full-body and cartoon characters). |
Better for close-ups and portraits with natural lip-sync and expressions. |
|
Applicable frames |
Full-body, half-body, portrait |
Portrait, half-body (recommended) |
|
Invocation method |
Two-step: detection API for compliance only (simpler integration). |
Two-step: detection API returns coordinates required by generation API. |
|
Style control |
Scenario-driven (speaking, singing, performing) |
Style-driven (moderate, calm, lively) |
|
Output specifications |
By resolution (480p, 720p) |
By aspect ratio (1:1, 3:4) |
|
Model call price |
|
|
Next steps
API documentation:
