Use the VideoRetalk API to generate videos with synchronized lip movements by replacing a person's speech with a provided audio track.
This document applies only to the China (Beijing) region. To use the model, use an API key from the China (Beijing) region.
HTTP
VideoRetalk supports only HTTP calls and uses an asynchronous process: submit a task, then query for results (two separate requests). This reduces wait times and prevents timeouts.
-
Submit a task: Submit a request to create a video generation task. The API returns a task ID.
-
Query task status and retrieve results: Use the returned task ID to query the task status and retrieve the generated video.
Prerequisites
You have created an API key and set the API key as an environment variable.
Input limitations
-
Video requirements:
-
File: MP4, AVI, or MOV. Max 300 MB. Duration: 2-120 seconds.
-
Properties: Frame rate: 15-60 fps. Encoding: H.264 or H.265 required. Side length: 640-2,048 pixels.
-
Content: Close-up, front-facing person. Avoid extreme angles or very small faces. If the video contains no face, see FAQ.
-
-
Audio requirements:
-
File: WAV, MP3, or AAC. Max 30 MB. Duration: 2-120 seconds. If audio and video durations differ, see FAQ.
-
Content: Clear, loud human voice. Remove ambient noise and background music.
-
-
Character reference image requirements:
-
File: JPEG, JPG, PNG, BMP, or WebP. Max 10 MB. Aspect ratio: 2 or less; longest side: 4,096 pixels or less.
-
Content: Clear, frontal face view. Person must appear in the video. You can use a video screenshot.
-
-
File URL requirements:
-
Files must be accessible via HTTP links (local paths not supported). Use the platform's temporary storage space to upload local files and create links.
-
Submit a task
POST https://dashscope.aliyuncs.com/api/v1/services/aigc/image2video/video-synthesis/
Request parameters
|
Field |
Type |
Location |
Required |
Description |
Example |
|
Content-Type |
String |
Header |
Yes |
application/json |
application/json |
|
Authorization |
String |
Header |
Yes |
API key (format: Bearer YOUR_KEY) |
Bearer d1**2a |
|
X-DashScope-Async |
String |
Header |
Yes |
Set to |
enable |
|
model |
String |
Body |
Yes |
Model to call. |
videoretalk |
|
input.video_url |
String |
Body |
Yes |
URL of the video file you uploaded. See Input limitations for file requirements. |
http://aaa/bbb.mp4 |
|
input.audio_url |
String |
Body |
Yes |
URL of the audio file you uploaded. See Input limitations for file requirements. |
http://aaa/bbb.wav |
|
input.ref_image_url |
String |
Body |
No |
URL of the reference face image. Use this to specify which face to sync when multiple faces are present. If omitted, the system uses the largest face in the first frame. See Input limitations for file requirements. |
http://aaa/bbb.jpg |
|
parameters.video_extension |
Boolean |
Body |
No |
Specifies whether to extend the video when audio is longer. Default:
|
false |
|
parameters.query_face_threshold |
Integer |
Body |
No |
Specifies the confidence level for face matching when a reference image is provided. Range: 120-200 (smaller = looser matching, larger = stricter matching). Default: 170. Ignored if input.ref_image_url is empty. |
170 |
Response parameters
|
Field |
Type |
Description |
Example |
|
output.task_id |
String |
Submitted task ID. Use this to query task status and retrieve results. |
a8532587-fa8c-4ef8-82be-0c46b17950d1 |
|
output.task_status |
String |
Task status after submission. |
"PENDING" |
|
request_id |
String |
Request ID. |
7574ee8f-38a3-4b1e-9280-11c33ab46e51 |
Sample request
curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/image2video/video-synthesis/' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "videoretalk",
"input": {
"video_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250717/pvegot/input_video_01.mp4",
"audio_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250717/aumwir/stella2-%E6%9C%89%E5%A3%B0%E4%B9%A67.wav",
"ref_image_url": ""
},
"parameters": {
"video_extension": false
}
}'
Sample response
{
"output": {
"task_id": "a8532587-fa8c-4ef8-82be-0c46b17950d1",
"task_status": "PENDING"
},
"request_id": "7574ee8f-38a3-4b1e-9280-11c33ab46e51"
}
Query task status and retrieve results
GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}
Request parameters
|
Field |
Type |
Location |
Required |
Description |
Example |
|
Authorization |
String |
Header |
Yes |
API key (format: Bearer YOUR_KEY) |
Bearer d1**2a |
|
task_id |
String |
Url Path |
Yes |
Task ID to query (returned by task submission API). |
a8532587-fa8c-4ef8-82be-0c46b17950d1 |
Response parameters
|
Field |
Type |
Description |
Example |
|
output.task_id |
String |
Queried task ID. |
a8532587-fa8c-4ef8-82be-0c46b17950d1 |
|
output.task_status |
String |
Queried task status. |
Task statuses:
|
|
output.video_url |
String |
Generated video URL. Valid for 24 hours after task completion. |
https://xxx/1.mp4" |
|
usage.video_duration |
Float |
Generated video duration (seconds). |
"video_duration": 10.23 |
|
usage.video_ratio |
String |
Generated video aspect ratio type. Value: standard (output matches original by default). |
"video_ratio": "standard" |
|
usage.size |
String |
Generated video resolution (matches input). |
"size": "1080*1920" |
|
usage.fps |
Integer |
Generated video frame rate (matches input). |
"fps": 25 |
|
request_id |
String |
Request ID. |
7574ee8f-38a3-4b1e-9280-11c33ab46e51 |
Sample request
curl -X GET 'https://dashscope.aliyuncs.com/api/v1/tasks/<YOUR_TASK_ID>' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
Sample response
{
"request_id": "87b9dce5-7f36-4305-a347-xxxxxx",
"output": {
"task_id": "3afd65eb-9604-48ea-8a91-xxxxxx",
"task_status": "SUCCEEDED",
"submit_time": "2025-09-11 20:15:29.887",
"scheduled_time": "2025-09-11 20:15:36.741",
"end_time": "2025-09-11 20:16:40.577",
"video_url": "http://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/xxx.mp4?Expires=xxx"
},
"usage": {
"video_duration": 7.2,
"size": "1080*1920",
"video_ratio": "standard",
"fps": 25
}
}
Sample error response
{
"request_id": "7574ee8f-38a3-4b1e-9280-11c33ab46e51",
"output": {
"task_id": "a8532587-fa8c-4ef8-82be-0c46b17950d1",
"task_status": "FAILED",
"code": "xxx",
"message": "xxxxxx"
}
}
Error codes
See Error messages for general status codes.
Model-specific error codes:
|
HTTP return code |
Error code |
Error message |
Description |
|
400 |
InvalidParameter |
Field required: xxx |
Missing or incorrect request parameter. |
|
400 |
InvalidURL.ConnectionRefused |
Connection to ${url} refused, please provide avaiable URL |
Download rejected. Provide an available URL. |
|
400 |
InvalidURL.Timeout |
Download ${url} timeout, please check network connection. |
Download timed out (timeout: 60s). |
|
400 |
InvalidFile.Size |
Invalid file size. The video/audio/image file size must be less than **MB. |
File must be smaller than ** MB. |
|
400 |
InvalidFile.Format |
Invalid file format,the request file format is one of the following types: MP4, AVI, MOV, MP3, WAV, AAC, JPEG, JPG, PNG, BMP, and WEBP. |
Invalid file format. Supported: Video (MP4/AVI/MOV), Audio (MP3/WAV/AAC), Image (JPG/JPEG/PNG/BMP/WebP). |
|
400 |
InvalidFile.Resolution |
Invalid video resolution. The height or width of video must be 640 ~ 2048. |
Video side length must be 640-2048 pixels. |
|
400 |
InvalidFile.FPS |
Invalid video FPS. The video FPS must be 15 ~ 60. |
Video frame rate must be 15-60 fps. |
|
400 |
InvalidFile.Duration |
Invalid file duration. The video/audio file duration must be 2s ~ 120s. |
Video/audio duration must be 2-120 seconds. |
|
400 |
InvalidFile.ImageSize |
The size of image is beyond limit. |
Image size exceeds limit. Aspect ratio must be 2 or less; longest side must be 4,096 pixels or less. |
|
400 |
InvalidFile.Openerror |
Invalid file, cannot open file as video/audio/image. |
Cannot open file. |
|
400 |
InvalidFile.Content |
The input image has no human body or multi human bodies. Please upload other image with single person. |
Image contains no person or multiple people. |
|
400 |
InvalidFile.FaceNotMatch |
There are no matched face in the video with the provided reference image. |
Reference face does not match any video face. |
FAQ
-
How do I handle input video and audio with different durations?
By default, the longer file is truncated to match the shorter one. To loop the video to match the audio length, set video_extension to true (reverses then plays forward).
-
How does the API handle silent segments in the input audio?
Model generates closed-mouth frames for silent audio segments.
-
What happens when a video frame contains no face, but the corresponding audio has speech?
Original frame is preserved; audio continues. Lip-sync only applies to frames with detectable faces.
-
How do I select a specific person for lip-syncing in a video with multiple people?
API syncs one person only. Use input.ref_image_url to specify the target face. If omitted, the largest face in the first frame is synced.