All Products
Search
Document Center

Alibaba Cloud Model Studio:VideoRetalk API reference

Last Updated:Mar 15, 2026

Use the VideoRetalk API to generate videos with synchronized lip movements by replacing a person's speech with a provided audio track.

Important

This document applies only to the China (Beijing) region. To use the model, use an API key from the China (Beijing) region.

HTTP

VideoRetalk supports only HTTP calls and uses an asynchronous process: submit a task, then query for results (two separate requests). This reduces wait times and prevents timeouts.

Prerequisites

You have created an API key and set the API key as an environment variable.

Input limitations

  • Video requirements:

    • File: MP4, AVI, or MOV. Max 300 MB. Duration: 2-120 seconds.

    • Properties: Frame rate: 15-60 fps. Encoding: H.264 or H.265 required. Side length: 640-2,048 pixels.

    • Content: Close-up, front-facing person. Avoid extreme angles or very small faces. If the video contains no face, see FAQ.

  • Audio requirements:

    • File: WAV, MP3, or AAC. Max 30 MB. Duration: 2-120 seconds. If audio and video durations differ, see FAQ.

    • Content: Clear, loud human voice. Remove ambient noise and background music.

  • Character reference image requirements:

    • File: JPEG, JPG, PNG, BMP, or WebP. Max 10 MB. Aspect ratio: 2 or less; longest side: 4,096 pixels or less.

    • Content: Clear, frontal face view. Person must appear in the video. You can use a video screenshot.

  • File URL requirements:

    • Files must be accessible via HTTP links (local paths not supported). Use the platform's temporary storage space to upload local files and create links.

Submit a task

POST https://dashscope.aliyuncs.com/api/v1/services/aigc/image2video/video-synthesis/

Request parameters

Field

Type

Location

Required

Description

Example

Content-Type

String

Header

Yes

application/json

application/json

Authorization

String

Header

Yes

API key (format: Bearer YOUR_KEY)

Bearer d1**2a

X-DashScope-Async

String

Header

Yes

Set to enable for asynchronous task creation.

enable

model

String

Body

Yes

Model to call.

videoretalk

input.video_url

String

Body

Yes

URL of the video file you uploaded. See Input limitations for file requirements.

http://aaa/bbb.mp4

input.audio_url

String

Body

Yes

URL of the audio file you uploaded. See Input limitations for file requirements.

http://aaa/bbb.wav

input.ref_image_url

String

Body

No

URL of the reference face image. Use this to specify which face to sync when multiple faces are present. If omitted, the system uses the largest face in the first frame. See Input limitations for file requirements.

http://aaa/bbb.jpg

parameters.video_extension

Boolean

Body

No

Specifies whether to extend the video when audio is longer. Default: false.

  • true: Extends the video to match the audio length by looping in a "reverse-play, forward-play" pattern.

  • false: Does not extend the video. The generated video matches the original duration, and the audio is truncated.

false

parameters.query_face_threshold

Integer

Body

No

Specifies the confidence level for face matching when a reference image is provided. Range: 120-200 (smaller = looser matching, larger = stricter matching). Default: 170. Ignored if input.ref_image_url is empty.

170

Response parameters

Field

Type

Description

Example

output.task_id

String

Submitted task ID. Use this to query task status and retrieve results.

a8532587-fa8c-4ef8-82be-0c46b17950d1

output.task_status

String

Task status after submission.

"PENDING"

request_id

String

Request ID.

7574ee8f-38a3-4b1e-9280-11c33ab46e51

Sample request

curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/image2video/video-synthesis/' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "videoretalk",
    "input": {
        "video_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250717/pvegot/input_video_01.mp4",
        "audio_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250717/aumwir/stella2-%E6%9C%89%E5%A3%B0%E4%B9%A67.wav",
        "ref_image_url": ""
     },
    "parameters": {
        "video_extension": false
    }
  }'

Sample response

{
    "output": {
	"task_id": "a8532587-fa8c-4ef8-82be-0c46b17950d1", 
    	"task_status": "PENDING"
    },
    "request_id": "7574ee8f-38a3-4b1e-9280-11c33ab46e51"
}

Query task status and retrieve results

GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}

Request parameters

Field

Type

Location

Required

Description

Example

Authorization

String

Header

Yes

API key (format: Bearer YOUR_KEY)

Bearer d1**2a

task_id

String

Url Path

Yes

Task ID to query (returned by task submission API).

a8532587-fa8c-4ef8-82be-0c46b17950d1

Response parameters

Field

Type

Description

Example

output.task_id

String

Queried task ID.

a8532587-fa8c-4ef8-82be-0c46b17950d1

output.task_status

String

Queried task status.

Task statuses:

  • PENDING

  • PRE-PROCESSING

  • RUNNING

  • POST-PROCESSING

  • SUCCEEDED

  • FAILED

  • UNKNOWN: The task does not exist or its status is unknown.

output.video_url

String

Generated video URL. Valid for 24 hours after task completion.

https://xxx/1.mp4"

usage.video_duration

Float

Generated video duration (seconds).

"video_duration": 10.23

usage.video_ratio

String

Generated video aspect ratio type. Value: standard (output matches original by default).

"video_ratio": "standard"

usage.size

String

Generated video resolution (matches input).

"size": "1080*1920"

usage.fps

Integer

Generated video frame rate (matches input).

"fps": 25

request_id

String

Request ID.

7574ee8f-38a3-4b1e-9280-11c33ab46e51

Sample request

curl -X GET 'https://dashscope.aliyuncs.com/api/v1/tasks/<YOUR_TASK_ID>' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Sample response

{
    "request_id": "87b9dce5-7f36-4305-a347-xxxxxx",
    "output": {
        "task_id": "3afd65eb-9604-48ea-8a91-xxxxxx",
        "task_status": "SUCCEEDED",
        "submit_time": "2025-09-11 20:15:29.887",
        "scheduled_time": "2025-09-11 20:15:36.741",
        "end_time": "2025-09-11 20:16:40.577",
        "video_url": "http://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/xxx.mp4?Expires=xxx"
    },
    "usage": {
        "video_duration": 7.2,
        "size": "1080*1920",
        "video_ratio": "standard",
        "fps": 25
    }
}

Sample error response

{
    "request_id": "7574ee8f-38a3-4b1e-9280-11c33ab46e51",
  	"output": {
        "task_id": "a8532587-fa8c-4ef8-82be-0c46b17950d1", 
    	"task_status": "FAILED",
    	"code": "xxx", 
    	"message": "xxxxxx" 
    }  
}

Error codes

See Error messages for general status codes.

Model-specific error codes:

HTTP return code

Error code

Error message

Description

400

InvalidParameter

Field required: xxx

Missing or incorrect request parameter.

400

InvalidURL.ConnectionRefused

Connection to ${url} refused, please provide avaiable URL

Download rejected. Provide an available URL.

400

InvalidURL.Timeout

Download ${url} timeout, please check network connection.

Download timed out (timeout: 60s).

400

InvalidFile.Size

Invalid file size. The video/audio/image file size must be less than **MB.

File must be smaller than ** MB.

400

InvalidFile.Format

Invalid file format,the request file format is one of the following types: MP4, AVI, MOV, MP3, WAV, AAC, JPEG, JPG, PNG, BMP, and WEBP.

Invalid file format. Supported: Video (MP4/AVI/MOV), Audio (MP3/WAV/AAC), Image (JPG/JPEG/PNG/BMP/WebP).

400

InvalidFile.Resolution

Invalid video resolution. The height or width of video must be 640 ~ 2048.

Video side length must be 640-2048 pixels.

400

InvalidFile.FPS

Invalid video FPS. The video FPS must be 15 ~ 60.

Video frame rate must be 15-60 fps.

400

InvalidFile.Duration

Invalid file duration. The video/audio file duration must be 2s ~ 120s.

Video/audio duration must be 2-120 seconds.

400

InvalidFile.ImageSize

The size of image is beyond limit.

Image size exceeds limit. Aspect ratio must be 2 or less; longest side must be 4,096 pixels or less.

400

InvalidFile.Openerror

Invalid file, cannot open file as video/audio/image.

Cannot open file.

400

InvalidFile.Content

The input image has no human body or multi human bodies. Please upload other image with single person.

Image contains no person or multiple people.

400

InvalidFile.FaceNotMatch

There are no matched face in the video with the provided reference image.

Reference face does not match any video face.

FAQ

  1. How do I handle input video and audio with different durations?

    By default, the longer file is truncated to match the shorter one. To loop the video to match the audio length, set video_extension to true (reverses then plays forward).

  2. How does the API handle silent segments in the input audio?

    Model generates closed-mouth frames for silent audio segments.

  3. What happens when a video frame contains no face, but the corresponding audio has speech?

    Original frame is preserved; audio continues. Lip-sync only applies to frames with detectable faces.

  4. How do I select a specific person for lip-syncing in a video with multiple people?

    API syncs one person only. Use input.ref_image_url to specify the target face. If omitted, the largest face in the first frame is synced.