Audio-Driven Lip Sync for Realistic Video - Alibaba Cloud Model Studio

VideoRetalk synchronizes lip movements with audio, generating a new video from an input video and an audio file.

Important

This document applies to China (Beijing). To use the model, you must use an API key from the China (Beijing) region.

Model overview

Input example

Output example

Character video:

Voice audio:

Model

Unit price

RPS limit for task submission

Number of concurrent tasks

videoretalk

$0.011469/second (pay-as-you-go, by generated video duration)

(one task runs at a time; others are queued)

To increase the RPS limit, email modelstudio@service.aliyun.com with your Alibaba Cloud account ID, the model name, and the required RPS.

VideoRetalk requires API calls (pay-as-you-go) — it cannot be tested in the Model Studio console.
Call VideoRetalk with a clear, front-facing character video and an audio file to generate lip-synced output. For more information, see VideoRetalk video generation.