All Products
Search
Document Center

Intelligent Media Services:SubmitAvatarVideoJob

Last Updated:Dec 23, 2025

Submits a task to render a video of an avatar speaking the content of the specified text or a human voice audio file.

Operation description

  • The input supports only text or a human voice audio file in MP3 or WAV format.
  • The output supports MP4 and WebM formats. For the MP4 format, the task produces two videos: one with the avatar on a green screen background and a separate alpha mask video. This is ideal for post-production. For the WebM format, the task produces a single video with a transparent alpha channel, suitable for direct web front-end display. Rendering WebM is slower due to encoding complexity.
  • The final output includes sentence-level timestamps, which are useful for subsequent video editing.

Debugging

You can run this interface directly in OpenAPI Explorer, saving you the trouble of calculating signatures. After running successfully, OpenAPI Explorer can automatically generate SDK code samples.

Authorization information

The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:

  • Operation: the value that you can use in the Action element to specify the operation on a resource.
  • Access level: the access level of each operation. The levels are read, write, and list.
  • Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
    • For mandatory resource types, indicate with a prefix of * .
    • If the permissions cannot be granted at the resource level, All Resources is used in the Resource type column of the operation.
  • Condition Key: the condition key that is defined by the cloud service.
  • Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
OperationAccess levelResource typeCondition keyAssociated operation
ice:SubmitAvatarVideoJobcreate
*All Resources
*
    none
none

Request parameters

ParameterTypeRequiredDescriptionExample
InputConfigstringNo

The input configurations of the video rendering task for an avatar. You can specify text, the Object Storage Service (OSS) URL of an audio file, or the ID of a media asset. The audio file must be in the MP3 or WAV format.

Notice The text must be at least five characters in length.

{"Text": "To be, or not to be, that is the question."}
EditingConfigstringNo

The avatar configurations, including the avatar ID, voice, and speech rate.

{"AvatarId":"yunqiao"}
OutputConfigstringNo

The output configurations, including the destination URL for the rendered video.

{"MediaURL":"https://your-bucket.oss-cn-shanghai.aliyuncs.com/xxx.mp4"}
TitlestringNo

The task name. Max length: 128 bytes.

test
DescriptionstringNo

The task description. Max length: 128 bytes.

test
UserDatastringNo

A user-defined JSON string for passing custom business information, such as environment details or task metadata.

{"user":"data","env":"prod"}

Examples of InputConfig

Set InputConfig to the OSS URL of an audio file:

{
  "InputFile": "https://your-bucket.oss-cn-shanghai.aliyuncs.com/xxx.mp3"
}

Set InputConfig to text:

{
  "Text": "To be, or not to be, that is the question."}

Set InputConfig to the ID of a media asset:

{
  "MediaId": "4aef0c80cc0071edbf92f6e7c44b6302"
}

Example of EditingConfig

AvatarId: required. The ID of the avatar. For valid values, see Official avatar examples.
Voice: optional. The voice of the avatar. This parameter takes effect only when InputConfig is set to text. For valid values, see Intelligent voice samples.
CustomizedVoice: the custom voice. Specify this parameter if you want to set the avatar voice to a cloned human voice.
LoopMotion: specifies whether the first and last frames of the video are the same. When set to true, it ensures smooth transitions in movements and expressions during loop playback, preventing frame jumps. Valid values: true and false. Default value: false.
SpeechRate: the speech rate. This parameter takes effect only when InputConfig is set to text. Valid values: -500 to 500. Default value: 0.
PitchRate: the pitch. This parameter takes effect only when InputConfig is set to text. Valid values: -500 to 500. Default value: 0.
Volume: the volume. This parameter takes effect only when InputConfig is set to text. Valid values: 0 to 100. Default value: 50.
BackgroundUrl: optional. The background image. If not provided, a black background is used by default. Only JPG and PNG formats are supported. The resolution must be 1080P, matching the orientation (portrait or landscape) of the avatar image.

{
  "AvatarId": "xinxin-marketing_standing", 
  "BackgroundUrl": "https://your-bucket.oss-cn-shanghai.aliyuncs.com/xxx.jpg",
  "Voice" : "zhichu",
  "LoopMotion": true,
  "SpeechRate": 100,
  "PitchRate": 10, 
  "Volume": 10 
}

Example of OutputConfig

{
  "MediaURL": "https://your-bucket.oss-cn-shanghai.aliyuncs.com/xxx.mp4
}

Response parameters

ParameterTypeDescriptionExample
object

Schema of Response

RequestIdstring

The ID of the request.

******11-DB8D-4A9A-875B-275798******
JobIdstring

The task ID.

****20b48fb04483915d4f2cd8ac****
MediaIdstring

The media asset ID of the output file.

******70dcc471edaf00e6f6f4******

You can call the GetSmartHandleJob operation to query the state and results of the task. Sample success response:

{
	"RequestId": "2014D1A8-4143-164F-94B4-32B8F39B706D",
	"JobId": "d9367da8c7184ec7a3f24de530ac5b9a",
	"State": "Finished",
	"SmartJobInfo": {
		"Title": "default_title_2023-03-28T13:37:47Z",
		"EditingConfig": "null",
		"JobType": "AvatarVideo",
		"CreateTime": "2023-03-28T13:37:47Z",
		"ModifiedTime": "2023-03-28T13:37:47Z",
		"UserId": 1833202230108227,
		"outputConfig": {
			"mediaUrl": "https://oushu-test-shanghai.oss-cn-shanghai.aliyuncs.com/avatar/222.mp4"
		}
	},
	"JobResult": {
		"MediaId": "4aef0c80cc0071edbf92f6e7c44b6302",
		"AiResult": "{\"outputVideoUrl\":\"https://oushu-test-shanghai.oss-cn-shanghai.aliyuncs.com/avatar/222.mp4\",\"subtitleClips\":\"[{\\\"from\\\":0.0,\\\"to\\\":4.692,\\\"content\\\":\\\"Do you not see the Yellow River flowing down from the sky and rushing into the sea, never to return? \\\"},{\\\"from\\\":4.692,\\\"to\\\":9.061,\\\"content\\\":\\\"Do you not see, before the mirror's gleaming light, one's black hair turns to snow white from morning to night? \\\"}]\"}"
	}
}

Parameters of AiResult:
outputVideoUrl: the URL of the output video. In this example, the output video is in the MP4 format.
subtitleClips: the timestamps of the subtitles.

{
	"outputVideoUrl": "https://oushu-test-shanghai.oss-cn-shanghai.aliyuncs.com/avatar/222.mp4",
	"subtitleClips": "[{\"from\":0.0,\"to\":4.692,\"content\":\"Do you not see the Yellow River flowing down from the sky and rushing into the sea, never to return? \"},{\"from\":4.692,\"to\":9.061,\"content\":\"Do you not see, before the mirror's gleaming light, one's black hair turns to snow white from morning to night? \"}]"
}

Examples

Sample success responses

JSONformat

{
  "RequestId": "******11-DB8D-4A9A-875B-275798******\n",
  "JobId": "****20b48fb04483915d4f2cd8ac****\n",
  "MediaId": "******70dcc471edaf00e6f6f4******"
}

Error codes

For a list of error codes, visit the Service error codes.

Change history

Change timeSummary of changesOperation
No change history