This topic describes the production parameters, advanced configurations, and software development kit (SDK) call examples for script-based automatic video generation.
Script-to-Video and Smart Text and Image to Video use the same Submit Job API. To learn how to distinguish between the two using parameters, see Parameter differences.
Note: In this API, the region in the Object Storage Service (OSS) URL of all media assets must be the same as the region in the OpenAPI endpoint.
The regions that support script-based automatic video generation are China (Shanghai), China (Beijing), China (Hangzhou), China (Shenzhen), US (West), and Singapore.
In practice, replace parameters such as [your-bucket], [your-region-id], [your-file-name], [your-file-path], and media asset IDs (for example, "****9d46c8b4548681030f6e****") in the examples with your actual values.
To better understand this topic, we recommend that you first learn about Script-to-Video in Smart Video Creation.
Script-based automatic video generation has two processing modes: global narration mode and grouped narration mode.
Global narration mode: Randomly combines multiple complete narration scripts with script nodes to achieve batch video mixing and editing.
Grouped narration mode: Splits a complete narration script into multiple paragraphs and pairs them with different script nodes to achieve better results.
The following section describes how to distinguish between global narration mode and grouped narration mode using parameters:
If SpeechTextArray is not empty, the mode is global narration mode.
If SpeechTextArray is empty and MediaGroupArray contains at least one MediaGroup.Duration or MediaGroup.SpeechTextArray that is not empty, the mode is grouped narration mode.
If SpeechTextArray is empty and all MediaGroup.Duration and MediaGroup.SpeechTextArray in MediaGroupArray are empty, the mode is global narration mode.
Usage notes
To intelligently mix multiple video, audio, and image materials and produce videos in batches with a single click, see the API reference for SubmitBatchMediaProducingJob - Batch intelligent one-click video production. For details about key API parameters, see InputConfig parameter details, EditingConfig parameter details, and OutputConfig parameter details.
For more information about batch smart one-click media production jobs, see GetBatchMediaProducingJob - Get information about batch smart one-click media production jobs.
InputConfig parameters
You can configure InputConfig to specify parameters for basic materials such as video assets, narration, background music, and stickers.
Parameter | Type | Description | Example | Required | Supported modes |
MediaGroupArray | List<MediaGroup> | Scripted materials for automatic video generation. You can set group names and material lists. Group name: Up to 50 characters. Emojis are not supported. Material list: Media asset ID or OSS URL of the material. A maximum of 40 groups. Each group can contain a maximum of 200 materials. | For more information, see Global announcement pattern - parameter examples and Group announcement pattern - parameter examples | Yes |
|
TitleArray | List<String> | An array of titles. A random title is selected for each video production. A maximum of 50 titles. Each title can be up to 50 characters long. | ["Title 1","Title 2"] | No |
|
SubHeadingArray | List<SubHeading> | Subtitle settings. | [{"Level":1,"TitleArray":["Level-1 subtitle 1","Level-1 subtitle 2"]},{"Level":3,"TitleArray":["Level-3 subtitle"]}] | No |
|
SpeechTextArray | List<String> |
| ["Narration content 1","Narration content 2"] | No |
|
StickerArray | List<Sticker> |
| [{"MediaId":"****9d46c8b4548681030f6e****","X":10,"Y":100,"Width":300,"Height":300,"Opacity":0.6}] | No |
|
BackgroundMusicArray | List<String> |
| ["****b4549d46c88681030f6e****","****549d46c88b4681030f6e****"] | No |
|
BackgroundImageArray | List<String> |
| ["****b4549d46c88681030f6e****","****549d46c88b4681030f6e****"] | No |
|
MediaGroup parameters
The differences in MediaGroup parameter configurations between global narration mode and grouped narration mode are indicated in the "Supported modes" column of the table.
Parameter | Type | Description | Example | Required | Supported modes |
GroupName | String | Group name. Up to 50 characters. Emojis are not supported. | Group1 | Yes |
|
MediaArray | List<String> |
| ****b4549d46c88681030f6e**** | Yes |
|
SpeechTextArray | List<String> |
| ["Narration content 1","Narration content 2"] | No |
|
Duration | Float | The duration of the current group in seconds. This parameter is valid only when SpeechTextArray is empty. | 10 | No. Default: 5. |
|
SplitMode | String |
| NoSplit | No. Default: AverageSplit. |
|
Volume | Float |
| 0.5 | No |
|
DurationAutoAdapt | Boolean | Specifies whether to enable automatic duration adaptation for the group. If enabled and there is no narration, the duration of the group is automatically adjusted to ensure video clips play at their original speed. | true | No. Default: false. |
|
Global narration mode - Parameter example
{
"MediaGroupArray": [
{
"GroupName": "UseMediaId",
"MediaArray": [
"****9d46c886b45481030f6e****",
"****c886810b4549d4630f6e****"
],
"SplitMode": "NoSplit"
},
{
"GroupName": "UseOssUrl",
"MediaArray": [
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png"
]
}
],
"TitleArray": [
"Hema Fresh in Huilongguan is now open",
"Hema Fresh is now open"
],
"SubHeadingArray": [
{
"Level": 1,
"TitleArray": ["Subtitle 1", "Subtitle 2"]
},
{
"Level": 3,
"TitleArray": ["Level-3 subtitle"]
}
],
"SpeechTextArray": [
"A new Hema Fresh store just opened in a nearby mall. Today is the grand opening, and I rushed over to join the fun. The store isn't very large, but the mall is crowded. Snacks and drinks are quite cheap, and the checkout lines are very long. Come and check it out!",
"A new Hema Fresh store just opened in a nearby mall. Today is the grand opening, so I came to join the excitement.",
"<speak>The battle <phoneme alphabet=\"py\" ph=\"zheng4 hao3\">is fierce</phoneme>. Today, our protagonist, table tennis legend Ma Long, is charging towards the pinnacle of glory. In the quarterfinals against the formidable Shunsuke Togami, Ma Long showed no fear, giving his all in every rally. His precise shots and calm judgment gave him the upper hand in this match. In the end, Ma Long successfully defeated his opponent and advanced to the semifinals.</speak>"
],
"StickerArray": [
{
"MediaId": "****9d46c8b4548681030f6e****",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300,
"Opacity": 0.6
},
{
"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
}
],
"BackgroundMusicArray": [
"****b4549d46c88681030f6e****",
"****549d46c88b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp3"
],
"BackgroundImageArray": [
"****6c886b4549d481030f6e****",
"****9d46c8548b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png"
]
}Grouped narration mode - Parameter example
{
"MediaGroupArray": [{
"GroupName": "start",
"MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].jpeg", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4"],
"Duration": 5,
"SplitMode": "NoSplit",
"Volume": 1
},
{
"GroupName": "group1",
"MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4"],
"SpeechTextArray": ["A new Hema Fresh store just opened in a nearby mall. Today is the grand opening.", "Today is the grand opening of this Hema Fresh store.", "<speak>The battle <phoneme alphabet=\"py\" ph=\"zheng4 hao3\">is fierce</phoneme>. Today, our protagonist, table tennis legend Ma Long, is charging towards the pinnacle of glory. In the quarterfinals against the formidable Shunsuke Togami, Ma Long showed no fear, giving his all in every rally. His precise shots and calm judgment gave him the upper hand in this match. In the end, Ma Long successfully defeated his opponent and advanced to the semifinals.</speak>"]
},
{
"GroupName": "group2",
"MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/0-test-batch-editing-materials/normal%20video.mp4", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].jpeg"],
"SpeechTextArray": ["The store isn't very large, but the mall is crowded. Snacks and drinks are quite cheap, and the checkout lines are very long.", "The scene is very lively, with crowds of people and a wide variety of goods."]
},
{
"GroupName": "group3",
"MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/0-test-batch-editing-materials/young_sunset_walk.mp4"],
"SpeechTextArray": ["Come and take a look!", "Hurry and come take a look!"]
},
{
"GroupName": "end",
"MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].jpg", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4"],
"Duration": 5
}
],
"TitleArray": [
"Hema Fresh in Huilongguan is now open",
"Hema Fresh is now open"
],
"StickerArray": [
{
"MediaId": "****9d46c8b4548681030f6e****",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300,
"Opacity": 0.6
},
{
"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
}
],
"SubHeadingArray": [
{
"Level": 1,
"TitleArray": ["Level-1 subtitle 1", "Level-1 subtitle 2"]
},
{
"Level": 3,
"TitleArray": ["Level-3 subtitle"]
}
],
"BackgroundMusicArray": [
"****b4549d46c88681030f6e****",
"****549d46c88b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp3"
],
"BackgroundImageArray": [
"****6c886b4549d481030f6e****",
"****9d46c8548b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png"
]
}EditingConfig parameters
You can configure EditingConfig to specify the volume, position, and other composition parameters for the clips. For parameter examples, see EditingConfig parameter examples.
Except for the following parameters, all other parameters support both global narration mode and grouped narration mode:
ProcessConfig.AlignmentMode takes effect only in global narration mode.
SpeechConfig.SpecialWordsConfig takes effect only in grouped narration mode.
Parameter | Type | Description | Example | Required |
JSON | Configurations for input video materials. | {"Volume":"1","MediaMetaDataArray":[{"Media":"****6c886b4549d481030f6e****","GroupName":"GroupA","TimeRangeList":[{"In":"0","Out":"1"},{"In":"2","Out":"3"}]}]} | No | |
JSON | Configurations for titles. You can configure caption parameters. | {"Alignment":"TopCenter","AdaptMode":"AutoWrap","Font":"Alibaba PuHuiTi 2.0 95 ExtraBold","SizeRequestType":"Nominal","Y":0.1} | No | |
SubHeadingConfig | JSON | Configurations for multi-level subtitles. You can configure caption parameters. JSON field description:
| {"1":{"Y":0.3,"FontSize":40},"3":{"Y":0.5,"FontSize":30}} | No |
JSON | Configurations for narration scripts. | For more information, see EditingConfig parameter examples | No | |
JSON | Configurations for background music. | {"Volume":0.2} | No | |
JSON | Configurations for background images. This field does not take effect if a background image is already configured in InputConfig. | {"SubType":"Blur","Radius":0.5} | No | |
JSON | Configurations for mixing and editing processing. | For more information, see EditingConfig parameter examples | No | |
FECanvas | JSON | Canvas configuration for frontend page preview. | {"Width": 1080,"Height": 1920} | No |
ProduceConfig | JSON | Configuration for standard video editing and production. For more information about the fields, see EditingProduceConfig. | {"AutoRegisterInputVodMedia":true,"OutputWebmTransparentChannel":true,"CoverConfig":{"StartTime":3.3},"AudioChannelCopy":"left","PipelineId":"****d54a97cff4108b555b01166d4****","MaxBitrate":5000,"KeepOriginMaxBitrate":false,"KeepOriginVideoMaxFps":false} | No |
ProcessConfig parameters
Parameter | Type | Description | Example | Required |
SingleShotDuration | Float | When a long video material is edited, it is automatically split. This parameter specifies the duration of a single shot after splitting, in seconds. | 5 | No. Default: 3. |
AllowVfxEffect | Boolean | Specifies whether to add special effects. | true | No. Default: false. |
VfxEffectProbability | Float | The probability of applying a special effect to each video clip. Value range: 0.0 to 1.0. Up to two decimal places are supported. | 0.6 | No. Default: 0.5. |
VfxFirstClipEffectList | List<String> |
| ["slightshow","starfieldshinee"] | No |
VfxNotFirstClipEffectList | List<String> |
| ["zoomslight","zoom"] | No |
AllowTransition | Boolean | Specifies whether to add transition effects. | true | No. Default: false. |
TransitionDuration | Float | The duration of the transition in seconds. If the transition duration is greater than (clip duration - 1), the transition effect for that clip will not be applied. | 0.5 | No. Default: 0.5 seconds. |
TransitionList | List<String> | A list of custom transition effects. When AllowTransition is set to true, a random transition effect from this list is selected for composition. For more information about the available transition effects, see the Transition Effect Library. If this parameter is empty, a random effect is selected from the following transition effects: "linearblur", "colordistance", "crosshatch", "dreamyzoom", or "doomscreentransition_up". | ["directional", "linearblur"] | No |
UseUniformTransition | Boolean | Specifies whether to use the same transition effect throughout a single produced video. | true | No. Default: true. |
AllowFilter | Boolean | Specifies whether to add custom filters. | false | No. Default: false. |
FilterList | List<String> | A list of custom filter effects. If `AllowFilter` is set to `true`, a filter is randomly selected from this list for composition. For the available filter effects, see Filter Effect Examples. If this parameter is empty, no filter effect is added. | ["m1", "m2"] | No |
AlignmentMode | String | The alignment mode for the video and narration script. This parameter takes effect only in global narration mode. Valid values:
| AutoSpeed | No. Default: AutoSpeed. |
ImageDuration | Float | The duration of image materials in seconds. | 2 | No. Default: 2. |
EditingConfig parameter example
{
"MediaConfig": {
"Volume": 0 // Mute the video materials by default
},
"TitleConfig": {
"Alignment": "TopCenter",
"AdaptMode": "AutoWrap",
"Font": "Alibaba PuHuiTi 2.0 95 ExtraBold",
"SizeRequestType": "Nominal",
"Y": 0.1, // Default Y-coordinate of the title when the output video is in portrait mode
"Y": 0.05, // Default Y-coordinate of the title when the output video is in landscape mode
"Y": 0.08 // Default Y-coordinate of the title when the output video is in square mode
},
"SubHeadingConfig": {
"1": {
"Y": 0.3,
"FontSize": 40
},
"3": {
"Y": 0.5,
"FontSize": 30
}
},
"SpeechConfig": {
"Volume": 1, // Use the original volume for the narration audio by default
"SpeechRate": 0,
"Voice": null,
"Style": null,
"CustomizedVoice": null, // The voice ID for voice cloning. If this field is specified, Voice and Style become invalid.
"AsrConfig": {
"Alignment": "TopCenter",
"AdaptMode": "AutoWrap",
"Font": "Alibaba PuHuiTi 2.0 65 Medium",
"SizeRequestType": "Nominal",
"Spacing": -1,
"Y": 0.8, // Default Y-coordinate of the captions when the output video is in portrait mode
"Y": 0.9, // Default Y-coordinate of the captions when the output video is in landscape mode
"Y": 0.85 // Default Y-coordinate of the captions when the output video is in square mode
},
"SpecialWordsConfig": [{
"Type": "Highlight",
"Style": {
"FontName": "KaiTi",
"FontSize": 80,
"FontColor": "20AEE9",
"OutlineColour": "2D20E9",
"Outline": 3,
"FontFace": {
"Bold": true,
"Underline": true
}
},
"WordsList": [
"ApsaraVideo",
"Intelligent Media Services",
"Smart video creation"
]
},
{
"Type": "Highlight",
"Style": {
"FontFace": {
"Italic": true
}
},
"WordsList": [
"product",
"take a look"
]
},
{
"Type": "Forbidden",
"WordsList": [
"pilipala",
"bilibala"
],
"SoundReplaceMode": "None"
}
]},
"BackgroundMusicConfig": {
"Volume": 0.2, // Use 20% of the original volume for the background music by default
"Style": null
},
"ProcessConfig": {
"SingleShotDuration": 3, // Duration of a shot after splitting
"AllowVfxEffect": false, // Specifies whether to add special effects
"AllowTransition": false, // Specifies whether to add transition effects
"AlignmentMode": "AutoSpeed" // This field is supported only in global narration mode
}
}TemplateConfig parameters
TemplateConfig is a common parameter used to set the template for the One-Click Video Creation feature. For detailed parameter descriptions and usage examples, see TemplateConfig parameters.
OutputConfig parameters
You can configure OutputConfig to specify production parameters such as the output address, naming rules, width and height, and the number of videos to produce.
The OutputConfig parameter configurations are the same for both global narration mode and grouped narration mode.
Parameter | Type | Description | Example | Required |
MediaURL | String | The output video address. It must contain the {index} placeholder. | Rule: http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4 Example: http://example.oss-cn-shanghai.aliyuncs.com/example/example_{index}.mp4 | Required when GeneratePreviewOnly is false and the output video is stored in OSS. |
StorageLocation | String | The storage address for the media asset file to be output to ApsaraVideo VOD (VOD). | Rule: [your-vod-bucket].oss-[your-region-id].aliyuncs.com Example: outin-****6c886b4549d481030f6e****.oss-cn-shanghai.aliyuncs.com | Required when GeneratePreviewOnly is false and the output video is stored in VOD. |
FileName | String | The name of the output file. It must contain the {index} placeholder. | Rule: [your-file-name]__{index}.mp4 Example: example_{index}.mp4 | Required when GeneratePreviewOnly is false and the output video is stored in VOD. |
GeneratePreviewOnly | Boolean |
| false | No. Default: false. |
Count | Integer | The number of videos to output. The maximum is 100. | 10 | No. Default: 1. |
MaxDuration | Float | The maximum duration of a single output video, in seconds.
| 20 | No. Default: 15. |
FixedDuration | Float | The fixed duration of a single output video, in seconds. If a fixed duration is set, the video duration will align with this parameter.
| 20 | No. Default: 15. |
Width | Integer | The width of the output video in pixels. | 1080 | Yes |
Height | Integer | The height of the output video in pixels. | 1920 | Yes |
JSON | Configurations for the output video stream, such as Crf and Codec. | {"Crf": 27} | No |
Parameter example
{
"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4",
"Count": 20,
"MaxDuration": 15,
"Width": 1080,
"Height": 1920,
"Video": {"Crf": 27},
"GeneratePreviewOnly":false
}Application examples
Example 1: Configure an opening and ending in grouped narration mode
Scenarios
This example applies to the scenario where you want to add a consistent intro and outro with a unified voiceover to a video. You can set the MediaGroup.SplitMode of the intro and outro groups to NoSplit. In this case, the system does not split the media clips in the intro and outro groups. Instead, it plays a randomly selected media clip from each group in its entirety to add a fixed intro and outro.
Example parameters
Example 2: Create a face montage video using script-based automatic video generation
If you are interested in the face collection scenario, see Best practices for creating face collection videos.
SDK call example
Prerequisites
You have installed the IMS server-side SDK. For more information, see Preparations.
Code example
The following example uses the global narration mode.
Details of API request parameters
Advanced configurations
For more information, see Batch one-click video remixing logic and advanced configuration.
FAQ
For frequently asked questions about Script-to-Video, see Script-to-Video FAQ.
How can I fix scene transitions in the final video that are too abrupt or too frequent?
How can I fix scene transitions that are too fast or slow and configure scene duration?
How can I calculate the display duration of image assets in the final video?
How can I ensure video clips play completely in the final video?
How can I alternate playback between original video segments and narration?
References
For more information about Script-to-Video, see SubmitBatchMediaProducingJob - Batch Intelligent One-Click Video Creation.
To retrieve a Script-to-Video job, see GetBatchMediaProducingJob - Retrieve batch Script-to-Video job information.
To create a face collection video using Script-to-Video, see Face Collection Video Creation Tutorial.
For advanced configurations, see Batch one-click montage logic and advanced configuration.