This topic describes the production parameters, advanced configurations, and SDK examples for the Script-to-Video feature.
Both Script-to-Video and Image-Text Matching use the SubmitBatchMediaProducingJob API to submit a task. To differentiate between them based on parameters, see Parameter differences.
In this API, the region specified in the OSS URL of all media assets must be the same as the OpenAPI service endpoint.
Supported regions: China (Shanghai), China (Beijing), China (Hangzhou), China (Shenzhen), US (Silicon Valley), and Singapore.
In practice, replace all placeholders in the examples, such as [your-bucket], [your-region-id], [your-file-name], [your-file-path], and media asset IDs ("****9d46c8b4548681030f6e****"), with your actual values.
For a better understanding of this document, first reading the Batch video production guide to familiarize yourself with the concepts and workflow of Script-to-Video.
Script-to-Video supports two production modes: Global Scripts and Segmented Scripts.
Global Scripts: Randomly combines multiple complete voiceover scripts with video assets to generate a large number of videos with a similar style.
Segmented Scripts: Breaks a voiceover script into multiple segments and matches each segment to a specific group of assets.
The mode is determined by the following parameter logic:
If SpeechTextArray is not empty, it is considered Global Scripts mode.
If SpeechTextArray is empty, and at least one MediaGroup.Duration or MediaGroup.SpeechTextArray in the MediaGroupArray is not empty, it is considered Segmented Scripts mode.
If SpeechTextArray is empty, and all MediaGroup.Duration and MediaGroup.SpeechTextArray values in the MediaGroupArray are empty, it is considered Global Scripts mode.
Usage notes
To submit a batch video production job that intelligently mixes multiple video, audio, and image assets, see SubmitBatchMediaProducingJob. Key API parameters are detailed in the
InputConfig,EditingConfig, andOutputConfigsections below.To get detailed information about a batch video creation job, see GetBatchMediaProducingJob.
InputConfig
Configure the InputConfig to specify parameters for basic assets such as video clips, voiceovers, background music, and stickers.
Parameter | Type | Description | Example | Required | Supported modes |
MediaGroupArray | List<MediaGroup> | Specify source assets. Supports grouping assets. Group name: Up to 50 characters. Emojis are not supported. Material list: Media asset ID or OSS URL of the material. Supports a maximum of 40 groups, each containing up to 200 materials. | Yes |
| |
TitleArray | List<String> | An array of titles. One title is randomly selected for each production. Max 50 titles, each up to 50 characters long. | ["Title 1","Title 2"] | No |
|
SubHeadingArray | List<SubHeading> | Multi-level subheading settings. | [{"Level":1,"TitleArray":["Level 1 subtitle 1","Level 1 subtitle 2"]},{"Level":3,"TitleArray":["Level 3 subtitle"]}] | No |
|
SpeechTextArray | List<String> |
| ["Voiceover content 1","Voiceover content 2"] | No |
|
StickerArray | List<Sticker> |
| [{"MediaId":"****9d46c8b4548681030f6e****","X":10,"Y":100,"Width":300,"Height":300,"Opacity":0.6}] | No |
|
BackgroundMusicArray | List<String> |
| ["****b4549d46c88681030f6e****","****549d46c88b4681030f6e****"] | No |
|
BackgroundImageArray | List<String> |
| ["****b4549d46c88681030f6e****","****549d46c88b4681030f6e****"] | No |
|
MediaGroup
The differences in MediaGroup parameter configurations between Global Scripts mode and Segmented Scripts mode are indicated in the Supported modes column.
Parameter | Type | Description | Example | Required | Supported modes |
GroupName | String | The name of the group. Max 50 characters, no emojis. | Group1 | Yes |
|
MediaArray | List<String> |
| ****b4549d46c88681030f6e**** | Yes |
|
SpeechTextArray | List<String> |
| ["Voiceover content 1","Voiceover content 2"] | No |
|
Duration | Float | The duration for the current group, in seconds. Use only when | 10 | No. Default: 5. |
|
SplitMode | String |
| NoSplit | No. Default: AverageSplit. |
|
Volume | Float |
| 0.5 | No |
|
DurationAutoAdapt | Boolean | Whether to enable duration auto-adaptation for this group. If enabled and no voiceover is present, the group's duration will be adjusted to ensure video clips play at their original speed. | true | No. Default: false. |
|
Example: Global Scripts mode
{
"MediaGroupArray": [
{
"GroupName": "UseMediaId",
"MediaArray": [
"****9d46c886b45481030f6e****",
"****c886810b4549d4630f6e****"
],
"SplitMode": "NoSplit"
},
{
"GroupName": "UseOssUrl",
"MediaArray": [
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png"
]
}
],
"TitleArray": [
"Freshippo opens a new location in Huilongguan",
"A new Freshippo store opens"
],
"SubHeadingArray": [
{
"Level": 1,
"TitleArray": ["Subtitle 1", "Subtitle 2"]
},
{
"Level": 3,
"TitleArray": ["Level 3 subtitle"]
}
],
"SpeechTextArray": [
"A new Freshippo store just opened in the nearby mall. It's the grand opening today, so I rushed over to check it out. The store isn't huge, but it's packed with people. Snacks and drinks are pretty cheap, and the checkout lines are super long. Come and see for yourself!",
"A new Freshippo store just opened in the nearby mall. It's the grand opening today, so I rushed over to check it out.",
"<speak>Today, our hero, table tennis legend <phoneme alphabet="ipa" ph="mɑː lʊŋ">Ma Long</phoneme>, is striving for the pinnacle of glory.</speak>"
],
"StickerArray": [
{
"MediaId": "****9d46c8b4548681030f6e****",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300,
"Opacity": 0.6
},
{
"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
}
],
"BackgroundMusicArray": [
"****b4549d46c88681030f6e****",
"****549d46c88b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp3"
],
"BackgroundImageArray": [
"****6c886b4549d481030f6e****",
"****9d46c8548b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png"
]
}Example: Segmented Scripts
{
"MediaGroupArray": [{
"GroupName": "start",
"MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].jpeg", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4"],
"Duration": 5,
"SplitMode": "NoSplit",
"Volume": 1
},
{
"GroupName": "group1",
"MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4"],
"SpeechTextArray": ["A new Freshippo store just opened in the nearby mall.", "It's the grand opening today.", "<speak>Today, our hero, table tennis legend <phoneme alphabet="ipa" ph="mɑː lʊŋ">Ma Long</phoneme>, is striving for the pinnacle of glory.</speak>"]
},
{
"GroupName": "group2",
"MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/0-test-batch-editing-materials/normal%20video.mp4", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].jpeg"],
"SpeechTextArray": ["The store isn't huge, but it's packed with people. Snacks and drinks are pretty cheap, and the checkout lines are super long.", "The scene is very lively, with crowds of people and a wide variety of goods."]
},
{
"GroupName": "group3",
"MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/0-test-batch-editing-materials/young_sunset_walk.mp4"],
"SpeechTextArray": ["Come and see for yourself!", "Hurry and come take a look!"]
},
{
"GroupName": "end",
"MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].jpg", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4"],
"Duration": 5
}
],
"TitleArray": [
"Freshippo opens a new location in Huilongguan",
"A new Freshippo store opens"
],
"StickerArray": [
{
"MediaId": "****9d46c8b4548681030f6e****",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300,
"Opacity": 0.6
},
{
"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
}
],
"SubHeadingArray": [
{
"Level": 1,
"TitleArray": ["Level 1 subtitle 1", "Level 1 subtitle 2"]
},
{
"Level": 3,
"TitleArray": ["Level 3 subtitle"]
}
],
"BackgroundMusicArray": [
"****b4549d46c88681030f6e****",
"****549d46c88b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp3"
],
"BackgroundImageArray": [
"****6c886b4549d481030f6e****",
"****9d46c8548b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png"
]
}EditingConfig
Configure EditingConfig to specify parameters for volume, positioning, and other production settings.
Except for the following parameters, all other parameters support both Global Scripts mode and Segmented Scripts mode:
ProcessConfig.AlignmentMode takes effect only in Global Scripts mode.
SpeechConfig.SpecialWordsConfig takes effect only in Segmented Scripts mode.
Parameter | Type | Description | Example | Required |
JSON | Configuration for input video assets. | {"Volume":"1","MediaMetaDataArray":[{"Media":"****6c886b4549d481030f6e****","GroupName":"GroupA","TimeRangeList":[{"In":"0","Out":"1"},{"In":"2","Out":"3"}]}]} | No | |
JSON | Configuration for titles. | {"Alignment":"TopCenter","AdaptMode":"AutoWrap","Font":"Alibaba PuHuiTi 2.0 95 ExtraBold","SizeRequestType":"Nominal","Y":0.1} | No | |
SubHeadingConfig | JSON | Configuration for multi-level subtitles. JSON fields:
| {"1":{"Y":0.3,"FontSize":40},"3":{"Y":0.5,"FontSize":30}} | No |
JSON | Configuration for the voiceover. | No | ||
JSON | Configuration for background music. | {"Volume":0.2} | No | |
JSON | Configuration for the background image. This field has no effect if a background image is already configured in InputConfig. | {"SubType":"Blur","Radius":0.5} | No | |
JSON | Configuration for the mixing and editing process. | No | ||
JSON | Canvas configuration for front-end preview. | {"Width": 1080,"Height": 1920} | No | |
ProduceConfig | JSON | Standard editing and production configuration. For fields, see EditingProduceConfig. | {"AutoRegisterInputVodMedia":true,"OutputWebmTransparentChannel":true,"CoverConfig":{"StartTime":3.3},"AudioChannelCopy":"left","PipelineId":"****d54a97cff4108b555b01166d4****","MaxBitrate":5000,"KeepOriginMaxBitrate":false,"KeepOriginVideoMaxFps":false} | No |
ProcessConfig
Parameter | Type | Description | Example | Required |
SingleShotDuration | Float | When editing long video assets, they are automatically segmented. This parameter sets the duration of each segmented shot, in seconds. | 5 | No. Default: 3. |
AllowVfxEffect | Boolean | Whether to allow adding special effects. | true | No. Default: false. |
VfxEffectProbability | Float | The probability that an effect will be applied to each video clip. Range: 0.0 to 1.0. Supports 2 decimal places. | 0.6 | No. Default: 0.5. |
VfxFirstClipEffectList | List<String> |
| ["slightshow","starfieldshinee"] | No |
VfxNotFirstClipEffectList | List<String> |
| ["zoomslight","zoom"] | No |
AllowTransition | Boolean | Whether to allow adding transition effects. | true | No. Default: false. |
TransitionDuration | Float | Duration of transitions in seconds. If | 0.5 | No. Default: 0.5. |
TransitionList | List<String> | A list of custom transitions. If | ["directional", "linearblur"] | No |
UseUniformTransition | Boolean | Whether to use the same transition throughout a single video. | true | No. Default: true. |
AllowFilter | Boolean | Whether to allow adding custom filters | false | No. Default: false. |
FilterList | List<String> | A list of custom filters. If | ["m1", "m2"] | No |
AlignmentMode | String | The alignment mode for video and voiceover. Effective only in Global Scripts mode. Valid values:
| AutoSpeed | No. Default: AutoSpeed. |
ImageDuration | Float | The duration for static image assets, in seconds. | 2 | No. Default: 2. |
Parameter example
{
"MediaConfig": {
"Volume": 0 // Input video assets are muted by default
},
"TitleConfig": {
"Alignment": "TopCenter",
"AdaptMode": "AutoWrap",
"Font": "Alibaba PuHuiTi 2.0 95 ExtraBold",
"SizeRequestType": "Nominal",
"Y": 0.1, // Y-coordinate for portrait video
"Y": 0.05, // Y-coordinate for landscape video
"Y": 0.08 // Y-coordinate for square video
},
"SubHeadingConfig": {
"1": {
"Y": 0.3,
"FontSize": 40
},
"3": {
"Y": 0.5,
"FontSize": 30
}
},
"SpeechConfig": {
"Volume": 1, // Voiceover uses original volume by default
"SpeechRate": 0,
"Voice": null,
"Style": null,
"CustomizedVoice": null, // Voice ID. If set, Voice and Style are ignored.
"AsrConfig": {
"Alignment": "TopCenter",
"AdaptMode": "AutoWrap",
"Font": "Alibaba PuHuiTi 2.0 65 Medium",
"SizeRequestType": "Nominal",
"Spacing": -1,
"Y": 0.8, // Subtitle Y-coordinate for portrait video
"Y": 0.9, // Subtitle Y-coordinate for landscape video
"Y": 0.85 // Subtitle Y-coordinate for square video
},
"SpecialWordsConfig": [{
"Type": "Highlight",
"Style": {
"FontName": "KaiTi",
"FontSize": 80,
"FontColor": "20AEE9",
"OutlineColour": "2D20E9",
"Outline": 3,
"FontFace": {
"Bold": true,
"Underline": true
}
},
"WordsList": [
"ApsaraVideo",
"Intelligent Media Services",
"Batch video creation"
]
},
{
"Type": "Highlight",
"Style": {
"FontFace": {
"Italic": true
}
},
"WordsList": [
"product",
"take a look"
]
},
{
"Type": "Forbidden",
"WordsList": [
"pilipala",
"bilibala"
],
"SoundReplaceMode": "None"
}
]},
"BackgroundMusicConfig": {
"Volume": 0.2, // Background music at 20% volume by default
"Style": null
},
"ProcessConfig": {
"SingleShotDuration": 3, // Duration of a shot after splitting
"AllowVfxEffect": false, // Specifies whether to add special effects
"AllowTransition": false, // Specifies whether to add transition effects
"AlignmentMode": "AutoSpeed" // This field is supported only in Global Scripts mode
}
}TemplateConfig
TemplateConfig contains common parameters for batch video production. For detailed parameters and examples, see TemplateConfig.
OutputConfig parameters
Configure OutputConfig to specify the output destination, naming conventions, resolution, and number of videos to produce.
The parameters are the same for both generation modes.
Parameter | Type | Description | Example | Required |
MediaURL | String | The output video URL, which must include the | Rule: http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4 Example: http://example.oss-cn-shanghai.aliyuncs.com/example/example_{index}.mp4 | Required if GeneratePreviewOnly is false and output is to OSS. |
StorageLocation | String | The storage location for media assets output to ApsaraVideo VOD. | Rule: [your-vod-bucket].oss-[your-region-id].aliyuncs.com Example: outin-****6c886b4549d481030f6e****.oss-cn-shanghai.aliyuncs.com | Required if GeneratePreviewOnly is false and output is to VOD. |
FileName | String | The output file name, which must include the | Rule: [your-file-name]__{index}.mp4 Example: example_{index}.mp4 | Required if GeneratePreviewOnly is false and output is to VOD. |
GeneratePreviewOnly | Boolean |
| false | No. Default: false. |
Count | Integer | The number of videos to output. The maximum is 100. | 10 | No. Default: 1. |
MaxDuration | Float | The maximum duration for each output video, in seconds.
| 20 | No. Default: 15. |
FixedDuration | Float | The fixed duration for each output video. If set, the video duration will be adjusted to match this value.
| 20 | No. Default: 15. |
Width | Integer | The width of the output video in pixels. | 1080 | Yes |
Height | Integer | The height of the output video in pixels. | 1920 | Yes |
JSON | Configuration for the output video stream, such as CRF and codec. | {"Crf": 27} | No |
Parameter example
{
"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4",
"Count": 20,
"MaxDuration": 15,
"Width": 1080,
"Height": 1920,
"Video": {"Crf": 27},
"GeneratePreviewOnly":false
}Application
Example 1: Configure an intro and outro with Segmented Scripts mode
Use case
This example shows how to add a consistent intro and outro to your videos. By setting MediaGroup.SplitMode to NoSplit for the first and last groups, the system will play a randomly selected asset from those groups in its entirety.
Sample code
Example 2: Create a face montage video
SDK example
Prerequisites
You have installed the IMS server SDK. For more information, see Preparations.
Code example
This example uses the Global Scripts mode.
API input parameters
Advanced configurations
For advanced settings, see Editing logic and advanced configurations.
FAQ
For frequently asked questions about Script-to-Video, see FAQ.
How can I fix issues with jarring or frequent scene changes?
How can I control the pacing of scene changes and configure shot durations?
How is the display duration of an image calculated in the final video?
How can I ensure a video clip plays in its entirety in the final video?
How can I alternate between video clips with original audio and clips with voice-over narration?
References
SubmitBatchMediaProducingJob: submits a batch video production job.
GetBatchMediaProducingJob: retrieves details of a batch video production job.