This document introduces the synthesis parameters, advanced configurations, and SDK call examples for the smart image-text matching - movie highlights scene.
Script-based automatic video production and smart image-text matching video production share the same task submission API. For information about how to distinguish between these two methods through parameters, see Parameter difference description
Note: In this interface, the region in all media asset OSS URLs must be consistent with the region in the OpenAPI endpoint.
Supported regions: China (Shanghai), China (Beijing), China (Hangzhou), China (Shenzhen), US (West), Singapore.
In actual use, replace all parameter examples in the document such as [your-bucket], [your-region-id], [your-file-name], [your-file-path], and media asset IDs (for example, "****9d46c8b4548681030f6e****") with your actual values.
For better understanding of this document, we recommend that you first learn about the concepts and usage process of [Smart image-text matching video production - Movie highlights scene] through Smart one-click video production operation guide.
Smart image-text matching - Movie highlights scene includes two video production modes. This document will explain the following "modes" in detail:
Broadcast mode
Storyboard script
Usage instructions
For the interface description of intelligently mixing and editing multiple videos, audio, and image materials to batch produce videos, see SubmitBatchMediaProducingJob - Batch smart one-click video production. For key API parameters, see InputConfig parameter description, EditingConfig parameter description, and OutputConfig parameter description below.
To obtain detailed information about batch smart one-click video production jobs, see GetBatchMediaProducingJob - Get batch smart one-click video production task information.
Inputconfig parameter description
You can configure InputConfig to specify parameter configurations for basic materials such as video materials, voiceover, background music, and stickers.
Parameter | Type | Description | Example value | Required | Supported modes |
MediaArray | List<String> | Smart image-text matching mode. Supports passing a list of media asset IDs or material OSS URLs. The total video duration is limited to a maximum of two hours. | ["****b4549d46c88681030f6e****","****549d46c88b4681030f6e****"] | Yes | All |
TitleArray | List<String> | Title array, one is randomly selected for each synthesis. Maximum of 50 titles, each title not exceeding 50 characters. | ["Title 1","Title 2"] | No | All |
SubHeadingArray | List<SubHeading> | Subtitle settings | [{"Level":1,"TitleArray":["Level 1 subtitle 1","Level 1 subtitle 2"]},{"Level":3,"TitleArray":["Level 3 subtitle"]}] | No | All |
SpeechTextArray | List<String> |
| ["Voiceover content 1","Voiceover content 2"] | No |
|
SceneInfo | Scene information, used for scene-related parameters. | See Parameter example: Broadcast mode, Parameter example: Storyboard script | Yes |
| |
StickerArray | List<Sticker> | Sticker array, one is randomly selected for each synthesis. Maximum of 50 stickers. | [{"MediaId":"****9d46c8b4548681030f6e****","X":10,"Y":100,"Width":300,"Height":300,"Opacity":0.6}] | No | All |
BackgroundMusicArray | List<String> | Background music array, one is randomly selected for each synthesis. Maximum of 50 entries, supports media asset ID or OSS URL. | ["****b4549d46c88681030f6e****","****549d46c88b4681030f6e****"] | No | All |
BackgroundImageArray | List<String> | Background image array, one is randomly selected for each synthesis. Maximum of 50 entries, supports media asset ID or OSS URL. | ["****b4549d46c88681030f6e****","****549d46c88b4681030f6e****"] | No | All |
Sceneinfo parameter description
Parameter | Type | Description | Required | Supported modes |
Scene | String | Matching scene type. For movie highlights scene, pass the static value "MovieHighlights". | Yes |
|
ShotInfo | Set storyboard script | Yes |
| |
FaceInfo | Set face information | No | All |
ShotInfo parameter description
This parameter is only applicable to storyboard script mode. If you are using broadcast mode, you do not need to set this parameter.
Parameter | Type | Description | Required |
ShotScripts | List<ShotScript> | Storyboard script array | Yes |
Shotscript parameter description
This parameter is only applicable to storyboard script mode. If you are using broadcast mode, you do not need to set this parameter.
In movie highlights scene - storyboard script mode, there are two storyboard modes: "Text description mode" and "Manual parsing mode". When setting parameters, you need to choose one of these two modes.
Parameter | Type | Description | Example value | Required | Storyboard mode |
ScriptText | String | Script text for a single shot, used to describe the content of the shot. | He has been developing a new magic potion recently. | No | Text description mode |
SpeechText | String |
| The old wizard Danny is tinkering with strange instruments. He has been developing a new magic potion recently. | No | All |
Duration | Float | Duration of the shot, only effective when there is no voiceover. If voiceover exists, the shot duration is automatically calculated based on the voiceover duration. | 5 | No | |
Descriptions | List<String> | Detailed descriptions for a single shot. | ["London streets under thick fog, with flowing traffic and people moving quickly"] | No | Manual parsing mode |
Characters | List<String> | Character (face) names in a single shot. Note: Character names must match the ImageInfo.Name in FaceInfo.ImageInfoList. | ["Daniel"] | No | |
Settings | List<String> | Scene descriptions for a single shot, for example:. | ["On London streets","Under a street lamp"] | No | |
Volume | Float |
| 0.5 | No | All |
Faceinfo parameter description
Parameter | Type | Description | Required |
ImageInfoList | List<ImageInfo> | Character (face) photo list, with a list length limit of 200. | No |
Imageinfo parameter description
Parameter | Type | Description | Example value | Required |
Name | String | Character (face) name | Daniel | Yes |
ImageURL | String | Character (face) photo storage address, must be a publicly accessible URL link. Please ensure that the face image contains only one individual, and the face should be clearly visible without obvious occlusion or missing parts. | http://[your-cdn-domain]/[your-file-path]/face1.png | Yes, one of the two is required |
ImageId | String | Image media asset ID | ****9d46c886b45481030f6e**** |
Parameter example: broadcast mode
{
"MediaArray": [
"****9d46c886b45481030f6e****",
"****c886810b4549d4630f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test1.mp4",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test2.png"
],
"SceneInfo": {
"Scene": "MovieHighlights", //MovieHighlights movie matching
"FaceInfo": {
"ImageInfoList": [
{
"Name": "Character A",
"ImageURL": "https://bkimg.cdn.bcebos.com/pic/3853ad1bdd9f70558718bf38?x-bce-process=image/format,f_auto/watermark,image_d2F0ZXIvYmFpa2UyNzI,g_7,xp_5,yp_5,P_20/resize,m_lfit,limit_1,h_1080"
},
{
"Name": "Character B",
"ImageURL": "https://bkimg.cdn.bcebos.com/pic/622762d0f703918ffbedc1125b3d269759eec42e?x-bce-process=image/format,f_auto/watermark,image_d2F0ZXIvYmFpa2UyNzI,g_7,xp_5,yp_5,P_20/resize,m_lfit,limit_1,h_1080"
},
{
"Name": "Character C",
"ImageId": "****b681034549d46c880f6e****"
}
]
}
},
"TitleArray": [
"Huilongguan Hema Fresh Opening",
"Hema Fresh Opening"
],
"SubHeadingArray": [
{
"Level": 1,
"TitleArray": ["Subtitle 1", "Subtitle 2"]
},
{
"Level": 3,
"TitleArray": ["Level 3 subtitle"]
}
],
"SpeechTextArray": [
"A new Hema Fresh store has opened in a nearby mall. Today is the first day of opening, so hurry up and join the excitement. This Hema store is not large, but there are many people in the mall. Snacks and beverages are relatively cheap, and people are queuing in long lines. Come and see for yourself!",
"A new Hema Fresh store has opened in a nearby mall. Today is the first day of opening, so hurry up and join the excitement"
],
"Sticker": {
"MediaId": "****b681034549d46c880f6e****",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
},
"StickerArray": [
{
"MediaId": "****9d46c8b4548681030f6e****",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300,
"Opacity": 0.6
},
{
"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/test3.png",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
}
],
"BackgroundMusicArray": [
"****b4549d46c88681030f6e****",
"****549d46c88b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test4.mp3"
],
"BackgroundImageArray": [
"****6c886b4549d481030f6e****",
"****9d46c8548b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test1.png"
]
}
Parameter example: Storyboard script
{
"MediaArray": [
"****9d46c886b45481030f6e****",
"****c886810b4549d4630f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test1.mp4",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test2.png"
],
"SceneInfo": {
"Scene": "MovieHighlights", // MovieHighlights movie matching
"ShotInfo": {
"ShotScripts": [
{
// For each shot, choose either manual parsing mode or text description mode
// Text description mode, start
"ScriptText": "This is the script text for the first scene",
// Text description mode, end
"SpeechText": "This is the voiceover script for the first scene"
},
{
// For each shot, choose either manual parsing mode or text description mode
// Manual parsing mode, start
"Descriptions": ["Character C enters the room", "Camera slowly pulls back to the view outside the window"],
"Characters": ["Character C"],
"Settings": ["Modern office", "Skyscraper"],
// Manual parsing mode end
"SpeechText": "This is the voiceover script for the second scene"
},
{
// Text description mode start
"ScriptText": "This is the script text for the third scene. When there is no voiceover script, the shot supports configuring Duration to control the shot duration.",
// Text description mode, end
"Duration": 8.0, // Supported when there is no voiceover script
"Volume": 1.0 //Set the original sound volume of video material
}
]
},
"FaceInfo": {
"ImageInfoList": [
{
"Name": "Character A",
"ImageURL": "https://bkimg.cdn.bcebos.com/pic/3853ad1bdd9f70558718bf38?x-bce-process=image/format,f_auto/watermark,image_d2F0ZXIvYmFpa2UyNzI,g_7,xp_5,yp_5,P_20/resize,m_lfit,limit_1,h_1080"
},
{
"Name": "Character B",
"ImageURL": "https://bkimg.cdn.bcebos.com/pic/622762d0f703918ffbedc1125b3d269759eec42e?x-bce-process=image/format,f_auto/watermark,image_d2F0ZXIvYmFpa2UyNzI,g_7,xp_5,yp_5,P_20/resize,m_lfit,limit_1,h_1080"
},
{
"Name": "Character C",
"ImageId": "****b681034549d46c880f6e****"
}
]
}
},
"TitleArray": [
"Huilongguan Hema Fresh Opening",
"Hema Fresh Opening"
],
"StickerArray": [
{
"MediaId": "****9d46c8b4548681030f6e****",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
},
{
"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/test3.png",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300,
"Opacity": 0.6
}
],
"BackgroundMusicArray": [
"****b4549d46c88681030f6e****",
"****549d46c88b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test4.mp3"
],
"BackgroundImageArray": [
"****6c886b4549d481030f6e****",
"****9d46c8548b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test1.png"
]
}
EditingConfig parameter description
You can configure EditingConfig to specify parameters such as volume, position, and other synthesis parameters for the output video materials. If you have no special requirements, we recommend using the default configuration. This field can be left empty.
The parameter descriptions for "Broadcast mode" and "Storyboard script" in the movie highlights scene are the same.
Parameter | Type | Description | Example value | Required |
JSON | Input video material related configuration. | No | ||
JSON | Title-related configuration. Supports configuring caption parameters. For field details, see: Banner text. | No | ||
SubHeadingConfig | JSON | Multi-level subtitle related configuration. Supports setting caption parameters. JSON field description:
| No | |
JSON | Voiceover script related configuration. | No | ||
JSON | Background music related configuration. | {"Volume":0.2} | No | |
JSON | Background image related configuration. | {"SubType":"Blur","Radius":0.5} | No | |
Mix editing processing configuration. | No | |||
FECanvas | JSON | Canvas configuration for frontend page preview. | {"Width": 1080,"Height": 1920} | No |
ProduceConfig | JSON | Standard editing synthesis configuration. For field details, see: EditingProduceConfig | {"AutoRegisterInputVodMedia":true,"OutputWebmTransparentChannel":true,"CoverConfig":{"StartTime":3.3},"AudioChannelCopy":"left","PipelineId":"***d54a97cff4108b555b01166d4b***","MaxBitrate":5000,"KeepOriginMaxBitrate":false,"KeepOriginVideoMaxFps":false} | No |
Processconfig parameter description
Parameter | Type | Description | Example value | Required |
AllowVfxEffect | Boolean | Whether to allow adding special effects. | true | No, default is false |
VfxEffectProbability | Float | Probability of applying effects to each video clip. Value range: 0.0 - 1.0, supports 2 decimal places. | 0.6 | No, default is 0.5 |
AllowTransition | Boolean | Whether to allow adding transition effects. | true | No, default is false |
TransitionDuration | Float | Transition duration in seconds. If the transition duration > clip duration - 1, the transition effect on that clip will not take effect. | 0.5 | No, default is 0.5 seconds |
TransitionList | List<String> | Custom transition effect list. When AllowTransition=true, a transition effect is randomly selected from the list for synthesis. For the available range of transition effects, see Transition effect library. If this parameter is null, effects will be randomly selected from the following: "linearblur", "colordistance", "crosshatch", "dreamyzoom", "doomscreentransition_up" | ["directional", "linearblur"] | No |
UseUniformTransition | Boolean | Whether to use consistent transition effects in a single output video. | true | No, default is true |
AllowDuplicateMatch | Boolean | Indicates whether matched clips can be reused. | false | No, default is false |
EnableClipDetection: | Boolean | Whether to perform shot detection on materials. Only configurable for movie highlights. Supports automatic shot division and recognition of shot types (opening/ending, advertisements, black screens, etc. will not be included in the output video). | true | No, default is true |
EnableTemporalOpt | Boolean | Whether to perform temporal optimization on matching results. Only configurable for movie highlights. When the input shot information or commentary script basically maintains the same order as the source content, it is recommended to set this to true. | false | No, default is false |
EditingConfig parameter example
All parameters in EditingConfig are optional. The following is the default configuration.
{
"MediaConfig": {
"Volume": 0 // Default video material is muted
},
"TitleConfig": {
"Alignment": "TopCenter",
"AdaptMode": "AutoWrap",
"Font": "Alibaba PuHuiTi 2.0 95 ExtraBold",
"SizeRequestType": "Nominal",
"Y": 0.1, // Y-coordinate value of the title when the output is in portrait mode
"Y": 0.05, // Y-coordinate value of the title when the output is in landscape mode
"Y": 0.08 // Y-coordinate value of the title when the output is in square mode
},
"SpeechConfig": {
"Volume": 1, // Default voiceover audio uses original volume
"SpeechRate": 0,
"Voice": null,
"Style": null,
"CustomizedVoice": null, // Voice clone voiceId. If this field is filled, Voice and Style will be ineffective.
"AsrConfig": {
"Alignment": "TopCenter",
"AdaptMode": "AutoWrap",
"Font": "Alibaba PuHuiTi 2.0 65 Medium",
"SizeRequestType": "Nominal",
"Spacing": -1,
"Y": 0.8, // Y-coordinate value of the caption when the output is in portrait mode
"Y": 0.9, // Y-coordinate value of the caption when the output is in landscape mode
"Y": 0.85 // Y-coordinate value of the caption when the output is in square mode
}
},
"SubHeadingConfig": {
"1": {
"Y": 0.3,
"FontSize": 40
},
"3": {
"Y": 0.5,
"FontSize": 30
}
},
"BackgroundMusicConfig": {
"Volume": 0.2, // Background music defaults to 20% volume,
"Style": null
},
"ProcessConfig": {
"AllowVfxEffect": false, // Whether to add special effects
"AllowTransition": false, // Whether to add transition effects
"AllowDuplicateMatch": false, // In image-text matching mode, whether matched clips can be reused
"EnableClipDetection": true, // Whether to perform shot detection
"EnableTemporalOpt": true // Whether to perform temporal optimization
}
}
TemplateConfig parameter description
TemplateConfig is a common parameter for one-click video production, used to set one-click video production templates. For detailed parameter descriptions and usage examples, see TemplateConfig parameter description
OutputConfig parameter description
You can configure OutputConfig to specify parameters such as output address, naming rules, output width and height, number of output videos, and other synthesis parameters.
Parameter | Type | Description | Example value | Required |
MediaURL | String | Output video address, must contain the placeholder {index}. | Rule: http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4 Example: http://example.oss-[your-region-id].aliyuncs.com/example/example_{index}.mp4 | Required when GeneratePreviewOnly=true and the output video is to OSS |
StorageLocation | String | Specifies the storage location of media files output to VOD. | Rule: [your-vod-bucket].oss-[your-region-id].aliyuncs.com Example: outin-****6c886b4549d481030f6e****.oss-[your-region-id].aliyuncs.com | Required when GeneratePreviewOnly=true and the output video is to VOD |
FileName | String | Output file name, must contain the placeholder {index}. | Rule: [your-file-name]__{index}.mp4 Example: example_{index}.mp4 | Required when GeneratePreviewOnly=true and the output video is to VOD |
GeneratePreviewOnly | Boolean |
| false | No, default is false |
Count | Integer | Number of output videos
| 1 | No, default is 1 |
Width | Integer | Output width, in px | 1080 | Yes |
Height | Integer | Output height, in px | 1920 | Yes |
JSONObject | Output video stream related configuration, such as Crf, Codec, etc. | {"Crf": 27} | No |
Parameter example
{
"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4",
"Count": 1,
"Width": 1080,
"Height": 1920,
"Video": {"Crf": 27},
"GeneratePreviewOnly":false
}
Sdk call examples
Prerequisites
You have installed the IMS server SDK. For more information, see Preparations.
Code example
Using broadcast mode as an example
API call parameter details
Advanced configuration
For advanced configuration details, see Batch one-click video production mixing logic and advanced configuration
FAQ
For common questions about script-based automatic video production, see Movie highlights FAQ:
What is the difference between [Broadcast mode] and [Storyboard script]?
How to correctly set face information?