This topic describes the parameters for producing videos in intelligent image-text matching mode for common scenarios. This topic also provides advanced configurations and examples of SDK calls.
Before you read this topic, we recommend that you read Use the intelligent and quick video production feature to learn the terms and procedure of producing videos in intelligent image-text matching mode for common scenarios.
Intelligent image-text matching mode for common scenarios supports two video production modes. This topic describes the parameters in the following video production modes:
Global broadcast mode
Storyboard script mode
The following regions are supported: China (Shanghai), China (Beijing), China (Hangzhou), and China (Shenzhen).
Usage notes
For information about how to produce videos by combining multiple video, audio, and image materials in an intelligent and quick manner, see SubmitBatchMediaProducingJob. For information about key parameters for the SubmitBatchMediaProducingJob operation, see the "InputConfig parameters", "EditingConfig parameters", and "OutputConfig parameters" sections of this topic.
For information about how to query the details of an intelligent and quick batch video production job, see GetBatchMediaProducingJob.
InputConfig parameters
You can configure InputConfig parameters to specify basic materials, such as video clips, voice-over scripts, background music, and stickers.
Parameter | Type | Description | Required | Supported mode |
MediaArray | List<String> | Specify editing materials by uploading media assets. You can specify the IDs or Object Storage Service (OSS) URLs of the materials that you want to use. The total length of the video materials can be up to 2 hours. | You must configure at least one of the MediaArray and MediaSearchInput parameters. | All modes |
MediaSearchInput | Intelligently search for matching materials by specifying a search library and theme description texts. | All modes | ||
TitleArray | List<String> | An array of titles. A title is randomly selected each time the system produces a video. You can specify at most 50 titles. Each title can be up to 50 characters in length. | No | All modes |
SpeechTextArray | List<String> | An array of voice-over scripts. A voice-over script is randomly selected each time the system produces a video. You can specify at most 50 voice-over scripts. Each voice-over script can be up to 1,000 characters in length. | No |
|
SceneInfo | The scenario-related settings. | Yes |
| |
StickerArray | List<Sticker> | An array of stickers. A sticker is randomly selected each time the system produces a video. You can specify at most 50 stickers. | No | All modes |
BackgroundMusicArray | List<String> | An array of background music materials. A background music material is randomly selected each time the system produces a video. You can specify at most 50 background music materials by specifying their IDs or OSS URLs. | No | All modes |
BackgroundImageArray | List<String> | An array of background images. A background image material is randomly selected each time the system produces a video. You can specify at most 50 background images by specifying their IDs or OSS URLs. | No | All modes |
MediaSearchInput parameters
Parameter | Type | Description | Required |
LibSearchCondition | The search conditions of the search library. | Required |
LibSearchCondition parameters
Parameter | Type | Description | Required |
SearchLibs | List<String> | A list of search libraries, such as ims-default-search-lib. | Required |
SearchText | String | The theme description text, which describes the theme of the matching materials. The text can be up to 20 characters in length. Examples: "Alibaba Cloud Assistant is learning live commerce" and "Ocean, coral reef, seal, dolphin, and marine environment". | Required |
Sticker parameters
Parameter | Type | Description | Required |
MediaId | String | The ID of an image, such as a sticker, a logo, or a watermark. | You must specify at least one of the parameters. If you specify both the MediaId and MediaURL parameters, the MediaId parameter takes precedence. |
MediaURL | String | The URL of the image. You must specify an OSS URL. | |
X | Float | For more information, see the description of the X parameter in the "VideoTrackClip" section of the Timeline configurations topic. | No |
Y | Float | For more information, see the description of the Y parameter in the "VideoTrackClip" section of the Timeline configurations topic. | No |
Width | Float | For more information, see the description of the Width parameter in the "VideoTrackClip" section of the Timeline configurations topic. | No |
Height | Float | For more information, see the description of the Height parameter in the "VideoTrackClip" section of the Timeline configurations topic. | No |
DynamicFrames | Integer | The number of frames of an animated image. | No. This parameter is required only when an animated sticker is specified. |
SceneInfo parameters
This parameter is valid only in storyboard script mode. You do not need to specify this parameter in global broadcast mode.
Parameter | Type | Description | Required |
Scene | String | The type of the matching scenario. For common scenarios, set the value to General. | Yes |
ShotInfo | The storyboard script. | Yes |
ShotInfo parameters
This parameter is valid only in storyboard script mode. You do not need to specify this parameter in global broadcast mode.
Parameter | Type | Description | Required |
ShotScripts | List<ShotScript> | An array of storyboard scripts. | Yes |
ShotScript parameters
This parameter is valid only in storyboard script mode. You do not need to specify this parameter in global broadcast mode.
Parameter | Type | Description | Required |
ScriptText | String | The script that describes a storyboard. Example: The old wizard Danny is working on some strange instruments, trying to develop a new magic potion. | No |
SpeechText | String | The voice-over script for a storyboard. The voice-over script is up to 100 characters in length. | No |
Duration | Float | The length of a storyboard. The value of this field takes effect only if no voice-over script is specified for the storyboard. If a voice-over script is specified for the storyboard, the length of the storyboard is the same as the voice-over. | No |
Sample code in global broadcast mode
{
// Choose between the MediaArray and MediaSearchInput parameters.
"MediaArray": [
"****9d46c886b45481030f6e****",
"****c886810b4549d4630f6e****",
"http://test-bucket.oss-cn-shanghai.aliyuncs.com/test1.mp4",
"http://test-bucket.oss-cn-shanghai.aliyuncs.com/test2.png"
],
// Choose between the MediaArray and MediaSearchInput parameters.
"MediaSearchInput": {
"LibSearchCondition": {
"SearchLibs": [
"ims-default-search-lib",
"test-20"
],
"SearchText": "Alibaba Cloud Assistant is learning live commerce"
}
},
"TitleArray": [
"Freshippo Opens a Store in Huilongguan",
"Freshippo Opens a Store"
],
"SpeechTextArray": [
"Freshippo opens a store near the shopping mall. Today is the first day of opening. Come and check it out. The store is not large but the prices of snacks and drinks are low, which attract many customers waiting in lines.",
"Freshippo opens a store near the shopping mall. Today is the first day of opening. Come and check it out."
],
"Sticker": {
"MediaId": "****b681034549d46c880f6e****",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
},
"StickerArray": [
{
"MediaId": "****9d46c8b4548681030f6e****",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
},
{
"MediaURL": "http://test-bucket.oss-cn-shanghai.aliyuncs.com/test3.png",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
}
],
"BackgroundMusicArray": [
"****b4549d46c88681030f6e****",
"****549d46c88b4681030f6e****",
"http://test-bucket.oss-cn-shanghai.aliyuncs.com/test4.mp3"
],
"BackgroundImageArray": [
"****6c886b4549d481030f6e****",
"****9d46c8548b4681030f6e****",
"http://test-bucket.oss-cn-shanghai.aliyuncs.com/test1.png"
]
}
Sample code in storyboard script mode
{
// Choose between the MediaArray and MediaSearchInput parameters.
"MediaArray": ["MediaId1", "MediaId2"],
// Choose between the MediaArray and MediaSearchInput parameters.
"MediaSearchInput": {
"LibSearchCondition": {
"SearchLibs": [
"ims-default-search-lib",
"test-20"
],
"SearchText": "Alibaba Cloud Assistant is learning live commerce"
}
},
"SceneInfo": {
"Scene": "General", // General matching mode.
"ShotInfo": {
"ShotScripts": [
{
"ScriptText": "The script for the first storyboard",
"SpeechText": "The voice-over script for the first storyboard",
"Duration": 5.0 // The value of this field takes effect only if no voice-over script is specified.
},
{
"ScriptText": "The script for the second storyboard",
"SpeechText": "The voice-over script for the second storyboard",
"Duration": 8.0 // The value of this field takes effect only if no voice-over script is specified.
}
]
}
},
"TitleArray": [
"Freshippo Opens a Store in Huilongguan",
"Freshippo Opens a Store"
],
"StickerArray": [
{
"MediaId": "****9d46c8b4548681030f6e****",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
},
{
"MediaURL": "http://test-bucket.oss-cn-shanghai.aliyuncs.com/test3.png",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
}
],
"BackgroundMusicArray": [
"****b4549d46c88681030f6e****",
"****549d46c88b4681030f6e****",
"http://test-bucket.oss-cn-shanghai.aliyuncs.com/test4.mp3"
],
"BackgroundImageArray": [
"****6c886b4549d481030f6e****",
"****9d46c8548b4681030f6e****",
"http://test-bucket.oss-cn-shanghai.aliyuncs.com/test1.png"
]
}
EditingConfig parameters
You can configure EditingConfig parameters to specify the volume, location, and other production settings of output videos. If you have no special requirements for a parameter, we recommend that you leave the parameter empty. The parameter uses the default value.
In common scenario, the EditingConfig parameters in global broadcast mode are the same as those in storyboard script mode.
Parameter | Type | Description | Required |
MediaConfig | JSON | The configurations of the input video materials. Supported fields:
| No |
TitleConfig | JSON | The configurations of titles. You can configure subtitle parameters. For more information, see the "Banner text" section of the Effect configurations topic. | No |
SpeechConfig | JSON | The configurations of voice-over scripts. Supported fields:
Note The following examples show how the speech tempo is calculated:
If the speed is less than 1x, use the coefficient of 0.002. If the speed is greater than 1x, use the coefficient of 0.001. The actual calculation result is rounded to the nearest integer.
| No |
BackgroundMusicConfig | JSON | The configurations of the background music. Supported fields:
| No |
BackgroundImageConfig | JSON | The configurations of background images. This parameter does not take effect if background images are specified in the InputConfig parameter. Supported fields:
| No |
ProcessConfig | The video editing settings. Supported fields:
| ||
ProduceConfig | JSON | The configurations of video editing and production. For more information, see the "EditingProduceConfig" section of the Editing and production parameters topic. | No |
EditingConfig sample code
All fields of the EditingConfig parameter are optional. The following sample code shows the default configurations:
{
"MediaConfig": {
"Volume": 0 // By default, video materials are muted.
},
"TitleConfig": {
"Alignment": "TopCenter",
"AdaptMode": "AutoWrap",
"Font": "Alibaba PuHuiTi 2.0 95 ExtraBold",
"SizeRequestType": "Nominal",
"Y": 0.1, // The coordinate of the title in the Y axis when the video is produced in portrait mode.
"Y": 0.05, // The coordinate of the title in the Y axis when the video is produced in landscape mode.
"Y": 0.08 // The coordinate of the title in the Y axis when the video is produced in square mode.
},
"SpeechConfig": {
"Volume": 1, // By default, the original volume setting of the voice-over is used.
"SpeechRate": 0,
"Voice": null,
"Style": null,
"AsrConfig": {
"Alignment": "TopCenter",
"AdaptMode": "AutoWrap",
"Font": "Alibaba PuHuiTi 2.0 65 Medium",
"SizeRequestType": "Nominal",
"Spacing": -1,
"Y": 0.8, // The coordinate of the subtitle in the Y axis when the video is produced in portrait mode.
"Y": 0.9, // The coordinate of the subtitle in the Y axis when the video is produced in landscape mode.
"Y": 0.85 // The coordinate of the subtitle in the Y axis when the video is produced in square mode.
}
},
"BackgroundMusicConfig": {
"Volume": 0.2, // By default, the volume of the background music is set to 20%.
"Style": null
},
"ProcessConfig": {
"SingleShotDuration": 3, // The length of each clip after segmentation. Choose between the SingleShotDuration and EnableClipSplit parameters.
"EnableClipSplit": false // Specifies whether to perform AI-powered video segmentation. If you set this parameter to true, the SingleShotDuration parameter does not take effect.
"AllowVfxEffect": false, // Specifies whether special effects can be used.
"AllowTransition": false, // Specifies whether transitions can be used.
"AllowDuplicateMatch": false, // Specifies whether a clip can be repeated in intelligent image-text matching mode.
}
}
OutputConfig parameters
You can configure OutputConfig parameters to specify the URL, name rules, width, height, and number of output videos.
In common scenarios, the OutputConfig parameters in global broadcast mode are the same as those in storyboard script mode.
Parameter | Type | Required | Description |
MediaURL | String | This parameter is required when you store the output video in OSS. | The URL of the output video. The URL must contain a placeholder. Example: http://xxx.oss-cn-shanghai.aliyuncs.com/xxx_{index}.mp4 |
StorageLocation | String | This parameter is required when you store the output video in ApsaraVideo VOD (VOD). | The URL of the output videos stored in VOD. Example: outin-xxxxxx.oss-cn-shanghai.aliyuncs.com |
FileName | String | This parameter is required when you store the output video in VOD. | The name of the output video. The name must contain a placeholder. Example: xxx_{index}.mp4 |
GeneratePreviewOnly | Boolean | No. Default value: false. | If you set the GeneratePreviewOnly parameter to true, the current job generates a timeline only for preview and no video is produced. In this case, you do not need to specify the URL of the output video. After the quick video production job is complete, you can call the GetBatchMediaProducingJob operation to query the result of the job. The returned task list contains the ID of the edit project (projectId). You can call the GetEditingProject operation to obtain the timeline for preview. |
Count | Integer | No. Default value: 1. | The number of videos to be produced.
|
MaxDuration | Float | No | The maximum length of each video. If the voice-over script setting is specified, the text-to-speech (TTS) length shall prevail. In this case, this parameter does not take effect. If no voice-over script setting is specified, the maximum length of each video specified in this parameter takes effect. Default length: 15 seconds. |
FixedDuration | Float | No | The fixed length of each video. If you specify a fixed length, all produced videos use the fixed length. Note:
|
Width | Integer | Yes | The width of the output video in pixels. |
Height | Integer | Yes | The height of the output video in pixels. |
JSONObject | No | The settings related to the output video streams, such as Crf and Codec. |
Sample code
{
"MediaURL": "http://xxx.oss-cn-shanghai.aliyuncs.com/xxx_{index}.mp4",
"Count": 1,
"MaxDuration": 15,
"Width": 1080,
"Height": 1920,
"Video": {"Crf": 27},
"GeneratePreviewOnly":false
}
Examples of SDK calls
Prerequisites
The Intelligent Media Services (IMS) server-side SDK is installed. For more information, see Preparations.
Sample code
The following sample code provides examples of the global broadcast mode.
API request parameters
Sample output videos
Portrait mode | Landscape mode |
Editing logic and advanced configurations
Editing logic
Global broadcast mode:
If the video material is selected by specifying a search library and theme description texts, you can use the theme description texts as the search condition to search for video clips in the search library. The matching video clips are used as input video materials.
If the input video is a long video, the video is segmented into clips. During video editing, you can select from the video clips and merge them into a new video. The default length of each clip after segmentation is 3 seconds. You can specify the SingleShotDuration parameter to configure the length of the clips. For more information, see the "EditingConfig parameters" section of this topic.
If no voice-over script is specified, the system selects random clips and merge them into a video of about 15 seconds.
If a voice-over script is specified, the system aligns the voice-over script with the clips in intelligent image-text matching mode and merges the clips into multiple videos at a time.
Storyboard script mode:
If the video material is selected by specifying a search library and theme description texts, use the theme description texts as the search condition to search for video clips in the search library. Then, use the matching video clips as input video materials.
In storyboard script mode, you do not need to specify the SpeechTextArray or SpeechText field. You can use the SceneInfo, ShotInfo, and ShotScripts fields to specify the content, length, and voice-over in each storyboard of the output videos.
In a single storyboard, clips are cut and matched based on the script. If no script is specified but a voice-over script is specified, the voice-over script is used to match clips.
The length of a storyboard is aligned with the length of the corresponding voice-over or the custom length.
Advanced configurations
For more information about advanced configurations, see Editing logic and advanced configurations for batch video production.