How to configure parameters for script-based automatic video generation - Intelligent Media Services

This topic describes the production parameters, advanced configurations, and software development kit (SDK) call examples for script-based automatic video generation.

Important

Script-to-Video and Smart Text and Image to Video use the same Submit Job API. To learn how to distinguish between the two using parameters, see Parameter differences.
Note: In this API, the region in the Object Storage Service (OSS) URL of all media assets must be the same as the region in the OpenAPI endpoint.
The regions that support script-based automatic video generation are China (Shanghai), China (Beijing), China (Hangzhou), China (Shenzhen), US (West), and Singapore.
In practice, replace parameters such as [your-bucket], [your-region-id], [your-file-name], [your-file-path], and media asset IDs (for example, "****9d46c8b4548681030f6e****") in the examples with your actual values.

Note

To better understand this topic, we recommend that you first learn about Script-to-Video in Smart Video Creation.
Script-based automatic video generation has two processing modes: global narration mode and grouped narration mode.
- Global narration mode: Randomly combines multiple complete narration scripts with script nodes to achieve batch video mixing and editing.
- Grouped narration mode: Splits a complete narration script into multiple paragraphs and pairs them with different script nodes to achieve better results.
- The following section describes how to distinguish between global narration mode and grouped narration mode using parameters:
  - If SpeechTextArray is not empty, the mode is global narration mode.
  - If SpeechTextArray is empty and MediaGroupArray contains at least one MediaGroup.Duration or MediaGroup.SpeechTextArray that is not empty, the mode is grouped narration mode.
  - If SpeechTextArray is empty and all MediaGroup.Duration and MediaGroup.SpeechTextArray in MediaGroupArray are empty, the mode is global narration mode.

Usage notes

To intelligently mix multiple video, audio, and image materials and produce videos in batches with a single click, see the API reference for SubmitBatchMediaProducingJob - Batch intelligent one-click video production. For details about key API parameters, see InputConfig parameter details, EditingConfig parameter details, and OutputConfig parameter details.
For more information about batch smart one-click media production jobs, see GetBatchMediaProducingJob - Get information about batch smart one-click media production jobs.

InputConfig parameters

Note

You can configure InputConfig to specify parameters for basic materials such as video assets, narration, background music, and stickers.

Parameter	Type	Description	Example	Required	Supported modes
MediaGroupArray	List<MediaGroup>	Scripted materials for automatic video generation. You can set group names and material lists. Group name: Up to 50 characters. Emojis are not supported. Material list: Media asset ID or OSS URL of the material. A maximum of 40 groups. Each group can contain a maximum of 200 materials.	For more information, see Global announcement pattern - parameter examples and Group announcement pattern - parameter examples	Yes	Global narration Grouped narration
TitleArray	List<String>	An array of titles. A random title is selected for each video production. A maximum of 50 titles. Each title can be up to 50 characters long.	["Title 1","Title 2"]	No	Global narration Grouped narration
SubHeadingArray	List<SubHeading>	Subtitle settings.	[{"Level":1,"TitleArray":["Level-1 subtitle 1","Level-1 subtitle 2"]},{"Level":3,"TitleArray":["Level-3 subtitle"]}]	No	Global narration Grouped narration
SpeechTextArray	List<String>	An array of narration scripts. A random script is selected for each video production. A maximum of 50 scripts. Each script can be up to 1,000 characters long. You can use SSML markup language to control speech synthesis. The default spoken language is Chinese (zh). To set other languages, see the SpeechLanguage parameter. Important Currently, only <break>, <s>, <sub>, <w>, <phoneme>, and <say-as> are supported. For CosyVoice-related voices, only <break>, <s>, and <sub> are supported.	["Narration content 1","Narration content 2"]	No	Global narration
StickerArray	List<Sticker>	An array of stickers. A random sticker is selected for each video production. A maximum of 50 stickers are supported. Random selection rule: For example, if you provide 10 stickers and set the number of videos to produce to 20, a random number from 1 to 10 is generated, such as 3. Then, the stickers are selected in the order of 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, and so on. For more information about supported material formats, see Image formats.	[{"MediaId":"**9d46c8b4548681030f6e**","X":10,"Y":100,"Width":300,"Height":300,"Opacity":0.6}]	No	Global narration Grouped narration
BackgroundMusicArray	List<String>	An array of background music. A random track is selected for each video production. A maximum of 50 tracks are supported. You can use media asset IDs or OSS URLs. Random selection rule: For example, if you provide 10 background music tracks and set the number of videos to produce to 20, a random number from 1 to 10 is generated, such as 3. Then, the tracks are selected in the order of 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, and so on. For supported media formats, see Audio formats.	["**b4549d46c88681030f6e","549d46c88b4681030f6e**"]	No	Global narration Grouped narration
BackgroundImageArray	List<String>	An array of background images. A random image is selected for each video production. A maximum of 50 images are supported. You can use media asset IDs or OSS URLs. Random selection rule: For example, if you provide 10 background images and set the number of videos to produce to 20, a random number from 1 to 10 is generated, such as 3. Then, the images are selected in the order of 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, and so on. For a list of supported formats, see Image formats.	["**b4549d46c88681030f6e","549d46c88b4681030f6e**"]	No	Global narration Grouped narration

MediaGroup parameters

Note

The differences in MediaGroup parameter configurations between global narration mode and grouped narration mode are indicated in the "Supported modes" column of the table.

Parameter	Type	Description	Example	Required	Supported modes
GroupName	String	Group name. Up to 50 characters. Emojis are not supported.	Group1	Yes	Global narration Grouped narration
MediaArray	List<String>	A list of materials. You can use media IDs or URLs. A maximum of 200 materials are supported. For supported formats, see Video formats.	**b4549d46c88681030f6e**	Yes	Global narration Grouped narration
SpeechTextArray	List<String>	An array of narration scripts. A random script is selected for each video production. A maximum of 50 scripts. Each script can be up to 1,000 characters long. Supports using SSML markup language to control speech synthesis. Important Currently, only <break>, <s>, <sub>, <w>, <phoneme>, and <say-as> are supported. For CosyVoice-related voices, only <break>, <s>, and <sub> are supported.	["Narration content 1","Narration content 2"]	No	Grouped narration
Duration	Float	The duration of the current group in seconds. This parameter is valid only when SpeechTextArray is empty.	10	No. Default: 5.	Grouped narration
SplitMode	String	The mode for splitting video materials in the group. For more information about the processing logic and usage of this parameter, see: How to ensure that video clips are played completely in the final video? How to fix abrupt and overly frequent scene transitions in the final video? Valid values: NoSplit: Does not split the video. AverageSplit: Automatically splits the video based on the SingleShotDuration parameter.	NoSplit	No. Default: AverageSplit.	Global narration Grouped narration
Volume	Float	The volume of the input video. If you set the volume here, the volume of the videos in the current group will match this setting, and the EditingConfig.MediaConfig.Volume parameter will no longer apply to this group. Value range: [0, 10.0]. Up to two decimal places are supported.	0.5	No	Grouped narration
DurationAutoAdapt	Boolean	Specifies whether to enable automatic duration adaptation for the group. If enabled and there is no narration, the duration of the group is automatically adjusted to ensure video clips play at their original speed.	true	No. Default: false.	Grouped narration

Global narration mode - Parameter example

{
  "MediaGroupArray": [
    {
      "GroupName": "UseMediaId",
      "MediaArray": [
        "****9d46c886b45481030f6e****",
        "****c886810b4549d4630f6e****"
      ],
      "SplitMode": "NoSplit"
    },
    {
      "GroupName": "UseOssUrl",
      "MediaArray": [
        "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4",
        "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png"
      ]
    }
  ],
  "TitleArray": [
    "Hema Fresh in Huilongguan is now open",
    "Hema Fresh is now open"
  ],
  "SubHeadingArray": [
    {
      "Level": 1,
      "TitleArray": ["Subtitle 1", "Subtitle 2"]
    },
    {
      "Level": 3,
      "TitleArray": ["Level-3 subtitle"]
    }
  ],
  "SpeechTextArray": [
    "A new Hema Fresh store just opened in a nearby mall. Today is the grand opening, and I rushed over to join the fun. The store isn't very large, but the mall is crowded. Snacks and drinks are quite cheap, and the checkout lines are very long. Come and check it out!",
    "A new Hema Fresh store just opened in a nearby mall. Today is the grand opening, so I came to join the excitement.",
    "<speak>The battle <phoneme alphabet=\"py\" ph=\"zheng4 hao3\">is fierce</phoneme>. Today, our protagonist, table tennis legend Ma Long, is charging towards the pinnacle of glory. In the quarterfinals against the formidable Shunsuke Togami, Ma Long showed no fear, giving his all in every rally. His precise shots and calm judgment gave him the upper hand in this match. In the end, Ma Long successfully defeated his opponent and advanced to the semifinals.</speak>"
  ],
  "StickerArray": [
    {
      "MediaId": "****9d46c8b4548681030f6e****",
      "X": 10,
      "Y": 100,
      "Width": 300,
      "Height": 300,
      "Opacity": 0.6
    },
    {
      "MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png",
      "X": 10,
      "Y": 100,
      "Width": 300,
      "Height": 300
    }
  ],
  "BackgroundMusicArray": [
    "****b4549d46c88681030f6e****",
    "****549d46c88b4681030f6e****",
    "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp3"
  ],
  "BackgroundImageArray": [
    "****6c886b4549d481030f6e****",
    "****9d46c8548b4681030f6e****",
    "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png"
  ]
}

Grouped narration mode - Parameter example

{
  "MediaGroupArray": [{
    "GroupName": "start",
    "MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].jpeg", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4"],
    "Duration": 5,
    "SplitMode": "NoSplit",
    "Volume": 1
  },
    {
      "GroupName": "group1",
      "MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4"],
      "SpeechTextArray": ["A new Hema Fresh store just opened in a nearby mall. Today is the grand opening.", "Today is the grand opening of this Hema Fresh store.", "<speak>The battle <phoneme alphabet=\"py\" ph=\"zheng4 hao3\">is fierce</phoneme>. Today, our protagonist, table tennis legend Ma Long, is charging towards the pinnacle of glory. In the quarterfinals against the formidable Shunsuke Togami, Ma Long showed no fear, giving his all in every rally. His precise shots and calm judgment gave him the upper hand in this match. In the end, Ma Long successfully defeated his opponent and advanced to the semifinals.</speak>"]
    },
    {
      "GroupName": "group2",
      "MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/0-test-batch-editing-materials/normal%20video.mp4", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].jpeg"],
      "SpeechTextArray": ["The store isn't very large, but the mall is crowded. Snacks and drinks are quite cheap, and the checkout lines are very long.", "The scene is very lively, with crowds of people and a wide variety of goods."]
    },
    {
      "GroupName": "group3",
      "MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/0-test-batch-editing-materials/young_sunset_walk.mp4"],
      "SpeechTextArray": ["Come and take a look!", "Hurry and come take a look!"]
    },
    {
      "GroupName": "end",
      "MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].jpg", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4"],
      "Duration": 5
    }
  ],
  "TitleArray": [
    "Hema Fresh in Huilongguan is now open",
    "Hema Fresh is now open"
  ],
  "StickerArray": [
    {
      "MediaId": "****9d46c8b4548681030f6e****",
      "X": 10,
      "Y": 100,
      "Width": 300,
      "Height": 300,
      "Opacity": 0.6
    },
     {
      "MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png",
      "X": 10,
      "Y": 100,
      "Width": 300,
      "Height": 300
    }
  ],
  "SubHeadingArray": [
    {
      "Level": 1,
      "TitleArray": ["Level-1 subtitle 1", "Level-1 subtitle 2"]
    },
    {
      "Level": 3,
      "TitleArray": ["Level-3 subtitle"]
    }
  ],
  "BackgroundMusicArray": [
    "****b4549d46c88681030f6e****",
    "****549d46c88b4681030f6e****",
    "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp3"
  ],
  "BackgroundImageArray": [
    "****6c886b4549d481030f6e****",
    "****9d46c8548b4681030f6e****",
    "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png"
  ]
}

EditingConfig parameters

You can configure EditingConfig to specify the volume, position, and other composition parameters for the clips. For parameter examples, see EditingConfig parameter examples.

Note

Except for the following parameters, all other parameters support both global narration mode and grouped narration mode:

ProcessConfig.AlignmentMode takes effect only in global narration mode.
SpeechConfig.SpecialWordsConfig takes effect only in grouped narration mode.

Parameter	Type	Description	Example	Required
MediaConfig	JSON	Configurations for input video materials.	{"Volume":"1","MediaMetaDataArray":[{"Media":"**6c886b4549d481030f6e**","GroupName":"GroupA","TimeRangeList":[{"In":"0","Out":"1"},{"In":"2","Out":"3"}]}]}	No
TitleConfig	JSON	Configurations for titles. You can configure caption parameters.	{"Alignment":"TopCenter","AdaptMode":"AutoWrap","Font":"Alibaba PuHuiTi 2.0 95 ExtraBold","SizeRequestType":"Nominal","Y":0.1}	No
SubHeadingConfig	JSON	Configurations for multi-level subtitles. You can configure caption parameters. JSON field description: key: Level value: Banner text	{"1":{"Y":0.3,"FontSize":40},"3":{"Y":0.5,"FontSize":30}}	No
SpeechConfig	JSON	Configurations for narration scripts.	For more information, see EditingConfig parameter examples	No
BackgroundMusicConfig	JSON	Configurations for background music.	{"Volume":0.2}	No
BackgroundImageConfig	JSON	Configurations for background images. This field does not take effect if a background image is already configured in InputConfig.	{"SubType":"Blur","Radius":0.5}	No
ProcessConfig	JSON	Configurations for mixing and editing processing.	For more information, see EditingConfig parameter examples	No
FECanvas	JSON	Canvas configuration for frontend page preview.	{"Width": 1080,"Height": 1920}	No
ProduceConfig	JSON	Configuration for standard video editing and production. For more information about the fields, see EditingProduceConfig.	{"AutoRegisterInputVodMedia":true,"OutputWebmTransparentChannel":true,"CoverConfig":{"StartTime":3.3},"AudioChannelCopy":"left","PipelineId":"**d54a97cff4108b555b01166d4**","MaxBitrate":5000,"KeepOriginMaxBitrate":false,"KeepOriginVideoMaxFps":false}	No

ProcessConfig parameters

Parameter	Type	Description	Example	Required
SingleShotDuration	Float	When a long video material is edited, it is automatically split. This parameter specifies the duration of a single shot after splitting, in seconds.	5	No. Default: 3.
AllowVfxEffect	Boolean	Specifies whether to add special effects.	true	No. Default: false.
VfxEffectProbability	Float	The probability of applying a special effect to each video clip. Value range: 0.0 to 1.0. Up to two decimal places are supported.	0.6	No. Default: 0.5.
VfxFirstClipEffectList	List<String>	If VfxFirstClipEffectList is not empty, the effect for the first clip of the video is selected from this list. If VfxFirstClipEffectList is empty, the effect for the first clip is randomly selected from the following: "slightshow", "starfieldshinee", "starfieldshinee2", "starsparkle", "colorfulripples", and "starfield". For examples of effects, see effect examples.	["slightshow","starfieldshinee"]	No
VfxNotFirstClipEffectList	List<String>	If VfxNotFirstClipEffectList is not empty, the effects for clips other than the first one are selected from this list. If VfxNotFirstClipEffectList is empty, the effects for clips other than the first one are selected from the following: "zoomslight", "zoom", "zoominout", and "slightshake". For more information, see Effect examples.	["zoomslight","zoom"]	No
AllowTransition	Boolean	Specifies whether to add transition effects.	true	No. Default: false.
TransitionDuration	Float	The duration of the transition in seconds. If the transition duration is greater than (clip duration - 1), the transition effect for that clip will not be applied.	0.5	No. Default: 0.5 seconds.
TransitionList	List<String>	A list of custom transition effects. When AllowTransition is set to true, a random transition effect from this list is selected for composition. For more information about the available transition effects, see the Transition Effect Library. If this parameter is empty, a random effect is selected from the following transition effects: "linearblur", "colordistance", "crosshatch", "dreamyzoom", or "doomscreentransition_up".	["directional", "linearblur"]	No
UseUniformTransition	Boolean	Specifies whether to use the same transition effect throughout a single produced video.	true	No. Default: true.
AllowFilter	Boolean	Specifies whether to add custom filters.	false	No. Default: false.
FilterList	List<String>	A list of custom filter effects. If `AllowFilter` is set to `true`, a filter is randomly selected from this list for composition. For the available filter effects, see Filter Effect Examples. If this parameter is empty, no filter effect is added.	["m1", "m2"]	No
AlignmentMode	String	The alignment mode for the video and narration script. This parameter takes effect only in global narration mode. Valid values: "AutoSpeed": The duration of the video track is scaled to match the audio track. "Cut": The duration of the video track is truncated to match the audio track.	AutoSpeed	No. Default: AutoSpeed.
ImageDuration	Float	The duration of image materials in seconds.	2	No. Default: 2.

EditingConfig parameter example

{
  "MediaConfig": {
    "Volume": 0 // Mute the video materials by default
  },
  "TitleConfig": {
    "Alignment": "TopCenter",
    "AdaptMode": "AutoWrap",
    "Font": "Alibaba PuHuiTi 2.0 95 ExtraBold",
    "SizeRequestType": "Nominal",
    "Y": 0.1, // Default Y-coordinate of the title when the output video is in portrait mode
    "Y": 0.05, // Default Y-coordinate of the title when the output video is in landscape mode
    "Y": 0.08 // Default Y-coordinate of the title when the output video is in square mode
  },
   "SubHeadingConfig": {
    "1": {
      "Y": 0.3,
      "FontSize": 40
    },
    "3": {
      "Y": 0.5,
      "FontSize": 30
    }
  },
  "SpeechConfig": {
    "Volume": 1,  // Use the original volume for the narration audio by default
    "SpeechRate": 0,
    "Voice": null,
    "Style": null,
    "CustomizedVoice": null, // The voice ID for voice cloning. If this field is specified, Voice and Style become invalid.
    "AsrConfig": {
      "Alignment": "TopCenter",
      "AdaptMode": "AutoWrap",
      "Font": "Alibaba PuHuiTi 2.0 65 Medium",
      "SizeRequestType": "Nominal",
      "Spacing": -1,
      "Y": 0.8, // Default Y-coordinate of the captions when the output video is in portrait mode
      "Y": 0.9, // Default Y-coordinate of the captions when the output video is in landscape mode
      "Y": 0.85 // Default Y-coordinate of the captions when the output video is in square mode
    },
    "SpecialWordsConfig": [{
      "Type": "Highlight",
      "Style": {
        "FontName": "KaiTi",
        "FontSize": 80,
        "FontColor": "20AEE9",
        "OutlineColour": "2D20E9",
        "Outline": 3,
        "FontFace": {
          "Bold": true,
          "Underline": true
        }
      },
      "WordsList": [
        "ApsaraVideo",
        "Intelligent Media Services",
        "Smart video creation"
      ]
    },
    {
      "Type": "Highlight",
      "Style": {
        "FontFace": {
          "Italic": true
        }
      },
      "WordsList": [
        "product",
        "take a look"
      ]
    },
    {
      "Type": "Forbidden",
      "WordsList": [
        "pilipala",
        "bilibala"
      ],
      "SoundReplaceMode": "None"
    }
  ]},
  "BackgroundMusicConfig": {
    "Volume": 0.2,   // Use 20% of the original volume for the background music by default
    "Style": null
  },
  "ProcessConfig": {
    "SingleShotDuration": 3,      // Duration of a shot after splitting
    "AllowVfxEffect": false,	  // Specifies whether to add special effects
    "AllowTransition": false,	  // Specifies whether to add transition effects
    "AlignmentMode": "AutoSpeed"  // This field is supported only in global narration mode
  }
}

TemplateConfig parameters

TemplateConfig is a common parameter used to set the template for the One-Click Video Creation feature. For detailed parameter descriptions and usage examples, see TemplateConfig parameters.

OutputConfig parameters

You can configure OutputConfig to specify production parameters such as the output address, naming rules, width and height, and the number of videos to produce.

Note

The OutputConfig parameter configurations are the same for both global narration mode and grouped narration mode.

Parameter	Type	Description	Example	Required
MediaURL	String	The output video address. It must contain the {index} placeholder.	Rule: http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4 Example: http://example.oss-cn-shanghai.aliyuncs.com/example/example_{index}.mp4	Required when GeneratePreviewOnly is false and the output video is stored in OSS.
StorageLocation	String	The storage address for the media asset file to be output to ApsaraVideo VOD (VOD).	Rule: [your-vod-bucket].oss-[your-region-id].aliyuncs.com Example: outin-**6c886b4549d481030f6e**.oss-cn-shanghai.aliyuncs.com	Required when GeneratePreviewOnly is false and the output video is stored in VOD.
FileName	String	The name of the output file. It must contain the {index} placeholder.	Rule: [your-file-name]__{index}.mp4 Example: example_{index}.mp4	Required when GeneratePreviewOnly is false and the output video is stored in VOD.
GeneratePreviewOnly	Boolean	If GeneratePreviewOnly is set to true, the current task only generates a timeline for preview and does not perform actual production. You do not need to specify the output video address. After the one-click video creation job is completed, call GetBatchMediaPoducingJob to query the job result. The returned subtask list contains the video editing project ID, projectId. You can then call GetEditingProject to obtain the preview timeline.	false	No. Default: false.
Count	Integer	The number of videos to output. The maximum is 100.	10	No. Default: 1.
MaxDuration	Float	The maximum duration of a single output video, in seconds. If the narration text parameter is specified, the duration is based on the text-to-speech (TTS) duration of the narration, and this parameter is invalid. You do not need to set this parameter in grouped narration mode. You can set either FixedDuration or MaxDuration. For more information about the processing logic and usage of this parameter, see: What is the processing logic for the global narration mode? How to troubleshoot scene transition speed and configure shot duration?	20	No. Default: 15.
FixedDuration	Float	The fixed duration of a single output video, in seconds. If a fixed duration is set, the video duration will align with this parameter. This parameter is not supported in grouped narration mode. In global narration mode, this parameter is supported when SpeechTextArray is empty. You can set either FixedDuration or MaxDuration. For more information, see Video duration rules.	20	No. Default: 15.
Width	Integer	The width of the output video in pixels.	1080	Yes
Height	Integer	The height of the output video in pixels.	1920	Yes
Video	JSON	Configurations for the output video stream, such as Crf and Codec.	{"Crf": 27}	No

Parameter example

{
 	"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4",
 	"Count": 20,
 	"MaxDuration": 15,
 	"Width": 1080,
 	"Height": 1920,
 	"Video": {"Crf": 27},
        "GeneratePreviewOnly":false
}

Application examples

Example 1: Configure an opening and ending in grouped narration mode

Scenarios

This example applies to the scenario where you want to add a consistent intro and outro with a unified voiceover to a video. You can set the MediaGroup.SplitMode of the intro and outro groups to NoSplit. In this case, the system does not split the media clips in the intro and outro groups. Instead, it plays a randomly selected media clip from each group in its entirety to add a fixed intro and outro.

Example parameters

Click to view the InputConfig parameter example

{
    "mediaGroupArray": [
        {
            "duration": 4,
            "splitMode": "NoSplit",
            "groupName": "opening",
            "mediaArray": [
                "****e44009ee71f0b62bf6f7d44b****"
            ]
        },
        {
            "groupName": "group1",
            "mediaArray": [
                "****e44009eef1f0b62bf6f7d44b****"
            ],
            "speechTextArray": [
                "Wondering where to go for the holiday?",
                "Still hesitant about your holiday plans?"
            ]
        },
        {
            "groupName": "group2",
            "mediaArray": [
                "****e44009eeferfb62bf6f7d44b****",
                "****e440094fghf0b62bf6f7d44b****",
                "****e44009ee74fgh62bf6f7d44b****"
            ],
            "speechTextArray": [
                "Lugu Lake in Yunnan invites you to an appointment with nature. The azure lake is like a mirror, reflecting the unique customs of the Mosuo Kingdom of Women, as picturesque as a painting. Row a boat in the heart of the lake and feel the peaceful years in a swaying dugout canoe. Look up at XX Mountain and its **** mysterious legends. What are you waiting for?",
                "Why not consider a natural feast at Lugu Lake in Yunnan? The azure, mirror-like lake reflects the unique folk customs of the Mosuo Kingdom of Women, picturesque and fascinating. You can leisurely row a boat in the heart of the lake, experiencing the tranquil years in a swaying dugout canoe. You can also look up at the sacred XX Mountain and listen to the ancient and mysterious legends that have been passed down through thousands of years. Come to Lugu Lake."
            ]
        },
        {
            "groupName": "group3",
            "mediaArray": [
                "****e44009ee7ft5662bf6f7d44b****"
            ],
            "speechTextArray": [
                "Come to Lugu Lake and share this quiet and charming landscape!",
                "Share the endless poetry brought by this quiet and charming landscape!"
            ]
        },
        {
            "duration": 4,
            "splitMode": "NoSplit",
            "groupName": "ending",
            "mediaArray": [
                "****e44009ee5fgfg62bf6f7d44b****"
            ]
        }
    ]
}

Click to view the EditingConfig parameter example

{
    "MediaConfig": {
        "MediaMetaDataArray": [
            {
                "Media": "****e44009eedttg62bf6f7d44b****",
                "GroupName": "opening",
                "TimeRangeList": [
                    {
                        "In": 1.5,
                        "Out": 5.5
                    }
                ]
            },
            {
                "Media": "****e44009ee7dfrf62bf6f7d44b****",
                "GroupName": "ending",
                "TimeRangeList": [
                    {
                        "In": 1.5,
                        "Out": 5.5
                    }
                ]
            }
        ]
    }
}

Click to view the OutputConfig parameter example

{
    "count": 10,
    "height": 1920,
    "mediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4",
    "width": 1080,
    "widthHeightRatio": 0.5625
}

Example 2: Create a face montage video using script-based automatic video generation

If you are interested in the face collection scenario, see Best practices for creating face collection videos.

SDK call example

Prerequisites

You have installed the IMS server-side SDK. For more information, see Preparations.

Code example

The following example uses the global narration mode.

Click to view the code example

package com.example;

import java.util.*;

import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;

import com.aliyun.ice20201109.Client;
import com.aliyun.ice20201109.models.*;
import com.aliyun.teaopenapi.models.Config;


/**
 *  You need to add the following Maven dependencies:
 *   <dependency>
 *      <groupId>com.aliyun</groupId>
 *      <artifactId>ice20201109</artifactId>
 *      <version>2.3.0</version>
 *  </dependency>
 *  <dependency>
 *      <groupId>com.alibaba</groupId>
 *      <artifactId>fastjson</artifactId>
 *      <version>1.2.9</version>
 *  </dependency>
 */
public class ScriptBatchEditingService {

    static final String regionId = "[your-region-id]"; // Smart video creation from images and text is supported in cn-shanghai, cn-beijing, and cn-hangzhou.
    static final String bucket = "[your-bucket]";
    private Client iceClient;

    public static void main(String[] args) throws Exception {
        ScriptBatchEditingService scriptBatchEditingService = new ScriptBatchEditingService();
        scriptBatchEditingService.initClient();
        scriptBatchEditingService.runExample();
    }

    public void initClient() throws Exception {
        // An Alibaba Cloud account AccessKey has full access to all APIs. We recommend that you use a RAM user for API calls and routine O&M.
        // This example shows how to store the AccessKey ID and AccessKey secret in environment variables. For more information about how to configure them, see https://www.alibabacloud.com/help/en/sdk/developer-reference/v2-manage-access-credentials
        com.aliyun.credentials.Client credentialClient = new com.aliyun.credentials.Client();

        Config config = new Config();
        config.setCredential(credentialClient);

        // To hard-code the AccessKey ID and AccessKey secret, use the following code. However, we strongly recommend that you do not hard-code them in your project code. Otherwise, the AccessKey pair may be leaked, which compromises the security of all your resources.
        // config.accessKeyId = <The AccessKey ID created in Step 2>;
        // config.accessKeySecret = <The AccessKey secret created in Step 2>;
        config.endpoint = "ice." + regionId + ".aliyuncs.com";
        config.regionId = regionId;
        iceClient = new Client(config);
    }

    public void runExample() throws Exception {

        // Video materials
        JSONObject mediaGroup1 = new JSONObject();
        mediaGroup1.put("GroupName", "start");
        mediaGroup1.put("MediaArray", Arrays.asList(
            "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-start-1.mp4"
        ));

        JSONObject mediaGroup2 = new JSONObject();
        mediaGroup2.put("GroupName", "middle");
        mediaGroup2.put("MediaArray", Arrays.asList(
            "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-m-1.mp4",
            "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-m-2.mp4",
            "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-m-3.mp4"
        ));

        JSONObject mediaGroup3 = new JSONObject();
        mediaGroup3.put("GroupName", "end");
        mediaGroup3.put("MediaArray", Arrays.asList(
            "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-end-1.mp4"
        ));

        JSONArray mediaGroupArray = new JSONArray();
        mediaGroupArray.add(mediaGroup1);
        mediaGroupArray.add(mediaGroup2);
        mediaGroupArray.add(mediaGroup3);

        // Narration scripts
        List<String> speechTextArray = Arrays.asList(
            "Wondering where to go for the holiday? Lugu Lake in Yunnan invites you to an appointment with nature. The azure lake is like a mirror, reflecting the unique customs of the Mosuo Kingdom of Women, as picturesque as a painting. Row a boat in the heart of the lake and feel the peaceful years in a swaying dugout canoe. Look up at XX Mountain and its **** mysterious legends. What are you waiting for? Come to Lugu Lake and share this quiet and charming landscape!",
            "Still hesitant about your holiday plans? Why not consider a natural feast at Lugu Lake in Yunnan? The azure, mirror-like lake reflects the unique folk customs of the Mosuo Kingdom of Women, picturesque and fascinating. You can leisurely row a boat in the heart of the lake, experiencing the tranquil years in a swaying dugout canoe. You can also look up at the sacred XX Mountain and listen to the ancient and mysterious legends that have been passed down through thousands of years. Come to Lugu Lake, and share the endless poetry brought by this quiet and charming landscape!"
        );

        // Video titles
        List<String> titleArray = Arrays.asList(
            "Lugu Lake: Mosuo customs in a beautiful landscape",
            "Exploring the mysterious Lugu Lake",
            "Immersive experience of Lugu Lake"
        );

        JSONObject inputConfig = new JSONObject();
        inputConfig.put("MediaGroupArray", mediaGroupArray);
        inputConfig.put("SpeechTextArray", speechTextArray);
        inputConfig.put("TitleArray", titleArray);

        // Number of videos to produce
        int produceCount = 4;

        // Width and height of the output video. A portrait video is generated.
        //int outputWidth = 1080;
        //int outputHeight = 1920;

        //// Width and height of the output video. A landscape video is generated.
        int outputWidth = 1920;
        int outputHeight = 1080;

        // The OSS address of the output video. It must contain the {index} placeholder.
        String mediaUrl = "http://" + bucket + ".oss-" + regionId + ".aliyuncs.com/script/output_{index}_w.mp4";

        JSONObject outputConfig = new JSONObject();
        outputConfig.put("MediaURL", mediaUrl);
        outputConfig.put("Count", produceCount);
        outputConfig.put("Width", outputWidth);
        outputConfig.put("Height", outputHeight);

        // Submit the smart video creation task
        SubmitBatchMediaProducingJobRequest request = new SubmitBatchMediaProducingJobRequest();
        request.setInputConfig(inputConfig.toJSONString());
        request.setOutputConfig(outputConfig.toJSONString());

        SubmitBatchMediaProducingJobResponse response = iceClient.submitBatchMediaProducingJob(request);
        String jobId = response.getBody().getJobId();
        System.out.println("Start script batch job, batchJobId: " + jobId);

        // Poll the task status until the task is complete
        System.out.println("Waiting job finished...");
        int maxTry = 3000;
        int i = 0;
        while (i < maxTry) {
            Thread.sleep(3000);
            i++;
            GetBatchMediaProducingJobRequest getRequest = new GetBatchMediaProducingJobRequest();
            getRequest.setJobId(jobId);
            GetBatchMediaProducingJobResponse getResponse = iceClient.getBatchMediaProducingJob(getRequest);
            String status = getResponse.getBody().getEditingBatchJob().getStatus();
            System.out.println("BatchJobId: " + jobId + ", status:" + status);

            if ("Failed".equals(status)) {
                System.out.println("Batch job failed. JobInfo: " + JSONObject.toJSONString(getResponse.getBody().getEditingBatchJob()));
                throw new Exception("Produce failed. BatchJobId: " + jobId);
            }

            if ("Finished".equals(status)) {
                System.out.println("Batch job finished. JobInfo: " + JSONObject.toJSONString(getResponse.getBody().getEditingBatchJob()));
                break;
            }
        }
    }
}

Details of API request parameters

Click to view InputConfig

{
  "MediaGroupArray": [{
    "GroupName": "start",
    "MediaArray": [
      "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-start-1.mp4"
    ]
  },
    {
      "GroupName": "middle",
      "MediaArray": [
        "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-m-1.mp4",
        "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-m-2.mp4",
        "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-m-3.mp4"
      ]
    },
    {
      "GroupName": "end",
      "MediaArray": [
        "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-end-1.mp4"
      ]
    }
  ],
  "SpeechTextArray": [
    "Wondering where to go for the holiday? Lugu Lake in Yunnan invites you to an appointment with nature. The azure lake is like a mirror, reflecting the unique customs of the Mosuo Kingdom of Women, as picturesque as a painting. Row a boat in the heart of the lake and feel the peaceful years in a swaying dugout canoe. Look up at XX Mountain and its **** mysterious legends. What are you waiting for? Come to Lugu Lake and share this quiet and charming landscape!",
    "Still hesitant about your holiday plans? Why not consider a natural feast at Lugu Lake in Yunnan? The azure, mirror-like lake reflects the unique folk customs of the Mosuo Kingdom of Women, picturesque and fascinating. You can leisurely row a boat in the heart of the lake, experiencing the tranquil years in a swaying dugout canoe. You can also look up at the sacred XX Mountain and listen to the ancient and mysterious legends that have been passed down through thousands of years. Come to Lugu Lake, and share the endless poetry brought by this quiet and charming landscape!"
  ],
  "TitleArray": [
    "Lugu Lake: Mosuo customs in a beautiful landscape",
    "Exploring the mysterious Lugu Lake",
    "Immersive experience of Lugu Lake"
  ]
}

Click to view OutputConfig

{
  "Count": 4,
  "Height": 1080,
  "Width": 1920,
  "MediaURL": "http://[your-bucket].oss-<region-id>.aliyuncs.com/[your-file-path]/[your-file-name]
_{index}_w.mp4"
}

Advanced configurations

For more information, see Batch one-click video remixing logic and advanced configuration.

FAQ

For frequently asked questions about Script-to-Video, see Script-to-Video FAQ.

References

Related Preparations
For more information about Script-to-Video, see SubmitBatchMediaProducingJob - Batch Intelligent One-Click Video Creation.
To retrieve a Script-to-Video job, see GetBatchMediaProducingJob - Retrieve batch Script-to-Video job information.
To create a face collection video using Script-to-Video, see Face Collection Video Creation Tutorial.
For advanced configurations, see Batch one-click montage logic and advanced configuration.