All Products
Search
Document Center

Intelligent Media Services:Script-based automatic video generation

Last Updated:Sep 16, 2025

This topic describes the production parameters, advanced configurations, and software development kit (SDK) call examples for script-based automatic video generation.

Important
  • Script-to-Video and Smart Text and Image to Video use the same Submit Job API. To learn how to distinguish between the two using parameters, see Parameter differences.

  • Note: In this API, the region in the Object Storage Service (OSS) URL of all media assets must be the same as the region in the OpenAPI endpoint.

  • The regions that support script-based automatic video generation are China (Shanghai), China (Beijing), China (Hangzhou), China (Shenzhen), US (West), and Singapore.

  • In practice, replace parameters such as [your-bucket], [your-region-id], [your-file-name], [your-file-path], and media asset IDs (for example, "****9d46c8b4548681030f6e****") in the examples with your actual values.

Note
  • To better understand this topic, we recommend that you first learn about Script-to-Video in Smart Video Creation.

  • Script-based automatic video generation has two processing modes: global narration mode and grouped narration mode.

    • Global narration mode: Randomly combines multiple complete narration scripts with script nodes to achieve batch video mixing and editing.

    • Grouped narration mode: Splits a complete narration script into multiple paragraphs and pairs them with different script nodes to achieve better results.

    • The following section describes how to distinguish between global narration mode and grouped narration mode using parameters:

Usage notes

InputConfig parameters

Note

You can configure InputConfig to specify parameters for basic materials such as video assets, narration, background music, and stickers.

Parameter

Type

Description

Example

Required

Supported modes

MediaGroupArray

List<MediaGroup>

Scripted materials for automatic video generation. You can set group names and material lists.

Group name: Up to 50 characters. Emojis are not supported.

Material list: Media asset ID or OSS URL of the material.

A maximum of 40 groups. Each group can contain a maximum of 200 materials.

For more information, see Global announcement pattern - parameter examples and Group announcement pattern - parameter examples

Yes

  • Global narration

  • Grouped narration

TitleArray

List<String>

An array of titles. A random title is selected for each video production.

A maximum of 50 titles. Each title can be up to 50 characters long.

["Title 1","Title 2"]

No

  • Global narration

  • Grouped narration

SubHeadingArray

List<SubHeading>

Subtitle settings.

[{"Level":1,"TitleArray":["Level-1 subtitle 1","Level-1 subtitle 2"]},{"Level":3,"TitleArray":["Level-3 subtitle"]}]

No

  • Global narration

  • Grouped narration

SpeechTextArray

List<String>

  • An array of narration scripts. A random script is selected for each video production.

  • A maximum of 50 scripts. Each script can be up to 1,000 characters long.

  • You can use SSML markup language to control speech synthesis.

  • The default spoken language is Chinese (zh). To set other languages, see the SpeechLanguage parameter.

    Important

    Currently, only <break>, <s>, <sub>, <w>, <phoneme>, and <say-as> are supported. For CosyVoice-related voices, only <break>, <s>, and <sub> are supported.

["Narration content 1","Narration content 2"]

No

  • Global narration

StickerArray

List<Sticker>

  • An array of stickers. A random sticker is selected for each video production. A maximum of 50 stickers are supported.

  • Random selection rule: For example, if you provide 10 stickers and set the number of videos to produce to 20, a random number from 1 to 10 is generated, such as 3. Then, the stickers are selected in the order of 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, and so on.

  • For more information about supported material formats, see Image formats.

[{"MediaId":"****9d46c8b4548681030f6e****","X":10,"Y":100,"Width":300,"Height":300,"Opacity":0.6}]

No

  • Global narration

  • Grouped narration

BackgroundMusicArray

List<String>

  • An array of background music. A random track is selected for each video production. A maximum of 50 tracks are supported. You can use media asset IDs or OSS URLs.

  • Random selection rule: For example, if you provide 10 background music tracks and set the number of videos to produce to 20, a random number from 1 to 10 is generated, such as 3. Then, the tracks are selected in the order of 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, and so on.

  • For supported media formats, see Audio formats.

["****b4549d46c88681030f6e****","****549d46c88b4681030f6e****"]

No

  • Global narration

  • Grouped narration

BackgroundImageArray

List<String>

  • An array of background images. A random image is selected for each video production. A maximum of 50 images are supported. You can use media asset IDs or OSS URLs.

  • Random selection rule: For example, if you provide 10 background images and set the number of videos to produce to 20, a random number from 1 to 10 is generated, such as 3. Then, the images are selected in the order of 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, and so on.

  • For a list of supported formats, see Image formats.

["****b4549d46c88681030f6e****","****549d46c88b4681030f6e****"]

No

  • Global narration

  • Grouped narration

MediaGroup parameters

Note

The differences in MediaGroup parameter configurations between global narration mode and grouped narration mode are indicated in the "Supported modes" column of the table.

Parameter

Type

Description

Example

Required

Supported modes

GroupName

String

Group name.

Up to 50 characters. Emojis are not supported.

Group1

Yes

  • Global narration

  • Grouped narration

MediaArray

List<String>

  • A list of materials. You can use media IDs or URLs. A maximum of 200 materials are supported.

  • For supported formats, see Video formats.

****b4549d46c88681030f6e****

Yes

  • Global narration

  • Grouped narration

SpeechTextArray

List<String>

  • An array of narration scripts. A random script is selected for each video production.

  • A maximum of 50 scripts. Each script can be up to 1,000 characters long.

  • Supports using SSML markup language to control speech synthesis.

    Important

    Currently, only <break>, <s>, <sub>, <w>, <phoneme>, and <say-as> are supported. For CosyVoice-related voices, only <break>, <s>, and <sub> are supported.

["Narration content 1","Narration content 2"]

No

  • Grouped narration

Duration

Float

The duration of the current group in seconds. This parameter is valid only when SpeechTextArray is empty.

10

No. Default: 5.

  • Grouped narration

SplitMode

String

NoSplit

No. Default: AverageSplit.

  • Global narration

  • Grouped narration

Volume

Float

  • The volume of the input video. If you set the volume here, the volume of the videos in the current group will match this setting, and the EditingConfig.MediaConfig.Volume parameter will no longer apply to this group.

  • Value range: [0, 10.0]. Up to two decimal places are supported.

0.5

No

  • Grouped narration

DurationAutoAdapt

Boolean

Specifies whether to enable automatic duration adaptation for the group. If enabled and there is no narration, the duration of the group is automatically adjusted to ensure video clips play at their original speed.

true

No. Default: false.

  • Grouped narration

Global narration mode - Parameter example

{
  "MediaGroupArray": [
    {
      "GroupName": "UseMediaId",
      "MediaArray": [
        "****9d46c886b45481030f6e****",
        "****c886810b4549d4630f6e****"
      ],
      "SplitMode": "NoSplit"
    },
    {
      "GroupName": "UseOssUrl",
      "MediaArray": [
        "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4",
        "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png"
      ]
    }
  ],
  "TitleArray": [
    "Hema Fresh in Huilongguan is now open",
    "Hema Fresh is now open"
  ],
  "SubHeadingArray": [
    {
      "Level": 1,
      "TitleArray": ["Subtitle 1", "Subtitle 2"]
    },
    {
      "Level": 3,
      "TitleArray": ["Level-3 subtitle"]
    }
  ],
  "SpeechTextArray": [
    "A new Hema Fresh store just opened in a nearby mall. Today is the grand opening, and I rushed over to join the fun. The store isn't very large, but the mall is crowded. Snacks and drinks are quite cheap, and the checkout lines are very long. Come and check it out!",
    "A new Hema Fresh store just opened in a nearby mall. Today is the grand opening, so I came to join the excitement.",
    "<speak>The battle <phoneme alphabet=\"py\" ph=\"zheng4 hao3\">is fierce</phoneme>. Today, our protagonist, table tennis legend Ma Long, is charging towards the pinnacle of glory. In the quarterfinals against the formidable Shunsuke Togami, Ma Long showed no fear, giving his all in every rally. His precise shots and calm judgment gave him the upper hand in this match. In the end, Ma Long successfully defeated his opponent and advanced to the semifinals.</speak>"
  ],
  "StickerArray": [
    {
      "MediaId": "****9d46c8b4548681030f6e****",
      "X": 10,
      "Y": 100,
      "Width": 300,
      "Height": 300,
      "Opacity": 0.6
    },
    {
      "MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png",
      "X": 10,
      "Y": 100,
      "Width": 300,
      "Height": 300
    }
  ],
  "BackgroundMusicArray": [
    "****b4549d46c88681030f6e****",
    "****549d46c88b4681030f6e****",
    "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp3"
  ],
  "BackgroundImageArray": [
    "****6c886b4549d481030f6e****",
    "****9d46c8548b4681030f6e****",
    "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png"
  ]
}

Grouped narration mode - Parameter example

{
  "MediaGroupArray": [{
    "GroupName": "start",
    "MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].jpeg", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4"],
    "Duration": 5,
    "SplitMode": "NoSplit",
    "Volume": 1
  },
    {
      "GroupName": "group1",
      "MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4"],
      "SpeechTextArray": ["A new Hema Fresh store just opened in a nearby mall. Today is the grand opening.", "Today is the grand opening of this Hema Fresh store.", "<speak>The battle <phoneme alphabet=\"py\" ph=\"zheng4 hao3\">is fierce</phoneme>. Today, our protagonist, table tennis legend Ma Long, is charging towards the pinnacle of glory. In the quarterfinals against the formidable Shunsuke Togami, Ma Long showed no fear, giving his all in every rally. His precise shots and calm judgment gave him the upper hand in this match. In the end, Ma Long successfully defeated his opponent and advanced to the semifinals.</speak>"]
    },
    {
      "GroupName": "group2",
      "MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/0-test-batch-editing-materials/normal%20video.mp4", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].jpeg"],
      "SpeechTextArray": ["The store isn't very large, but the mall is crowded. Snacks and drinks are quite cheap, and the checkout lines are very long.", "The scene is very lively, with crowds of people and a wide variety of goods."]
    },
    {
      "GroupName": "group3",
      "MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/0-test-batch-editing-materials/young_sunset_walk.mp4"],
      "SpeechTextArray": ["Come and take a look!", "Hurry and come take a look!"]
    },
    {
      "GroupName": "end",
      "MediaArray": ["https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].jpg", "https://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp4"],
      "Duration": 5
    }
  ],
  "TitleArray": [
    "Hema Fresh in Huilongguan is now open",
    "Hema Fresh is now open"
  ],
  "StickerArray": [
    {
      "MediaId": "****9d46c8b4548681030f6e****",
      "X": 10,
      "Y": 100,
      "Width": 300,
      "Height": 300,
      "Opacity": 0.6
    },
     {
      "MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png",
      "X": 10,
      "Y": 100,
      "Width": 300,
      "Height": 300
    }
  ],
  "SubHeadingArray": [
    {
      "Level": 1,
      "TitleArray": ["Level-1 subtitle 1", "Level-1 subtitle 2"]
    },
    {
      "Level": 3,
      "TitleArray": ["Level-3 subtitle"]
    }
  ],
  "BackgroundMusicArray": [
    "****b4549d46c88681030f6e****",
    "****549d46c88b4681030f6e****",
    "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].mp3"
  ],
  "BackgroundImageArray": [
    "****6c886b4549d481030f6e****",
    "****9d46c8548b4681030f6e****",
    "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name].png"
  ]
}

EditingConfig parameters

You can configure EditingConfig to specify the volume, position, and other composition parameters for the clips. For parameter examples, see EditingConfig parameter examples.

Note

Except for the following parameters, all other parameters support both global narration mode and grouped narration mode:

Parameter

Type

Description

Example

Required

MediaConfig

JSON

Configurations for input video materials.

{"Volume":"1","MediaMetaDataArray":[{"Media":"****6c886b4549d481030f6e****","GroupName":"GroupA","TimeRangeList":[{"In":"0","Out":"1"},{"In":"2","Out":"3"}]}]}

No

TitleConfig

JSON

Configurations for titles. You can configure caption parameters.

{"Alignment":"TopCenter","AdaptMode":"AutoWrap","Font":"Alibaba PuHuiTi 2.0 95 ExtraBold","SizeRequestType":"Nominal","Y":0.1}

No

SubHeadingConfig

JSON

Configurations for multi-level subtitles. You can configure caption parameters.

JSON field description:

{"1":{"Y":0.3,"FontSize":40},"3":{"Y":0.5,"FontSize":30}}

No

SpeechConfig

JSON

Configurations for narration scripts.

For more information, see EditingConfig parameter examples

No

BackgroundMusicConfig

JSON

Configurations for background music.

{"Volume":0.2}

No

BackgroundImageConfig

JSON

Configurations for background images. This field does not take effect if a background image is already configured in InputConfig.

{"SubType":"Blur","Radius":0.5}

No

ProcessConfig

JSON

Configurations for mixing and editing processing.

For more information, see EditingConfig parameter examples

No

FECanvas

JSON

Canvas configuration for frontend page preview.

{"Width": 1080,"Height": 1920}

No

ProduceConfig

JSON

Configuration for standard video editing and production. For more information about the fields, see EditingProduceConfig.

{"AutoRegisterInputVodMedia":true,"OutputWebmTransparentChannel":true,"CoverConfig":{"StartTime":3.3},"AudioChannelCopy":"left","PipelineId":"****d54a97cff4108b555b01166d4****","MaxBitrate":5000,"KeepOriginMaxBitrate":false,"KeepOriginVideoMaxFps":false}

No

ProcessConfig parameters

Parameter

Type

Description

Example

Required

SingleShotDuration

Float

When a long video material is edited, it is automatically split. This parameter specifies the duration of a single shot after splitting, in seconds.

5

No. Default: 3.

AllowVfxEffect

Boolean

Specifies whether to add special effects.

true

No. Default: false.

VfxEffectProbability

Float

The probability of applying a special effect to each video clip. Value range: 0.0 to 1.0. Up to two decimal places are supported.

0.6

No. Default: 0.5.

VfxFirstClipEffectList

List<String>

  • If VfxFirstClipEffectList is not empty, the effect for the first clip of the video is selected from this list.

  • If VfxFirstClipEffectList is empty, the effect for the first clip is randomly selected from the following: "slightshow", "starfieldshinee", "starfieldshinee2", "starsparkle", "colorfulripples", and "starfield".

  • For examples of effects, see effect examples.

["slightshow","starfieldshinee"]

No

VfxNotFirstClipEffectList

List<String>

  • If VfxNotFirstClipEffectList is not empty, the effects for clips other than the first one are selected from this list.

  • If VfxNotFirstClipEffectList is empty, the effects for clips other than the first one are selected from the following: "zoomslight", "zoom", "zoominout", and "slightshake".

  • For more information, see Effect examples.

["zoomslight","zoom"]

No

AllowTransition

Boolean

Specifies whether to add transition effects.

true

No. Default: false.

TransitionDuration

Float

The duration of the transition in seconds. If the transition duration is greater than (clip duration - 1), the transition effect for that clip will not be applied.

0.5

No. Default: 0.5 seconds.

TransitionList

List<String>

A list of custom transition effects. When AllowTransition is set to true, a random transition effect from this list is selected for composition. For more information about the available transition effects, see the Transition Effect Library. If this parameter is empty, a random effect is selected from the following transition effects: "linearblur", "colordistance", "crosshatch", "dreamyzoom", or "doomscreentransition_up".

["directional", "linearblur"]

No

UseUniformTransition

Boolean

Specifies whether to use the same transition effect throughout a single produced video.

true

No. Default: true.

AllowFilter

Boolean

Specifies whether to add custom filters.

false

No. Default: false.

FilterList

List<String>

A list of custom filter effects. If `AllowFilter` is set to `true`, a filter is randomly selected from this list for composition. For the available filter effects, see Filter Effect Examples. If this parameter is empty, no filter effect is added.

["m1", "m2"]

No

AlignmentMode

String

The alignment mode for the video and narration script. This parameter takes effect only in global narration mode. Valid values:

  • "AutoSpeed": The duration of the video track is scaled to match the audio track.

  • "Cut": The duration of the video track is truncated to match the audio track.

AutoSpeed

No. Default: AutoSpeed.

ImageDuration

Float

The duration of image materials in seconds.

2

No. Default: 2.

EditingConfig parameter example

{
  "MediaConfig": {
    "Volume": 0 // Mute the video materials by default
  },
  "TitleConfig": {
    "Alignment": "TopCenter",
    "AdaptMode": "AutoWrap",
    "Font": "Alibaba PuHuiTi 2.0 95 ExtraBold",
    "SizeRequestType": "Nominal",
    "Y": 0.1, // Default Y-coordinate of the title when the output video is in portrait mode
    "Y": 0.05, // Default Y-coordinate of the title when the output video is in landscape mode
    "Y": 0.08 // Default Y-coordinate of the title when the output video is in square mode
  },
   "SubHeadingConfig": {
    "1": {
      "Y": 0.3,
      "FontSize": 40
    },
    "3": {
      "Y": 0.5,
      "FontSize": 30
    }
  },
  "SpeechConfig": {
    "Volume": 1,  // Use the original volume for the narration audio by default
    "SpeechRate": 0,
    "Voice": null,
    "Style": null,
    "CustomizedVoice": null, // The voice ID for voice cloning. If this field is specified, Voice and Style become invalid.
    "AsrConfig": {
      "Alignment": "TopCenter",
      "AdaptMode": "AutoWrap",
      "Font": "Alibaba PuHuiTi 2.0 65 Medium",
      "SizeRequestType": "Nominal",
      "Spacing": -1,
      "Y": 0.8, // Default Y-coordinate of the captions when the output video is in portrait mode
      "Y": 0.9, // Default Y-coordinate of the captions when the output video is in landscape mode
      "Y": 0.85 // Default Y-coordinate of the captions when the output video is in square mode
    },
    "SpecialWordsConfig": [{
      "Type": "Highlight",
      "Style": {
        "FontName": "KaiTi",
        "FontSize": 80,
        "FontColor": "20AEE9",
        "OutlineColour": "2D20E9",
        "Outline": 3,
        "FontFace": {
          "Bold": true,
          "Underline": true
        }
      },
      "WordsList": [
        "ApsaraVideo",
        "Intelligent Media Services",
        "Smart video creation"
      ]
    },
    {
      "Type": "Highlight",
      "Style": {
        "FontFace": {
          "Italic": true
        }
      },
      "WordsList": [
        "product",
        "take a look"
      ]
    },
    {
      "Type": "Forbidden",
      "WordsList": [
        "pilipala",
        "bilibala"
      ],
      "SoundReplaceMode": "None"
    }
  ]},
  "BackgroundMusicConfig": {
    "Volume": 0.2,   // Use 20% of the original volume for the background music by default
    "Style": null
  },
  "ProcessConfig": {
    "SingleShotDuration": 3,      // Duration of a shot after splitting
    "AllowVfxEffect": false,	  // Specifies whether to add special effects
    "AllowTransition": false,	  // Specifies whether to add transition effects
    "AlignmentMode": "AutoSpeed"  // This field is supported only in global narration mode
  }
}

TemplateConfig parameters

TemplateConfig is a common parameter used to set the template for the One-Click Video Creation feature. For detailed parameter descriptions and usage examples, see TemplateConfig parameters.

OutputConfig parameters

You can configure OutputConfig to specify production parameters such as the output address, naming rules, width and height, and the number of videos to produce.

Note

The OutputConfig parameter configurations are the same for both global narration mode and grouped narration mode.

Parameter

Type

Description

Example

Required

MediaURL

String

The output video address. It must contain the {index} placeholder.

Rule: http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4

Example: http://example.oss-cn-shanghai.aliyuncs.com/example/example_{index}.mp4

Required when GeneratePreviewOnly is false and the output video is stored in OSS.

StorageLocation

String

The storage address for the media asset file to be output to ApsaraVideo VOD (VOD).

Rule: [your-vod-bucket].oss-[your-region-id].aliyuncs.com

Example: outin-****6c886b4549d481030f6e****.oss-cn-shanghai.aliyuncs.com

Required when GeneratePreviewOnly is false and the output video is stored in VOD.

FileName

String

The name of the output file. It must contain the {index} placeholder.

Rule: [your-file-name]__{index}.mp4

Example: example_{index}.mp4

Required when GeneratePreviewOnly is false and the output video is stored in VOD.

GeneratePreviewOnly

Boolean

  • If GeneratePreviewOnly is set to true, the current task only generates a timeline for preview and does not perform actual production. You do not need to specify the output video address.

  • After the one-click video creation job is completed, call GetBatchMediaPoducingJob to query the job result. The returned subtask list contains the video editing project ID, projectId. You can then call GetEditingProject to obtain the preview timeline.

false

No. Default: false.

Count

Integer

The number of videos to output. The maximum is 100.

10

No. Default: 1.

MaxDuration

Float

The maximum duration of a single output video, in seconds.

20

No. Default: 15.

FixedDuration

Float

The fixed duration of a single output video, in seconds. If a fixed duration is set, the video duration will align with this parameter.

  • This parameter is not supported in grouped narration mode.

  • In global narration mode, this parameter is supported when SpeechTextArray is empty.

  • You can set either FixedDuration or MaxDuration.

  • For more information, see Video duration rules.

20

No. Default: 15.

Width

Integer

The width of the output video in pixels.

1080

Yes

Height

Integer

The height of the output video in pixels.

1920

Yes

Video

JSON

Configurations for the output video stream, such as Crf and Codec.

{"Crf": 27}

No

Parameter example

{
 	"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4",
 	"Count": 20,
 	"MaxDuration": 15,
 	"Width": 1080,
 	"Height": 1920,
 	"Video": {"Crf": 27},
        "GeneratePreviewOnly":false
}

Application examples

Example 1: Configure an opening and ending in grouped narration mode

Scenarios

This example applies to the scenario where you want to add a consistent intro and outro with a unified voiceover to a video. You can set the MediaGroup.SplitMode of the intro and outro groups to NoSplit. In this case, the system does not split the media clips in the intro and outro groups. Instead, it plays a randomly selected media clip from each group in its entirety to add a fixed intro and outro.

Example parameters

Click to view the InputConfig parameter example

{
    "mediaGroupArray": [
        {
            "duration": 4,
            "splitMode": "NoSplit",
            "groupName": "opening",
            "mediaArray": [
                "****e44009ee71f0b62bf6f7d44b****"
            ]
        },
        {
            "groupName": "group1",
            "mediaArray": [
                "****e44009eef1f0b62bf6f7d44b****"
            ],
            "speechTextArray": [
                "Wondering where to go for the holiday?",
                "Still hesitant about your holiday plans?"
            ]
        },
        {
            "groupName": "group2",
            "mediaArray": [
                "****e44009eeferfb62bf6f7d44b****",
                "****e440094fghf0b62bf6f7d44b****",
                "****e44009ee74fgh62bf6f7d44b****"
            ],
            "speechTextArray": [
                "Lugu Lake in Yunnan invites you to an appointment with nature. The azure lake is like a mirror, reflecting the unique customs of the Mosuo Kingdom of Women, as picturesque as a painting. Row a boat in the heart of the lake and feel the peaceful years in a swaying dugout canoe. Look up at XX Mountain and its **** mysterious legends. What are you waiting for?",
                "Why not consider a natural feast at Lugu Lake in Yunnan? The azure, mirror-like lake reflects the unique folk customs of the Mosuo Kingdom of Women, picturesque and fascinating. You can leisurely row a boat in the heart of the lake, experiencing the tranquil years in a swaying dugout canoe. You can also look up at the sacred XX Mountain and listen to the ancient and mysterious legends that have been passed down through thousands of years. Come to Lugu Lake."
            ]
        },
        {
            "groupName": "group3",
            "mediaArray": [
                "****e44009ee7ft5662bf6f7d44b****"
            ],
            "speechTextArray": [
                "Come to Lugu Lake and share this quiet and charming landscape!",
                "Share the endless poetry brought by this quiet and charming landscape!"
            ]
        },
        {
            "duration": 4,
            "splitMode": "NoSplit",
            "groupName": "ending",
            "mediaArray": [
                "****e44009ee5fgfg62bf6f7d44b****"
            ]
        }
    ]
}

Click to view the EditingConfig parameter example

{
    "MediaConfig": {
        "MediaMetaDataArray": [
            {
                "Media": "****e44009eedttg62bf6f7d44b****",
                "GroupName": "opening",
                "TimeRangeList": [
                    {
                        "In": 1.5,
                        "Out": 5.5
                    }
                ]
            },
            {
                "Media": "****e44009ee7dfrf62bf6f7d44b****",
                "GroupName": "ending",
                "TimeRangeList": [
                    {
                        "In": 1.5,
                        "Out": 5.5
                    }
                ]
            }
        ]
    }
}

Click to view the OutputConfig parameter example

{
    "count": 10,
    "height": 1920,
    "mediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4",
    "width": 1080,
    "widthHeightRatio": 0.5625
}

Example 2: Create a face montage video using script-based automatic video generation

If you are interested in the face collection scenario, see Best practices for creating face collection videos.

SDK call example

Prerequisites

You have installed the IMS server-side SDK. For more information, see Preparations.

Code example

The following example uses the global narration mode.

Click to view the code example

package com.example;

import java.util.*;

import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;

import com.aliyun.ice20201109.Client;
import com.aliyun.ice20201109.models.*;
import com.aliyun.teaopenapi.models.Config;


/**
 *  You need to add the following Maven dependencies:
 *   <dependency>
 *      <groupId>com.aliyun</groupId>
 *      <artifactId>ice20201109</artifactId>
 *      <version>2.3.0</version>
 *  </dependency>
 *  <dependency>
 *      <groupId>com.alibaba</groupId>
 *      <artifactId>fastjson</artifactId>
 *      <version>1.2.9</version>
 *  </dependency>
 */
public class ScriptBatchEditingService {

    static final String regionId = "[your-region-id]"; // Smart video creation from images and text is supported in cn-shanghai, cn-beijing, and cn-hangzhou.
    static final String bucket = "[your-bucket]";
    private Client iceClient;

    public static void main(String[] args) throws Exception {
        ScriptBatchEditingService scriptBatchEditingService = new ScriptBatchEditingService();
        scriptBatchEditingService.initClient();
        scriptBatchEditingService.runExample();
    }

    public void initClient() throws Exception {
        // An Alibaba Cloud account AccessKey has full access to all APIs. We recommend that you use a RAM user for API calls and routine O&M.
        // This example shows how to store the AccessKey ID and AccessKey secret in environment variables. For more information about how to configure them, see https://www.alibabacloud.com/help/en/sdk/developer-reference/v2-manage-access-credentials
        com.aliyun.credentials.Client credentialClient = new com.aliyun.credentials.Client();

        Config config = new Config();
        config.setCredential(credentialClient);

        // To hard-code the AccessKey ID and AccessKey secret, use the following code. However, we strongly recommend that you do not hard-code them in your project code. Otherwise, the AccessKey pair may be leaked, which compromises the security of all your resources.
        // config.accessKeyId = <The AccessKey ID created in Step 2>;
        // config.accessKeySecret = <The AccessKey secret created in Step 2>;
        config.endpoint = "ice." + regionId + ".aliyuncs.com";
        config.regionId = regionId;
        iceClient = new Client(config);
    }

    public void runExample() throws Exception {

        // Video materials
        JSONObject mediaGroup1 = new JSONObject();
        mediaGroup1.put("GroupName", "start");
        mediaGroup1.put("MediaArray", Arrays.asList(
            "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-start-1.mp4"
        ));

        JSONObject mediaGroup2 = new JSONObject();
        mediaGroup2.put("GroupName", "middle");
        mediaGroup2.put("MediaArray", Arrays.asList(
            "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-m-1.mp4",
            "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-m-2.mp4",
            "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-m-3.mp4"
        ));

        JSONObject mediaGroup3 = new JSONObject();
        mediaGroup3.put("GroupName", "end");
        mediaGroup3.put("MediaArray", Arrays.asList(
            "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-end-1.mp4"
        ));

        JSONArray mediaGroupArray = new JSONArray();
        mediaGroupArray.add(mediaGroup1);
        mediaGroupArray.add(mediaGroup2);
        mediaGroupArray.add(mediaGroup3);

        // Narration scripts
        List<String> speechTextArray = Arrays.asList(
            "Wondering where to go for the holiday? Lugu Lake in Yunnan invites you to an appointment with nature. The azure lake is like a mirror, reflecting the unique customs of the Mosuo Kingdom of Women, as picturesque as a painting. Row a boat in the heart of the lake and feel the peaceful years in a swaying dugout canoe. Look up at XX Mountain and its **** mysterious legends. What are you waiting for? Come to Lugu Lake and share this quiet and charming landscape!",
            "Still hesitant about your holiday plans? Why not consider a natural feast at Lugu Lake in Yunnan? The azure, mirror-like lake reflects the unique folk customs of the Mosuo Kingdom of Women, picturesque and fascinating. You can leisurely row a boat in the heart of the lake, experiencing the tranquil years in a swaying dugout canoe. You can also look up at the sacred XX Mountain and listen to the ancient and mysterious legends that have been passed down through thousands of years. Come to Lugu Lake, and share the endless poetry brought by this quiet and charming landscape!"
        );

        // Video titles
        List<String> titleArray = Arrays.asList(
            "Lugu Lake: Mosuo customs in a beautiful landscape",
            "Exploring the mysterious Lugu Lake",
            "Immersive experience of Lugu Lake"
        );

        JSONObject inputConfig = new JSONObject();
        inputConfig.put("MediaGroupArray", mediaGroupArray);
        inputConfig.put("SpeechTextArray", speechTextArray);
        inputConfig.put("TitleArray", titleArray);

        // Number of videos to produce
        int produceCount = 4;

        // Width and height of the output video. A portrait video is generated.
        //int outputWidth = 1080;
        //int outputHeight = 1920;

        //// Width and height of the output video. A landscape video is generated.
        int outputWidth = 1920;
        int outputHeight = 1080;

        // The OSS address of the output video. It must contain the {index} placeholder.
        String mediaUrl = "http://" + bucket + ".oss-" + regionId + ".aliyuncs.com/script/output_{index}_w.mp4";

        JSONObject outputConfig = new JSONObject();
        outputConfig.put("MediaURL", mediaUrl);
        outputConfig.put("Count", produceCount);
        outputConfig.put("Width", outputWidth);
        outputConfig.put("Height", outputHeight);

        // Submit the smart video creation task
        SubmitBatchMediaProducingJobRequest request = new SubmitBatchMediaProducingJobRequest();
        request.setInputConfig(inputConfig.toJSONString());
        request.setOutputConfig(outputConfig.toJSONString());

        SubmitBatchMediaProducingJobResponse response = iceClient.submitBatchMediaProducingJob(request);
        String jobId = response.getBody().getJobId();
        System.out.println("Start script batch job, batchJobId: " + jobId);

        // Poll the task status until the task is complete
        System.out.println("Waiting job finished...");
        int maxTry = 3000;
        int i = 0;
        while (i < maxTry) {
            Thread.sleep(3000);
            i++;
            GetBatchMediaProducingJobRequest getRequest = new GetBatchMediaProducingJobRequest();
            getRequest.setJobId(jobId);
            GetBatchMediaProducingJobResponse getResponse = iceClient.getBatchMediaProducingJob(getRequest);
            String status = getResponse.getBody().getEditingBatchJob().getStatus();
            System.out.println("BatchJobId: " + jobId + ", status:" + status);

            if ("Failed".equals(status)) {
                System.out.println("Batch job failed. JobInfo: " + JSONObject.toJSONString(getResponse.getBody().getEditingBatchJob()));
                throw new Exception("Produce failed. BatchJobId: " + jobId);
            }

            if ("Finished".equals(status)) {
                System.out.println("Batch job finished. JobInfo: " + JSONObject.toJSONString(getResponse.getBody().getEditingBatchJob()));
                break;
            }
        }
    }
}

Details of API request parameters

Click to view InputConfig

{
  "MediaGroupArray": [{
    "GroupName": "start",
    "MediaArray": [
      "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-start-1.mp4"
    ]
  },
    {
      "GroupName": "middle",
      "MediaArray": [
        "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-m-1.mp4",
        "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-m-2.mp4",
        "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-m-3.mp4"
      ]
    },
    {
      "GroupName": "end",
      "MediaArray": [
        "http://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/lgh/lgh-end-1.mp4"
      ]
    }
  ],
  "SpeechTextArray": [
    "Wondering where to go for the holiday? Lugu Lake in Yunnan invites you to an appointment with nature. The azure lake is like a mirror, reflecting the unique customs of the Mosuo Kingdom of Women, as picturesque as a painting. Row a boat in the heart of the lake and feel the peaceful years in a swaying dugout canoe. Look up at XX Mountain and its **** mysterious legends. What are you waiting for? Come to Lugu Lake and share this quiet and charming landscape!",
    "Still hesitant about your holiday plans? Why not consider a natural feast at Lugu Lake in Yunnan? The azure, mirror-like lake reflects the unique folk customs of the Mosuo Kingdom of Women, picturesque and fascinating. You can leisurely row a boat in the heart of the lake, experiencing the tranquil years in a swaying dugout canoe. You can also look up at the sacred XX Mountain and listen to the ancient and mysterious legends that have been passed down through thousands of years. Come to Lugu Lake, and share the endless poetry brought by this quiet and charming landscape!"
  ],
  "TitleArray": [
    "Lugu Lake: Mosuo customs in a beautiful landscape",
    "Exploring the mysterious Lugu Lake",
    "Immersive experience of Lugu Lake"
  ]
}

Click to view OutputConfig

{
  "Count": 4,
  "Height": 1080,
  "Width": 1920,
  "MediaURL": "http://[your-bucket].oss-<region-id>.aliyuncs.com/[your-file-path]/[your-file-name]
_{index}_w.mp4"
}

Advanced configurations

For more information, see Batch one-click video remixing logic and advanced configuration.

FAQ

For frequently asked questions about Script-to-Video, see Script-to-Video FAQ.

References