All Products
Search
Document Center

Intelligent Media Services:Script-to-Video FAQ

Last Updated:Dec 09, 2025

This topic addresses common questions about the Script-to-Video feature.

Processing logic

How does the Global Scripts mode work?

  • If the input videos are long clips, they are first split into shorter segments. During composition, these resulting segments are assembled to create the final video. The duration of each segment after splitting is determined by the SingleShotDuration parameter.

  • If global voice-over scripts are provided, the system tries not to repeat, assuming there are enough scripts available. For example, if you provide 3 scripts to generate 5 videos, some scripts will be reused, but all 3 scripts will be used at least once.

  • The system follows the group order, randomly selecting one segment from each group and concatenating them. If a global voice-over is provided and the total video duration is less than the voice-over duration, the system will prioritize randomly selecting additional video segments from non-start/end groups until the video duration matches the voice-over duration

  • The rules for the final video's duration are as follows:

    • If a global voice-over script is provided, the final duration will equal the duration of the voice-over.

    • If no global voice-over script is provided:

      • If FixedDuration is set, the final duration equals the value of FixedDuration.

      • If MaxDuration is set, the final duration equals SingleShotDuration multiplied by the number of groups (the length of SpeechTextArray), but it does not exceed the value of MaxDuration.

How does the Segmented Scripts mode work?

  • Similarly, long videos are first split into segments. However, in this mode, the voice-over scripts are configured within each MediaGroup using the SpeechTextArray field. Do not set InputConfig.SpeechText or a global SpeechTextArray, as doing so will cause an error.

  • If MediaGroup.SpeechTextArray is empty, it means this media group has no voice-over. The duration of this group is controlled by MediaGroup.Duration, which defaults to 5 seconds.

  • To control the audio volume of assets in a specific MediaGroup, you can set MediaGroup.Volume. This parameter has a higher priority than EditingConfig.MediaConfig.Volume.

  • Each group can accept multiple voice-over scripts. If all groups (excluding those with no voice-over) have the same number of voice-over scripts, they will be combined in order (e.g., all groups will use their Nth script). If the number of scripts differs between groups, a random segment will be selected from each group to form the complete voice-over script.

Result optimization

How can I fix issues with jarring or frequent scene changes?

You can use the shot analysis feature of Smart Tagging to automatically split your media assets into meaningful shots. This can solve problems with jarring transitions and overly fast pacing. The following steps outline the process for a single video asset; repeat these steps for multiple assets.

  1. For a specific video asset, call SubmitSmarttagJob. For this scenario, you must set TemplateId to the fixed value "S00000103-000003".

  2. Query the shot analysis results by calling QuerySmarttagJobI. Find the data parameter in the response where type=ClipSplit. The content will be a JSON array similar to the example below. StartTime and EndTime indicate the start and end times of each shot, in seconds.

    [
      {
        "EndTime": 5.4,
        "ClipType": "opening",
        "StartTime": 0.0
      },
      {
        "EndTime": 9.16,
        "ClipType": "opening",
        "StartTime": 5.4
      },
      {
        "EndTime": 12.88,
        "ClipType": "opening",
        "StartTime": 9.16
      },
      {
        "EndTime": 16.0,
        "ClipType": "opening",
        "StartTime": 12.88
      }
    ]
    
  3. Map the shot information from Step 2 to the corresponding media asset in your Script-to-Video job by passing it to the EditingConfig.MediaConfig.MediaMetaDataArray.TimeRangeList parameter:

    {
      "MediaConfig": {
        "MediaMetaDataArray": [
          {
            "Media": "https://******.****.****/public-template/video/movie_apsara_4.mp4",
            "GroupName": "opening",
            "TimeRangeList": [
              {
                "In": 0,
                "Out": 5.4
              },
              {
                "In": 5.4,
                "Out": 9.16
              },
              {
                "In": 9.16,
                "Out": 12.88
              },
              {
                "In": 12.88,
                "Out": 16.0
              }
            ]
          }
        ]
      }
    }
  4. Set the MediaGroup.SplitMode parameter of the corresponding MediaGroup to NoSplit. This prevents the system from re-splitting the pre-segmented clips, ensuring shot integrity and smoother transitions.

How can I control the pacing of scene changes and configure shot durations?

  • Use ImageDuration to control the display duration of images in the output video.

  • Set the SingleShotDuration parameter to control the duration of segments after video assets are split.

How is the display duration of an image calculated in the final video?

If a group contains both image and video assets, an image may be selected. The display duration for the image is determined by the following priority:

How can I ensure a video clip plays in its entirety in the final video?

  • To ensure a video clip within a group plays completely, set the group's MediaGroup.SplitMode to NoSplit and set MediaGroup.Duration to the exact duration of the source video clip. For example, to play a 20-second clip in its entirety, set MediaGroup.SplitMode to NoSplit and MediaGroup.Duration to 20.

  • Note: If the duration of the selected asset is shorter than the specified MediaGroup.Duration, the asset's playback speed will be adjusted to fit. For example, if MediaGroup.Duration is set to 10 seconds but a 20-second video is selected, the video will be played at 2x speed.

How can I alternate between video clips with original audio and clips with voice-over narration?

  • To play some clips with their original audio while others are muted and accompanied by a voice-over, use MediaGroup.Volume to control the volume of assets within a group. To use the original audio, set the value to 1.

  • Assume that you have three groups: MediaGroup1, MediaGroup2, and MediaGroup3. If you want MediaGroup1 and MediaGroup3 to have voice-overs, while MediaGroup2 plays a video with its original audio, you can configure your MediaGroupArray as follows:

[{
"GroupName": "MediaGroup1",
"MediaArray": ["https://ice-*****-test.oss-cn-*****.aliyuncs.com/0-test-batch-editing-materials/160134%2B9859695-2032aa5c-2803-47cd-bf65-8a40d66598db.png", "https://ice-auto-test.oss-cn-shanghai.aliyuncs.com/0-test-batch-editing-materials/cloud.mp4"],
"SpeechTextArray": ["Grand opening! A Freshippo store opens today at the nearby mall.", "A Freshippo store opens today at the nearby mall", "<speak>The battle is <phoneme alphabet=\"py\" ph=\"zheng4 hao3\">fierce</phoneme>. The table tennis legend Ma Long, is charging towards the pinnacle of glory. In the quarter-finals against Togami Shunsuke, Ma Long shows no fear, giving his all in every rally. His precise ball placement and calm judgment give him the upper hand in this match. In the end, Ma Long defeats his opponent and advances to the semi-finals.</speak>"]
},
{
"GroupName": "MediaGroup2",
"MediaArray": ["https://ice-*****-test.oss-cn-*****.aliyuncs.com/0-test-batch-editing-materials/normal%20video.mp4", "https://ice-auto-test.oss-cn-shanghai.aliyuncs.com/0-test-batch-editing-materials/3.jpeg"],
"Duration": 5,
"SplitMode": "NoSplit",
"Volume": 1
},
{
"GroupName": "MediaGroup3",
"MediaArray": ["https://ice-*****-test.oss-cn-*****.aliyuncs.com/0-test-batch-editing-materials/young_sunset_walk.mp4"],
"SpeechTextArray": ["Come and take a look", "Come and take a look now"]
}]