This topic addresses common questions about the Script-to-Video feature.
Processing logic
How does the Global Scripts mode work?
If the input videos are long clips, they are first split into shorter segments. During composition, these resulting segments are assembled to create the final video. The duration of each segment after splitting is determined by the SingleShotDuration parameter.
If global voice-over scripts are provided, the system tries not to repeat, assuming there are enough scripts available. For example, if you provide 3 scripts to generate 5 videos, some scripts will be reused, but all 3 scripts will be used at least once.
The system follows the group order, randomly selecting one segment from each group and concatenating them. If a global voice-over is provided and the total video duration is less than the voice-over duration, the system will prioritize randomly selecting additional video segments from non-start/end groups until the video duration matches the voice-over duration
The rules for the final video's duration are as follows:
If a global voice-over script is provided, the final duration will equal the duration of the voice-over.
If no global voice-over script is provided:
If FixedDuration is set, the final duration equals the value of FixedDuration.
If MaxDuration is set, the final duration equals
SingleShotDurationmultiplied by the number of groups (the length ofSpeechTextArray), but it does not exceed the value of MaxDuration.
How does the Segmented Scripts mode work?
Similarly, long videos are first split into segments. However, in this mode, the voice-over scripts are configured within each MediaGroup using the
SpeechTextArrayfield. Do not setInputConfig.SpeechTextor a globalSpeechTextArray, as doing so will cause an error.If
MediaGroup.SpeechTextArrayis empty, it means this media group has no voice-over. The duration of this group is controlled byMediaGroup.Duration, which defaults to 5 seconds.To control the audio volume of assets in a specific MediaGroup, you can set MediaGroup.Volume. This parameter has a higher priority than EditingConfig.MediaConfig.Volume.
Each group can accept multiple voice-over scripts. If all groups (excluding those with no voice-over) have the same number of voice-over scripts, they will be combined in order (e.g., all groups will use their Nth script). If the number of scripts differs between groups, a random segment will be selected from each group to form the complete voice-over script.
Result optimization
How can I fix issues with jarring or frequent scene changes?
You can use the shot analysis feature of Smart Tagging to automatically split your media assets into meaningful shots. This can solve problems with jarring transitions and overly fast pacing. The following steps outline the process for a single video asset; repeat these steps for multiple assets.
For a specific video asset, call SubmitSmarttagJob. For this scenario, you must set
TemplateIdto the fixed value "S00000103-000003".Query the shot analysis results by calling QuerySmarttagJobI. Find the
dataparameter in the response wheretype=ClipSplit. The content will be a JSON array similar to the example below.StartTimeandEndTimeindicate the start and end times of each shot, in seconds.[ { "EndTime": 5.4, "ClipType": "opening", "StartTime": 0.0 }, { "EndTime": 9.16, "ClipType": "opening", "StartTime": 5.4 }, { "EndTime": 12.88, "ClipType": "opening", "StartTime": 9.16 }, { "EndTime": 16.0, "ClipType": "opening", "StartTime": 12.88 } ]Map the shot information from Step 2 to the corresponding media asset in your Script-to-Video job by passing it to the EditingConfig.MediaConfig.MediaMetaDataArray.TimeRangeList parameter:
{ "MediaConfig": { "MediaMetaDataArray": [ { "Media": "https://******.****.****/public-template/video/movie_apsara_4.mp4", "GroupName": "opening", "TimeRangeList": [ { "In": 0, "Out": 5.4 }, { "In": 5.4, "Out": 9.16 }, { "In": 9.16, "Out": 12.88 }, { "In": 12.88, "Out": 16.0 } ] } ] } }Set the MediaGroup.SplitMode parameter of the corresponding
MediaGrouptoNoSplit. This prevents the system from re-splitting the pre-segmented clips, ensuring shot integrity and smoother transitions.
How can I control the pacing of scene changes and configure shot durations?
Use ImageDuration to control the display duration of images in the output video.
Set the SingleShotDuration parameter to control the duration of segments after video assets are split.
How is the display duration of an image calculated in the final video?
If a group contains both image and video assets, an image may be selected. The display duration for the image is determined by the following priority:
If ImageDuration is set, its value will be used.
If there is a voice-over, the image duration = Total voice-over duration / Number of items in the MediaGroupArray.
If FixedDuration is set but no voice-over is provided, the image duration = FixedDuration / Number of items in the MediaGroupArray.
How can I ensure a video clip plays in its entirety in the final video?
To ensure a video clip within a group plays completely, set the group's MediaGroup.SplitMode to
NoSplitand setMediaGroup.Durationto the exact duration of the source video clip. For example, to play a 20-second clip in its entirety, setMediaGroup.SplitModetoNoSplitandMediaGroup.Durationto 20.Note: If the duration of the selected asset is shorter than the specified
MediaGroup.Duration, the asset's playback speed will be adjusted to fit. For example, ifMediaGroup.Durationis set to 10 seconds but a 20-second video is selected, the video will be played at 2x speed.
How can I alternate between video clips with original audio and clips with voice-over narration?
To play some clips with their original audio while others are muted and accompanied by a voice-over, use MediaGroup.Volume to control the volume of assets within a group. To use the original audio, set the value to 1.
Assume that you have three groups: MediaGroup1, MediaGroup2, and MediaGroup3. If you want MediaGroup1 and MediaGroup3 to have voice-overs, while MediaGroup2 plays a video with its original audio, you can configure your
MediaGroupArrayas follows:
[{
"GroupName": "MediaGroup1",
"MediaArray": ["https://ice-*****-test.oss-cn-*****.aliyuncs.com/0-test-batch-editing-materials/160134%2B9859695-2032aa5c-2803-47cd-bf65-8a40d66598db.png", "https://ice-auto-test.oss-cn-shanghai.aliyuncs.com/0-test-batch-editing-materials/cloud.mp4"],
"SpeechTextArray": ["Grand opening! A Freshippo store opens today at the nearby mall.", "A Freshippo store opens today at the nearby mall", "<speak>The battle is <phoneme alphabet=\"py\" ph=\"zheng4 hao3\">fierce</phoneme>. The table tennis legend Ma Long, is charging towards the pinnacle of glory. In the quarter-finals against Togami Shunsuke, Ma Long shows no fear, giving his all in every rally. His precise ball placement and calm judgment give him the upper hand in this match. In the end, Ma Long defeats his opponent and advances to the semi-finals.</speak>"]
},
{
"GroupName": "MediaGroup2",
"MediaArray": ["https://ice-*****-test.oss-cn-*****.aliyuncs.com/0-test-batch-editing-materials/normal%20video.mp4", "https://ice-auto-test.oss-cn-shanghai.aliyuncs.com/0-test-batch-editing-materials/3.jpeg"],
"Duration": 5,
"SplitMode": "NoSplit",
"Volume": 1
},
{
"GroupName": "MediaGroup3",
"MediaArray": ["https://ice-*****-test.oss-cn-*****.aliyuncs.com/0-test-batch-editing-materials/young_sunset_walk.mp4"],
"SpeechTextArray": ["Come and take a look", "Come and take a look now"]
}]