All Products
Search
Document Center

Intelligent Media Services:Video translation parameters

Last Updated:May 30, 2025

This topic outlines the parameters for submitting video translation tasks via SubmitVideoTranslationJob and for retrieving task results via GetSmartHandleJob.

Feature availability

  • Subtitle translation: China (Shanghai), China (Beijing), China (Shenzhen), China (Hangzhou), Singapore, US (Silicon Valley)

  • Speech translation: China (Shanghai), China (Beijing), China (Shenzhen), China (Hangzhou), Singapore, US (Silicon Valley)

  • Lip-sync translation: China (Shanghai), Singapore

SubmitVideoTranslationJob

Important

To distinguish between subtitle, speech, and lip-sync translation:

Parameter

Type

Required

Description

InputConfig

String

Yes

The input configurations.

Enter a JSON string.

OutputConfig

String

Yes

The output configurations.

Enter a JSON string.

EditingConfig

String

Yes

The translation configurations.

Enter a JSON string.

Title

String

No

The task name.

Description

String

No

The task description.

UserData

String

No

The custom information.

Enter a JSON string. The maximum length is 512 characters.

You can configure a callback URL in this parameter.

InputConfig

Parameter

Type

Required

Description

Type

String

Yes

The input type. Valid values:

  • Video

  • Audio

  • Subtitle

Note

Video

String

No

The ID or URL of the video file.

This parameter is required when Type is set to Video.

Valid values:

  • A media asset ID

  • An OSS URL in the current Alibaba Cloud account

  • A URL that can be accessed over the Internet

Supported formats:

  • MP4

  • WebM

  • MOV

  • M3U8

Audio

String

No

The ID or URL of the audio file.

This parameter is required when Type is set to Audio.

Valid values:

  • A media asset ID

  • An OSS URL in the current Alibaba Cloud account

  • A URL that can be accessed over the Internet

Supported formats:

  • MP3

  • WAV

Subtitle

String

No

The ID or URL of the subtitle file.

This parameter is required when Type is set to Subtitle.

Valid values:

  • A media asset ID

  • An OSS URL in the current Alibaba Cloud account

  • A URL that can be accessed over the Internet

Supported format: SRT

OutputConfig

Parameter

Type

Required

Description

OutputTarget

String

No

The output type. Default value: OSS.

Valid values:

  • OSS: saves the output to OSS.

  • VOD: saves the output to ApsaraVideo VOD.

MediaURL

String

Yes

Enter an OSS URL in the current Alibaba Cloud account.

The path must include the extension.

Supported extensions:

  • If InputConfig.Type=Video, mp4 is supported.

  • If InputConfig.Type=Audio, wav is supported.

  • If InputConfig.Type=Subtitle, srt is supported.

StorageLocation

String

No

This parameter is required when OutputTarget is set to VOD.

Do not include the protocol prefixes, such as http:// or https://.

Example: outin-*****c7d2a3811eb83da00163e0*****.oss-cn-shanghai.aliyuncs.com

FileName

String

No

This parameter is required when OutputTarget is set to VOD.

The file name must include the extension.

Supported extensions: mp4, wav, and srt.

Width

Integer

No

The width of the output video. Unit: pixels.

By default, the specification of the source video is used.

Height

Integer

No

The height of the output video. Unit: pixels.

By default, the specification of the source video is used.

Video

JSONObject

No

The configurations of the output video, such as CRF settings and codec.

Example: {"Crf": 27}

EditingConfig

Parameter

Type

Required

Description

SourceLanguage

String

Yes

The code of the source language, such as zh.

For valid values, see SourceLanguage.

TargetLanguage

String

Yes

The code of the target language, such as en. You can specify multiple languages and seperate them with commas (,). Example: en,ja,id.

For valid values, see TargetLanguage.

DetextArea

String

No

The subtitle erasure configurations. If it is not specified, source subtitles are retained.

  • Valid values:

    • Auto: automatically identifies the subtitle erea for erasure.

    • [x, y, width, height]: specifies the area where the subtitles are erased. The value is a two-layer array. You can specify multiple areas.

      • x: the ratio of the horizontal distance from the upper-left corner of the subtitle frame to the upper-left corner of the video. Valid values: [0,1]

      • y: the ratio of the vertical distance from the upper-left corner of the subtitle frame to the upper-left corner of the video. Valid values: [0,1]

      • width: the ratio of the width of the subtitle frame to the width of the video. Valid values: [0,1]

      • height: the ratio of the height of the subtitle frame to the height of the video. Valid values: [0,1]

SupportEditing

Boolean

No

Specifies whether to enable post-editing to correct the translation.

Valid values:

  • true

  • false

Default value: false.

BilingualSubtitle

Boolean

No

Specifies whether to generate bilingual subtitles in the output video. Bilingual subtitles are supported by subtitle, speech, and lip-sync translation.

  • Valid values:

    • true

    • false

Default value: false

Note

If you want an output video without displaying subtitles, set SubtitleTrackClip.FontSize to 0.

SubtitleTranslate

String

No

The subtitle translation configurations.

This parameter is required to perform subtitle translation.

Enter a JSON string.

NeedSpeechTranslate

Boolean

No

Specifies whether to perform speech translation.

Valid values:

  • true

  • false

Default value: false

SpeechTranslate

String

No

The speech translation configurations.

If NeedSpeechTranslate is set to true or this parameter is specified, speech translation is performed.

Enter a JSON string.

NeedFaceTranslate

Boolean

No

Specifies whether to perform lip-sync translation.

Valid values:

  • true

  • false

Default value: false

FaceTranslate

String

No

The lip-sync translation configurations.

If NeedFaceTranslate is set to true or this parameter is specified, lip-sync translation is performed.

Enter a JSON string.

TextSource

String

No

Specifies the source of subtitles.

Valid values:

  • ASR: recognizes subtitles through Automatic Speech Recognition (ASR).

  • OCR: recognizes subtitles through Optical Character Recognition (OCR). You can set the recognition area through OcrArea. When the translation type is subtitle translation, the default value is OCR.

  • OCR_ASR: The system prioritizes OCR. If OCR fails, ASR is used. You can set the recognition area through OcrArea. When the translation type is speech or lip-sync translation, the default value is OCR_ASR.

  • ALL: The system prioritizes ASR. If ASR fails, OCR is used. This option is available only for subtitle translation.

If you want to use an external SRT file, specify InputConfig.Subtitle.

If both InputConfig.Subtitle and TextSource are specified, the former takes precedence.

CustomParameter

String

No

The custom parameters that control the video translation effects. For valid values, see CustomParameter.

Valid values of SourceLanguage

Subtitle translation

Speech translation

Lip-sync translation

TextSource=OCR or OCR_ASR

TextSource=ASR

External SRT source

  • zh: Chinese

  • en: English

  • zh: Chinese

  • en: English

  • fr: French

  • tr: Turkish

  • zh: Chinese

  • en: English

  • ja: Japanese

  • ko: Korean

  • yue: Cantonese

  • de: German

  • fr: French

  • es: Spanish

  • ar: Arabic

  • it: Italian

  • az: Azerbaijani

  • be: Belarusian

  • bg: Bulgarian

  • bs: Bosnian

  • bn: Bengali

  • cs: Czech

  • cy: Welsh

  • da: Danish

  • et: Estonian

  • fa: Persian

  • hi: Hindi

  • hbs: Croatian

  • hu: Hungarian

  • id: Indonesian

  • is: Icelandic

  • lt: Lithuanian

  • lv: Latvian

  • mi: Maori

  • mn: Mongolian

  • mr: Marathi

  • ms: Malay

  • mt: Maltese

  • ne: Nepali

  • nl: Dutch

  • no: Norwegian

  • pl: Polish

  • pt: Portuguese

  • ro: Romanian

  • ru: Russian

  • sk: Slovak

  • sl: Slovenian

  • sq: Albanian

  • zh: Chinese

  • en: English

  • ja: Japanese

  • ko: Korean

  • yue: Cantonese

  • de: German

  • fr: French

  • es: Spanish

  • ar: Arabic

  • it: Italian

  • az: Azerbaijani

  • be: Belarusian

  • bg: Bulgarian

  • bs: Bosnian

  • bn: Bengali

  • cs: Czech

  • cy: Welsh

  • da: Danish

  • et: Estonian

  • fa: Persian

  • hi: Hindi

  • hbs: Croatian

  • hu: Hungarian

  • id: Indonesian

  • is: Icelandic

  • lt: Lithuanian

  • lv: Latvian

  • mi: Maori

  • mn: Mongolian

  • mr: Marathi

  • ms: Malay

  • mt: Maltese

  • ne: Nepali

  • nl: Dutch

  • no: Norwegian

  • pl: Polish

  • pt: Portuguese

  • ro: Romanian

  • ru: Russian

  • sk: Slovak

  • sl: Slovenian

  • sq: Albanian

  • zh: Chinese

  • en: English

  • ja: Japanese

  • ko: Korean

  • yue: Cantonese

  • de: German

  • fr: French

  • es: Spanish

  • ar: Arabic

  • it: Italian

  • az: Azerbaijani

  • be: Belarusian

  • bg: Bulgarian

  • bs: Bosnian

  • bn: Bengali

  • cs: Czech

  • cy: Welsh

  • da: Danish

  • et: Estonian

  • fa: Persian

  • hi: Hindi

  • hbs: Croatian

  • hu: Hungarian

  • id: Indonesian

  • is: Icelandic

  • lt: Lithuanian

  • lv: Latvian

  • mi: Maori

  • mn: Mongolian

  • mr: Marathi

  • ms: Malay

  • mt: Maltese

  • ne: Nepali

  • nl: Dutch

  • no: Norwegian

  • pl: Polish

  • pt: Portuguese

  • ro: Romanian

  • ru: Russian

  • sk: Slovak

  • sl: Slovenian

  • sq: Albanian

Valid values of TargetLanguage

Subtitle translation

Speech translation

Lip-sync translation

  • zh: Chinese

  • zh-tw: Traditional Chinese

  • en: English

  • ja: Japanese

  • id: Indonesian

  • es: Spanish

  • pt: Portuguese

  • ar: Arabic

  • fr: French

  • tr: Turkish

  • yue: Cantonese

  • de: German

  • ko: Korean

  • ru: Russian

  • th: Thai

  • vi: Vietnamese

  • ms: Malay

Chinese dialects:

  • sichuan: Sichuan dialect

  • dongbei: Northeastern dialect

  • henan: Henan dialect

  • shanghai: Shanghai dialect

  • tianjin: Tianjin dialect

    beijing: Beijing dialect

  • chongqing: Chongqing dialect

  • hunan: Hunan dialect

  • taiwan: Hakka Chinese

  • shanxi: Shanxi dialect

  • shaanxi: Shaanxi dialect

  • zh: Chinese

  • zh-tw: Traditional Chinese

  • en: English

  • ja: Japanese

  • ko: Korean

  • yue: Cantonese

  • de: German

  • fr: French

  • es: Spanish

  • ar: Arabic

  • tr: Turkish

  • ru: Russian

  • pt: Portuguese

  • vi: Vietnamese

  • ms: Malay

  • th: Thai

  • id: Indonesian

Chinese dialects:

  • sichuan: Sichuan dialect

  • tianjin: Tianjin dialect

  • en: English

SubtitleTranslate

Parameter

Type

Required

Description

OcrArea

String

No

The OCR area. If this parameter is not specified, the subtitle area is automatically identified.

Valid values:

  • Auto: automatically identifies the subtitle area.

  • [x, y, width, height]: specifies the subtitle area in a single-layer array. You can specify only one area.

    • x: the ratio of the horizontal distance from the upper-left corner of the subtitle frame to the upper-left corner of the video. Valid values: [0,1]

    • y: the ratio of the vertical distance from the upper-left corner of the subtitle frame to the upper-left corner of the video. Valid values: [0,1]

    • width: the ratio of the width of the subtitle frame to the width of the video. Valid values: [0,1]

    • height: the ratio of the height of the subtitle frame to the height of the video. Valid values: [0,1]

SubtitleConfig

String

No

The output subtitle effects. The parameters are consistent with SubtitleTrackClip in Timeline.

Enter a JSON string.

If you want an output video without displaying subtitles, set SubtitleTrackClip.FontSize to 0.

SpeechTranslate

Parameter

Type

Required

Description

OcrArea

String

No

The OCR area. If this parameter is not specified, the subtitle area is automatically identified.

Valid values:

  • Auto: automatically identifies the subtitle area.

  • [x, y, width, height]: specifies the subtitle area in a single-layer array. You can specify only one area.

    • x: the ratio of the horizontal distance from the upper-left corner of the subtitle frame to the upper-left corner of the video. Valid values: [0,1]

    • y: the ratio of the vertical distance from the upper-left corner of the subtitle frame to the upper-left corner of the video. Valid values: [0,1]

    • width: the ratio of the width of the subtitle frame to the width of the video. Valid values: [0,1]

    • height: the ratio of the height of the subtitle frame to the height of the video. Valid values: [0,1]

CustomSrtType

String

No

The input subtitle type. This parameter is required if InputConfig.Subtitle is specified.

Valid values:

  • SourceSrt: monolingual subtitles in the source language

  • TargetSrt: monolingual subtitles in the target language

  • BilingualSrtSrcFirst: Bilingual subtitles (target below source)

  • BilingualSrtTgtFirst: Bilingual subtitles (source below target)

Examples for Chinese-to-English translation:

  • SourceSrt

    1
    00:00:01,000 --> 00:00:05,000
    你好,世界!
  • TargetSrt

    1
    00:00:01,000 --> 00:00:05,000
    Hello,World!
  • BilingualSrtSrcFirst

    1
    00:00:01,000 --> 00:00:05,000
    你好,世界!
    Hello,World!
  • BilingualSrtTgtFirst

    1
    00:00:01,000 --> 00:00:05,000
    Hello,World!
    你好,世界!

SubtitleTimeForce

Boolean

No

Specifies whether to translate only audio within the subtitle range. Default value: false.

This field is valid only when TextSource is set to OCR or OCR_ASR.

SubtitleConfig

String

No

The output subtitle effects. The parameters are consistent with SubtitleTrackClip in Timeline.

Enter a JSON string.

If you want an output video without displaying subtitles, set SubtitleTrackClip.FontSize to 0.

OriginalJobId

String

No

The ID of the initial task.

To modify the translation result, specify the modified subtitle file in InputConfig.Subtitle, and specify the initial task ID in this field to regenerate the output video.

  • The initial task ID is the returned value of SpeechTranslationJobId (when SupportEditing is set to true).

  • The initial translation result can be retrieved from returned SpeechTranslatedSubtitleURL (when SupportEditing is set to true). Modify the SRT file and pass the URL to InputConfig.Subtitle.

FaceTranslate

Parameter

Type

Required

Description

OcrArea

String

No

The OCR area. If this parameter is not specified, the subtitle area is automatically identified.

Valid values:

  • Auto: automatically identifies the subtitle area.

  • [x, y, width, height]: specifies the subtitle area in a single-layer array. You can specify only one area.

    • x: the ratio of the horizontal distance from the upper-left corner of the subtitle frame to the upper-left corner of the video. Valid values: [0,1]

    • y: the ratio of the vertical distance from the upper-left corner of the subtitle frame to the upper-left corner of the video. Valid values: [0,1]

    • width: the ratio of the width of the subtitle frame to the width of the video. Valid values: [0,1]

    • height: the ratio of the height of the subtitle frame to the height of the video. Valid values: [0,1]

CustomSrtType

String

No

The input subtitle type. This parameter is required if InputConfig.Subtitle is specified.

Valid values:

  • SourceSrt: monolingual subtitles in the source language

  • TargetSrt: monolingual subtitles in the target language

  • BilingualSrtSrcFirst: Bilingual subtitles (target below source)

  • BilingualSrtTgtFirst: Bilingual subtitles (source below target)

SubtitleTimeForce

Boolean

No

Specifies whether to translate only audio within the subtitle range. Default value: false.

This field is valid only when TextSource is set to OCR or OCR_ASR.

SubtitleConfig

String

No

The output subtitle effects. The parameters are consistent with SubtitleTrackClip in Timeline.

Enter a JSON string.

If you want an output video without displaying subtitles, set SubtitleTrackClip.FontSize to 0.

SpeechDurationThres

Float

No

The lip-sync translation threshold. Lip sync will not be performed for segments shorter than the threshold. Default value: 1 second. It cannot exceed the total duration of the source video.

FacialClarity

Float

No

The face clarity. Valid values: 0 to 1. Default value: 1, indicating the highest clarity. We recommend setting a smaller value for low-definition source videos.

CustomParameter

Valid value

Description

--add -dl

Specify this value when the input audio contains multiple languages. For example, in a Korean-to-French translation task, the input may include English segments. You can use this parameter to translate the English segments into French.

However, for Chinese-to-English tasks, when the input is a mix of Chinese and English, we recommend not adding this value, because the English input will be directly used in the output.

Sample code

Subtitle translation

Subtitle file

Important

If the source file is an SRT file, you are charged based on text translation.

{
  "InputConfig": {
    "Type": "Subtitle",
    "Subtitle": "https://******.oss-cn-shanghai.aliyuncs.com/ice-generated/4e1021a0720f71eeb755f6f7d6496302/snapshots/sprite/test.srt"
  },
  "EditingConfig": {
    "SourceLanguage": "zh",
    "TargetLanguage": "en"
  },
  "Title": "1735798516693.srt",
  "OutputConfig": {
    "MediaURL": "https://****.oss-cn-shanghai.aliyuncs.com/ice-generated/4e1021a0720f71eeb755f6f7d6496302/snapshots/sprite/new.srt"
  }
}

Video file

{
  "InputConfig": {
    "Type": "Video",
    "Video": "1628ae20c36******f6f7c77a6302"
  },
  "EditingConfig": {
    "SourceLanguage": "zh",
    "TargetLanguage": "en",
    "DetextArea": [
      [0, 0.64, 1, 0.13]],
    "BilingualSubtitle": false,
    "SubtitleTranslate": {
      "OcrArea": [0, 0.64, 1, 0.15],
      "SubtitleConfig": {
        "Type": "Text",
        "FontSize": 95,
        "FontColorOpacity": 1,
        "Color": "#ffffff",
        "X": 0.5,
        "Y": 0.686,
        "Angle": 0,
        "Spacing": 0,
        "TextWidth": 0.9,
        "Font": "Alibaba PuHuiTi",
        "FontColor": "#ffffff",
        "FontFace": {
          "Bold": false,
          "Italic": false,
          "Underline": false
        },
        "SizeRequestType": "RealDim",
        "SubtitleEffects": [],
        "LineSpacing": 0,
        "BorderStyle": 1,
        "Outline": 0,
        "Alignment": "Center"
      }
    },
    "SupportEditing": true,
    "NeedSpeechTranslate": false
  },
  "Title": "have a test",
  "OutputConfig": {
    "MediaURL": "https://*****.oss-cn-shanghai.aliyuncs.com/ice-generated/test.mp4"
  }
}

Video and subtitle files

{
  "InputConfig": {
    "Type": "Video",
    "Video": "4e92fa60c995*****6f7c77a6302",
    "Subtitle": "https://*****.oss-cn-shanghai.aliyuncs.com/ice-generated/4e1021a072****5f6f7d6496302/snapshots/sprite/test.srt"
  },
  "EditingConfig": {
    "SourceLanguage": "zh",
    "TargetLanguage": "en",
    "DetextArea": [[0, 0.64, 1, 0.14]],
    "BilingualSubtitle": false,
    "SubtitleTranslate": {
      "OcrArea": "Auto",
      "SubtitleConfig": {
        "Type": "Text",
        "FontSize": 95,
        "FontColorOpacity": 1,
        "Color": "#ffffff",
        "X": 0.5,
        "Y": 0.686,
        "Angle": 0,
        "Spacing": 0,
        "TextWidth": 0.9,
        "Font": "Alibaba PuHuiTi",
        "FontColor": "#ffffff",
        "FontFace": {
          "Bold": false,
          "Italic": false,
          "Underline": false
        },
        "SizeRequestType": "RealDim",
        "SubtitleEffects": [],
        "LineSpacing": 0,
        "BorderStyle": 1,
        "Outline": 0,
        "Alignment": "Center"
      }
    },
    "SupportEditing": true,
    "NeedSpeechTranslate": false
  },
  "Title": "1735898570421.mp4",
  "OutputConfig": {
    "MediaURL": "https://****.oss-cn-shanghai.aliyuncs.com/ice-generated/******/snapshots/sprite/1735898570421.mp4"
  }
}

Speech translation

Video file

{
  "InputConfig": {
    "Type": "Video",
    "Video": "1628ae20c36******8f6f7c77a6302"
  },
  "EditingConfig": {
    "SourceLanguage": "zh",
    "TargetLanguage": "en",
    "DetextArea": [[0, 0.64, 1, 0.15]],
    "SupportEditing": true,
    "BilingualSubtitle": false,
    "NeedSpeechTranslate": true,
    "SpeechTranslate": {
      "SubtitleTimeForce": false,
      "SubtitleConfig": {
        "Type": "Text",
        "FontSize": 95,
        "FontColorOpacity": 1,
        "Color": "#ffffff",
        "X": 0.5,
        "Y": 0.686,
        "Angle": 0,
        "Spacing": 0,
        "TextWidth": 0.9,
        "Font": "Alibaba PuHuiTi",
        "FontColor": "#ffffff",
        "FontFace": {
          "Bold": false,
          "Italic": false,
          "Underline": false
        },
        "SizeRequestType": "RealDim",
        "SubtitleEffects": [],
        "LineSpacing": 0,
        "BorderStyle": 1,
        "Outline": 0,
        "Alignment": "Center"
      }
    }
  },
  "Title": "have a test",
  "OutputConfig": {
    "MediaURL": "https://******.oss-cn-shanghai.aliyuncs.com/ice-generated/4e1021a0720f71eeb755f6f7d6496302/snapshots/sprite/1735798757385.mp4"
  }
}

Video and subtitle files

{
  "InputConfig": {
    "Type": "Video",
    "Video": "738d94a0ce87******af6f7c6696302",
    "Subtitle": "https://********.oss-cn-test.aliyuncs.com/test.srt"
  },
  "EditingConfig": {
    "SourceLanguage": "zh",
    "TargetLanguage": "en",
    "DetextArea": "Auto",
    "SupportEditing": true,
    "BilingualSubtitle": false,
    "NeedSpeechTranslate": true,
    "SpeechTranslate": {
      "SubtitleTimeForce": false,
      "SubtitleConfig": {
        "Type": "Text",
        "FontSize": 95,
        "FontColorOpacity": 1,
        "Color": "#ffffff",
        "X": 0.5,
        "Y": 0.686,
        "Angle": 0,
        "Spacing": 0,
        "TextWidth": 0.9,
        "Font": "Alibaba PuHuiTi",
        "FontColor": "#ffffff",
        "FontFace": {
          "Bold": false,
          "Italic": false,
          "Underline": false
        },
        "SizeRequestType": "RealDim",
        "SubtitleEffects": [],
        "LineSpacing": 0,
        "BorderStyle": 1,
        "Outline": 0,
        "Alignment": "Center"
      },
      "OcrArea": "Auto",
      "CustomSrtType": "SourceSrt"
    }
  },
  "Title": "1736485935837.mp4",
  "OutputConfig": {
    "MediaURL": "https://*****.oss-cn-***.aliyuncs.com/test.mp4"
  }
}

Audio file

{
  "InputConfig": {
    "Type": "Audio",
    "Audio": "2f552010c8d******e7f7f4586303"
  },
  "EditingConfig": {
    "SourceLanguage": "zh",
    "TargetLanguage": "en",
    "NeedSpeechTranslate": true
  },
  "Title": "have a test",
  "OutputConfig": {
    "MediaURL": "https://******.oss-cn-shanghai.aliyuncs.com/ice-generated/4e1021a0720f***f6f7d6496302/snapshots/sprite/test.wav"
  }
}

Lip-sync translation

{
  "InputConfig": {
    "Type": "Video",
    "Video": "1628ae20c36******8f6f7c77a6302"
  },
  "EditingConfig": {
    "SourceLanguage": "zh",
    "TargetLanguage": "en",
    "DetextArea": [[0, 0.64, 1, 0.15]],
    "SupportEditing": true,
    "BilingualSubtitle": false,
    "NeedFaceTranslate": true,
    "FaceTranslate": {
      "SubtitleConfig": {
        "Type": "Text",
        "FontSize": 95,
        "FontColorOpacity": 1,
        "Color": "#ffffff",
        "X": 0.5,
        "Y": 0.686,
        "Angle": 0,
        "Spacing": 0,
        "TextWidth": 0.9,
        "Font": "Alibaba PuHuiTi",
        "FontColor": "#ffffff",
        "FontFace": {
          "Bold": false,
          "Italic": false,
          "Underline": false
        },
        "SizeRequestType": "RealDim",
        "SubtitleEffects": [],
        "LineSpacing": 0,
        "BorderStyle": 1,
        "Outline": 0,
        "Alignment": "Center"
      },
      "SpeechDurationThres": 1,
      "FacialClarity": 1,
      "SubtitleTimeForce": false
    }
  },
  "Title": "have a test",
  "OutputConfig": {
    "MediaURL": "https://******.oss-cn-shanghai.aliyuncs.com/ice-generated/4e1021a0720f71eeb755f6f7d6496302/snapshots/sprite/1735798757385.mp4"
  }
}

GetSmartHandleJob

AiResult response parameters

Single target language

Category

Parameter

Type

Description

Common

EditingProjectId

String

The ID of the post-editing project.

MediaURL

String

The URL to the output media asset.

MediaId

String

The ID of the output media asset.

DetextVideoURL

String

The URL to the video file in which the subtitles are erased.

DetextVideoMediaId

String

The ID of the video file in which the subtitles are erased.

Subtitle translation

OriginalSubtitleMediaId

String

The ID of the source SRT file.

OriginalSubtitleURL

String

The URL to the source SRT file.

TranslatedSubtitleMediaId

String

The ID of the translated SRT file.

TranslatedSubtitleURL

String

The URL to the translated SRT file.

TranslatedText

String

The translated text.

TranslatedTextArray

String

The array of translated text (multiple subtitle inputs).

Speech translation

SpeechTranslatedSubtitleMediaId

String

The ID of the SRT file for subtitle display in the output video.

SpeechTranslatedSubtitleURL

String

The URL to the SRT file for subtitle display in the output video.

SpeechTranslatedSubtitleURLSigned

String

The signed URL to the SRT file for subtitle display in the output video.

SpeechTranslatedSubtitleMediaIdForFix

String

The ID of the SRT file for translation correction.

SpeechTranslatedSubtitleURLForFix

String

The URL to the SRT file for translation correction.

SpeechBilingualSubtitleMediaId

String

The ID of the bilingual SRT file.

SpeechBilingualSubtitleURL

String

The URL to the bilingual SRT file.

SpeechTranslationJobId

String

The ID of the speech translation task for post-editing.

TranslatedAudioMediaId

String

The ID of the translated audio file.

TranslatedAudioMediaURL

String

The URL to the translated audio file.

Lip-sync translation

FaceTranslationMediaId

String

The ID of the lip-synced video file.

Sample response

{
  "SpeechTranslatedSubtitleURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/test1.srt",
  "MediaId": "****9cb015bb71f0b96ff7f6d449****",
  "TranslatedAudioMediaId": "****df80e79d71efa44ae7f6c449****",
  "SpeechTranslatedSubtitleURLForFix": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/test2.srt",
  "EditingProjectId": "****febec2bf4377b923cf5b1742****",
  "DetextVideoURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/test3.mp4",
  "FaceTranslationMediaId": "****138015bb71f0b96ff7f6d449****",
  "SpeechTranslationJobId": "****0c827a664b4f88b33d8d46e3****"
}

Multiple target languages

Parameter

Type

Description

DetextVideoURL

String

The URL to the subtitle-removed video.

DetextVideoMediaId

String

The ID of the subtitle-removed video.

VideoTranslationAiResultMap

JSON

  • The translation outputs in different target languages.

  • JSON fields:

    • key: the target language code, such as en.

    • value: the translation in the target language.

VideoTranslationAiResultMap

Category

Parameter

Type

Description

Common

MediaURL

String

The URL to the output media asset in a target language.

MediaId

String

The ID of the output media asset in a target language.

Subtitle translation

OriginalSubtitleMediaId

String

The ID of the source SRT file.

OriginalSubtitleURL

String

The URL to the source SRT file.

TranslatedSubtitleMediaId

String

The ID of the translated SRT file.

TranslatedSubtitleURL

String

The URL to the translated SRT file.

TranslatedText

String

The translated text.

TranslatedTextArray

String

The array of translated text (multiple subtitle entries).

Speech translation

SpeechTranslatedSubtitleMediaId

String

The ID of the SRT file for subtitle display in the output video.

SpeechTranslatedSubtitleURL

String

The URL to the SRT file for subtitle display in the output video.

SpeechTranslatedSubtitleURLSigned

String

The signed URL to the SRT file for subtitle display in the output video.

SpeechTranslatedSubtitleMediaIdForFix

String

The ID of the SRT file for translation correction.

SpeechTranslatedSubtitleURLForFix

String

The signed URL to the SRT file for translation correction.

SpeechBilingualSubtitleMediaId

String

The ID of the bilingual output.

SpeechBilingualSubtitleURL

String

The URL to the bilingual output.

SpeechTranslationJobId

String

The ID of the speech translation task for post-editing.

TranslatedAudioMediaId

String

The ID of the translated audio file.

TranslatedAudioMediaURL

String

The URL to the translated audio file.

Lip-sync translation

FaceTranslationMediaId

String

The ID of the lip-synced video file.

Sample response
{
  "VideoTranslationAiResultMap": {
    "en": {
      "SpeechTranslatedSubtitleURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/test1.srt",
      "MediaId": "****4d00087171f0bf81f6f7d44b****",
      "TranslatedAudioMediaId": "****df80e79d71efa44ae7f6c449****",
      "SpeechTranslatedSubtitleURLForFix": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/test2.srt",
      "EditingProjectId": "****983759c045c387146888713d****",
      "SpeechTranslationJobId": "****33cc65164f13aace55a0f0d3****"
    },
    "es": {
      "SpeechTranslatedSubtitleURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/test3.srt",
      "MediaId": "****cec0087171f09033f7f6d449****",
      "TranslatedAudioMediaId": "****a7f0085671f0bf81f6f7d44b****",
      "SpeechTranslatedSubtitleURLForFix": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/test4.srt",
      "EditingProjectId": "****d16f4a874c4898d254d4b68e****",
      "SpeechTranslationJobId": "****f705193c404e9bec19859a11****"
    }
  },
  "DetextVideoURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/test5.mp4"
}

Scenario examples

Speech translation with post-editing

You can manually correct the generated SRT file and resubmit it for revision.

Important

Lip-sync translation currently supports only corrections to the voice output. Post-editing of the lip movements is not available.

Procedure

Submit a speech translation task. If post-editing is required, set SupportEditing to true.

Example:

{
    "InputConfig": {
        "Type": "Video",
        "Video": "*****a0052ff71efbfd4e7e6c66*****"
    },
    "OutputConfig": {
        "MediaURL": "https://your-bucket.oss-cn-shanghai.aliyuncs.com/video.mp4"
    },
    "EditingConfig": {
        "SourceLanguage": "zh",
        "TargetLanguage": "en",
        "SupportEditing": true,
        "NeedSpeechTranslate": true
    }
}

The following response is returned:

{
    "MediaId": "*****d306b6d71efbf98f6f7f55*****",
    "TranslatedAudioMediaId": "*****d306b6d71efbf98f6f7f5*****",
    "SpeechTranslatedSubtitleURL": "http://your-bucket.oss-cn-shanghai.aliyuncs.com/video_subtitle_asr_en.srt",
    "SpeechTranslatedSubtitleURLSigned": "http://your-bucket.oss-cn-shanghai.aliyuncs.com/***.srt",
    "SpeechTranslatedSubtitleURLForFix": "http://your-bucket.oss-cn-shanghai.aliyuncs.com/***.srt",
    "SpeechTranslationJobId": "*****74f329d4c03b63e7f7dac8*****"
}

Where

  • MediaId: the ID of the output media asset.

  • TranslatedAudioMediaId: the ID of the translated audio file.

  • SpeechTranslatedSubtitleURL: the URL to the translated SRT file for subtitle display in the output video.

  • SpeechTranslatedSubtitleURLSigned: the signed URL to the translated SRT file for subtitle display in the output video.

  • SpeechTranslatedSubtitleURLForFix: the signed URL to the translated SRT file for translation correction.

  • SpeechTranslationJobId: the ID of the speech translation task. Enter the returned value for OriginalJobId.

For post-editing, download the SRT file from SpeechTranslatedSubtitleURLForFix and modify the content. Then, resubmit the speech translation task. The system will regenerate the output based on the modified subtitle content.

Note

You can add marks to the SRT file in SpeechTranslatedSubtitleURLForFix to set effects of post-editing. For details, see Post-editing marks.

When you resubmit the task, required parameters include the source video, modified SRT file, ID of the initial task, and subtitle style settings.

Example:

{
  "InputConfig": {
    "Type": "Video",
    "Video": "*****a0052ff71efbfd4e7e6c66*****",
    "Subtitle": "https://your-bucket.oss-cn-shanghai.aliyuncs.com/new_subtitle.srt"
  },
  "OutputConfig": {
    "MediaURL": "https://your-bucket.oss-cn-shanghai.aliyuncs.com/video.mp4"
  },
  "EditingConfig": {
    "SourceLanguage": "zh",
    "TargetLanguage": "en",
    "SupportEditing": true,
    "NeedSpeechTranslate": true,
    "SpeechTranslate": {
      "OriginalJobId": "*****b5d5d604916bb898b3066*****",
      "SubtitleConfig": {
        "Type": "Text",
        "FontSize": 95,
        "FontColorOpacity": 1,
        "Color": "#ffffff",
        "X": 0.5,
        "Y": 0.686,
        "Angle": 0,
        "Spacing": 0,
        "TextWidth": 0.9,
        "Font": "Alibaba PuHuiTi",
        "FontColor": "#ffffff",
        "FontFace": {
          "Bold": false,
          "Italic": false,
          "Underline": false
        },
        "SizeRequestType": "RealDim",
        "SubtitleEffects": [],
        "LineSpacing": 0,
        "BorderStyle": 1,
        "Outline": 0,
        "Alignment": "Center"
      }
    }
  }
}

Post-editing marks

Important

A subtitle line cannot have both the content and the post-editing mark empty (including spaces and tab characters). Otherwise, post-editing may fail. To display an empty line, use a mark to specify how to handle the corresponding audio. For example, if you want to remove a line of text from being displayed without changing the audio, add the <--onlymodifytext> mark to the line.

Mark

Command

Description

<--copy>

Use original audio

Replace the audio within the timestamp range with the original audio from the source video.

<--fsttran>

Use initial translation result

Replace the audio within the timestamp range with the audio corresponding to the initial translation result.

<--mute>

Mute the speaker

Mute the speaker’s voice within the timestamp range.

<--mute_all>

Mute all

Mute all audio within the timestamp range.

<--onlymodifytext>

Modify subtitle text only

Modify displayed subtitle text without changing the corresponding audio.

SRT file example:

1
00:00:01,000 --> 00:00:03,000
大家好,
<--copy>Hello everyone,

2
00:00:03,500 --> 00:00:06,000
欢迎来到今天的讲座。
<--fsttran>welcome to today's lecture.

3
00:00:07,000 --> 00:00:10,000
我们将讨论未来的技术趋势。
<--mute>We'll discuss the future technology trends.

4
00:00:11,000 --> 00:00:14,000
请大家准备好提问。
<--mute_all>Please get ready to ask questions.

5
00:00:15,000 --> 00:00:18,000
现在让我们开始吧。
<--onlymodifytext>Now, let's get started.

Reference

Billing

For billing information, see Billing of video translation.