All Products
Search
Document Center

ApsaraVideo Media Processing:Parameters for intelligent production API operations

Last Updated:Mar 28, 2024

This topic describes the JobParams and Output request parameters of the SubmitIProductionJob operation, and the Job response parameter of the QueryIProductionJob operation.

CaptionExtraction

Parameter

Type

Description

Output

STRING

If the JobParams parameter is configured to separate Chinese and English, {resultType} placeholders are supported in the output file path to specify whether the output caption file is in Chinese or English. zh indicates Chinese and en indicates English.

Parameter description of JobParams

Parameter

Type

Required

Description

fps

INT

No

The sampling frame rate. This parameter is optional. The value is an integer. Valid values: [2,10]. Default value: 5.

roi

LIST

No

The region of interest.

  • If you specify a region of interest, only the text within the region is extracted, and the text outside the region is ignored. By default, if you do not specify this parameter, the text within the bottom quarter of the video image is extracted.

  • Set the value in the following format: [[top, bottom], [left, right]].

  • Default value: N/A.

sep

BOOLEAN

No

Specifies whether to generate separate Chinese and English SRT files. This parameter is optional. Default value: False.

formatter

STRING

No

The format string of the SRT caption. Example: "{\an8}". This parameter is optional. Default value: N/A.

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],"FunctionName":"CaptionExtraction",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success","State":"Succes"
}

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information: {"Code":"Success","Message":"Successful.","Data":"{\"result\":[{\"file\":\"captionextraction/b48d02b58e9b6a0d1c13271bcf9aa6d7-161121379****.srt\"}]}"}.

VideoGreenScreenMatting

Parameter description of JobParams

Parameter

Type

Required

Description

bgimage

STRING

No

The background image for replacement. Example: http://example-image-****.example-location.aliyuncs.com/example/example.jpg.

  • If you specify this parameter, an MP4 video whose background image is replaced is returned.

  • If you do not specify this parameter, a WebM video with alpha channels is returned.

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],"FunctionName":"VideoGreenScreenMatting",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information: {"Code":"Success","Message":"Successful.","Data":"{\"result\":[{\"file\":\"videogreenscreenmatting/16e6bc5ca802e12429d082010164dba3-160275535****_matting.mp4\"}]}"}.

MusicSegmentDetect

Parameter description of JobParams

Parameter

Type

Required

Description

None

None

None

None

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"MusicSegmentDetect",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information: {"Code":"Success","Data":"{\"result\":[{\"start\":39.32,\"end\":63.85,\"title\":\"Chorus\"},{\"start\":86.69,\"end\":114.45,\"title\":\"Chorus\"},{\"start\":135.75,\"end\":160.27,\"title\":\"Chorus\"}]}","Message":"Successful."}.

VideoDetext

Parameter description of JobParams

Parameter

Type

Required

Description

Text

LIST

No

The location of a caption box that you want to remove. A maximum of two caption boxes are supported. Example: [[bx1, by1, bw1, bh1], [bx2, by2, bw2, bh2]].

Note

The location of a caption box must be specified by bx, by, bw, and bh at the same time.

  • bx: The ratio of the normalized x-coordinate of the upper-left corner of the caption box to the video width. Example: 0.1.

  • by: The ratio of the normalized y-coordinate of the upper-left corner of the caption box to the video height. Example: 0.0.

  • bw: The ratio of the normalized width of the caption box to the video width. Example: 0.3.

  • bh: The ratio of the normalized height of the caption box to the video height. Example: 0.2.

LimitRegion

LIST

No

The area in which you want to remove captions. The system detects the captions within the specified area and removes the detected captions. This parameter has a lower priority than the Text parameter that directly specifies the location of caption boxes to be removed. Example: [[0, 0.6, 1, 0.4]]. In this example, the system detects the captions within the bottom 40% of the video image and removes the detected captions.

Callback format

JSON format

{
  "Code":"Success",
  "Details":[], 
  "FunctionName":"VideoDetext",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information: {"Details":[],"Message":"success","Code":"Success"}.

VideoH2V

Parameter description of JobParams

Parameter

Type

Required

Description

None

None

None

None

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"VideoH2V",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information: {"Details":[],"Message":"success","Code":"Success"}.

VideoDelogo

Parameter description of JobParams

Parameter

Type

Required

Description

Logo

STRING

No

The position of a logo that you want to remove. Set the value in the format of [xmin, ymin, width, height]. You can remove up to two logos at a time. Example: [[0, 0, 0.3, 0.3], [0.7, 0, 0.3, 0.3]].

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"VideoDelogo",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information: {"Details":[],"Message":"success","Code":"Success"}.

Cover

Parameter description of JobParams

Parameter

Type

Required

Description

Model

STRING

No

The smart thumbnail model. A still thumbnail is generated if this parameter is left empty, and an animated thumbnail is generated if this parameter is set to gif.

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"Cover",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information: {"Details":[],"Message":"success","Code":"Success"}{"Message":"success","Data":"[{\"Score\":8.270855992569906,\"Time\":\"28278.25\",\"Url\":\"cover/test-00001.jpg\"},{\"Score\":7.474117489692728,\"Time\":\"25942.583333333332\",\"Url\":\"cover/test-00002.jpg\"}]","Code":"Success"}. In this example, Score indicates the confidence of the thumbnail result, Time indicates the timestamp of the thumbnail frame, and Url indicates the URL of the thumbnail.

VideoClip

Parameter description of JobParams

Parameter

Type

Required

Description

None

None

None

None

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"VideoClip",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information:

{"Code":"Success","Message":"Successful.","Data":"{\"result\":[{\"file\":\"videoclip/16e6bc5ca802e12429d082010164****-1602755353502-origin.mp4\"}]}"}

ImageH2V

Parameter description of JobParams

Parameter

Type

Required

Description

None

None

None

None

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"ImageH2V",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter

Type

Description

Result

STRINGN

The detailed information about the job result. Example of success result information: {"Details":[],"Message":"success","Code":"Success"}.

ImageDelogo

Parameter description of JobParams

Parameter

Type

Required

Description

None

None

None

None

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"ImageDelogo",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information: {"Details":[],"Message":"success","Code":"Success"}.

AudioBeatDetection

Parameter description of JobParams

Parameter

Type

Required

Description

None

None

None

None

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"AudioBeatDetection",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information: {"Code":"Success","Data":"{\"result\":[{\"file\":\"detectresult/normalvideo-161225931****.txt\"}]}","Message":"Successful."}.

AudioMixing

Parameter description of JobParams

Parameter

Type

Required

Description

inputs

STRING

No

The list of URLs of the audio track files to be mixed. You can specify only one URL. Example: {"file":"http://example-bucket-****.oss-cn-shanghai.aliyuncs.com/2.mp4"}.

Callback format

JSON format

{
  "Code":"Success",
  "FunctionName":"AudioMixing",
  "JobId":"158688059d8443a68b78a65e55b3****",
  "Message":"Successful.",
  "State":"Success",
  "Type":"IProduction",
  "UserData":"test"
}

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information: {"Message":"Successful.","Data":"{\"result\":[{\"file\":\"audiomix/alibaba-161283935****-origin.mp4\"}]}","Code":"Success"}.

ImageCartoonize

Parameter description of Output

Parameter

Type

Description

Output

STRING

{resultType} placeholders are supported in the path to distinguish whether the result file is a cartoonized image or the original image. result indicates a cartoonized image, and origin indicates the original image.

Callback format

JSON format

{
 "Code":"Success",
 "Details":[],
 "FunctionName":"ImageCartoonize",
 "JobId":"39f8e0bc005e4f309379701645f4744c",
 "Message":"success",
 "State":"Success",
 "Type":"IProduction"
}

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information: {"Code":"Success","Data":"{\"result\":[{\"file\":\"iproduction/test-result.jpg\"},{\"file\":\"iproduction/test-origin.jpg\"}]}","Message":"Successful."}.

AudioQualityAssessment

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. The following code shows an example of success result information.

Sample result information:

{
  "Code" : "Success",
  "Data" : "{
    \"result\":[{
        \"Discontinuity\":\"Good\",
        \"Loudness\":\"Excellent\",
        \"Worst MOS(0-5)\":\"0.38\",
        \"Discontinuity(0-5)\":\"3.52\",
        \"Speech Ratio\":\"48.55\",
        \"Loudness(0-5)\":\"4.91\",
        \"Worst Discontinuity(0-5)\":\"0.88\",
        \"Worst Coloration(0-5)\":\"0.42\",
        \"Channel\":\"1\",
        \"Coloration(0-5)\":\"0.99\",
        \"Bad Mute Ratio(%)\":\"0.0\",
        \"Time\":\"2022-12-02 16:14:06\",
        \"Noisiness(0-5)\":\"3.28\",
        \"MOS\":\"Poor\",
        \"Worst Noisiness(0-5)\":\"0.91\",
        \"Double Talk Ratio(%)\":\"19.23\",
        \"Input\":\"/home/admin/algo/quality****/example.wav\",
        \"Total Duration\":\"42.78\",
        \"Noisiness\":\"Good\",
        \"Tag\":\"Valid\",
        \"MOS(0-5)\":\"1.01\",
        \"Loudness(-90dB-0dB)\":\"-0.59\",
        \"Coloration\":\"Bad\",
        \"Saturated Ratio(%)\":\"37.55\"
    },
    {
        \"Discontinuity\":\"Fair\",
        \"Loudness\":\"Excellent\",
        \"Worst MOS(0-5)\":\"0.65\",
        \"Discontinuity(0-5)\":\"2.45\",
        \"Speech Ratio\":\"41.68\",
        \"Loudness(0-5)\":\"4.52\",
        \"Worst Discontinuity(0-5)\":\"0.66\",
        \"Worst Coloration(0-5)\":\"0.72\",
        \"Channel\":\"2\",
        \"Coloration(0-5)\":\"2.34\",
        \"Bad Mute Ratio(%)\":\"0.0\",
        \"Time\":\"2022-12-02 16:14:06\",
        \"Noisiness(0-5)\":\"2.53\",
        \"MOS\":\"Poor\",
        \"Worst Noisiness(0-5)\":\"0.67\",
        \"Double Talk Ratio(%)\":\"25.93\",
        \"Input\":\"/home/admin/algo/quality****/example.wav\",
        \"Total Duration\":\"42.78\",
        \"Noisiness\":\"Fair\",
        \"Tag\":\"Valid\",
        \"MOS(0-5)\":\"1.69\",
        \"Loudness(-90dB-0dB)\":\"-4.82\",
        \"Coloration\":\"Fair\",
        \"Saturated Ratio(%)\":\"0.0\"
    }]
  }",
  "Message" : "Successful."
}

Parameters

Parameter

Description

Time

The timestamp generated when the input file was scored.

Input

The name of the input file.

Total Duration

The duration of the input file. Unit: seconds.

Speech Ratio

The ratio of the duration of the audio data to the duration of the input file. Valid values: [0,100]. Unit: percentage.

Tag

The tag for the input file, which is used to indicate the validity of the detection. Valid values:

  • Valid: The detection is valid, which indicates that subsequent key metrics and the mean opinion score (MOS) are valid.

  • File too Short: The duration of the input file is less than 2s.

  • Mute: The input file does not contain audio data.

  • Voice too Short: The duration of the audio data is less than 2s.

Note
  • The preceding four events are mutually exclusive.

  • If the tag for an input file is one of the last three tags, the MOS, Discontinuity, Coloration, and Noisiness parameters are meaningless for the file and the parameter values are 0.

MOS(0-5)

The MOS of the input file, which describes the quality of the audio data. Valid values: [0,5].

MOS

The description of the MOS. Valid values:

  • (4,5]: The quality of the audio data is excellent.

  • [3,4): The quality of the audio data is good.

  • [2,3): The quality of the audio data is fair.

  • [1,2): The quality of the audio data is poor.

  • [0,1): The quality of the audio data is bad.

Discontinuity(0-5)

The continuity score of the audio data. The continuity score decreases due to the following reasons: the stuttering issue of audio data capture, echo issue due to multi-channel audio, and packet loss issue due to poor network connectivity. Valid values: [0,5].

Discontinuity

The description of the continuity score. Valid values:

  • (4,5]: The continuity of the audio data is excellent.

  • [3,4): The continuity of the audio data is good.

  • [2,3): The continuity of the audio data is fair.

  • [1,2): The continuity of the audio data is poor.

  • [0,1): The continuity of the audio data is bad.

Coloration(0-5)

The intelligibility score of the audio data. The intelligibility score decreases due to the following reasons: large reverberation, low bitrate, encoding error, and ambiguous pronunciation. Valid values: [0,5].

Coloration

The description of the intelligibility score. Valid values:

  • (4,5]: The intelligibility of the audio data is excellent.

  • [3,4): The intelligibility of the audio data is good.

  • [2,3): The intelligibility of the audio data is fair.

  • [1,2): The intelligibility of the audio data is poor.

  • [0,1): The intelligibility of the audio data is bad.

Noisiness(0-5)

The noise score of the audio data. Valid values: [0,5].

Note

The noise in the audio data includes environmental noise, such as the noise from fans and streets, background noise from the device of poor quality, and residual noise caused by the incomplete echo processing of the noise pickup equipment. If noise is not eliminated well during audio data processing, the noise score increases.

Noisiness

The description of the noise score. Valid values:

  • (4,5]: The noiselessness of the audio data is excellent.

  • [3,4): The noiselessness of the audio data is good.

  • [2,3): The noiselessness of the audio data is fair.

  • [1,2): The noiselessness of the audio data is poor.

  • [0,1): The noiselessness of the audio data is bad.

Loudness(0-5)

The loudness score of the human voice. If the human voice is clear and strong, the loudness score is high; if the human voice is hard to hear, the loudness score tends to be 0. Valid values: [0,5].

Loudness

The description of the loudness score. Valid values:

  • (4,5]: The loudness of the human voice is excellent.

  • [3,4): The loudness of the human voice is good.

  • [2,3): The loudness of the human voice is fair.

  • [1,2): The loudness of the human voice is poor.

  • [0,1): The loudness of the human voice is bad.

Loudness(-90dB-0dB)

The average volume of the human voice. Valid values: [-90,0]. Unit: decibel.

  • This parameter describes the volume of the human voice in decibels. In most cases, if the parameter value is less than -24, the human voice sounds low.

  • Default value: -90.0. This value indicates that no explicit human voice is detected.

Double Talk Ratio(%)

The ratio of the duration of two-channel audio data to the duration of the audio data. This parameter helps determine the possible factors of the low continuity score. Valid values: [0,100]. Unit: percentage.

Note

Two-channel audio data indicates that sounds simultaneously exist in two channels, such as the scenario in which the device leaks residual echo. This scenario may result in a low continuity score. Therefore, this parameter helps determine the possible factors of a low continuity score.

Bad Mute Ratio(%)

The percentage of abnormal mute frames. All abnormal mute frames of the audio data that does not include two-channel audio data are counted, excluding mute frames caused by cutting two-channel audio data. Valid values: [0,100]. Unit: percentage.

Saturated Ratio(%)

The percentage of the sonic boom segment to the voice segment. This parameter helps determine whether the excessive volume results in a large-scale sonic boom. Valid values: [0,100]. Unit: percentage.

Worst MOS(0-5)

The lowest MOS during the scoring process. Valid values: [0,5].

Worst Discontinuity(0-5)

The lowest continuity score during the scoring process. Valid values: [0,5].

Worst Noisiness(0-5)

The highest noise score during the scoring process. Valid values: [0,5].

Worst Coloration(0-5)

The lowest intelligibility score during the scoring process. Valid values: [0,5].

FaceBeauty

Parameter description of Job

Parameter

Type

Required

Description

beauty_params

STRING

No

The parameters of the FaceBeauty operation. Example: "whiten=20,smooth=50,face_thin=50"

Callback format

JSON format

{
	"Code":"Success",
  "Details":[],
  "FunctionName":"FaceBeauty",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
 }

Parameters

Parameter

Type

Description

skin_beauty_enable

INT

Specifies whether to enable skin polishing.

  • Valid values: [0,1].

  • 0: disables skin polishing.

  • 1: enables skin polishing.

  • Default value: 1.

shape_beauty_enable

INT

Specifies whether to enable face shaping.

  • Valid values: [0,1].

  • 0: disables face shaping.

  • 1: enables face shaping.

  • Default value: 1.

whiten

INT

The degree of skin whitening. The greater the value, the whiter the skin looks.

  • Valid values: [0,100].

  • Default value: 20

smooth

INT

The degree of skin smoothing. The greater the value, the more smooth the skin looks.

  • Valid values: [0,100].

  • Default value: 20

detail

INT

The degree of skin granularity. The greater the value, the more fine-grained the skin is, and the more skin details exist.

  • Valid values: [0,100].

  • Default value: 20

skin_model

INT

Specifies whether to enable the skin model feature. If you enable this feature, skin whitening is valid only for sections that are detected as skin.

  • Valid values: [0,1].

  • 0: disables the skin model feature.

  • 1: enables the skin model feature.

  • Default value: 1.

cheek_thin

FLOAT

The degree of frontal bone thinning.

  • Valid values: [0,100].

  • Default value: 0.

face_cut

FLOAT

The degree of cheekbone narrowing.

  • Valid values: [0,100].

  • Default value: 0.

face_thin

FLOAT

The degree of face thinning.

  • Valid values: [0,100].

  • Default value: 0.

face_length

FLOAT

The degree of face length adjustment (two-way).

  • Valid values: [-100,100].

  • Default value: 0.

chin_length

FLOAT

The degree of chin length adjustment (two-way).

  • Valid values: [-100,100].

  • Default value: 0.

chin_thin

FLOAT

The degree of chin thinning.

  • Valid values: [0,100].

  • Default value: 0.

eye_size

FLOAT

The degree of eye widening.

  • Valid values: [0,100].

  • Default value: 0.

eye_corner1

FLOAT

The degree of vertical canthus adjustment (two-way).

  • Valid values: [-100,100].

  • Default value: 0.

eye_distance

FLOAT

The degree of eye distance adjustment (two-way).

  • Valid values: [-100,100].

  • Default value: 0.

nose_thin

FLOAT

The degree of nose slimming (two-way).

  • Valid values: [-100,100].

  • Default value: 0.

nose_wing

FLOAT

The degree of nasal alar slimming (two-way).

  • Valid values: [-100,100].

  • Default value: 0.

nose_length

FLOAT

The degree of nose length adjustment (two-way).

  • Valid values: [-100,100].

  • Default value: 0.

mouth_size

FLOAT

The degree of mouth size adjustment (two-way).

  • Valid values: [-100,100].

  • Default value: 0.

mouth_position

FLOAT

The degree of philtrum length adjustment (two-way).

  • Valid values: [-100,100].

  • Default value: 0.

lip_thickness

FLOAT

The degree of lip thickness adjustment (two-way).

  • Valid values: [-100,100].

  • Default value: 0.

hair_line

FLOAT

The degree of hairline adjustment (two-way).

  • Valid values: [-100,100].

  • Default value: 0.

smile

FLOAT

The degree of smiling adjustment.

  • Valid values: [0,100].

  • Default value: 0.

detect_mode

FLOAT

The facial detection mode.

  • Valid values: [0,1].

  • 0: video mode.

  • 1: image mode.

  • Default value: 1.

Note

In video mode, multiple frames are used to trace faces to ensure more stable results.

detect_level

FLOAT

The resolution of the face detector. Smaller faces may not be detected at low resolution.

  • Valid values: [0,2].

  • 0: the lowest resolution at the fastest detection speed.

  • 1: the medium resolution at medium detection speed.

  • 2: the highest resolution at the slowest detection speed.

  • Default value: 1.

threshold

FLOAT

The threshold of confidence of a facial detection.

  • Valid values: [0,1].

  • Default value: 0.8.

detect_interval

FLOAT

The number of frames between two consecutive facial detections in video mode.

  • Valid values: [1,65535].

  • Default value: 5.

max_face_num

FLOAT

The maximum number of faces that can be detected.

  • Valid values: [0,32].

  • Default value: 32.

min_face

FLOAT

The smallest width of a face.

  • Valid values: [10,1024].

  • Default value: 40.

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information: {"Code":"Success","Data":"{\"result\":[{\"file\":\"result.mp4\"}]}","Message":"Successful."}.

SpeechDenoise

The input audio file must be in the WAV format with a sampling rate of 16,000 Hz or 48,000 Hz.

The format and sampling rate of the output audio file are the same as those of the input audio file.