All Products
Search
Document Center

Intelligent Media Services:SubmitIProductionJob

Last Updated:Apr 01, 2026

Use the SubmitIProductionJob API to submit an intelligent production job.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

  • Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.

  • API: The API that you can call to perform the action.

  • Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.

  • Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.

    • For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.

    • For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.

  • Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.

  • Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

ice:SubmitIProductionJob

create

*All Resource

*

None None

Request parameters

Parameter

Type

Required

Description

Example

Name

string

No

The job name, up to 100 characters long.

测试任务

FunctionName

string

Yes

The algorithm to use. Valid values:

  • Cover: Smart cover generation.

  • VideoClip: Video summarization.

  • VideoDelogo: Video logo removal.

  • VideoDetext: Video subtitle removal.

  • CaptionExtraction: Subtitle extraction.

  • HybridCaptionExtraction: Multimodal subtitle extraction.

  • VideoGreenScreenMatting: Green screen matting for videos.

  • FaceBeauty: Video face beautification.

  • VideoH2V: Horizontal-to-vertical video conversion.

  • MusicSegmentDetect: Music chorus detection.

  • AudioBeatDetection: Beat detection.

  • AudioQualityAssessment: Audio quality assessment.

  • SpeechDenoise: Speech denoising.

  • AudioMixing: Audio mixing.

  • MusicDemix: Music source separation.

Cover

Input

object

Yes

The input media asset, specified by an OSS URL or a media asset ID.

Input file requirements vary by algorithm. For details, see the subsequent sections of this topic.

Type

string

Yes

The input type. Valid values:

  • OSS: The input is an OSS URL.

  • Media: The input is a media asset ID.

OSS

Media

string

Yes

The OSS URL or media asset ID of the input media asset. Use one of the following OSS URL formats:

  1. oss://bucket/object

  2. http(s)://bucket.oss-[regionId].aliyuncs.com/object In these formats, bucket is the name of an OSS bucket in the same region as your project, and object is the file path.

oss://bucket/object

Output

object

Yes

The output media asset, specified by an OSS URL or a media asset ID.

The output files vary by algorithm. For details, see the subsequent sections of this topic.

Type

string

Yes

The output type. Valid values:

  • OSS: The output is an OSS URL.

  • Media: The output is a media asset ID.

OSS

Biz

string

No

The service that owns the media asset.

IMS

Media

string

Yes

The output media asset. This value is an OSS URL if Type is OSS, or a media asset ID if Type is Media.

Use one of the following OSS URL formats:

  1. oss://bucket/object

  2. http(s)://bucket.oss-[RegionId].aliyuncs.com/object In these formats, bucket is the name of an OSS bucket in the same region as your project, and object is the file path.

Media asset ID:

  • You can specify an existing media asset ID.

    • In this case, you do not need to specify Biz because the service is inherited from the source asset.

  • You can create a new media asset by leaving this parameter empty.

    • The Biz parameter determines whether the asset is written to IMS or VOD. If Biz is not specified, the service is inherited from the source asset or defaults to IMS.

Note

The OSS path supports placeholders. Example: oss://example-****/iproduction/{source}-{timestamp}-{sequenceId}.png. The following placeholders are supported:

  • {source}: The input file name.

  • {timestamp}: The Unix timestamp.

  • {sequenceId}: The sequence number for the output.

  • {resultType}: The output file type, determined by the server. Placeholders are optional. However, for algorithms that produce multiple output files, such as Cover, we recommend including the {sequenceId} placeholder to ensure that each output file has a unique path.

oss://bucket/object

OutputUrl

string

No

The OSS URL of the output file if Type is Media. The bucket must be registered with IMS or VOD.

http(s)://bucket.oss-[RegionId].aliyuncs.com/object

TemplateId

string

No

The template ID.

****20b48fb04483915d4f2cd8ac****

JobParams

string

No

The algorithm job parameters, provided as a JSON string. Required parameters vary by algorithm. For details, see the subsequent sections of this topic.

{"Model":"gif"}

ScheduleConfig

object

No

The job scheduling configuration.

PipelineId

string

No

The pipeline ID.

5246b8d12a62433ab77845074039c3dc

Priority

integer

No

The job priority. Valid values: 1 to 10. A smaller value indicates a higher priority.

6

UserData

string

No

Custom user data. This data is returned in the response without modification. The value can be up to 256 characters in length.

{"test":1}

ModelId

string

No

The algorithm model ID. If this parameter is empty, the system uses the default model for the algorithm. You can usually leave this parameter empty.

Non-default models are available for the following algorithms:

  • VideoDetext
    • ModelId = algo-video-detext-new: A subtitle removal model that provides better results but is slower and more expensive than the default model.

Input and output fields

Cover

Input: a video file. Output: multiple images (three by default, which must be distinguished using placeholders). The output format is PNG for static images or GIF for animated images, depending on the settings in JobParams.

VideoDelogo

Input: a video file. Output: a video in MP4 format with the logo removed.

VideoDetext

Input: a video file. Output: a video in MP4 format with captions removed.

CaptionExtraction

Input: a video file. Output: a caption file in SRT format.

HybridCaptionExtraction

Input: a video file. Output: a caption file in SRT format.

VideoGreenScreenMatting

Input: a video file. Output: a video with the green screen background removed. The format is MP4 or WebM, depending on the settings in JobParams.

FaceBeauty

Input: a video file. Output: a beautified video in MP4 format.

VideoH2V

Input: a video file. Output: a video in MP4 format converted from a horizontal to a vertical aspect ratio.

MusicSegmentDetect

Input: an audio file. Output: a JSON file containing the chorus detection results.

AudioBeatDetection

Input: an audio file. Output: a JSON file containing the beat detection results.

AudioQualityAssessment

Input: an audio file. No output file is generated. The audio quality assessment results are returned directly in the response of the QueryIProductionJob operation.

SpeechDenoise

Input: an audio file. Output: a noise-reduced audio file in WAV format.

AudioMixing

Input: an audio file. Output: a mixed audio file in WAV format. For details on how to specify additional audio files for mixing, see the JobParams parameters below.

MusicDemix

Input: an audio file (typically a song). Output: two audio files resulting from source separation. You must include the {resultType} placeholder in the output path to distinguish between the vocals and the accompaniment.

JobParams JSON fields

Cover

  • Model: string. The model for the smart cover. If this parameter is left empty, a static image is generated. If set to gif, an animated image is generated.

VideoDelogo

  • LogoModel: string. The type of station logo to remove. Valid values are tv (for television station logos) and internet (for online media logos). You can specify multiple values, separated by commas.

  • Boxes: string. The bounding boxes for the target logos. The coordinates are normalized values relative to the top-left corner of the video, in the format [xmin, ymin, width, height]. Supports up to two bounding boxes. Example: "[[0, 0, 0.3, 0.3], [0.7, 0, 0.3, 0.3]]".

VideoDetext

  • LimitRegion: list. Specifies the region(s) for caption detection. The coordinates are normalized values relative to the top-left corner, specified as [xmin, ymin, width, height]. You can specify multiple detection regions. Example: [[0, 0, 0.3, 0.3], [0.7, 0, 0.3, 0.3]]. Note: If this parameter is not set, the default detection region is the bottom 30% of the video.

  • Time: list. The time range for caption removal, specified in seconds as [start_time, end_time]. For example, [5, 20] removes captions between the 5-second and 20-second marks of the video.
    • The Time parameter can be a one-dimensional array, such as [5, 20], to specify a single time range.

    • The Time parameter can also be a two-dimensional array, such as [[5, 20], [25, 43], [51, 80]], to specify multiple time ranges (supported only when modelId is set to algo-video-detext-new).

CaptionExtraction

  • fps: integer (Optional). The sampling frame rate. Range: [2, 10]. Default: 5.

  • roi: list. The region of interest (ROI) for caption extraction. Only captions within this region are extracted. The format is [[top, bottom], [left, right]], using normalized values. For example, [[0.5, 1], [0, 1]] specifies the bottom half of the video. If this parameter is not provided, the default region is the bottom 1/4 of the video.

  • lang: string. The recognition language. Valid values: ch (Chinese), en (English), and ch_ml (Chinese-English mixed). Default: ch.

  • track: string. If set to main, only the main caption track is extracted. If this parameter is not set, the system extracts all captions that appear in the specified region by default.

HybridCaptionExtraction

  • fps: integer (Optional). The sampling frame rate. Range: [2, 10]. Default: 5.

  • roi: list. The bounding box of the target caption, in the format [bx, by, bw, bh]. If this parameter is not provided, the default region is the bottom 1/4 of the video.
    • bx: The normalized x-coordinate of the top-left corner of the bounding box, relative to the video width. Example: 0.1.

    • by: The normalized y-coordinate of the top-left corner of the bounding box, relative to the video height. Example: 0.0.

    • bw: The normalized width of the bounding box, relative to the video width. Example: 0.3.

    • bh: The normalized height of the bounding box, relative to the video height. Example: 0.2.

  • lang: string. The recognition language. Valid values: zh (Chinese) and en (English). Default: zh.

  • track: string. If set to main, only the main caption track is extracted. If this parameter is not set, the system extracts all captions that appear in the specified region by default.

VideoGreenScreenMatting

  • bgimage: string. The background image to replace the green screen. Example: http://example-image-****.example-location.aliyuncs.com/example/example.jpg. If you do not set this parameter, the service outputs a WebM video with an alpha channel.

FaceBeauty

  • beauty_params: string. The beautification parameters. Example: "whiten=20,smooth=50,face_thin=50". For more information, see Parameter field descriptions.

VideoH2V

None

MusicSegmentDetect

None

AudioBeatDetection

None

AudioQualityAssessment

None

SpeechDenoise

Input audio requirements: The audio file must be in WAV format with a sample rate of 16 kHz or 48 kHz.

AudioMixing

  • inputs: list. A list of URLs for the audio tracks to mix. Currently, only one audio track is supported. Example: {"file":"http://example-bucket-****.oss-cn-shanghai.aliyuncs.com/2.mp4"}

MusicDemix

None

Response elements

Element

Type

Description

Example

object

The response object.

RequestId

string

The ID of the request.

C1849434-FC47-5DC1-92B6-F7EAAFE3851E

JobId

string

The job ID.

****20b48fb04483915d4f2cd8ac****

Examples

Success response

JSON format

{
  "RequestId": "C1849434-FC47-5DC1-92B6-F7EAAFE3851E",
  "JobId": "****20b48fb04483915d4f2cd8ac****"
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.