All Products
Document Center

ApsaraVideo Media Processing:Terms

Last Updated:Feb 04, 2024

This topic describes the terms related to ApsaraVideo Media Processing (MPS).

MPS-specific terms


A job is an abstractive task in MPS.

  • Job type: Jobs can be classified into multiple types, such as media information analysis jobs, transcoding jobs, transcoding query jobs, snapshot jobs, smart production jobs, smart tagging jobs, content moderation jobs, media fingerprinting jobs, and jobs for adding copyright watermarks.

  • Job submission:

    • Submission method: You can submit a job by using the MPS console, calling API operations, or using SDKs. You can also configure workflows to submit jobs when the workflows are triggered.

    • Required parameters: You must configure the input, output, and processing parameters to submit a job. For example, if you create a transcoding job, you must configure the core parameters before you submit the job. The core parameters include the parameters that determine the input file, output path, transcoding template or transcoding method, MPS queue, and priority.

  • Invocation mode and result query: The time required for running different types of jobs varies. Some jobs are immediately complete after they are submitted. In most jobs, files are downloaded, analyzed, and processed, which take a period of time to complete. Therefore, jobs can be processed in synchronous or asynchronous mode.

    Invocation mode

    Applicable jobs

    Submission method

    Result query


    Synchronous mode

    Single-snapshot and media information analysis jobs

    • You can use SDKs or API operations to submit all synchronous jobs.

    The result of a synchronous job is immediately returned after you submit the job. The result includes the path of the snapshot file to be stored in an Object Storage Service (OSS) bucket or the detailed media information.

    You can also query the job result by calling API operations for periodic polling. However, you cannot receive notifications of the job results by configuring the notification feature of Message Service (MNS).


    A timeout error may occur if the size of the input file is large. Configure a retry mechanism based on your business requirements.

    Process of jobs in synchronous mode

    Asynchronous mode

    All types of MPS jobs, including

    transcoding, snapshot, media information analysis, and video AI jobs. Transcoding jobs include regular transcoding, Narrowband HD™ 1.0 transcoding, Narrowband HD™ 2.0 transcoding, and audio and video enhancement jobs. Video AI jobs include intelligent production, smart tagging, automated review, media fingerprinting jobs

    • You can use the MPS console to submit some asynchronous jobs.

    • You can use SDKs or API operations to submit all asynchronous jobs.

    • You can configure workflows to submit some asynchronous jobs when the workflows are triggered.

    After an asynchronous job is submitted, you can only view whether the submission is successful from the response. To query the result of the job, you must call API operations for periodic polling or configure MNS for notifications.

    • Periodic polling: Each job is identified by a unique ID, which is immediately returned after the job is submitted. You can view the ID of the job in the data that is returned by the API operation or in the MPS console. You can record the ID of a job and specify the ID when you call API operations to regularly poll the results of the job.

    • MNS configuration: You can configure the notification feature for an MPS queue or workflow to obtain the job results. A notification contains the ID of the job, user data, and detailed results.

    Process of jobs in asynchronous mode

MPS queue

An MPS queue is a queue for processing asynchronous jobs. After you submit asynchronous jobs, the jobs are queued for running based on the job priorities and the sequence in which the jobs are submitted. Different types of MPS queues differ in the maximum number of MPS queues to be created, applicable features, and supported concurrency. For more information, see Overview.


A workflow is a preset process in which one or more jobs are run. After you upload a file to the OSS bucket specified as the input bucket of a workflow, the workflow is triggered and run, and the file is processed as specified.

Transcoding template

A transcoding template is a collection of processing parameters. You can use a transcoding template to simplify the operations when you create a transcoding job or use a workflow. Each transcoding template is identified by a unique ID. Transcoding templates can be classified into the following types: custom templates, customized templates, and preset templates. For more information, see Overview.

  • Custom template: a transcoding template that is created in the MPS console or by using the API. No parameters are customized at the backend.

  • Customized template: a transcoding template for which customized parameters are configured at the backend based on your personalized requirements. If you use a customized template, the parameters that you configure do not take effect. You cannot view or modify the customized parameters.

  • Preset template: a transcoding template that is predefined in MPS based on the resolutions and network. Preset templates include static preset templates and intelligent preset templates. Static preset templates support regular transcoding, audio transcoding, container format conversion, Narrowband HD™ 1.0 transcoding, and Narrowband HD™ 2.0 transcoding jobs. For more information, go to the MPS console or see Preset template details.

Watermark template

A watermark template specifies a set of parameters that are used to add a watermark to a video, such as the position, offset, and size of the watermark. Each watermark template is identified by a unique ID. To add a watermark to an output video, you can specify a watermark template or directly pass in the corresponding watermark parameters.

Template analysis job

Due to the differences between input files, such as differences in resolution and bitrate, not all preset templates are suitable for an input file. Therefore, before you use a preset template, you must call the SubmitAnalysisJob operation to submit a template analysis job. The template analysis job returns a list of preset templates that can be used for the specified input file. You can call the QueryAnalysisJobList operation to query this list.

Audio and video-specific terms


Transcoding is the process of converting a coded audio or video stream to another audio or video stream based on different network bandwidths, processing capabilities of terminals, and user needs. Transcoding is a process of decoding and encoding. Streams before and after transcoding may use the same or different video encoding formats. The following encoding formats are commonly used: H.264, H.265, and AV1.

Container format conversion

Container format conversion is the process of converting an audio or video file from one container format to another. For example, an AVI file can be converted to an MP4 file. The compressed video and audio streams are obtained from the file in the source container format and then packaged into a file in the destination container format. No encoding or decoding is involved in this process. Compared with transcoding, container format conversion has the following benefits:

  • Fast processing: Decoding and encoding audio and video files are complex and occupy most of the transcoding time. Container format conversion does not require encoding or decoding. This reduces the processing time.

  • Lossless audio or video quality: Container format conversion does not compress audio or video files because encoding and decoding are not involved. The output file after container format conversion is almost the same as the input file in terms of resolution and bitrate. However, if you convert an MP4 file to an M3U8 file and TS files, the output file size increases due to protocol specifications.


Resolution describes the amount of details in a video. It indicates the number of pixels contained in each dimension. For example, a 1,280 × 720 video indicates that the width of the video is 1,280 pixels and the height is 720 pixels. The resolution determines how realistic and clear the video appears. A video that has a higher resolution contains more pixels and has clearer images.

The resolution is a key factor that determines the bitrate. Videos that have different resolutions use different bitrates. In general, higher resolutions require higher bitrates. Each resolution corresponds to a recommended range of bitrates. If you specify a resolution and a bitrate that are lower than the lower limit of the recommended range, the image quality is poor. If you specify a resolution and a bitrate that are higher than the upper limit of the recommended range, the video occupies more storage space and requires higher traffic to be loaded, but the image quality is not significantly improved.


Bitrate is the data traffic that video files use per unit of time. Bitrate is the most important item for image quality control in video encoding. The bitrate is measured in bits per second (bit/s), and is often used in the units of Kbit/s and Mbit/s. For videos that have the same resolution, a higher bitrate indicates a smaller compression ratio and higher image quality. A high bitrate indicates a high sampling rate per unit of time and a high data stream accuracy. Therefore, the quality and definition of the processed video file are close to those of the original file. The processed file requires excellent decoding capabilities from the playback device.

The higher the bitrate, the larger the file. You can calculate the file size based on the following formula: File size = Time × Bitrate/8. For example, if a 60-minute 720p online video file has a bitrate of 1 Mbit/s, the size of the file is calculated based on the following formula: 3,600 seconds × 1 Mbit/s/8 = 450 MB.

Frame rate

The frame rate is used to measure the number of frames that are displayed per unit of time in a video, or the number of frames that are refreshed per second in an image. The unit of frame rate is frame per second (FPS) or Hz.

The higher the frame rate, the smoother and more lifelike the video appears. In most cases, 25 to 30 FPS is sufficient. 60 FPS can deliver an immersive and realistic playback experience. If you increase the frame rate to more than 75 FPS, the improvement of playback experience is less significant. If you specify a frame rate higher than the refresh rate of your monitor, the monitor cannot properly display the frames and the processing potential of your graphics card is wasted. Higher frame rates at the same resolution require greater processing capabilities from the graphics card.


A Group of Pictures (GOP) is a group of continuous images in an MPEG-encoded video or video stream. It starts with an I-frame and ends with the next I-frame. A GOP contains the following image types:

  • Intra coded picture (I-frame): the keyframe. An I-frame contains all information that is required to produce the picture for that frame. The I-frame is independently decoded and can be regarded as a static picture. The first frame in the video sequence is always an I-frame, and each GOP starts with an I-frame.

  • Predictive coded picture (P-frame): A P-frame must be encoded based on the previous I-frame. A P-frame contains motion-compensated difference information relative to the previous I-frame or P-frame. During decoding, the difference defined by the current P-frame is superimposed with the previously cached image to generate the final image. P-frames occupy fewer data bits compared with I-frames. However, P-frames are sensitive to transmission errors due to the complex dependencies on the previous I-frame or P-frame.

  • Bidirectionally predictive coded picture (B-frame): A B-frame contains motion-compensated difference information relative to the previous and subsequent frames. During decoding, the data of the current B-frame is superimposed with both the previously cached image and the decoded subsequent image to generate the final image. B-frames provide a high compression ratio and require high decoding performance.

The GOP value indicates the interval of keyframes, which is the distance between two Instantaneous Decoding Refresh (IDR) frames or the maximum number of frames in a frame group. At least one keyframe is required for each second of video. More keyframes improve video quality but increase bandwidth consumption and network loads. The interval is calculated by dividing the GOP value by the frame rate. The GOP value indicates the number of frames. For example, the default GOP value of ApsaraVideo VOD is 250 frames and the frame rate is 25 FPS. The time interval is calculated based on the following formula: 250/25 = 10 seconds.

The GOP value must be within an appropriate range to achieve balance among the video quality, file size that indicates bandwidth consumption, and seeking effect that indicates the speed of response to the drag and fast-forward operations.

  • When the GOP value increases, the file size is reduced. However, if the GOP value is too large, the last frames of a GOP are distorted, and the video quality is reduced.

  • The GOP value is also a key factor in determining the speed of response to seeking in a video. During seeking, the player locates the closest keyframe before a specific position. A larger GOP value indicates a longer distance between the specified position and the closest keyframe, which results in more predictive frames that need to be decoded. In this case, the loading time is extended and the seeking operation requires a long period of time to complete.

  • Encoding P-frames and B-frames is more complex compared with encoding I-frames. A large GOP value results in many P-frames and B-frames. This decreases the encoding efficiency.

  • However, if the GOP value is too small, the bitrate of the video must be increased to ensure that the image quality is not reduced. This process increases bandwidth consumption.

Encoding profile

An encoding profile defines a set of capabilities that focus on a specific class of applications.

H.264 provides the following encoding profiles:

  • Baseline profile: This profile provides basic image quality and is applicable to mobile devices. The baseline profile uses I-frames and P-frames, and supports only progressive videos and context-adaptive variable-length coding (CAVLC).

  • Main profile: This profile provides mainstream image quality and is applicable to standard-definition devices, such as MP4 players that have relatively low decoding capabilities, portable video players, PSPs, and iPods. The main profile uses I-frames, P-frames, and B-frames, and supports progressive and interlaced videos. The main profile also supports CAVLC and context-adaptive binary arithmetic coding (CABAC).

  • High profile: This profile provides high image quality and is applicable to high-definition devices that have big screens, such as broadcast and disc storage applications for Blu-ray Discs and high-definition television applications. The high profile supports 8 × 8 inter-prediction, custom quantization, lossless video coding, more YUV formats and the features of the main profile.

Advanced Audio Coding (AAC) provides the following encoding profiles:

  • aac_low: Low Complexity AAC (LC)

  • aac_he: High Efficiency AAC (HE-AAC)

  • aac_he_v2: High Efficiency AAC version 2 (HE-AACv2)

  • aac_ld: Low Delay AAC (LD-AAC)

  • aac_eld: Enhanced Low Delay AAC (ELD-AAC)

Bitrate control method

Bitrate control methods are the methods that are used to control the bitrate of a coded stream. The following bitrate control methods are commonly used:

  • Constant bitrate (CBR): This method is used to generate a file that has a fixed bitrate. If you use this method, the bitrate is fixed throughout the coded stream. CBR-compressed files are larger in size than VBR- and ABR-compressed files, and do not have much improvement in quality.

  • Variable bitrate (VBR): This method is used to generate a file that has a variable bitrate. This method determines the bitrate of the output file based on the complexity of the input file during encoding. For a more complex input file, a file that has a higher bitrate is generated. For a simpler file, a file that has a lower bitrate is generated. This method is generally used with the Two-Pass encoding method. VBR is applicable to storage and allows you to use the limited storage space in a more reasonable manner. However, you cannot predict the size and bitrate fluctuation of the output file by using VBR.

  • Average bitrate (ABR): This method is an average bitrate mode with interpolation parameters added. LAME has created this method to solve the size and quality mismatch of CBR-compressed files and the unpredictable file sizes of VBR. In a specific file range, ABR divides a stream into parts in the unit of 50 frames. 30 frames equal to about one second. ABR uses relatively low bitrates to code the less complex segments and high bitrates to code the more complex segments. ABR can be regarded as a compromise between VBR and CBR. The bitrate can reach the specified value within a specific time range, but the peak bitrate in some segments may exceed the specified bitrate. The average bitrate remains constant. ABR is a modified version of VBR. ABR ensures that the average output bitrate is within an appropriate range and codes videos within this range based on the complexity. By default, Alibaba Cloud uses ABR.

  • Video Buffering Verifier (VBV): This method can ensure that the bitrate is lower than a specific value. This method is used based on the maxrate and bufsize parameters. The maxrate parameter specifies the maximum output bitrate, and the bufsize parameter specifies the buffer size. VBV can be used with the Two-Pass or Constant Rate Factor (CRF) encoding method. The CRF encoding method can also be replaced by Capped CRF.

  • bufsize: the size of the video buffer. You can configure this parameter based on the expected bitrate fluctuation. In most cases, you can configure the bufsize parameter to twice the maxrate parameter. If the cache size of the client is small, set the bufsize parameter to the value of the maxrate parameter. If you want to limit the bitrate, set the bufsize parameter to half of the value of the maxrate parameter or less.

  • CRF: This method controls the output bitrate by using the quality control factor. The video quality is quantified into different levels from 0 to 51. 0 specifies lossless image quality and 51 specifies the worst image quality that can be generated. You can ensure the general quality of the output video by using CRF. The bitrate varies based on the complexity of the input content. If you do not know which CRF level is suitable, we recommend that you use a level in the range of [23,29]. You can adjust the CRF level based on the complexity of the content. If you increase or decrease the CRF level by 6, the bitrate is reduced by half or doubled. To generate files with the same definition, you can set the CRF level for a computer-generated video to be larger than the CRF level for a live-action video. CRF can provide better video quality, but cannot be used to predict the size and bitrate fluctuation of the output file.

  • Capped CRF: The bitrate of the output file by using CRF is not fixed. You can use CRF together with VBV to limit the range of the bitrate and prevent bitrate spikes.

  • One-Pass: The encoding speed of this method is faster than that of Two-Pass. By default, Alibaba Cloud uses One-Pass.

  • Two-Pass: The encoder is run twice to accurately assign the bitrate to obtain an output file that has a smaller size and higher quality. In the first pass, the encoder analyzes a video and generates log files. In the second pass, the encoder performs encoding based on the analysis results to obtain the best encoding quality. Two-Pass consumes more time than One-Pass. Therefore, Two-Pass cannot be used in scenarios that require high transcoding timeliness, such as live streaming and real-time communication. If the compression ratio of the input video is high, we recommend that you do not use Two-Pass. Otherwise, blocking artifact may occur.

Common terms


A region is a location in which you activate an Alibaba Cloud service. You can select different Alibaba Cloud regions and use Alibaba Cloud services closer to your business for lower access latency and better user experience.


OSS is short for Object Storage Service provided by Alibaba Cloud. MPS transcodes media files stored in OSS. The output files are also stored in OSS. For more information, see Terms.


A bucket is a container for objects that are stored in OSS. Every object in OSS is contained in a bucket. You can configure multiple settings for a bucket, such as the region, access permissions, and storage formats. You can create different types of buckets to store different data as required. For more information, see the "Bucket" section of the Terms topic.


Objects are the smallest data unit in OSS. Files uploaded to OSS are called objects. Unlike typical file systems, objects in OSS are stored in a flat structure instead of a hierarchical structure. An object is composed of a key, metadata, and the data stored in the object. Each object in a bucket is uniquely identified by the key. Object metadata is a group of key-value pairs that define the properties of an object, such as the size of the object and the time when the object is last modified. You can also specify custom user metadata for objects in OSS. For more information, see the "Object" section of the Terms topic.

AccessKey pair

An AccessKey pair is the credential that is used by OSS to authenticate a requester. An AccessKey pair consists of an AccessKey ID and an AccessKey secret. OSS uses symmetric encryption based on an AccessKey pair to verify the identity of a requester. The AccessKey ID is used to identify a user. The AccessKey secret is used to encrypt and verify signature strings and OSS buckets. To ensure the security of your data, we recommend that you keep your AccessKey secret confidential.

OSS supports the following types of AccessKey pairs:

  • AccessKey pairs applied for by the bucket owner.

  • AccessKey pairs granted by the bucket owner by using Resource Access Management (RAM).

  • AccessKey pairs granted by the bucket owner by using Security Token Service (STS).

For more information, see Obtain an AccessKey pair.