All Products
Document Center

ApsaraVideo Media Processing:Terms

Last Updated:Aug 11, 2023

This topic introduces the terms related to ApsaraVideo Media Processing (MPS).

MPS-specific terms


A job is a task in MPS. Jobs can be classified into types such as media information analysis jobs, transcoding jobs, transcoding query jobs, snapshot jobs, smart tagging jobs, and media fingerprinting jobs. You must configure parameters to submit a job. For example, if you create a transcoding job, you must specify an MPS queue, a template, and an input file before you submit the job.

MPS queue

An MPS queue is a queue for jobs. After jobs are added to an MPS queue, they are scheduled for processing by MPS. If an MPS queue contains a large number of jobs, the jobs are queued up. An MPS queue can be enabled or disabled. If an MPS queue is disabled, MPS does not schedule jobs in this MPS queue until the MPS queue is enabled again. However, ongoing jobs in this MPS queue are not affected. Different types of MPS queues support different concurrencies.


Concurrency specifies the number of threads that can be concurrently run within a period of time.


A workflow is a preset process in which one or more jobs are run. After you upload a file to the Object Storage Service (OSS) bucket specified as the input bucket of a workflow, the workflow is triggered and run, and the file is processed as specified.

transcoding template

A transcoding template is a collection of processing parameters. You can use a transcoding template to simplify the operations when you create a transcoding job or use a workflow. Each transcoding template is identified by a unique ID. Transcoding templates can be classified into the following types: custom templates, customized templates, and preset templates.

  • Custom template: a transcoding template that is created in the MPS console or by using the API. No parameters are customized at the backend.

  • Customized template: a transcoding template for which customized parameters are configured at the backend based on your personalized requirements. If you use a customized template, the parameters that you configure do not take effect. You cannot view or modify the customized parameters.

  • Preset template: a transcoding template that is predefined in MPS based on the resolutions and network. Preset templates include static preset templates and intelligent preset templates. Static preset templates support regular transcoding, audio transcoding, container format conversion, Narrowband HD™ 1.0, and Narrowband HD™ 2.0. For more information, go to the MPS console or see Preset template details.

watermark template

A watermark template specifies a set of parameters that are used to add a watermark to a video, such as the position, offset, and size of the watermark. Each watermark template is identified by a unique ID. To add a watermark to an output video, you can specify a watermark template or directly pass in the corresponding watermark parameters.

template analysis job

Due to the differences between input files, such as differences in resolution and bitrate, not all preset templates are suitable for an input file. Therefore, before you use a preset template, you must call the SubmitAnalysisJob operation to submit a template analysis job. The template analysis job returns a list of preset templates that can be used for a specific input file. You can call the QueryAnalysisJobList operation to query this list.

Audio and video-specific terms


Transcoding refers to the process of converting a coded audio or video stream to another audio or video stream based on different network bandwidths, processing capabilities of terminals, and user needs. Transcoding is a process of decoding and encoding. Streams before and after transcoding may use the same or different video encoding formats. The following encoding formats are commonly used: H.264, H.265, and AV1.

container format conversion

Container format conversion refers to the process of converting an audio or video file from one container format to another. For example, convert an AVI file to an MP4 file. The compressed video and audio streams are obtained from the file in the source container format and then packaged into a file in the destination container format. No encoding or decoding is involved in this process. Compared with transcoding, container format conversion has the following features:

  • Fast processing. Decoding and encoding audio and video files are complex and occupy most of the transcoding time. Container format conversion does not require encoding or decoding. This reduces the processing time.

  • Lossless audio or video quality. Container format conversion does not compress audio or video files because encoding and decoding are not involved. The output file after container format conversion is almost the same as the input file in terms of resolution and bitrate. However, if you convert an MP4 file to an M3U8 file and TS files, the output file size increases due to protocol specifications.


Resolution describes the amount of details in a video. It indicates the number of pixels contained in each dimension. For example, a 1,280 × 720 video indicates that the width of the video is 1,280 pixels and the height is 720 pixels. The resolution determines how realistic and clear the video appears. A video that has a higher resolution contains more pixels and has clearer images.

The resolution is a key factor that determines the bitrate. Videos that have different resolutions use different bitrates. In general, higher resolutions require higher bitrates. Each resolution corresponds to a recommended range of bitrates. If you specify a resolution and a bitrate that are lower than the lower limit of the recommended range, the image quality is poor. If you specify a resolution and a bitrate that are higher than the upper limit of the recommended range, the video occupies more storage space and requires higher traffic to be loaded, but the image quality is not significantly improved.


Bitrate refers to the data traffic that video files use per unit of time. Bitrate is the most important item for image quality control in video encoding. The bitrate is measured in bits per second (bit/s), and is often used in the units of Kbit/s and Mbit/s. For videos that have the same resolution, a higher bitrate indicates a smaller compression ratio and higher image quality. A high bitrate indicates a high sampling rate per unit of time and a high data stream accuracy. Therefore, the quality and definition of the processed video file are close to those of the original file. The processed file requires excellent decoding capabilities from the playback device.

The higher the bitrate, the larger the file. You can calculate the file size based on the following formula: File size = Time × Bitrate/8. For example, if a 60-minute 720p online video file has a bitrate of 1 Mbit/s, the size of the file is calculated based on the following formula: 3,600 seconds × 1 Mbit/s/8 = 450 MB.

frame rate

The frame rate is used to measure the number of frames that are displayed per unit of time in a video, or the number of frames that are refreshed per second in an image. The unit of frame rate is frame per second (FPS) or Hz.

The higher the frame rate, the smoother and more lifelike the video appears. In most cases, 25 to 30 FPS is sufficient. 60 FPS can deliver an immersive and realistic playback experience. If you increase the frame rate to more than 75 FPS, the improvement of playback experience is less significant. If you specify a frame rate higher than the refresh rate of your monitor, the monitor cannot properly display the frames and the processing potential of your graphics card is wasted. Higher frame rates at the same resolution require greater processing capabilities from the graphics card.


A Group of Pictures (GOP) refers to a group of continuous images in an MPEG-encoded video or video stream. It starts with an I-frame and ends with the next I-frame. A GOP contains the following image types:

  • Intra coded picture (I-frame): the keyframe. An I-frame contains all information that is required to produce the picture for that frame. The I-frame is independently decoded and can be regarded as a static picture. The first frame in the video sequence is always an I-frame, and each GOP starts with an I-frame.

  • Predictive coded picture (P-frame): A P-frame must be encoded based on the previous I-frame. A P-frame contains motion-compensated difference information relative to the previous I-frame or P-frame. During decoding, the difference defined by the current P-frame is superimposed with the previously cached image to generate the final image. P-frames occupy fewer data bits compared with I-frames. However, P-frames are sensitive to transmission errors due to the complex dependencies on the previous I-frame or P-frame.

  • Bidirectionally predictive coded picture (B-frame): A B-frame contains motion-compensated difference information relative to the previous and subsequent frames. During decoding, the data of the current B-frame is superimposed with both the previously cached image and the decoded subsequent image to generate the final image. B-frames provide a high compression ratio and require high decoding performance.

The GOP value indicates the interval of keyframes, which is the distance between two Instantaneous Decoding Refresh (IDR) frames or the maximum number of frames in a frame group. At least one keyframe is required for each second of video. More keyframes improve video quality but increase bandwidth consumption and network loads. The interval is calculated by dividing the GOP value by the frame rate. The GOP value indicates the number of frames. For example, the default GOP value of ApsaraVideo VOD is 250 frames and the frame rate is 25 FPS. The time interval is calculated based on the following formula: 250/25 = 10 seconds.

The GOP value must be within an appropriate range to achieve balance among the video quality, file size that indicates bandwidth consumption, and seeking effect that indicates the speed of response to the drag and fast-forward operations.

  • When the GOP value increases, the file size is reduced. However, if the GOP value is too large, the last frames of a GOP are distorted, and the video quality is reduced.

  • The GOP value is also a key factor in determining the speed of response to seeking in a video. During seeking, the player locates the closest keyframe before a specific position. A larger GOP value indicates a longer distance between the specified position and the closest keyframe, which results in more predictive frames that need to be decoded. In this case, the loading time is extended and the seeking operation requires a long period of time to complete.

  • Encoding P-frames and B-frames is more complex compared with encoding I-frames. A large GOP value results in many P-frames and B-frames. This decreases the encoding efficiency.

  • However, if the GOP value is too small, the bitrate of the video must be increased to ensure that the image quality is not reduced. This process increases bandwidth consumption.

encoding profile

An encoding profile defines a set of capabilities that focus on a specific class of applications.

H.264 provides the following encoding profiles:

  • Baseline profile: This profile provides basic image quality and is applicable to mobile devices. The baseline profile uses I-frames and P-frames, and supports only progressive videos and context-adaptive variable-length coding (CAVLC).

  • Main profile: This profile provides mainstream image quality and is applicable to standard-definition devices, such as MP4 players that have relatively low decoding capabilities, portable video players, PSPs, and iPods. The main profile uses I-frames, P-frames, and B-frames, and supports progressive and interlaced videos. The main profile also supports CAVLC and context-adaptive binary arithmetic coding (CABAC).

  • High profile: This profile provides high image quality and is applicable to high-definition devices that have big screens, such as broadcast and disc storage applications for Blu-ray Discs and high-definition television applications. The high profile supports 8 × 8 inter-prediction, custom quantization, lossless video coding, more YUV formats and the features of the main profile.

Advanced Audio Coding (AAC) provides the following encoding profiles:

  • aac_low: Low Complexity AAC (LC)

  • aac_he: High Efficiency AAC (HE-AAC)

  • aac_he_v2: High Efficiency AAC version 2 (HE-AACv2)

  • aac_ld: Low Delay AAC (LD-AAC)

  • aac_eld: Enhanced Low Delay AAC (ELD-AAC)

bitrate control method

Bitrate control methods refer to the methods that are used to control the bitrate of a coded stream. The following bitrate control methods are commonly used:

  • Constant bitrate (CBR): This method is used to generate a file that has a fixed bitrate. If you use this method, the bitrate is fixed throughout the coded stream. CBR-compressed files are larger in size than VBR- and ABR-compressed files, and do not have much improvement in quality.

  • Variable bitrate (VBR): This method is used to generate a file that has a variable bitrate. This method determines the bitrate of the output file based on the complexity of the input file during encoding. For a more complex input file, a file that has a higher bitrate is generated. For a simpler file, a file that has a lower bitrate is generated. This method is generally used with the Two-Pass encoding method. VBR is applicable to storage and allows you to use the limited storage space in a more reasonable manner. However, you cannot predict the size and bitrate fluctuation of the output file by using VBR.

  • Average bitrate (ABR): This method is an average bitrate mode with interpolation parameters added. LAME has created this method to solve the size and quality mismatch of CBR-compressed files and the unpredictable file sizes of VBR. In a specific file range, ABR divides a stream into parts in the unit of 50 frames. 30 frames equal to about one second. ABR uses relatively low bitrates to code the less complex segments and high bitrates to code the more complex segments. ABR can be regarded as a compromise between VBR and CBR. The bitrate can reach the specified value within a specific time range, but the peak bitrate in some segments may exceed the specified bitrate. The average bitrate remains constant. ABR is a modified version of VBR. ABR ensures that the average output bitrate is within an appropriate range and codes videos within this range based on the complexity. By default, Alibaba Cloud uses ABR.

  • Video Buffering Verifier (VBV): This method can ensure that the bitrate is lower than a specific value. This method is used based on the maxrate and bufsize parameters. The maxrate parameter specifies the maximum output bitrate, and the bufsize parameter specifies the buffer size. VBV can be used with the Two-Pass or Constant Rate Factor (CRF) encoding method. The CRF encoding method can also be replaced by Capped CRF.

  • bufsize: the size of the video buffer. You can configure this parameter based on the expected bitrate fluctuation. In most cases, you can configure the bufsize parameter to twice the maxrate parameter. If the cache size of the client is small, set the bufsize parameter to the value of the maxrate parameter. If you want to limit the bitrate, set the bufsize parameter to half of the value of the maxrate parameter or less.

  • CRF: This method controls the output bitrate by using the quality control factor. The video quality is quantified into different levels from 0 to 51. 0 specifies lossless image quality and 51 specifies the worst image quality that can be generated. You can ensure the general quality of the output video by using CRF. The bitrate varies based on the complexity of the input content. If you do not know which CRF level is suitable, we recommend that you use a level in the range of [23,29]. You can adjust the CRF level based on the complexity of the content. If you increase or decrease the CRF level by 6, the bitrate is reduced by half or doubled. To generate files with the same definition, you can set the CRF level for a computer-generated video to be larger than the CRF level for a live-action video. CRF can provide better video quality, but cannot be used to predict the size and bitrate fluctuation of the output file.

  • Capped CRF: The bitrate of the output file by using CRF is not fixed. You can use CRF together with VBV to limit the range of the bitrate and prevent bitrate spikes.

  • One-Pass: The encoding speed of this method is faster than that of Two-Pass. By default, Alibaba Cloud uses One-Pass.

  • Two-Pass: The encoder is run twice to accurately assign the bitrate to obtain an output file that has a smaller size and higher quality. In the first pass, the encoder analyzes a video and generates log files. In the second pass, the encoder performs encoding based on the analysis results to obtain the best encoding quality. Two-Pass consumes more time than One-Pass. Therefore, Two-Pass cannot be used in scenarios that require high transcoding timeliness, such as live streaming and real-time communication. If the compression ratio of the input video is high, we recommend that you do not use Two-Pass. Otherwise, blocking artifact may occur.

Common terms


A region is a location in which you activate an Alibaba Cloud service. You can select different Alibaba Cloud regions and use Alibaba Cloud services closer to your business for lower access latency and better user experience.


OSS is short for Object Storage Service provided by Alibaba Cloud. MPS transcodes media files stored in OSS. The output files are also stored in OSS. For more information, see Terms in OSS documentation.


A bucket is a container for objects that are stored in OSS. Every object in OSS is contained in a bucket. You can configure multiple settings for a bucket, such as the region, access permissions, and storage formats. You can create different types of buckets to store different data as required. For more information, see the Bucket section of the "Terms" topic in OSS documentation.


Objects are the smallest data unit in OSS. Files uploaded to OSS are called objects. Unlike typical file systems, objects in OSS are stored in a flat structure instead of a hierarchical structure. An object is composed of a key, metadata, and the data stored in the object. Each object in a bucket is uniquely identified by the key. Object metadata is a group of key-value pairs that define the properties of an object, such as the size of the object and the time when the object is last modified. You can also specify custom user metadata for objects in OSS. For more information, see the Object section of the "Terms" topic in OSS documentation.

AccessKey pair

An AccessKey pair is the credential that is used by OSS to authenticate a requester. An AccessKey pair consists of an AccessKey ID and an AccessKey secret. OSS uses symmetric encryption based on an AccessKey pair to verify the identity of a requester. The AccessKey ID is used to identify a user. The AccessKey secret is used to encrypt and verify signature strings and OSS buckets. To ensure the security of your data, we recommend that you keep your AccessKey secret confidential.

OSS supports the following types of AccessKey pairs:

  • AccessKey pairs applied for by the bucket owner.

  • AccessKey pairs granted by the bucket owner by using Resource Access Management (RAM).

  • AccessKey pairs granted by the bucket owner by using Security Token Service (STS).

For more information, see Obtain an AccessKey pair.