ApsaraVideo Media Processing (MPS) allows you to convert an audio or video file to one or more files to adapt to different network bandwidths, terminal processing capabilities, and user needs. MPS performs multimodal analysis on the content, text, speeches, and scenes of media files and offers various features, including automated review, content recognition, and smart editing.

You can perform the following operations by using MPS:
  • Adapt to different terminal devices: You can convert media formats to support the playback on multiple types of terminal devices, such as PCs, TVs, and mobile devices.
  • Adapt to different network conditions: You can produce video files of varied definitions to adapt the bitrates to different network bandwidths. Transcoded video files can be in the standard definition, high definition, or ultra high definition.
  • Add watermarks: You can add enterprise logos, TV station logos, or user nicknames as watermarks to a video to highlight the brand and copyright information and increase product recognition.
  • Take snapshots: You can take snapshots of a video at specific time points. You can use a snapshot as the video thumbnail or multiple snapshots to generate a sprite.
  • Edit videos: You can edit, crop, and merge original videos to generate new ones.
  • Perform video enhancement: You can remove the blurs or mosaics from poor-quality videos to generate restored versions in higher definitions.
  • Reduce storage and traffic costs: You can adjust the video bitrate, increase the compression rate, and reduce the file size without compromising the video quality. This reduces video stuttering during playback and saves the storage and traffic costs.
  • Generate media fingerprints: You can extract fingerprint features such as images and audio from a video to generate a media fingerprint. The media fingerprint can be used to find and remove duplicate videos, trace the source of video clips, filter videos that infringe copyright, and identify user-generated content (UGC).
  • Perform automated reviews: MPS intelligently detects pornographic content, terrorism content, ads, and undesirable content in the speeches, text, and scenes of a video. This feature can help you reduce manpower costs for manual review and lower the risks that may be brought by non-compliant content.
  • Improve the conversion rate (CVR) of videos: MPS learns from image aesthetics and a large amount of user behavior data, selects the optimal keyframe from a video, and then generates images, animated stickers, or short videos as the video thumbnail.


Container format

Configuration item Description
Input format
  • Container format: 3GP, AVI, FLV, MP4, M3U8, MPG, ASF, WMV, MKV, MOV, TS, WebM, or MXF
  • Video encoding format: H.264/AVC, H.263, H.263+, H.265, MPEG-1, MPEG-2, MPEG-4, MJPEG, VP8, VP9, Quicktime, RealVideo, or Windows Media Video
  • Audio encoding format: AAC, AC-3, ADPCM, AMR, DSD, MP1, MP2, MP3, PCM, RealAudio, or Windows Media Audio
Output format
  • Container formats:
    • Video: FLV, MP4, HLS (M3U8+TS), or MPEG-DASH (MPD+fMP4)
    • Audio: MP3, MP4, OGG, FLAC, or M4A
    • Image: GIF or WEBP
  • Video encoding format: H.264/AVC or H.265/HEVC
  • Audio encoding format: MP3, AAC, VORBIS, or FLAC
Audio extraction Extract only audio from a video by disabling the video image part.
Video extraction Extract only video from a video file by disabling the audio part.
Container format conversion Convert the container format of a video to another without changing the encoding format. You can convert the container format of an audio or video file to MP4, M3U8, or FLV.
Video to animated sticker Capture a highlight clip from a video and covert the clip to an animated sticker in GIF or WEBP format.
Video encoding parameters
Parameter Description
Codec The video codec format.
  • Valid values: H.264, H.265, GIF, and WEBP.
  • Default value: H.264.
Bitrate The bitrate of the transcoded video.
  • Valid values: [10,50000].
  • Unit: Kbit/s.
Fps The frame rate of the transcoded video.
  • The default value is the frame rate of the input video file. If the frame rate of the input video file exceeds 60, the value is 60.
  • Valid values: (0,60].
  • Unit: frames per second.
Width* Height The resolution of the transcoded video.
  • Width:
    • Default value: the original width of the video.
    • Valid values: [128,4096].
    • Unit: pixel.
  • Height:
    • Default value: the original height of the video.
    • Value range: [128,4096].
    • Unit: pixel.
Scale Auto scaling. You can enable proportional scaling by height or width.
Gop The group of pictures (GOP) size. The GOP size indicates the maximum interval of keyframes or the maximum number of frames in a frame group.
  • If you want to specify the maximum interval, the value must contain a unit. The unit is seconds in this case. Default value: 10s.
  • If you want to specify the maximum number of frames, the value does not contain a unit. Valid values: 1 to 100000.
Profile The codec profile. This parameter is only valid when the codec format is H.264. You can set this parameter to Baseline, Main, or High.
PixFmt The pixel format for video color encoding.
  • Standard pixel formats such as yuv420p and yuvj420p are supported.
  • The default pixel format can be yuv420p or the original color format.
Rotate The rotation angle of the video, in the clockwise direction.
  • Valid values: [0,360].
  • Default value: 0.
Video processing parameters
Parameter Description
ScanMode The scan mode. Valid values: interlaced and progressive.
Rate Control Modes The bitrate control method. The following bitrate control methods are supported: VBR, CBR, and CRF.
Crop Specifies whether to crop the video image. MPS can automatically detect and remove the black borders of a video image. You can also set cropping-related parameters as needed.
Pad Specifies whether to add black borders to the video image. You can add black borders to a video image.
Audio encoding parameters
Parameter Description
Codec The audio codec format.
  • Valid values: AAC, MP3, VORBIS, and FLAC.
  • Default value: AAC.
Samplerate The sampling rate.
  • Default value: 44100.
  • Valid values: 22050, 32000, 44100, 48000, and 96000.
  • Unit: Hz.
  • If the container format for the video files is FLV and the encoding format for the audio files is MP3, this parameter cannot be set to 32000, 48000, or 96000.
  • If the encoding format for the audio files is MP3, this parameter cannot be set to 96000.
Bitrate The audio bitrate.
  • Default value: 128.
  • Bitrate range: [8,1000].
  • Unit: Kbit/s.
Channels The number of sound channels.
  • Default value: 2.
  • If the Codec parameter is set to MP3, this parameter can be set only to 1 or 2.
  • If the Codec parameter is set to AAC, this parameter can be set only to 1, 2, 4, 5, 6, or 8.
Transcoding control
Feature Description
HLS MasterPlayList This feature allows you to package one or more video streams at different bitrates, subtitles in different languages, and audio tracks into a Master Playlist file.
Conditional transcoding If the bitrate or resolution specified in the transcoding template is higher than that of the input video, you can select one of the following options:
  • Do not transcode the video.
  • Transcode the video to the template specifications, except that the bitrate or resolution of the transcoded video is the same as that of the input video.
Workflow MPS automatically executes the workflow in the cloud after an audio or video file is uploaded.

Transcoding templates

Preset templates

MPS provides a series of preset templates so that transcoded videos can adapt to a range of network bandwidths.

  • Intelligent preset templates

    Intelligent preset templates automatically adjust transcoding parameters based on the input video file so that the output video files can meet requirements. Whether an intelligent preset template is applicable to an input video file depends on the resolution, bitrate, and other properties of the input video file. Therefore, you must submit an analysis task to obtain a list of intelligent preset templates that are available to the input video file. MPS tries to balance the need to reduce the bitrate of the media file and the need to reduce quality loss in the transcoding process. If you use an intelligent preset template, quality is prioritized.

  • Static preset templates

    Analysis tasks are not required before you use this type of template. You can use static preset templates to transcode a video file, transcode an audio file to the MP3 format, or convert the container format of a media file. Media files generated by using this type of template can adapt to common playback devices and network bandwidths. Static preset templates control the output bitrate first.

  • Preset Narrowband HDTM templates

    Analysis tasks are not required before you use this type of template. You can use this type of template to generate videos in the FLV, MP4, or M3U8 format. Preset Narrowband HDTM templates, including preset Narrowband HDTM 1.0 templates, are exclusively provided by MPS. Compared with other transcoding templates, a preset Narrowband HDTM template can generate an output video at a lower bitrate without compromising the video quality. This way, MPS helps you save more costs.

Custom templates

A custom template contains a set of transcoding parameters, such as the audio, video, and container parameters. You can set the transcoding parameters based on your needs to create a regular template. Alternatively, you can submit a ticket to configure the created template to a Narrowband HDTM 1.0 .

Video editing

Feature Description
Video editing This feature allows you to crop a video of the specified duration from the specified time point to generate a video clip.
Video merging This feature allows you to merge up to 20 videos into one.
Blurring This feature allows you to blur the specified area of a video.
Video head and tail
  • This feature allows you to add dynamic logos at the beginning of a video and specify the content for the video tail.
  • This feature helps increase product recognition and highlight copyright information.


Feature Description
Static watermarks
  • You can add up to 20 watermarks to an output video.
  • Supported file formats: PNG, text, MOV, and APNG.
Dynamic watermarks You can specify the time period during which watermarks are displayed.


Feature Description
Video snapshots
  • This feature allows you to capture images at specific time points for a video that is stored in Object Storage Service (OSS). The captured images are in JPG format.
  • You can take a single snapshot, multiple snapshots at different time points, or snapshots at even intervals.
Sprite and WebVTT-based thumbnail A sprite is generated by taking a series of snapshots. This feature allows you to obtain information about multiple snapshots in one request. This way, the number of requests is reduced, and the client performance is improved.
Smart thumbnail MPS learns from image aesthetics, recognizes the content of a video, selects the optimal keyframe from the video, and then generates an image as the video thumbnail.

Narrowband HDTM

Feature Description
Narrowband HDTM 1.0 Narrowband HDTM 1.0 is a media processing feature that is developed based on the transcoding technologies of Alibaba Cloud. Narrowband HDTM 1.0 intelligently analyzes the scenes, actions, content, and textures in a video. This helps reduce the bitrate of output videos and bandwidth costs without compromising the video quality.

High-speed transcoding

For long videos of more than 30 minutes, MPS can speed up the transcoding process by concurrently transcoding video clips. The transcoding speed can be increased by 5 times.

Video AI

Feature Description
Media fingerprint A media fingerprint can be used to uniquely identify a video. The media fingerprint is usually a binary string. A media fingerprint is unique. Different videos have different media fingerprints. A media fingerprint is also stable. The media fingerprint of an audio or video file remains the same if the file is converted to another format, cropped, merged, compressed, or rotated.
Automated review This feature is implemented based on a large amount of labeled data and deep learning algorithms. MPS analyzes the content, thumbnail, title, and comments of videos and accurately recognizes prohibited content in terms of speech, text, and visual elements. Prohibited content includes pornographic, terrorism, and politically sensitive content, ads, and content in video blacklists. This feature can be used in multiple scenarios, such as short video platforms, live streaming platforms, and media auditing.

Other features

Feature Description
Media information You can obtain the encoding and content information of audio and video files that are stored in OSS.
Custom duration of M3U8 output media segments
  • You can customize the duration of a media segment in the M3U8 format. The duration can range from one second to 60 seconds.
  • This feature allows you to adapt the media segment length to the network bandwidth of the playback clients, so that the loading time of the first frame can be reduced.
External subtitle You can import an external subtitle file and specify the encoding format of the subtitle file for a transcoding job.
  • Message Service (MNS) is supported.
  • You can set MNS-related parameters for an MPS queue. When a message is returned from a transcoding job in the MPS queue, MNS delivers the returned message to you.
  • ApsaraVideo Player SDK for Web supports playback in Flash, HTML5, and automatic rotation mode.
  • ApsaraVideo Player SDKs are provided for iOS and Android devices.