Introduction to the transcoding, encryption, and AI features of MPS - ApsaraVideo Media Processing

ApsaraVideo Media Processing (MPS) allows you to convert an audio or video file to one or more files to adapt to different network bandwidths, terminal processing capabilities, and user needs. MPS performs multimodal analysis on the content, text, speeches, and scenes of media files and offers various features, such as automated review, content recognition, and smart editing.

Audio and video transcoding

The audio and video transcoding feature allows you to convert the definition, encoding format, or container format of audio and video streams to adapt to different network bandwidths and playback devices. MPS supports mainstream encoding and container formats, and allows you to perform simple edit operations and add watermarks and captions during transcoding. The following table describes the specifications of the audio and video transcoding feature. To use specifications that are unavailable in the MPS console or API operations, contact technical support with the help of sales staff.

Important

To use the feature described in the following table, you must submit a transcoding job. The regular transcoding fee is charged based on the specifications and length of the output video. For more information, see Audio and video transcoding fees.

Item	Parameter	Description
Input file	Container format	Video: 3GP, AVI, FLV, MP4, M3U8, MPG, ASF, WMV, MKV, MOV, TS, WebM, MXF, and VOB. Audio: AAC, FLAC, M4A, MP3, MP4, and OGG. Caption: ASS, SSA, SRT, and VTT.
	Video encoding format	Apple ProRes, AVS+, AVS, AVS2, H.263, H.263+, H.264/AVC, H.265/HEVC, H.266/VVC, MJPEG, MPEG-1, MPEG-2, MPEG-4, QuickTime, RealVideo, VP8, VP9, and WMV.
	Audio encoding format	AAC, AC3, ADPCM, AMR, DSD, EAC3, MP1, MP2, MP3, PCM, RealAudio, Vorbis, and WMA.
	File size	The maximum size is 100 GB.
	Chroma	Examples: 4:2:2 and 4:2:0.
Output file	Container format	Note A container format must be used along with specific encoding formats. For more information, see Supported formats. To use specifications that are unavailable in the MPS console or API operations, contact technical support with the help of sales staff. If you convert the container format of audio or video streams to another, the encoding format remains unchanged. MP4, HLS, and FLV are supported as the output formats. Video: HLS, DASH, CMAF, 3GP, AVI, FLV, F4V, fMP4, MKV, MOV, MP4, MPEG, TS, MXF, and WebM. Animated image: GIF and WEBP. Audio: AAC, M4A, MP2, MP3, MP4, OGG, FLAC, and WAV.
	Encoding format	Video: H.263, H.264/AVC, H.265/HEVC, H.266/VVC, VP8, VP9, AV1, AVC-Introa, AVS2, MPEG-1, MPEG-2, MPEG-2 422, MPEG-4, and Apple ProRes. Animated image: GIF and WEBP. Audio: AAC, AC3, EAC3, MP2, MP3, FLAC, Opus, Vorbis, WMA, and pcm_s16le.
	Encoding profile	The encoding profiles baseline, main, and high are supported for the H.264 video codec. The encoding profiles aac_low, aac_he, aac_he_v2, aac_ld, and aac_eld are supported for the AAC audio codec.
	Resolution	If the video codec is H.264, the output resolution ranges from 128 pixels to 4,096 pixels. If the video codec is H.265, H.266, or AV1, the output resolution ranges from 128 pixels to 8,192 pixels.
	Bitrate	If the video codec is H.264, the output bitrate ranges from 10 Kbit/s to 50,000 Kbit/s. If the video codec is H.265, H.266, or AV1, the output bitrate ranges from 10 Kbit/s to 200,000 Kbit/s.
	Frame rate	The maximum output frame rate is 60 frames per second (FPS).
	Sampling bit depth	If the video codec is H.264, the bit depth is 8 bits. If the video codec is H.265, the bit depth can be up to 12 bits.
	Pixel format	Examples: yuv420p, yuvj420p, yuv422p, yuvj422p, yuv444p, and yuvj444.
	Bitrate control	Variable bitrate (VBR), constant bitrate (CBR), average bitrate (ABR), and constant rate factor (CRF).
	Scan mode	Scan mode of input video, automatic deinterlacing, interlaced scan, and sequential scan are supported.

Narrowband HD^TM

Narrowband HD^TM is a media processing feature based on the transcoding technologies supported by Alibaba Cloud. This feature allows you to improve video compression efficiency and reduce file sizes without compromising the image quality. This way, you can reduce video stuttering during playback and save storage and traffic costs.

Important

To use the Narrowband HD™ feature described in the following table, you must select an appropriate transcoding template when you submit a transcoding job. The Narrowband HD™ transcoding fee is charged based on the specifications and length of the output video.

Feature	Description
Narrowband HD^TM 1.0	MPS intelligently analyzes details such as scenes, actions, content, and textures in a video. This reduces the bitrate by 20% to 40% without changing the image quality, or improves the definition of videos under the same network bandwidth conditions. Supported codecs are H.264 and H.265. Other configuration items are the same as those of audio and video transcoding. Start a free trial.
Narrowband HD ^TM 2.0	MPS improves the upper limit of the encoder and integrates the definition restoration and enhancement features. This reduces the bitrate by 40% to 60% without changing the image quality, or improves the definition of videos under the lower network bandwidth conditions. Supported codecs are H.264 and H.265. Other configuration items are the same as those of audio and video transcoding. Start a free trial.

Audio enhancement

ApsaraVideo Audio Lab provides full-scenario audio enhancement and repair solutions by combining signal processing and deep learning technologies.

Important

To use the audio enhancement features described in the following table, you must select an appropriate transcoding template when you submit a transcoding job. The audio enhancement fee is charged based on the specifications and length of the output audio. The video transcoding fee is charged based on the billing rules of the feature that you use. To configure a transcoding template for audio enhancement, search for and join the DingTalk group (ID 32171220) to contact Alibaba Cloud technical support.

Feature	Description
Sound enhancement	MPS supports sound enhancement for mono audio streams, binaural audio streams, and audio streams that use the 5.1 or 7.1 surround sound format. When you use earphones or speakers to play music, a speech, or a video, MPS provides a high-quality, natural, clear, and customizable sound effect.
Volume normalization	MPS intelligently normalizes the volume of videos. This way, you can resolve the issue of unstable volume due to the volume differences of content sources in scenarios where the short videos or music are in continuous playback.

High-speed transcoding

MPS supports the high-speed transcoding feature to split a video into multiple segments and then transcode them in parallel. This increases the transcoding speed by 5 to 30 times and significantly reduces the processing duration. This feature is suitable for important content that requires high timeliness, such as news and events.

Important

To use this feature, you must enable an MPS queue for high-speed transcoding and submit a transcoding job to this MPS queue. The high-speed transcoding fee is charged based on the specifications, length, and transcoding speed of the output video. In addition, you are charged for the audio and video transcoding or audio-visual enhancement feature.

Feature	Description
Speed boost	The transcoding speed can be boosted by 5 to 30 times depending on the properties of the input video, such as the format, resolution, and bitrate of the video. You can specify the expected speed boost for an MPS queue for high-speed transcoding, such as 5 times, 10 times, 20 times, or 30 times.
Scenarios	We recommend that you use high-speed transcoding for videos that are longer than 30 minutes, or videos that require high frame rates, ultra high definition, and audio-visual enhancement. For more information, see the Limits on high-speed transcoding section of the "Limits" topic.
Policy	Splitting is not supported for all videos. If you submit a video that is not supported by high-speed transcoding to an MPS queue for high-speed transcoding, the video is transcoded in the regular way by default.

More features

Media information

MPS can obtain information about audio and video files that are stored in Object Storage Service (OSS), including the resolution, bitrate, frame rate, codec, and format of the files.

Important

You must call the SubmitMediaInfoJob operation to use this feature. You are charged based on the number of API requests. For more information, see the Pricing for API calls section of the "Audio and video transcoding fees" topic.

Video editing

You can perform simple edit operations on videos. For example, you can extract audio or videos, merge videos, clip videos, and mix audio.

Important

To use the video editing features described in the following table, you must configure relevant parameters when you submit a transcoding job. The transcoding fee is charged based on the specifications and length of the output video.

Feature	Description	Parameter of an API operation	MPS console
Audio extraction	This feature allows you to extract the audio stream from a video by disabling the video stream.	Remove	Supported
Video extraction	This feature allows you to extract the video stream from a video by disabling the audio stream.	Remove	Supported
Black bar removal	This feature allows you to detect whether black bars exist in a video. If black bars exist, the system automatically removes the black bars.	Crop	Not supported
Video cropping	This feature allows you to resize the video image, adjust the position of the resized image, and remove the gaps between the original image and the resized image.	Crop	Not supported
Black bar addition	This feature allows you to resize the video image, adjust the position of the resized image, and fill the gaps between the original image and the resized image by using black bars.	Pad	Not supported
Auto-rotate screen	This feature allows you to convert the resolution of a video based on the long and short sides instead of the width and height of the video. If the input videos include videos in landscape mode and portrait mode, we recommend that you enable this feature.	LongShortMode	Supported
Video rotation	This feature allows you to set the rotation angle of a video.	Rotate	Supported
Video merging	This feature allows you to merge up to 100 videos into one. You can set the start point in time and length of each video to be merged.	MergeList or MergeConfigUrl	Not supported
Video clipping	Video cutting: You can cut a video from a specific point in time to retain a specific length of the video. Video trimming: You can trim a part of a specific length from the tail of a video.	Clip	Supported
Video head and tail	This feature allows you to add dynamic logos at the beginning of a video and specify the content for the video tail. This helps increase product recognition and highlight copyright information.	OpeningList and TailSlateList	Video tail addition is supported.
Blurring	This feature allows you to blur the specified area of a video.	DeWatermark	Not supported
Audio mixing	This feature allows you to merge two audio tracks into one. You can use this feature to add background music.	Amix	Not supported

Video snapshot

You can use the video snapshot feature to take snapshots of a specific size at a specific point in time of a video. The snapshots are used for video thumbnails, sprites, and progress bar thumbnails.

Important

To use the video snapshot features described in the following table, you must submit a snapshot job. The snapshot fee is charged based on the number of snapshots.

Feature	Description	Parameter of an API operation	MPS console
Static snapshot	This feature allows you to take snapshots of specific sizes at specific points in time of a video in the JPG format. The following snapshot modes are provided: Single: A snapshot is taken at a specific point in time of a video. This type of snapshot can be taken synchronously or asynchronously. Multiple: Snapshots are taken at a specific interval from a specific point in time until a specific number of snapshots are taken or the video ends. You can specify the interval in units of seconds. This type of snapshot is taken asynchronously. Average: A specific number of snapshots are taken from a specific point in time of a video to the end of the video at a regular interval. This type of snapshot is taken asynchronously. Time point (in beta testing): Snapshots are taken at the specified points in time of a video. This type of snapshot is taken asynchronously.	SnapshotConfig	Supported
Sprite snapshot	This feature allows you to create a sprite by merging the snapshots that are taken into a single image based on specific rules. The sprites are in the JPG format. This type of snapshot is taken asynchronously. Users can send a request to query the information about multiple images at a time. This greatly reduces the number of API requests for images and improves client performance.	TileOut and TileOutputFile	Not supported
WebVTT snapshot	This feature allows you to generate VTT files for the snapshots that are taken or the sprites that are created. VTT files contain the time when snapshots are taken, paths of snapshots, and coordinates of snapshots in sprites. When a client requests an image, the image is displayed after the corresponding VTT file is obtained and parsed. This feature can be used to display thumbnails on the progress bar.	SubOut	Supported
Keyframe snapshot	This feature allows you to capture only keyframes. If the frame at the specified point in time is not a keyframe, the adjacent keyframe is captured.	FrameType	Supported
Black screen detection	This feature allows you to detect whether the first snapshot is a black screen. To use this feature, set the Time parameter to 0, which specifies that the snapshots are taken from the start of a video. You can define a black screen by specifying the portion of black pixels in an image and the color value of black pixels. If the black screen detection feature is enabled, the system checks the frames in the first 5 seconds of a video. If a non-black frame exists, the non-black frame is captured. Otherwise, the job fails if the job is a single-snapshot job, or the first black frame is captured if the job is a multi-snapshot job.	BlackLevel and PixelBlackThreshold	Supported

Video watermarking

This feature allows you to add visible watermarks, such as enterprise logos and TV station logos, to a video to highlight brands, protect copyrights, and raise product popularity. You can also add blind watermarks to a video for copyright tracing. For more information, see the Digital watermarking section of this topic.

Important

To use the video watermarking features described in the following table, you must submit a transcoding job and specify the watermark materials and watermark template. The watermark template is optional. The transcoding fee is charged based on the specifications and length of the output video. The watermarking fee is charged based on the number of watermarks.

Feature	Description	Parameter of an API operation	MPS console
Image watermark	You can add up to four watermarks to a video. You can set the start time, duration, position, and size of each watermark. Images in the PNG format and animated images in the PNG, MOV, and GIF formats are supported. You can use watermark templates to simplify the process.	WaterMarks	Supported
Text watermark	You can add up to four watermarks to a video. You can set the start time, duration, position, and size of each watermark. You can set the text content and font effects such as the font size, font type, color, transparency, and border. For more information about supported fonts and colors, see Fonts and Color. You cannot use watermark templates to add text watermarks.	WaterMarks	Not supported

Caption

This feature allows you to add captions to videos for better comprehension and appreciation.

Important

To use the caption feature described in the following table, you must submit a transcoding job or create a transcoding workflow and trigger the workflow. The transcoding fee is charged based on the specifications and length of the output video.

Feature	Description	Parameter of an API operation	MPS console
Caption packaging	You can integrate caption files and audio and video streams into a master playlist in the M3U8 or MPD format by using a packaging workflow. You can add up to four captions to a master playlist. This allows you to switch between captions of different versions. An HTTP Live Streaming (HLS) packaging workflow supports captions in the VTT format. A Dynamic Adaptive Streaming over HTTP (DASH) packaging workflow supports captions in the VTT, STL, and TTML formats.	HLS packaging: ExtXMedia DASH packaging: InputConfig	Supported

Video packaging

Packaging indicates the process in which a master playlist is generated for multiple video streams at different bitrates, multiple caption streams, and multiple audio streams. The packaging feature allows you to perform the following operations during streaming media playback:

Adaptive streaming: supports automatic bitrate adjustment to ensure smooth live streaming.
Ad placement: supports video ad placement between segments.

Important

To use the video packaging features described in the following table, you must create a transcoding workflow and trigger the workflow. The transcoding fee is charged based on the specifications and length of the output video.

Feature	Description	Parameter of an API operation	MPS console
HLS packaging	HLS that supports secondary indexes is used for video packaging. HLS supports index files in the M3U8 format and video files in the TS format.	For more information, see How do I perform HLS package?	Supported
CMAF packaging	Common Media Application Format (CMAF) that supports the output format of HLS or DASH is used for packaging.	N/A	Not supported
Custom segment length	You can specify a maximum of 10 points in time at which you want to segment a video, and the length of segments. The segment length ranges from 1 second to 60 seconds. This feature allows you to adapt the media segment length to the network bandwidth of playback clients. This way, the loading time of the first frame is reduced.	Segment	Not supported

Video encryption

Important

To use the video encryption features described in the following table, you must create a transcoding workflow and trigger the workflow. The transcoding fee is charged based on the specifications and length of the output video.

Feature	Description	Parameter of an API operation	MPS console
HLS encryption	This feature allows you to encrypt a video based on the HLS AES-128 protocol by using a self-managed or Key Management Service (KMS) key. You can decrypt and play the video on a player that supports HLS streams. This ensures video security on mobile devices, and offers high-level security and excellent terminal compatibility.	N/A	Supported
Alibaba Cloud proprietary cryptography	This feature allows you to encrypt a video based on the Alibaba Cloud proprietary cryptography protocol and convert the video to an encrypted HLS format. Only KMS keys are supported. You must use ApsaraVideo Player to decrypt and play the video. Otherwise, you cannot play or transmit the video even if you download it to an on-premises device. This ensures video security on mobile devices and Flash players. This feature can be used in scenarios such as online education and subscription-based viewing, which require high-level security.	N/A	Supported

Video AI

Automated review

This feature allows you to review the content in a media file, such as the title, overview, thumbnail, video, and audio. This way, you can efficiently detect prohibited content in a video. This feature can be used in multiple scenarios, such as short video review, live streaming review, and media review.

Important

To use the automated review feature described in the following table, you must submit a media review job. The automated review fee is charged based on the length of the processed video.

Feature	Content	Description
Content moderation	Pornography detection	Detects pornographic and sexy content from dimensions such as voice, text, and vision.
	Terroristic content detection	Detects terroristic content from more than 10 dimensions, such as weapon, bloody scene, specific costume, smoke and light scene, special symbol, crowd, and parade.
	Ad violation detection	Detects different forms of ads, such as advertising text, watermarks, QR codes, illegal ads, and mini program codes.
	Logo detection	Detects logos in a video or image, such as TV station logos, trademarks, and watermarks. This helps protect copyrights.
	Undesirable content detection	Detects undesirable scenes in a video or image, such as picture-in-picture (PiP), smoking, live broadcasting while driving, and meaningless images.
	Audio anti-spam	Detects illegal content in audio, such as pornography, terrorism, and abuse. Chinese and English speech recognitions are supported.

Media fingerprinting

The media fingerprinting feature is implemented based on video recognition technologies developed by Alibaba Cloud. This feature uses a fingerprint to uniquely mark a media file, and allows you to extract and compare the fingerprints among media files. This helps detect duplicate videos and trace the source of video clips.

Important

To use the media fingerprinting features described in the following table, you must submit a media fingerprinting job. The media fingerprinting fee is charged based on the length of the processed video or audio.

Feature	Description
Media fingerprinting	You can use this feature to extract the fingerprints of videos, import and analyze video fingerprints in the fingerprint library, and search for similar videos.
Audio fingerprinting	You can use this feature to extract the fingerprints of audio, import and analyze audio fingerprints in the fingerprint library, and search for similar audio.
Image fingerprinting	You can use this feature to extract the fingerprints of images, import and analyze image fingerprints in the fingerprint library, and search for similar images.
Text fingerprinting	You can use this feature to extract the fingerprints of text, import and analyze text fingerprints in the fingerprint library, and search for similar text.

Service management

Feature	Description	Parameter of an API operation	MPS console
Media management	You can upload, manage, and publish media files.	N/A	N/A
Workflow orchestration	MPS automatically runs a workflow in the cloud after an audio or video file is uploaded.	N/A	Supported
Transcoding template	A transcoding template is a collection of transcoding parameters. You can use a transcoding template to simplify the operations when you create a transcoding job or use a workflow. Transcoding templates can be classified into the following types: custom templates, customized templates, and preset templates.	TemplateId	Supported
Watermark template	A watermark template specifies the settings of multiple parameters, such as the parameters that determine the position and size of watermarks. You can use a watermark template to simplify the watermarking process.	WaterMarkTemplateId	Supported
Transcoding priority	You can specify the priority of transcoding jobs in an MPS queue. A maximum of 10 priority levels can be specified.	Priority	Not supported
Conditional transcoding	If the video bitrate, video resolution, or audio bitrate of an input video is less than the specified output settings, the video is transcoded in original quality or no transcoding is performed.	Examples: IsCheckReso and IsCheckResoFail	Supported
MPS queue	MPS jobs such as transcoding and asynchronous snapshot jobs are asynchronously processed. You must add the jobs to an MPS queue for scheduling and execution. You can create multiple MPS queues and specify the priority of jobs in an MPS queue. A maximum of 10 priority levels are supported.	Priority	Not supported
Message notification	MPS jobs such as transcoding and asynchronous snapshot jobs are asynchronously processed. You can integrate Message Service (MNS) to associate an MNS topic or queue with an MPS queue or workflow. If a job in the MPS queue is complete or the workflow starts or stops, MPS sends a notification to the specified contact.	NotifyConfig	Supported

Audio and video transcoding

Narrowband HDTM

Audio enhancement

Audio enhancement

High-speed transcoding

More features

Media information

Video editing

Video snapshot

Video watermarking

Caption

Video packaging

Video encryption

Video AI

Automated review

Media fingerprinting

Service management

Narrowband HD^TM