All Products
Search
Document Center

ApsaraVideo Media Processing:Functions and features

Last Updated:Mar 06, 2024

ApsaraVideo Media Processing (MPS) allows you to convert an audio or video file to one or more files to adapt to different network bandwidths, terminal processing capabilities, and user needs. MPS performs multimodal analysis on the content, text, speeches, and scenes of media files and offers various features, such as automated review, content recognition, and smart editing.

Audio and video transcoding

The audio and video transcoding feature allows you to convert the definition, encoding format, or container format of audio and video streams to adapt to different network bandwidths and playback devices. MPS supports mainstream encoding and container formats, and allows you to perform simple edit operations and add watermarks and captions during transcoding. The following table describes the specifications of the audio and video transcoding feature. To use specifications that are unavailable in the MPS console or API operations, contact technical support with the help of sales staff.

Important

To use the feature described in the following table, you must submit a transcoding job. The regular transcoding fee is charged based on the specifications and length of the output video. For more information, see Audio and video transcoding fees.

Item

Parameter

Description

Input file

Container format

  • Video: 3GP, AVI, FLV, MP4, M3U8, MPG, ASF, WMV, MKV, MOV, TS, WebM, MXF, and VOB.

  • Audio: AAC, FLAC, M4A, MP3, MP4, and OGG.

  • Caption: ASS, SSA, SRT, and VTT.

Video encoding format

Apple ProRes, AVS+, AVS, AVS2, H.263, H.263+, H.264/AVC, H.265/HEVC, H.266/VVC, MJPEG, MPEG-1, MPEG-2, MPEG-4, QuickTime, RealVideo, VP8, VP9, and WMV.

Audio encoding format

AAC, AC3, ADPCM, AMR, DSD, EAC3, MP1, MP2, MP3, PCM, RealAudio, Vorbis, and WMA.

File size

The maximum size is 100 GB.

Chroma

Examples: 4:2:2 and 4:2:0.

Output file

Container format

Note
  • A container format must be used along with specific encoding formats. For more information, see Supported formats. To use specifications that are unavailable in the MPS console or API operations, contact technical support with the help of sales staff.

  • If you convert the container format of audio or video streams to another, the encoding format remains unchanged. MP4, HLS, and FLV are supported as the output formats.

  • Video: HLS, DASH, CMAF, 3GP, AVI, FLV, F4V, fMP4, MKV, MOV, MP4, MPEG, TS, MXF, and WebM.

  • Animated image: GIF and WEBP.

  • Audio: AAC, M4A, MP2, MP3, MP4, OGG, FLAC, and WAV.

Encoding format

  • Video: H.263, H.264/AVC, H.265/HEVC, H.266/VVC, VP8, VP9, AV1, AVC-Introa, AVS2, MPEG-1, MPEG-2, MPEG-2 422, MPEG-4, and Apple ProRes.

  • Animated image: GIF and WEBP.

  • Audio: AAC, AC3, EAC3, MP2, MP3, FLAC, Opus, Vorbis, WMA, and pcm_s16le.

Encoding profile

  • The encoding profiles baseline, main, and high are supported for the H.264 video codec.

  • The encoding profiles aac_low, aac_he, aac_he_v2, aac_ld, and aac_eld are supported for the AAC audio codec.

Resolution

  • If the video codec is H.264, the output resolution ranges from 128 pixels to 4,096 pixels.

  • If the video codec is H.265, H.266, or AV1, the output resolution ranges from 128 pixels to 8,192 pixels.

Bitrate

  • If the video codec is H.264, the output bitrate ranges from 10 Kbit/s to 50,000 Kbit/s.

  • If the video codec is H.265, H.266, or AV1, the output bitrate ranges from 10 Kbit/s to 200,000 Kbit/s.

Frame rate

The maximum output frame rate is 60 frames per second (FPS).

Sampling bit depth

  • If the video codec is H.264, the bit depth is 8 bits.

  • If the video codec is H.265, the bit depth can be up to 12 bits.

Pixel format

Examples: yuv420p, yuvj420p, yuv422p, yuvj422p, yuv444p, and yuvj444.

Bitrate control

Variable bitrate (VBR), constant bitrate (CBR), average bitrate (ABR), and constant rate factor (CRF).

Scan mode

Scan mode of input video, automatic deinterlacing, interlaced scan, and sequential scan are supported.

Narrowband HDTM

Narrowband HDTM is a media processing feature based on the transcoding technologies supported by Alibaba Cloud. This feature allows you to improve video compression efficiency and reduce file sizes without compromising the image quality. This way, you can reduce video stuttering during playback and save storage and traffic costs.

Important

To use the Narrowband HD™ feature described in the following table, you must select an appropriate transcoding template when you submit a transcoding job. The Narrowband HD™ transcoding fee is charged based on the specifications and length of the output video.

Feature

Description

Narrowband HDTM 1.0

MPS intelligently analyzes details such as scenes, actions, content, and textures in a video. This reduces the bitrate by 20% to 40% without changing the image quality or improves the definition of videos under the same network bandwidth conditions. Supported codecs are H.264 and H.265. Other configuration items are the same as those of audio and video transcoding. Start a free trial.

Audio enhancement

Audio enhancement

ApsaraVideo Audio Lab provides full-scenario audio enhancement and repair solutions by combining signal processing and deep learning technologies.

Important

To use the audio enhancement features described in the following table, you must select an appropriate transcoding template when you submit a transcoding job. The audio enhancement fee is charged based on the specifications and length of the output audio. The video transcoding fee is charged based on the billing rules of the feature that you use. To configure a transcoding template for audio enhancement, search for and join the DingTalk group (ID 32171220) to contact Alibaba Cloud technical support.

Feature

Description

Sound enhancement

MPS supports sound enhancement for mono audio streams, binaural audio streams, and audio streams that use the 5.1 or 7.1 surround sound format. When you use earphones or speakers to play music, a speech, or a video, MPS provides a high-quality, natural, clear, and customizable sound effect.

Volume normalization

MPS intelligently normalizes the volume of videos. This way, you can resolve the issue of unstable volume due to the volume differences of content sources in scenarios where the short videos or music are in continuous playback.

High-speed transcoding

MPS supports the high-speed transcoding feature to split a video into multiple segments and then transcode them in parallel. This increases the transcoding speed by 5 to 30 times and significantly reduces the processing duration. This feature is suitable for important content that requires high timeliness, such as news and events.

Important

To use this feature, you must enable an MPS queue for high-speed transcoding and submit a transcoding job to this MPS queue. The high-speed transcoding fee is charged based on the specifications, length, and transcoding speed of the output video. In addition, you are charged for the audio and video transcoding or audio-visual enhancement feature.

Item

Description

Speed boost

The transcoding speed can be boosted by 5 to 30 times depending on the properties of the input video, such as the format, resolution, and bitrate of the video. You can specify the expected speed boost for an MPS queue for high-speed transcoding, such as 5 times, 10 times, 20 times, or 30 times.

Scenarios

We recommend that you use high-speed transcoding for videos that are longer than 30 minutes, or videos that require high frame rates, ultra high definition, and audio-visual enhancement. For more information, see the Limits on high-speed transcoding section of the "Limits" topic.

Policy

Splitting is not supported for all videos. If you submit a video that is not supported by high-speed transcoding to an MPS queue for high-speed transcoding, the video is transcoded in the regular way by default.

More features

Media information

MPS can obtain information about audio and video files that are stored in Object Storage Service (OSS), including the resolution, bitrate, frame rate, codec, and format of the files.

Important

You must call the SubmitMediaInfoJob operation to use this feature. You are charged based on the number of API requests. For more information, see the Pricing for API calls section of the "Audio and video transcoding fees" topic.

Video editing

You can perform simple edit operations on videos. For example, you can extract audio or videos, merge videos, clip videos, and mix audio.

Important

To use the video editing features described in the following table, you must configure relevant parameters when you submit a transcoding job. The transcoding fee is charged based on the specifications and length of the output video.

Feature

Description

Parameter of an API operation

MPS console

Audio extraction

This feature allows you to extract the audio stream from a video by disabling the video stream.

Remove

Supported

Video extraction

This feature allows you to extract the video stream from a video by disabling the audio stream.

Remove

Supported

Black bar removal

This feature allows you to detect whether black bars exist in a video. If black bars exist, the system automatically removes the black bars.

Crop

Not supported

Video cropping

This feature allows you to resize the video image, adjust the position of the resized image, and remove the gaps between the original image and the resized image.

Crop

Not supported

Black bar addition

This feature allows you to resize the video image, adjust the position of the resized image, and fill the gaps between the original image and the resized image by using black bars.

Pad

Not supported

Auto-rotate screen

This feature allows you to convert the resolution of a video based on the long and short sides instead of the width and height of the video. If the input videos include videos in landscape mode and portrait mode, we recommend that you enable this feature.

LongShortMode

Supported

Video rotation

This feature allows you to set the rotation angle of a video.

Rotate

Supported

Video merging

This feature allows you to merge up to 100 videos into one. You can set the start point in time and length of each video to be merged.

MergeList or MergeConfigUrl

Not supported

Video clipping

  • Video cutting: You can cut a video from a specific point in time to retain a specific length of the video.

  • Video trimming: You can trim a part of a specific length from the tail of a video.

Clip

Supported

Video head and tail

This feature allows you to add dynamic logos at the beginning of a video and specify the content for the video tail. This helps increase product recognition and highlight copyright information.

OpeningList and TailSlateList

Video tail addition is supported.

Blurring

This feature allows you to blur the specified area of a video.

DeWatermark

Not supported

Audio mixing

This feature allows you to merge two audio tracks into one. You can use this feature to add background music.

Amix

Not supported

Video snapshot

You can use the video snapshot feature to take snapshots of a specific size at a specific point in time of a video. The snapshots are used for video thumbnails, sprites, and progress bar thumbnails.

Important

To use the video snapshot features described in the following table, you must submit a snapshot job. The snapshot fee is charged based on the number of snapshots.

Feature

Description

Parameter of an API operation

MPS console

Static snapshot

This feature allows you to take snapshots of specific sizes at specific points in time of a video in the JPG format. The following snapshot modes are provided:

  • Single: A snapshot is taken at a specific point in time of a video. This type of snapshot can be taken synchronously or asynchronously.

  • Multiple: Snapshots are taken at a specific interval from a specific point in time until a specific number of snapshots are taken or the video ends. You can specify the interval in units of seconds. This type of snapshot is taken asynchronously.

  • Average: A specific number of snapshots are taken from a specific point in time of a video to the end of the video at a regular interval. This type of snapshot is taken asynchronously.

  • Time point (in beta testing): Snapshots are taken at the specified points in time of a video. This type of snapshot is taken asynchronously.

SnapshotConfig

Supported

Sprite snapshot

This feature allows you to create a sprite by merging the snapshots that are taken into a single image based on specific rules. The sprites are in the JPG format. This type of snapshot is taken asynchronously. Users can send a request to query the information about multiple images at a time. This greatly reduces the number of API requests for images and improves client performance.

TileOut and TileOutputFile

Not supported

WebVTT snapshot

This feature allows you to generate VTT files for the snapshots that are taken or the sprites that are created. VTT files contain the time when snapshots are taken, paths of snapshots, and coordinates of snapshots in sprites. When a client requests an image, the image is displayed after the corresponding VTT file is obtained and parsed. This feature can be used to display thumbnails on the progress bar.

SubOut

Supported

Keyframe snapshot

This feature allows you to capture only keyframes. If the frame at the specified point in time is not a keyframe, the adjacent keyframe is captured.

FrameType

Supported

Black screen detection

This feature allows you to detect whether the first snapshot is a black screen. To use this feature, set the Time parameter to 0, which specifies that the snapshots are taken from the start of a video. You can define a black screen by specifying the portion of black pixels in an image and the color value of black pixels. If the black screen detection feature is enabled, the system checks the frames in the first 5 seconds of a video. If a non-black frame exists, the non-black frame is captured. Otherwise, the job fails if the job is a single-snapshot job, or the first black frame is captured if the job is a multi-snapshot job.

BlackLevel and PixelBlackThreshold

Supported

Video watermarking

This feature allows you to add visible watermarks, such as enterprise logos and TV station logos, to a video to highlight brands, protect copyrights, and raise product popularity.

Important

To use the video watermarking features described in the following table, you must submit a transcoding job and specify the watermark materials and watermark template. The watermark template is optional. The transcoding fee is charged based on the specifications and length of the output video. The watermarking fee is charged based on the number of watermarks.

Feature

Description

Parameter of an API operation

MPS console

Image watermark

  • You can add up to four watermarks to a video. You can set the start time, duration, position, and size of each watermark.

  • Images in the PNG format and animated images in the PNG, MOV, and GIF formats are supported.

  • You can use watermark templates to simplify the process.

WaterMarks

Supported

Text watermark

  • You can add up to four watermarks to a video. You can set the start time, duration, position, and size of each watermark.

  • You can set the text content and font effects such as the font size, font type, color, transparency, and border. For more information about supported fonts and colors, see Fonts and Color.

  • You cannot use watermark templates to add text watermarks.

WaterMarks

Not supported

Caption

This feature allows you to add captions to videos for better comprehension and appreciation.

Important

To use the caption feature described in the following table, you must submit a transcoding job or create a transcoding workflow and trigger the workflow. The transcoding fee is charged based on the specifications and length of the output video.

Feature

Description

Parameter of an API operation

MPS console

Caption packaging

You can integrate caption files and audio and video streams into a master playlist in the M3U8 or MPD format by using a packaging workflow. You can add up to four captions to a master playlist. This allows you to switch between captions of different versions. An HTTP Live Streaming (HLS) packaging workflow supports captions in the VTT format. A Dynamic Adaptive Streaming over HTTP (DASH) packaging workflow supports captions in the VTT, STL, and TTML formats.

  • HLS packaging: ExtXMedia

  • DASH packaging: InputConfig

Supported

Video packaging

Packaging indicates the process in which a master playlist is generated for multiple video streams at different bitrates, multiple caption streams, and multiple audio streams. The packaging feature allows you to perform the following operations during streaming media playback:

  • Adaptive streaming: supports automatic bitrate adjustment to ensure smooth live streaming.

  • Ad placement: supports video ad placement between segments.

Important

To use the video packaging features described in the following table, you must create a transcoding workflow and trigger the workflow. The transcoding fee is charged based on the specifications and length of the output video.

Feature

Description

Parameter of an API operation

MPS console

HLS packaging

HLS that supports secondary indexes is used for video packaging. HLS supports index files in the M3U8 format and video files in the TS format.

For more information, see Perform HLS packaging

Supported

CMAF packaging

Common Media Application Format (CMAF) that supports the output format of HLS or DASH is used for packaging.

N/A

Not supported

Custom segment length

You can specify a maximum of 10 points in time at which you want to segment a video, and the length of segments. The segment length ranges from 1 second to 60 seconds. This feature allows you to adapt the media segment length to the network bandwidth of playback clients. This way, the loading time of the first frame is reduced.

Segment

Not supported

Video encryption

Important

To use the video encryption features described in the following table, you must create a transcoding workflow and trigger the workflow. The transcoding fee is charged based on the specifications and length of the output video.

Feature

Description

Parameter of an API operation

MPS console

HTTP-Live-Streaming (HLS) encryption

This feature allows you to encrypt a video based on the HLS AES-128 protocol by using a self-managed or Key Management Service (KMS) key. You can decrypt and play the video on a player that supports HLS streams. This ensures video security on mobile devices, and offers high-level security and excellent terminal compatibility.

N/A

Supported

Alibaba Cloud proprietary cryptography

This feature allows you to encrypt a video based on the Alibaba Cloud proprietary cryptography protocol and convert the video to an encrypted HLS format. Only KMS keys are supported. You must use ApsaraVideo Player to decrypt and play the video. Otherwise, you cannot play or transmit the video even if you download it to an on-premises device. This ensures video security on mobile devices and Flash players. This feature can be used in scenarios such as online education and subscription-based viewing, which require high-level security.

N/A

Supported

Video AI

Automated review

This feature allows you to review the content in a media file, such as the title, overview, thumbnail, video, and audio. This way, you can efficiently detect prohibited content in a video. This feature can be used in multiple scenarios, such as short video review, live streaming review, and media review.

Important

To use the automated review feature described in the following table, you must submit a media review job. The automated review fee is charged based on the length of the processed video.

Feature

Content

Description

Content moderation

Pornography detection

Detects pornographic and sexy content from dimensions such as voice, text, and vision.

Terroristic content detection

Detects terroristic content from more than 10 dimensions, such as weapon, bloody scenes, specific costume, smoke and light scene, special symbols, crowd, and parades.

Ad violation detection

Detects different forms of ads, such as advertising text, watermarks, QR codes, illegal ads, and mini program codes.

Logo detection

Detects logos in a video or image, such as TV station logos, trademarks, and watermarks. This helps protect copyrights.

Undesirable content detection

Detects undesirable scenes in a video or image, such as picture-in-picture (PiP), smoking, live broadcasting while driving, and meaningless images.

Audio anti-spam

Detects illegal content in audio, such as pornography, terrorism, and abuse. Chinese and English speech recognitions are supported.

Media fingerprinting

The media fingerprinting feature is implemented based on video recognition technologies developed by Alibaba Cloud. This feature uses a fingerprint to uniquely mark a media file, and allows you to extract and compare the fingerprints among media files. This helps detect duplicate videos and trace the source of video clips.

Important

To use the media fingerprinting features described in the following table, you must submit a media fingerprinting job. The media fingerprinting fee is charged based on the length of the processed video or audio.

Feature

Description

Video fingerprinting

You can use this feature to extract the fingerprints of videos, import and analyze video fingerprints in the fingerprint library, and search for similar videos.

Audio fingerprinting

You can use this feature to extract the fingerprints of audio, import and analyze audio fingerprints in the fingerprint library, and search for similar audio.

Image fingerprinting

You can use this feature to extract the fingerprints of images, import and analyze image fingerprints in the fingerprint library, and search for similar images.

Text fingerprinting

You can use this feature to extract the fingerprints of text, import and analyze text fingerprints in the fingerprint library, and search for similar text.

Service management

Feature

Description

Parameter of an API operation

MPS console

Media management

You can upload, manage, and publish media files.

N/A

N/A

Workflow orchestration

MPS automatically runs a workflow in the cloud after an audio or video file is uploaded.

N/A

Supported

Transcoding template

A transcoding template is a collection of transcoding parameters. You can use a transcoding template to simplify the operations when you create a transcoding job or use a workflow. Transcoding templates can be classified into the following types: custom templates, customized templates, and preset templates.

TemplateId

Supported

Watermark template

A watermark template specifies the settings of multiple parameters, such as the parameters that determine the position and size of watermarks. You can use a watermark template to simplify the watermarking process.

WaterMarkTemplateId

Supported

Transcoding priority

You can specify the priority of transcoding jobs in an MPS queue. A maximum of 10 priority levels can be specified.

Priority

Not supported

Conditional transcoding

If the video bitrate, video resolution, or audio bitrate of an input video is less than the specified output settings, the video is transcoded in original quality or no transcoding is performed.

Examples: IsCheckReso and IsCheckResoFail

Supported

MPS queue

MPS jobs such as transcoding and asynchronous snapshot jobs are asynchronously processed. You must add the jobs to an MPS queue for scheduling and execution. You can create multiple MPS queues and specify the priority of jobs in an MPS queue. A maximum of 10 priority levels are supported.

Priority

Not supported

Message notification

MPS jobs such as transcoding and asynchronous snapshot jobs are asynchronously processed. You can integrate Message Service (MNS) to associate an MNS topic or queue with an MPS queue or workflow. If a job in the MPS queue is complete or the workflow starts or stops, MPS sends a notification to the specified contact.

NotifyConfig

Supported