This topic describes the billable items of Intelligent Media Management (IMM) and includes important notes.
Pricing of billable items on Alibaba Cloud International Website
Intelligent Media Management (IMM) has billable items in the following categories: smart imaging, metadata management, media management, document processing, and file processing.
Starting from 11:00 on July 28, 2025 (UTC+8), Intelligent Media Management (IMM) will charge for some features that are currently free and adjust the prices of some existing billable items. For more information, see IMM Billing Adjustment Announcement.
Smart imaging
The following table describes the pricing of the smart imaging billable items.
Billable item | Description | Related API operations | Related x-oss-process operations | Price before 11:00 on July 28, 2025 (USD) | Price after 11:00 on July 28, 2025 (USD) | Unit |
ImageDetect | Face detection |
|
| 0.028 | 0.028 | Per 1,000 calls |
Body detection | DetectImageBodies (Body detection in images) | image/bodies | Free for a limited time | |||
Vehicle detection | DetectImageCars | image/cars | Free for a limited time | |||
ImageLabel | Image tagging | DetectImageLabels (Image tag detection) | image/labels | 0.142 | 0.142 | Per 1,000 calls |
ImageFace | Face image | CreateFacesSearchingTask (Create a task to search for images with similar faces) | Free for a limited time | 0.028 | Per 1,000 calls | |
ImageFaceClustering | Face clustering |
| 7.0754717 | 7.0754717 | Per 1,000 calls | |
GenerateStory | Story generation | CreateStory (Create a story) | 7.0754717 | 7.0754717 | Per 1,000 calls | |
ImageMosaic | Image mosaic | AddImageMosaic (Add a mosaic to an image) | Free for a limited time | 0.0074 | Per 1,000 calls | |
ImageCropping | Smart cropping suggestions for images | DetectImageCropping (Detect visually appealing crop boxes in an image) | image/crop,g_auto | 0.1415094 | 0.1415094 | Per 1,000 calls |
ImageQRCodes | QR code detection in images | DetectImageCodes (QR code detection in images) | image/codes | 0.1132075 | 0.1132075 | Per 1,000 calls |
ImageSplicing | Image splicing | CreateImageSplicingTask (Create an image splicing task) | Free for a limited time |
| Per 1,000 calls | |
ImageToPDF | Image to PDF conversion | CreateImageToPDFTask (Create an image-to-PDF conversion task) | Free for a limited time | 0.0074 | Tofu skin | |
ImageScoring | Image quality scoring | DetectImageScore (Image quality scoring) | image/scoring | 0.0424528 | 0.0424528 | Per 1,000 calls |
LocationDateClustering | Spatiotemporal clustering | CreateLocationDateClusteringTask (Create a spatiotemporal clustering task) | Free for a limited time | Free for a limited time | Per 1,000 calls | |
SimilarImageClustering | Image clustering | CreateSimilarImageClusteringTask (Create a similar image clustering task) | Free for a limited time | Free for a limited time | Per 1,000 calls | |
Blindwatermark | Blind watermark for images |
|
| 0.0990566 | 0.0990566 | Per 1,000 calls |
ReverseGeocoding | Reverse geocoding | DetectMediaMeta (Get media file metadata) Note Charged when the media file contains geographic location information. | 0.1415094 | 0.1415094 | Per 1,000 calls | |
ImageTexts | Image text recognition (OCR) | DetectImageTexts (Image text recognition) | 7.0754717 | 7.0754717 | Per 1,000 calls |
Metadata management
The following table describes the pricing of the metadata management billable items.
Billable item | Description | Related API operations | Related x-oss-process operations | Price before 11:00 on July 28, 2025 (USD) | Price after 11:00 on July 28, 2025 (USD) | Unit |
StandardQueryL0 | Basic query |
| task/get | 0.014 | 0.001 | Per 1,000 calls |
StandardQueryL1 | Standard query |
| 0.0283 | 0.002 | Per 1,000 calls | |
StandardQueryL2 | Advanced query |
| 0.708 | 0.074 | Per 1,000 calls | |
MediaMeta | Get media information |
|
| 0.1415094 | 0.1415094 | Per 1,000 calls |
SemanticAnalyze | Semantic analysis | SemanticQuery (Natural language query) | Free for a limited time | 0.52 | Per 1,000 calls |
ApsaraVideo Media Processing
The following table describes the pricing of the ApsaraVideo Media Processing billable items.
Billable item | Description | Related API operations | Related x-oss-process operations | Price before 11:00 on July 28, 2025 (USD) | Price after 11:00 on July 28, 2025 (USD) | Unit |
AudioCompress | Audio transcoding | CreateMediaConvertTask (Create a media transcoding task) |
| 0.0000141509 | 0.0000141509 | Per second of audio |
VideoCompressCopy | Container format conversion | CreateMediaConvertTask (Create a media transcoding task) | 0.0001415094 | 0.0001415094 | Per second of video | |
VideoCompress264LD | H.264 transcoding - LDNote* | CreateMediaConvertTask (Create a media transcoding task) |
| 0.0000509434 | 0.0000509434 | Per second of video |
VideoCompress264SD | H.264 transcoding - SDNote* | CreateMediaConvertTask (Create a media transcoding task) |
| 0.0000707547 | 0.0000707547 | Per second of video |
VideoCompress264HD | H.264 transcoding - HDNote* | CreateMediaConvertTask (Create a media transcoding task) |
| 0.0001273585 | 0.0001273585 | Per second of video |
VideoCompress2642K | H.264 transcoding - 2KNote* | CreateMediaConvertTask (Create a media transcoding task) |
| 0.0002830189 | 0.0002830189 | Per second of video |
VideoCompress2644K | H.264 transcoding - 4KNote* | CreateMediaConvertTask (Create a media transcoding task) |
| 0.0006367925 | 0.0006367925 | Per second of video |
VideoCompress265LD | H.265 transcoding - LDNote* | CreateMediaConvertTask (Create a media transcoding task) |
| 0.0002122642 | 0.0002122642 | Per second of video |
VideoCompress265SD | H.265 transcoding - SDNote* | CreateMediaConvertTask (Create a media transcoding task) |
| 0.0003537736 | 0.0003537736 | Per second of video |
VideoCompress265HD | H.265 transcoding - HDNote* | CreateMediaConvertTask (Create a media transcoding task) |
| 0.0007075472 | 0.0007075472 | Per second of video |
VideoCompress2652K | H.265 transcoding - 2KNote* | CreateMediaConvertTask (Create a media transcoding task) |
| 0.0011320755 | 0.0011320755 | Per second of video |
VideoCompress2654K | H.265 transcoding - 4KNote* | CreateMediaConvertTask (Create a media transcoding task) |
| 0.0022641509 | 0.0022641509 | Per second of video |
MediaAnimation | Video to animated image conversion | CreateMediaConvertTask (Create a media transcoding task) | video/animation | Free for a limited time |
| Per 1,000 frames |
ExtractSubtitleText | Video text caption extraction | CreateMediaConvertTask (Create a media transcoding task) | Free for a limited time | 0.223 | Per 1,000 streams | |
ExtractSubtitleImage | Video image caption extraction | CreateMediaConvertTask (Create a media transcoding task) | Free for a limited time | 0.015 | Per 1,000 frames | |
VideoFraming | Video snapshot | CreateMediaConvertTask (Create a media transcoding task) |
| 0.142 | 0.015 | Per 1,000 frames |
VideoClassification | Video tag detection | CreateVideoLabelClassificationTask (Create a media transcoding task) | 7.0754717 | 7.0754717 | Per 1,000 calls | |
LiveTranscoding | Transcoding during playbackNote* | GenerateVideoPlaylist (Generate a playlist for transcoding during playback) |
| 0.0000141509 | 0.0000141509 | CountUnit |
Document processing
The following table describes the pricing of the document processing billable items.
For projects created before December 1, 2023, online preview and online editing are billed based on the number of times a document is opened. For projects created on or after December 1, 2023, these features are billed based on the number of API operation calls.
Billable item | Description | Related API operations | Related x-oss-process operations | Price before 11:00 on July 28, 2025 (USD) | Price after 11:00 on July 28, 2025 (USD) | Unit |
DocumentConvert | Document conversion | CreateOfficeConversionTask (Create a document conversion task) |
| 11.3207547 | 11.3207547 | Thousands of times |
Document content extraction | ExtractDocumentText (Extract document text) | |||||
DocumentWebofficeEdit | Online editing (Weboffice)Note* |
| doc/edit | 2.8301887 | 2.8301887 | Per 1,000 calls |
DocumentWebofficePreview | Online preview (Weboffice)Note* |
| doc/preview | 1.4150943 | 1.4150943 | Per 1,000 calls |
DocumentWebofficeCachePreview | Cached preview (Weboffice) |
| 0.9905660 | 0.9905660 | Per 1,000 calls Important This refers to the number of API operation calls. |
File processing
The following table describes the pricing of the file processing billable items.
Billable item | Description | Related API operations | Related x-oss-process operations | Price before 11:00 on July 28, 2025 (USD) | Price after 11:00 on July 28, 2025 (USD) | Unit |
PointCloudCompress | Point cloud compression | CreateCompressPointCloudTask (Create a point cloud compression task) | pointcloud/compress | Free for a limited time | 0.03 | Per 1,000 calls |
FileProcess | File packaging and download | CreateFileCompressionTask (Create a file compression task) | Free for a limited time | 0.00074 | GB | |
Compressed package decompression | CreateFileUncompressionTask (Create a decompression task) | Free for a limited time | ||||
FilePreview | Compressed package preview | CreateArchiveFileInspectionTask (Create a compressed package preview and parsing task) | Free for a limited time | 0.0074 | TB |
Notes on API operations that involve multiple billable items
The SemanticQuery API operation incurs fees for two billable items: StandardQueryL2 and SemanticAnalyze.
The CreateFacesSearchingTask API operation incurs fees for two billable items: ImageDetect and ImageFace.
Notes on video transcoding
H.264 transcoding: The output video uses the H.264 encoder.
H.265 transcoding: The output video uses the H.265 encoder.
LD: The resolution of the transcoded video is less than or equal to 640 × 480.
SD: The resolution of the transcoded video is less than or equal to 1280 × 720.
HD: The resolution of the transcoded video is less than or equal to 1920 × 1080.
2K: The resolution of the transcoded video is less than or equal to 2560 × 1440.
4K: The resolution of the transcoded video is less than or equal to 3840 × 2160.
Video transcoding is billed per second of video. The transcoding length is rounded up to the nearest second. Durations less than 1 second are billed as 1 second.
Notes on billing for document preview and editing
For projects created before December 1, 2023, online editing and online preview are billed based on the number of times a document is opened, not the number of API operation calls.
Projects created on or after December 1, 2023 are billed based on the number of API operation calls. To switch to the new billing method, you must create a new project.
In the billing mode based on the number of API operation calls, a single API call can be used by only one user. If the call is reused, only the last user can access the document. The access permissions of other users are revoked.
If the Permission.Readonly parameter in the GenerateWebofficeToken API operation is set to true, you are charged for document preview. If this parameter is set to false, you are charged for online editing.
The billing for RefreshWebofficeToken depends on the parameters used in the original GenerateWebofficeToken API call. If the Permission.Readonly parameter was set to true, you are charged for document preview. Otherwise, you are charged for online editing.
Notes on billing for transcoding during playback
Billing is based on the following components:
When you generate a playlist, you can set the InitialTranscode parameter to control the duration of the initial transcoding. This incurs LiveTranscoding fees.
When you play a video, if you play a TS file that has not been transcoded, a new transcoding task is triggered. This incurs LiveTranscoding fees.
During transcoding, fees are incurred for reading the source video file from OSS and writing the transcoded file to OSS. Fees are also incurred for reading the video file from OSS for playback. For more information about OSS-related fees, see OSS billable items.
Formula for calculating LiveTranscoding compute units (CUs):
Video
The `eff` parameter values for the codec of different video outputs are: h264: 0.3, h265: 1.8.
The formula is as follows:
Ceiling (eff * Ceiling(Height/240) * Ceiling(Width/240) * Ceiling(FrameRate/30) + 1 ) * Ceiling(VideoStreamDuration)Audio
The `eff` parameter value is 0.3.
The formula is as follows:
Ceiling(eff * Ceiling(AudioStreamDuration))
Billing rules: Real-time processing of multiple video or audio streams is performed based on the settings of TargetVideo.Stream or TargetAudio.Stream. Each audio and video stream is billed separately. The following examples describe how real-time transcoding fees are calculated.
Example 1 (Only a playlist is generated. No transcoding-during-playback fees are incurred if the video is not played.):
A user calls GenerateVideoPlaylist. The output video is 38 minutes long. The resolution is 800 × 600, the frame rate is 30, and the video encoding format is H.264. The initial transcoding duration is 0 seconds. The default value is used for TranscodeAhead. The video is not played.
Example 2 (Only a playlist is generated. Pre-transcoding is configured. Only pre-transcoding fees are incurred if the video is not played.):
A user calls GenerateVideoPlaylist. The output video is 38 minutes long. The resolution is 800 × 600, the frame rate is 30, and the video encoding format is H.264. The initial transcoding duration is 30 seconds. The default value is used for TranscodeAhead. The video is not played.
Fees incurred:
LiveTranscoding (The number of CUs is calculated using the following formula):
Ceiling((0.3 * Ceiling(800/240) * Ceiling(600/240) * Ceiling(30/30) + 1 ) * (Ceiling(30)) + Ceiling(0.3 * Ceiling(30)) = 159 (CUs)
Example 3 (After a playlist is generated, a part of the video is played. Transcoding-during-playback fees are incurred only for the played part of the video.):
A user calls GenerateVideoPlaylist. The output video is 38 minutes long. The resolution is 800 × 600, the frame rate is 30, and the video encoding format is H.264. The initial transcoding duration is 0 seconds. The default value is used for TranscodeAhead. The user plays the video using the M3U8 file, starts playing from the beginning to the 5th minute (the video is transcoded 2 minutes ahead by default), and then jumps to the 15th minute and plays to the end of the video.
Fees incurred:
LiveTranscoding (The number of CUs is calculated using the following formula):
Ceiling((0.3 * Ceiling(800/240) * Ceiling(600/240) * Ceiling(30/30) + 1) * (Ceiling((5+2)*60) + Ceiling((38-15)*60)) + Ceiling(0.3 * Ceiling((5+2) * 60)) + Ceiling(0.3 * Ceiling((38-15) * 60) = 9540 (CUs)
Example 4: If multiple users play the video, transcoding-during-playback fees are incurred only once for the parts that are played repeatedly.
A user calls GenerateVideoPlaylist. The output video is 38 minutes long. The resolution is 800 × 600, the frame rate is 30, and the video encoding format is H.264. The initial transcoding duration is 0 seconds. The default value is used for TranscodeAhead.
User A plays the video using the M3U8 file, starts playing from the beginning to the 5th minute, and then stops playing.
User B plays the video using the M3U8 file, starts playing from the 15th minute to the end.
User C plays the video using the M3U8 file from the beginning to the end.
Fees incurred:
LiveTranscoding (The number of CUs is calculated using the following formula):
Ceiling((0.3 * Ceiling(800/240) * Ceiling(600/240) * Ceiling(30/30) + 1) * Ceiling(38*60) + Ceiling(0.3 * Ceiling(38 * 60)) = 12084 (CUs)
Terminology:
Width: The width of the output video resolution.
Height: The vertical resolution of the output video
FrameRate: The video frame rate.
VideoStreamDuration: The length of the video stream.
AudioStreamDuration: The length of the audio stream.
eff: The CU coefficient.
Ceiling(x) function: Returns the smallest integer that is greater than or equal to x.
Mapping between operators and billable items
When you create metadata indexes by attaching an OSS Bucket or by calling the IndexFileMeta or BatchIndexFileMeta operations, you incur fees. Executing the operators described in Mappings between workflow templates and operators incurs data processing fees, index storage fees, and OSS request fees. OSS request fees are charged by OSS. For more information, see OSS Request fees. The following table shows the mappings between operators and billable items.
Operator | Billable item | Billed by |
OSSMeta operator | OSS | |
MIME operator | No charge | N/A |
FaceDetection operator | ImageFaceNote* | IMM |
LabelClassification operator (image) | ImageClassificationNote* | IMM |
LabelClassification operator (video) | VideoClassification | IMM |
ImageScoring operator | ImageScoringNote* | IMM |
ReGEO operator | ReverseGeocoding | IMM |
MediaMeta operator | MediaMeta | IMM |
EXIF operator | OSS | |
ExtractDocumentText operator | DocumentConvert | IMM |
ExtractImageEmbeddings operator | Free for a limited time | IMM |
To process image files in various formats, IMM uses the image processing capabilities of Object Storage Service (OSS) to perform one or more operations, such as format conversion and image scaling. These operations incur fees that are charged by OSS. For more information about these fees, see Data processing fees.
External request fees
Accessing Object Storage Service (OSS) through Intelligent Media Management (IMM) incurs OSS request fees. For more information, see Request fees.