This topic describes the billing items for Intelligent Media Management (IMM) and related details.
Billing Items for Alibaba Cloud International Website
Intelligent Media Management (IMM) includes the following billing items: image intelligence, metadata management, media processing, document processing, and file processing.
Starting at 11:00 UTC+8 on July 28, 2025, IMM will begin charging for some previously free features and will adjust prices for some existing billing items. For more information, refer to the IMM Pricing Adjustment Notice.
Image Intelligence
For detailed pricing of image intelligence billing items, refer to the following table.
Billing Item | Billing Item Description | Related API Operations | Related x-oss-process Operations | Price (USD) | Unit |
ImageDetect | Face detection |
|
| 0.028 | per 1,000 calls |
Human body detection | DetectImageBodies (human body detection) | image/bodies | |||
Vehicle detection | DetectImageCars | image/cars | |||
ImageLabel | Image tagging | DetectImageLabels (image label detection) | image/labels | 0.142 | per 1,000 calls |
ImageFace | Face images | CreateFacesSearchingTask (create a similar-face search task) | 0.028 | per 1,000 calls | |
ImageFaceClustering | Face clustering |
| 7.0754717 | per 1,000 calls | |
GenerateStory | Story generation | CreateStory (create a story) | 7.0754717 | per 1,000 calls | |
ImageMosaic | Image mosaic | AddImageMosaic (add an image mosaic) | 0.0074 | per 1,000 calls | |
ImageCropping | Smart image cropping suggestions | DetectImageCropping (detect visually optimal cropping regions) | image/crop,g_auto | 0.1415094 | per 1,000 calls |
ImageQRCodes | QR code detection in images | DetectImageCodes (QR code detection) | image/codes | 0.1132075 | per 1,000 calls |
ImageSplicing | Image stitching | CreateImageSplicingTask (create an image stitching task) |
| per 1,000 calls | |
ImageToPDF | Image-to-PDF conversion | CreateImageToPDFTask (create an image-to-PDF conversion task) | 0.0074 | per 1,000 images | |
ImageScoring | Image quality scoring | DetectImageScore (image quality scoring) | image/scoring | 0.0424528 | per 1,000 calls |
LocationDateClustering | Spatiotemporal clustering | CreateLocationDateClusteringTask (create a spatiotemporal clustering task) | Free for a limited time | per 1,000 calls | |
SimilarImageClustering | Image clustering | CreateSimilarImageClusteringTask (create a similar-image clustering task) | Free for a limited time | per 1,000 calls | |
Blindwatermark | Blind watermarking for images |
|
| 0.0990566 | per 1,000 calls |
ReverseGeocoding | Reverse geocoding | DetectMediaMeta (retrieve media file metadata) Note Charged only when the media file contains geographic location information. | 0.1415094 | per 1,000 calls | |
ImageTexts | Optical character recognition (OCR) for images | DetectImageTexts (OCR for images) | 7.0754717 | per 1,000 calls |
Metadata Management
For detailed pricing of metadata management billing items, refer to the following table.
Billing Item | Billing item description | Related API Operations | Related x-oss-process Operations | Price (USD) | Unit |
StandardQueryL0 | Basic queries |
| task/get | 0.001 | per 1,000 calls |
StandardQueryL1 | Standard queries |
| 0.002 | per 1,000 calls | |
StandardQueryL2 | Advanced queries |
| 0.074 | per 1,000 calls | |
MediaMeta | Retrieve media information |
|
| 0.1415094 | per 1,000 calls |
SemanticAnalyze | Semantic analysis | SemanticQuery (natural language query) | 0.52 | per 1,000 calls |
Media Processing
For detailed pricing of media processing billing items, refer to the following table.
Billing Item | Billing item description | Related API Operations | Related x-oss-process Operations | Price (USD) | Unit |
AudioCompress | Audio transcoding | CreateMediaConvertTask (create a media transcoding task) |
| 0.0000141509 | per second of audio |
VideoCompressCopy | Container format conversion | CreateMediaConvertTask (create a media transcoding task) | 0.00001433525 | per second of video | |
VideoCompress264LD | H.264 LD transcodingNote* | CreateMediaConvertTask (create a media transcoding task) |
| 0.0000509434 | Videos per second |
VideoCompress264SD | H.264 SD transcodingNote* | CreateMediaConvertTask (create a media transcoding task) |
| 0.0000707547 | per second of video |
VideoCompress264HD | H.264 HD transcodingNote* | CreateMediaConvertTask (create a media transcoding task) |
| 0.0001273585 | per second of video |
VideoCompress2642K | H.264 2K transcodingNote* | CreateMediaConvertTask (create a media transcoding task) |
| 0.0002830189 | per second of video |
VideoCompress2644K | H.264 4K transcodingNote* | CreateMediaConvertTask (create a media transcoding task) |
| 0.0006367925 | per second of video |
VideoCompress265LD | H.265 LD transcodingNote* | CreateMediaConvertTask (create a media transcoding task) |
| 0.0002122642 | per second of video |
VideoCompress265SD | H.265 SD transcodingNote* | CreateMediaConvertTask (create a media transcoding task) |
| 0.0003537736 | per second of video |
VideoCompress265HD | H.265 HD transcodingNote* | CreateMediaConvertTask (create a media transcoding task) |
| 0.0007075472 | per second of video |
VideoCompress2652K | H.265 2K transcodingNote* | CreateMediaConvertTask (create a media transcoding task) |
| 0.0011320755 | per second of video |
VideoCompress2654K | H.265 4K transcodingNote* | CreateMediaConvertTask (create a media transcoding task) |
| 0.0022641509 | per second of video |
MediaAnimation | Video Animation | CreateMediaConvertTask (create a media transcoding task) | video/animation |
| Thousands of frames |
ExtractSubtitleText | Text subtitle extraction from videos | CreateMediaConvertTask (create a media transcoding task) | 0.223 | Qianlu | |
ExtractSubtitleImage | Image subtitle extraction from videos | CreateMediaConvertTask (create a media transcoding task) | 0.015 | Thousand Frames | |
VideoFraming | Video snapshot | CreateMediaConvertTask (create a media transcoding task) |
| 0.015 | Thousand Frames |
VideoClassification | Video label detection | CreateVideoLabelClassificationTask (create a media transcoding task) | 7.0754717 | per 1,000 calls | |
LiveTranscoding | Real-time transcodingNote* | GenerateVideoPlaylist (generate a real-time transcoding playlist) |
| 0.0000141509 | CountUnit |
Document Processing
For detailed pricing of document processing billing items, refer to the following table.
Projects created before December 1, 2023 for online preview and online editing are billed based on the number of times documents are opened. Projects created on or after December 1, 2023 are billed based on the number of API calls.
Billing Item | Billing Item Description | Related API Operations | Related x-oss-process Operations | Price (USD) | Unit |
DocumentConvert | Document conversion | CreateOfficeConversionTask (create a document conversion task) |
| 11.3207547 | per 1,000 calls |
Document content extraction | ExtractDocumentText (extract document text) | ||||
DocumentWebofficeEdit | Online editing (Weboffice)Note* |
| doc/edit | 2.8301887 | per 1,000 calls |
DocumentWebofficePreview | Online preview (Weboffice)Note* |
| doc/preview | 1.4150943 | per 1,000 calls |
DocumentWebofficeCachePreview | Cached preview (Weboffice) |
| 0.9905660 | per 1,000 calls Important This refers to the number of API calls. |
File Processing
For detailed pricing of file processing billing items, refer to the following table.
Billing Item | Billing Project Description | Related API Operations | Related x-oss-process Operations | Price (USD) | Unit |
PointCloudCompress | Point cloud compression | CreateCompressPointCloudTask (create a point cloud compression task) | pointcloud/compress | 0.03 | per 1,000 calls |
FileProcess | File packaging and download | CreateFileCompressionTask (create a file compression task) | 0.00074 | GB | |
Archive decompression | CreateFileUncompressionTask (create a decompression task) | ||||
FilePreview | Archive preview | CreateArchiveFileInspectionTask (create an archive preview and parsing task) | 0.0074 | TB |
API operations with multiple billing items
The SemanticQuery operation incurs charges for both StandardQueryL2 and SemanticAnalyze.
The CreateFacesSearchingTask operation incurs charges for both ImageDetect and ImageFace.
Video transcoding details
H.264 transcoding: Output video uses the H.264 encoder.
H.265 transcoding: Output video uses the H.265 encoder.
LD: Output resolution is less than or equal to 640 × 480.
SD: Output resolution is less than or equal to 1280 × 720.
HD: Output resolution is less than or equal to 1920 × 1080.
2K: Output resolution is less than or equal to 2560 × 1440.
4K: Output resolution is less than or equal to 3840 × 2160.
Video transcoding is billed per second of output video. Charges are rounded up to the nearest second. Fractions of a second are billed as a full second.
Document preview and editing billing details
For projects created before December 1, 2023, online editing and online preview are billed based on the number of times documents are opened—not the number of API calls.
Projects created on or after December 1, 2023 are billed based on the number of API calls. To switch to the new billing model, create a new project.
In the API call billing model, each API call can be used by only one user. If an API call is reused, only the last user retains access, and access permissions for other users are revoked.
If the Permission.Readonly parameter in the GenerateWebofficeToken operation is set to true, you are charged for document preview. If it is set to false, you are charged for online editing.
The RefreshWebofficeToken operation is billed based on the Permission.Readonly setting used when generating the original token. If Permission.Readonly was set to true, you are charged for document preview. Otherwise, you are charged for online editing.
Real-time transcoding billing details
Billing components:
When generating a playlist, you can control initial transcoding duration using the InitialTranscode parameter. This incurs LiveTranscoding charges.
During playback, if a .ts file has not been transcoded, a new transcoding job starts. This incurs LiveTranscoding charges.
Reading source video files from OSS and writing transcoded files back to OSS incurs fees. Reading video files from OSS for playback also incurs fees. For details about OSS-related fees, refer to the OSS billing items.
LiveTranscoding count formula:
Video
Codec `eff` values: h264 = 0.3, h265 = 1.8
Formula:
Ceiling(eff * Ceiling(Height/240) * Ceiling(Width/240) * Ceiling(FrameRate/30) + 1) * Ceiling(VideoStreamDuration)Audio
`eff` value: 0.3
Formula:
Ceiling(eff * Ceiling(AudioStreamDuration))
Billing rule: Each audio or video stream specified in TargetVideo.Stream or TargetAudio.Stream is processed separately and billed individually. The following examples illustrate real-time transcoding costs.
Example 1 (playlist generated only; no playback, so no LiveTranscoding charges):
A user calls GenerateVideoPlaylist. The output video length is 38 minutes, with a resolution of 800 × 600, a frame rate of 30, and an h264 video codec. The initial transcoding duration is 0 seconds, and TranscodeAhead uses the default value. No video is played.
Example 2 (playlist generated only; pre-transcoding configured; only pre-transcoding incurs LiveTranscoding charges):
A user calls GenerateVideoPlaylist. The output video length is 38 minutes, with a resolution of 800 × 600, a frame rate of 30, and an h264 video codec. The initial transcoding duration is 30 seconds, and TranscodeAhead uses the default value. No video is played.
Charges:
LiveTranscoding (CU count formula):
Ceiling((0.3 * Ceiling(800/240) * Ceiling(600/240) * Ceiling(30/30) + 1) * Ceiling(30)) + Ceiling(0.3 * Ceiling(30)) = 159 (CountUnit)
Example 3 (playlist generated and partially played; only played segments incur LiveTranscoding charges):
A user calls GenerateVideoPlaylist. The output video length is 38 minutes, with a resolution of 800 × 600, a frame rate of 30, and an h264 video codec. The initial transcoding duration is 0 seconds, and TranscodeAhead uses the default value. The user plays the m3u8 playlist from the start, stops at minute 5 (default forward transcoding is 2 minutes), then jumps to minute 15 and plays to the end.
Charges:
LiveTranscoding (CU count formula):
Ceiling((0.3 * Ceiling(800/240) * Ceiling(600/240) * Ceiling(30/30) + 1) * (Ceiling((5+2)*60) + Ceiling((38-15)*60))) + Ceiling(0.3 * Ceiling((5+2) * 60)) + Ceiling(0.3 * Ceiling((38-15) * 60)) = 9540 (CountUnit)
Example 4 (multiple users play the same video; repeated playback incurs LiveTranscoding charges only once):
A user calls GenerateVideoPlaylist. The output video length is 38 minutes, with a resolution of 800 × 600, a frame rate of 30, and an h264 video codec. The initial transcoding duration is 0 seconds, and TranscodeAhead uses the default value.
User A plays the m3u8 playlist from the start to minute 5, then exits.
User B plays the m3u8 playlist from minute 15 to the end.
User C plays the m3u8 playlist from start to finish.
Charges:
LiveTranscoding (CU count formula):
Ceiling((0.3 * Ceiling(800/240) * Ceiling(600/240) * Ceiling(30/30) + 1) * Ceiling(38*60)) + Ceiling(0.3 * Ceiling(38 * 60)) = 12084 (CountUnit)
Term definitions:
Width: The width of the output video resolution.
Height: The height of the output video resolution.
FrameRate: The video frame rate.
VideoStreamDuration: The duration of the video stream.
AudioStreamDuration: The length of the audio stream.
`eff`: The CU coefficient.
Ceiling(x) function: Returns the smallest integer greater than or equal to x.
Operator and billing item mapping
When you create metadata indexes by binding an OSS Bucket or calling IndexFileMeta or BatchIndexFileMeta, executing operators from the workflow template and operator mapping generates data processing fees, index storage fees, and OSS request fees. OSS request fees are charged by OSS. For more information, refer to OSS request fees. The mapping between operators and billing items is as follows:
Operator | Billing Item | Billed by |
OSSMeta operator | OSS | |
MIME operator | No charge | N/A |
FaceDetection operator | ImageFaceNote* | IMM |
LabelClassification operator (image) | ImageClassificationNote* | IMM |
LabelClassification operator (video) | VideoClassification | IMM |
ImageScoring operator | ImageScoringNote* | IMM |
ReGEO operator | ReverseGeocoding | IMM |
MediaMeta operator | MediaMeta | IMM |
EXIF operator | OSS | |
ExtractDocumentText operator | DocumentConvert | IMM |
ExtractImageEmbeddings operator | Free for a limited time | IMM |
To process various image formats, IMM uses the image processing feature of Object Storage Service (OSS) to perform one or more operations, such as format conversion or image scaling. These operations incur fees charged by OSS. For more information about these fees, refer to Data processing fees.
External request fees
Accessing OSS through Intelligent Media Management (IMM) incurs OSS request fees. For more information, refer to Request fees.