All Products
Search
Document Center

Platform For AI:Multimedia analysis

Last Updated:Dec 10, 2025

Multimedia analysis provides algorithm-based services to analyze multimedia content. These services include foundation model services and advanced model services, which offer out-of-the-box algorithm capabilities. This topic describes the billing details and usage instructions for multimedia analysis.

Background information

Multimedia analysis supports the following algorithm services:

  • Foundation model services: These services provide out-of-the-box algorithm capabilities for images. They include model services such as multi-label image tagging, image quality assessment, facial attribute analysis (such as attractiveness, face shape, hairstyle, and hair color), age analysis, figure modification (slimming or plus-size), and watermark removal.

  • Advanced model services: These services provide out-of-the-box algorithm capabilities for videos. They include model services such as video classification and tagging, video quality assessment, dynamic classification and tagging for posts with images and videos (used for tagging multimodal content such as dynamic posts and threads), and AI-generated image tagging. The tags improve the training of AI image generation models.

Billing details

Multimedia analysis supports two billing methods: pay-as-you-go and subscription resource plans. For more information, see Billing details for multimedia analysis.

Usage guide

Activate multimedia analysis and purchase a resource plan

First-time users must activate the service in the Multimedia Analysis section, which is under Solutions on the Platform for AI (PAI) page. The procedure is as follows.

  1. Log on to the PAI console.

  2. Follow the instructions in the figure to activate the Multimedia Analysis service.

  3. The pay-as-you-go billing method is used by default. You are billed based on the number of calls.38e9535689e0b041a5c5c5a0ca32dd1a.png

You can also purchase a resource plan with a one-time payment for a lower price.

  1. On the Basic Model Service tab of the Multimedia Analytics page, click Purchase Resource Plan.

  2. On the Subscription Model Service page, configure the QuantityScenariosAPI Calls and Duration parameters, and then click Buy Now.

  3. To use multimedia analysis services, set the Scenarios parameter to Multimedia Analysis-Basic Model Service or Multimedia Analysis-Advanced Model Service. Configure the other parameters based on your business requirements.

Python SDK instructions

After activating the multimedia analysis service, you can use the Python software development kit (SDK) to call various algorithm services. For more information, see Multimedia analysis: Python SDK instructions.

Java SDK instructions

After activating the multimedia analysis service, refer to the Java SDK GitHub for details about using the Java SDK to call API operations for algorithm services. The parameters for the Java SDK are almost identical to those for the Python SDK. For parameter details, see Multimedia analysis: Python SDK instructions.

Multimedia analysis capabilities matrix

Specification

Model service name

Consumption per service call

Description

Example

Foundation model service

Image quality assessment

1 foundation model service call

Provides image quality assessment and returns a floating-point score from 0 to 100.

"iqa_result":66.88

Facial attribute analysis

1 foundation model service call

  • Provides output for facial attributes, including face shape, hair color, hairstyle, and attractiveness.

  • Differentiates multiple faces based on the coordinates of the facial regions. If no face is detected, an empty array is returned.

  • Face shape: Triangle, Round, Heart, Square, Oval, Diamond, Long.

  • Female hairstyle:

    • Bangs type: Center-parted, Braided, Side-swept, No bangs, Wispy, Blunt.

    • Curl type: Cloud curls, Large waves, Small waves, Airy curls, Permed curls, Frizzy curls, Egg-roll curls.

    • Hairstyle: Curly, Updo, Straight, Ponytail, Braided.

    • Hair length: Medium, Short, Long.

  • Male hairstyle: Parted, Buzz cut, Crew cut, Flat top, Butch cut, Textured crop, Layered, Slicked back.

  • Hair color: Black, Coffee, Ash gray, Chestnut, Brown, Gradient, Burgundy, Gold, Yellow, Other.

  • Attractiveness: 0 to 5.

Age analysis

1 foundation model service call

  • Detects the age range of the main face in an image.

  • If there are multiple faces in the image, only the result for the largest face is returned. If no face is detected, an error is returned.

Age ranges include the following: '0-2', '3-9', '10-19', '20-29', '30-39', '40-49', '50-59', '60-69', and '70+'.

Multi-label image tagging

1 foundation model service call

Provides multi-label image tagging. It can output the top K tags with the highest probabilities and their corresponding high-dimensional features.

Examples of frequent tags: girl, selfie, boy, daily life, screenshot, food, car, cuisine, game, cartoon, animal, Korean fashion.

Figure modification

1 foundation model service call

Provides a figure modification feature. You can upload a portrait and adjust the figure by changing the degree parameter. This includes making the figure slimmer or larger. A degree > 0 indicates slimming.

The API operation returns the Base64 encoding of the modified image.

Watermark removal

1 foundation model service call

Removes watermarks from an image.

The API operation returns the Base64 encoding of the image after watermark removal.

AI-generated image tagging

1 foundation model service call

Provides multi-label image tagging capabilities for training AI image generation models, such as Stable Diffusion. Better tags improve the quality of the generated images.

  • Supported tagging models: WD14, BLIP, GIT, RAM.

  • Example caption result:

    "sensitive, 1girl, solo, long hair, looking at viewer, smile, black hair, brown eyes, scarf, lips, realistic".

Custom model service

N foundation model service calls. The value of N varies based on the complexity of the custom model.

Provides custom model services for images and videos.

Depends on the specific type of custom model.

Advanced model service

Dynamic classification and tagging for posts with images and videos

1 advanced model service call

Provides classification and tagging for dynamic posts or threads that contain multimodal content. Supports classification and tagging using text and image or text and video combinations. Also supports returning high-dimensional feature embeddings.

  • Examples of frequent classes: life, movies and TV shows, sports, travel, games, food, fitness.

  • Examples of frequent tags: sports, food, dance, fitness, cooking, travel, selfie.

  • Example embedding:

    0.915,0.882,0.943,0.978,1.027,1.181,1.066,1.029,0.866,0.716,0.628,1.203,0.689,0.533,0.734,1.038,0.98,0.613,0.96,0.88,0.586,0.702,1.515,0.697,0.987,0.699,1.179,4.274,0.757,0.89,0.805,0.901.

Video quality assessment

1 advanced model service call

Provides short video quality assessment and returns a floating-point quality score from 0 to 100.

"video_score":20.57

Video classification and tagging

1 advanced model service call

Provides short video classification and tagging. Returns the video class and the top K tags with the highest probabilities. Also supports outputting high-dimensional video features.

  • Examples of frequent classes: life, knowledge, music, technology, games.

  • Examples of frequent tags: with captions, girl, social news, slimming and shaping, skits, movie clips, natural scenery.

Testing and service

For further testing and support, contact us by submitting a ticket for technical support.