All Products
Search
Document Center

Intelligent Media Management:Video label detection

Last Updated:Mar 01, 2026

Video label detection analyzes video content and returns structured labels that describe scenes, events, and objects. Use these labels to classify, search, and recommend videos in your applications.

Use cases

  • Video classification: Automatically categorize videos into topics such as news, entertainment, game, technology, food, sports, travel, animation, dance, music, film and television, and automobile.

  • Video retrieval: Build a searchable video library by indexing labels. For example, retrieve all videos that contain outdoor scenes or specific objects.

  • Personalized recommendation: Match content labels extracted from videos with user preference labels to deliver targeted video recommendations.

How it works

  1. Upload a video to an OSS bucket.

  2. Call CreateVideoLabelClassificationTask to create an asynchronous label detection task.

  3. After the task completes, call GetVideoLabelClassificationResult to retrieve the detected labels.

Important

Task information is retained for only seven days after the task starts. After this period, the results are no longer available through the API.

Prerequisites

Before you begin, make sure that you have:

Note

You can also create and manage projects programmatically. Call CreateProject to create a project, or ListProjects to query existing projects in a region.

Track task status

In addition to calling GetVideoLabelClassificationResult, you can use the following methods to track task progress:

Method

Description

API polling

Call GetTask or ListTasks to query task details.

Simple Message Queue (SMQ)

Subscribe to task notifications in the same region as your IMM project. For more information, see Asynchronous message examples and Receive and delete the message.

ApsaraMQ for RocketMQ 4.0

Create a RocketMQ instance, topic, and group in the same region to receive task notifications. For more information, see Asynchronous message examples and Send and subscribe to normal messages.

EventBridge

Receive task completion events through EventBridge. For more information, see IMM events.

Response structure

A successful task returns a Labels array containing hierarchical labels organized in up to three levels. Each label includes a confidence score and a centricity score.

Label fields

Field

Type

Description

LabelName

String

The name of the detected label, such as "Natural landscape" or "Car".

LabelConfidence

Float

The probability that the label is correct. Range: 0 to 1. Higher values indicate greater confidence.

CentricScore

Float

How central or prominent the labeled content is in the video. Range: 0 to 1. Higher values mean the content is more prominent.

LabelLevel

Integer

The position in the label hierarchy. 1 = top-level category, 2 = mid-level subcategory, 3 = leaf-level detail.

ParentLabelName

String

The name of the parent label. Empty for top-level labels (level 1).

Language

String

The language of the label name, such as zh-Hans.

Label hierarchy

Labels are organized in a three-level hierarchy. Each child label references its parent through ParentLabelName:

Level 1 (category)      Level 2 (subcategory)     Level 3 (detail)
---------------------   ----------------------    -----------------
Tourism & geography  ->  Natural landscape     ->  Moon, Sky
Others               ->  Color                 ->  Blue, Green, Black, White
                     ->  Astronomical object
Daily necessities    ->  Text
                     ->  Letter
Virtual scene        ->  Web page
                     ->  Website
Artwork              ->  Illustration
Other scenes         ->  Mobile phone screenshot

Sample response

{
    "ProjectName": "test-project",
    "RequestId": "D65E8038-C584-0809-9BF0-****",
    "StartTime": "2022-08-22T05:01:17.572Z",
    "EndTime": "2022-08-22T05:01:20.49Z",
    "TaskType": "VideoLabelClassification",
    "TaskId": "VideoLabelClassification-1b77de73-ff9f-4c39-b254-****",
    "Status": "Succeeded",
    "Labels": [
        {
            "Language": "zh-Hans",
            "LabelName": "Color",
            "LabelConfidence": 0.999,
            "CentricScore": 0.77,
            "LabelLevel": 2,
            "ParentLabelName": "Others"
        },
        {
            "Language": "zh-Hans",
            "LabelName": "Others",
            "LabelConfidence": 0.999,
            "CentricScore": 0.77,
            "LabelLevel": 1,
            "ParentLabelName": ""
        },
        {
            "Language": "zh-Hans",
            "LabelName": "Blue",
            "LabelConfidence": 1,
            "CentricScore": 0.716,
            "LabelLevel": 3,
            "ParentLabelName": "Color"
        },
        {
            "Language": "zh-Hans",
            "LabelName": "Natural landscape",
            "LabelConfidence": 0.897,
            "CentricScore": 0.801,
            "LabelLevel": 2,
            "ParentLabelName": "Tourism & geography"
        },
        {
            "Language": "zh-Hans",
            "LabelName": "Moon",
            "LabelConfidence": 0.859,
            "CentricScore": 0.756,
            "LabelLevel": 3,
            "ParentLabelName": "Natural landscape"
        }
    ]
}
Note

This sample is abbreviated. A full response typically contains more labels. The response may also include UserData if set in the request, and Code and Message fields in case of errors.

Supported video formats

Video label detection supports the following 22 formats: AVI, MPEG, MPG, DAT, DIVX, XVID, RM, RMVB, MOV, QT, ASF, WMV, VOB, 3GP, MP4, FLV, AVS, MKV, TS, OGM, NSV, and SWF.

FAQ

Can I specify which labels to detect?

No. Video label detection uses a predefined label taxonomy. You cannot specify, include, or exclude individual labels.

What categories do labels fall into?

Labels are grouped into three broad categories:

  • Scenes: Natural landscapes such as forests, beaches, and snow-capped mountains. Living spaces such as homes and restaurants. Disaster scenes.

  • Events: Talent shows, office activities, performances, and production processes.

  • Objects: Tableware, electronic products such as mobile phones and computers, furniture, and vehicles.