Video label detection analyzes video content and returns structured labels that describe scenes, events, and objects. Use these labels to classify, search, and recommend videos in your applications.
Use cases
Video classification: Automatically categorize videos into topics such as news, entertainment, game, technology, food, sports, travel, animation, dance, music, film and television, and automobile.
Video retrieval: Build a searchable video library by indexing labels. For example, retrieve all videos that contain outdoor scenes or specific objects.
Personalized recommendation: Match content labels extracted from videos with user preference labels to deliver targeted video recommendations.
How it works
Upload a video to an OSS bucket.
Call CreateVideoLabelClassificationTask to create an asynchronous label detection task.
After the task completes, call GetVideoLabelClassificationResult to retrieve the detected labels.
Task information is retained for only seven days after the task starts. After this period, the results are no longer available through the API.
Prerequisites
Before you begin, make sure that you have:
An AccessKey pair. For more information, see Create an AccessKey pair
An OSS bucket with the video uploaded. For more information, see Upload objects
IMM activated. For more information, see Activate IMM
An IMM project created in the target region. For more information, see Create a project
You can also create and manage projects programmatically. Call CreateProject to create a project, or ListProjects to query existing projects in a region.
Track task status
In addition to calling GetVideoLabelClassificationResult, you can use the following methods to track task progress:
Method | Description |
API polling | |
Simple Message Queue (SMQ) | Subscribe to task notifications in the same region as your IMM project. For more information, see Asynchronous message examples and Receive and delete the message. |
ApsaraMQ for RocketMQ 4.0 | Create a RocketMQ instance, topic, and group in the same region to receive task notifications. For more information, see Asynchronous message examples and Send and subscribe to normal messages. |
EventBridge | Receive task completion events through EventBridge. For more information, see IMM events. |
Response structure
A successful task returns a Labels array containing hierarchical labels organized in up to three levels. Each label includes a confidence score and a centricity score.
Label fields
Field | Type | Description |
| String | The name of the detected label, such as "Natural landscape" or "Car". |
| Float | The probability that the label is correct. Range: 0 to 1. Higher values indicate greater confidence. |
| Float | How central or prominent the labeled content is in the video. Range: 0 to 1. Higher values mean the content is more prominent. |
| Integer | The position in the label hierarchy. |
| String | The name of the parent label. Empty for top-level labels (level 1). |
| String | The language of the label name, such as |
Label hierarchy
Labels are organized in a three-level hierarchy. Each child label references its parent through ParentLabelName:
Level 1 (category) Level 2 (subcategory) Level 3 (detail)
--------------------- ---------------------- -----------------
Tourism & geography -> Natural landscape -> Moon, Sky
Others -> Color -> Blue, Green, Black, White
-> Astronomical object
Daily necessities -> Text
-> Letter
Virtual scene -> Web page
-> Website
Artwork -> Illustration
Other scenes -> Mobile phone screenshotSample response
{
"ProjectName": "test-project",
"RequestId": "D65E8038-C584-0809-9BF0-****",
"StartTime": "2022-08-22T05:01:17.572Z",
"EndTime": "2022-08-22T05:01:20.49Z",
"TaskType": "VideoLabelClassification",
"TaskId": "VideoLabelClassification-1b77de73-ff9f-4c39-b254-****",
"Status": "Succeeded",
"Labels": [
{
"Language": "zh-Hans",
"LabelName": "Color",
"LabelConfidence": 0.999,
"CentricScore": 0.77,
"LabelLevel": 2,
"ParentLabelName": "Others"
},
{
"Language": "zh-Hans",
"LabelName": "Others",
"LabelConfidence": 0.999,
"CentricScore": 0.77,
"LabelLevel": 1,
"ParentLabelName": ""
},
{
"Language": "zh-Hans",
"LabelName": "Blue",
"LabelConfidence": 1,
"CentricScore": 0.716,
"LabelLevel": 3,
"ParentLabelName": "Color"
},
{
"Language": "zh-Hans",
"LabelName": "Natural landscape",
"LabelConfidence": 0.897,
"CentricScore": 0.801,
"LabelLevel": 2,
"ParentLabelName": "Tourism & geography"
},
{
"Language": "zh-Hans",
"LabelName": "Moon",
"LabelConfidence": 0.859,
"CentricScore": 0.756,
"LabelLevel": 3,
"ParentLabelName": "Natural landscape"
}
]
}This sample is abbreviated. A full response typically contains more labels. The response may also include UserData if set in the request, and Code and Message fields in case of errors.
Supported video formats
Video label detection supports the following 22 formats: AVI, MPEG, MPG, DAT, DIVX, XVID, RM, RMVB, MOV, QT, ASF, WMV, VOB, 3GP, MP4, FLV, AVS, MKV, TS, OGM, NSV, and SWF.
FAQ
Can I specify which labels to detect?
No. Video label detection uses a predefined label taxonomy. You cannot specify, include, or exclude individual labels.
What categories do labels fall into?
Labels are grouped into three broad categories:
Scenes: Natural landscapes such as forests, beaches, and snow-capped mountains. Living spaces such as homes and restaurants. Disaster scenes.
Events: Talent shows, office activities, performances, and production processes.
Objects: Tableware, electronic products such as mobile phones and computers, furniture, and vehicles.