what is video classification component - Platform For AI

You can use the video classification component to train a video classification model for inference based on unprocessed video data. This topic describes how to configure the video classification component and provides an example on how to use the component in Platform for AI (PAI).

Prerequisites

OSS is activated, and Machine Learning Studio is authorized to access OSS. For more information, see Activate OSS and Grant the permissions that are required to use Machine Learning Designer.

Limits

The video classification component is available only in Machine Learning Designer of PAI.
You can use the video classification component with the computing resources of Deep Learning Container (DLC).

Overview

The video classification module provides mainstream 3D Convolutional Neural Network (CNN) and transformer models that can be used to run video classification training jobs. The supported X3D models include X3D-XS, X3D-M, and X3D-L and the supported transformer models include swin-t, swin-s, swin-b, and swin-t-bert. The swin-t-bert model supports dual-modal input based on video and text data.

You can find the video classification component in the Offline Training subfolder in the Video Algorithm folder of the component library.

Configure the component in the PAI console

Input ports

Input port (from left to right)	Data type	Recommended upstream component	Required
train data	OSS	Read File Data	No. If you do not use this input port to pass the training data to the video classification component, you need to go to the Fields Setting tab of the component and configure the oss path to train file parameter. For more information, see the Component parameters table in this topic.
eval data	OSS	Read File Data	No. If you do not use this input port to pass the evaluation data to the video classification component, you need to go to the Fields Setting tab of the component and configure the oss path to evaluation file parameter. For more information, see the Component parameters table in this topic.

Component parameters

Tab	Parameter	Required	Description	Default value
Fields Setting	oss path to save checkpoint	Yes	The Object Storage Service (OSS) path in which the model is stored. Example: `oss://pai-online-shanghai.oss-cn-shanghai-internal.aliyuncs.com/test/test_video_cls`.	N/A
	oss path to data	No	The OSS directory in which the video file is stored. If a directory is specified, the video file path contains the directory and the name of the video file in the labeling file. For example, if the OSS directory is `oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/` and the video file name in the labeling file is `video/1.mp4`, the video file path is `oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/video/1.mp4`.	N/A
	oss path to train file	No	The OSS path in which the training data is stored. This parameter is required if you do not use the input port to pass the training data to the video classification component. Example: `oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/train_pai.txt`. If you use both an input port and this parameter to pass the training data to the video classification component, the training data that is passed by using the input port is used. If the labeling file does not contain text, separate the video file name and label in each row of the labeling file with a space. Example: `Video file name Label`. If the labeling file contains text, separate the video file name, text, and label in each row with \t. Example: `Video file name\tText\tLabel`.	N/A
	oss path to evaluation file	No	The OSS path in which the evaluation data is stored. This parameter is required if you do not use the input port to pass the evaluation data to the video classification component. Example: `oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/train_pai.txt`. If you use both an input port and this parameter to pass the evaluation data to the video classification component, the evaluation data that is passed by using the input port is used.	N/A
	oss path to pretrained model	No	The OSS path in which a pre-trained model is stored. We recommend that you use a pre-trained model to improve the model precision.	N/A
Parameters Setting	video classification network	Yes	The network that is used by the model. Valid values: x3d_xs x3d_l x3d_m swin_t swin_s swin_b swin_t_bert	x3d_xs
	whether to use multilabel	No	Specifies whether to use multiple labels. This parameter is available only if you select the swin_t_bert network.	false
	numclasses	Yes	The number of categories.	N/A
	learning rate	Yes	The initial learning rate. For the x3d model, we recommend that you set the learning rate to 0.1. For the swin model, we recommend that you set the learning rate to 0.0001.	0.1
	number of train epochs	Yes	The number of training iterations. For the x3d model, we recommend that you set the value to 300. For the swin model, we recommend that you set the value to 30.	10
	warmup epoch	Yes	The number of warmup iterations. We recommend that you set the initial learning rate for warmup to a small value. This way, the value of the learning rate parameter can be reached only after the specified number of warmup iterations are implemented. This prevents the model gradient from exploding. For example, if you set the warmup epoch parameter to 35, the learning rate of the model is gradually increased to the value specified by the learning rate parameter after 35 warmup iterations.	35
	batch size	Yes	The size of a training batch. This parameter specifies the number of data samples used in a single model iteration or training process.	32
	model save interval	No	The epoch interval at which a checkpoint is saved. A value of 1 indicates that a checkpoint is saved each time an epoch is completed.	1
Tuning	use fp 16	Yes	Specifies whether to enable FP16 to reduce memory usage during model training.	N/A
	single worker or distributed on dlc	No	The mode in which the component is run. Valid values: single_dlc: single worker on Deep Learning Containers (DLC) distribute_dlc: distributed workers on DLC	single_dlc
	gpu machine type	No	The specification of the GPU-accelerated node that you want to use.	8vCPU+60GB Mem+1xp100-ecs.gn5-c8g1.2xlarge

Output ports

Output port (from left to right)	Data type	Downstream component
output model	The OSS path of the output model. The value is the same as that you specified for the oss path to save checkpoint parameter on the Fields Setting tab. The output model in the .pth format is stored in this OSS path.	video prediction

Examples

The following figure shows a sample pipeline in which the video classification component is used. 视频分类示例 In this example, configure the components in the preceding figure by performing the following steps:

Use two Read File Data components as the upstream components of the video classification component to read video data files as the input training data and evaluation data of the component. To do this, set the OSS Data Path parameters of the two Read File Data components to the OSS paths of the video data files.
The following figure shows the format of a video labeling file. Each row in the file contains a video file path and a category label that are separated by a space character.
Specify the training data and evaluation data as the input of the video classification component and configure other parameters. For more information, see Configure the video classification component.

References

For information about Machine Learning Designer components, see Overview of Machine Learning Designer.
Machine Learning Designer provides various preset algorithm components. You can select a component for data processing based on your business requirements. For more information, see Component reference: Overview of all components.