You can use the video classification component to train a video classification model for inference based on unprocessed video data. This topic describes how to configure the video classification component and provides an example on how to use the component.
Prerequisites
OSS is activated, and Machine Learning Studio is authorized to access OSS. For more information, see Activate OSS and Grant PAI the permissions to access OSS.
Limits
This component is available only in Machine Learning Designer.
Introduction
The video classification module provides mainstream 3D Convolutional Neural Network (CNN) and transformer models that can be used to run video classification training jobs. The supported X3D models include X3D-XS, X3D-M, and X3D-L and the supported transformer models include swin-t, swin-s, swin-b, and swin-t-bert. The swin-t-bert model supports dual-modal input based on video and text data.
You can find the video classification component in the Offline Training subfolder under the Video Algorithm folder of the component library.
Configure the video classification component
- Input ports
Input port (from left to right) Data type Recommended upstream component Required Training data OSS Read File Data No If no input port is used to pass the training data to the video classification component, you must go to the Fields Setting tab of the component and set the oss path to train file parameter. For more information, see the Component parameters table in this topic. Evaluation data OSS Read File Data No If no input port is used to pass the evaluation data to the video classification component, you must set the oss path to evaluation file parameter on the Fields Setting tab of the component. For more information, see the Component parameters table in this topic. - Component parameters
Tab Parameter Required Description Default value Fields Setting oss path to save checkpoint Yes The Object Storage Service (OSS) path in which the model is stored. Example: oss://pai-online-shanghai.oss-cn-shanghai-internal.aliyuncs.com/test/test_video_cls
.None oss path to data No The OSS directory where the video file is stored. If a directory is specified, the video file path contains the directory and the name of the video file in the labeling file. For example, if the OSS directory is oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/
and the video file name in the labeling file isvideo/1.mp4
, the video file path isoss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/video/1.mp4
.None oss path to train file No The OSS path in which the training data is stored. This parameter is required if no input port is used to pass the training data to the video classification component. Example: oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/train_pai.txt
.If you use both an input port and this parameter to pass the training data to the video classification component, the training data that is passed by using the input port is preferably used.
If the labeling file does not contain text, separate the video file name and label in each row of the labeling file with a space character. Example:
Video file name Label
. If the labeling file contains text, separate the video file name, text, and label in each row with \t. Example:Video file name\tText\tLabel
.None oss path to evaluation file No The OSS path in which the evaluation data is stored. This parameter is required if no input port is used to pass the evaluation data to the video classification component. Example:
If you use both an input port and this parameter to pass the evaluation data to the video classification component, the evaluation data that is passed by using the input port is preferably used.oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/train_pai.txt
.None oss path to pretrained model No The OSS path in which a pre-trained model is stored. We recommend that you use a pre-trained model to improve the model precision. None Parameters Setting video classification network Yes The network that is used by the model. Valid values: - x3d_xs
- x3d_l
- x3d_m
- swin_t
- swin_s
- swin_b
- swin_t_bert
x3d_xs whether to use multilabel No Specifies whether to use multiple labels. This parameter is available only when swin_t_bert is selected.
false numclasses Yes The number of categories. None learning rate Yes The initial learning rate. For the x3d model, we recommend that you set the learning rate to 0.1. For the swin model, we recommend that you set the learning rate to 0.0001.
0.1 number of train epochs Yes The number of training iterations. For the x3d model, we recommend that you set the value to 300. For the swin model, we recommend that you set the value to 30.
10 warmup epoch Yes The number of warmup iterations. We recommend that you set the initial learning rate for warmup to a small value so that the value of the learning rate parameter can be reached only after the specified number of warmup iterations are implemented. This prevents the model gradient from exploding. For example, if you set the warmup epoch parameter to 35, the learning rate of the model will be gradually increased to the value specified by the learning rate parameter after 35 warmup iterations. 35 train batch size Yes The size of a training batch. This parameter specifies the number of data samples used in a single model iteration or training process. 32 model save interval No The epoch interval at which a checkpoint is saved. A value of 1 indicates that a checkpoint is saved each time an epoch is complete. 1 Tuning use fp 16 Yes Specifies whether to enable FP16 to reduce memory usage during model training. None single worker or distributed on dlc No The mode in which the component is run. Valid values: - single_dlc: single worker on Deep Learning Containers (DLC)
- distribute_dlc: distributed workers on DLC
single_dlc gpu machine type No The specification of the GPU-accelerated node that you want to use. 8vCPU+60GB Mem+1xp100-ecs.gn5-c8g1.2xlarge - Output ports
Output port (from left to right) Data type Downstream component output model An OSS path. The value is the same as that you specified for the oss path to save checkpoint parameter on the Fields Setting tab. The output model in the .pth format is stored in this OSS path. video prediction
Compute engine
The video classification component supports only the DLC engine.
Examples

- Use two Read File Data components as the upstream components of the video classification component to read video data files as the input training data and evaluation data for the video classification component. To do this, set the OSS Data Path parameters of the two Read File Data components to the OSS paths of the video data files. The following figure shows the format of a video labeling file.
Each row in the file indicates a video file path and a category label that are separated by a space character.
- Configure the training data and evaluation data as the input of the video classification component and set other parameters. For more information, see Configure the video classification component.