You can use the video classification component to train a video classification model for inference based on unprocessed video data. This topic describes how to configure the video classification component and provides an example on how to use the component.

Prerequisites

OSS is activated, and Machine Learning Studio is authorized to access OSS. For more information, see Activate OSS and Grant PAI the permissions to access OSS.

Limits

This component is available only in Machine Learning Designer.

Introduction

The video classification module provides mainstream 3D Convolutional Neural Network (CNN) and transformer models that can be used to run video classification training jobs. The supported X3D models include X3D-XS, X3D-M, and X3D-L and the supported transformer models include swin-t, swin-s, swin-b, and swin-t-bert. The swin-t-bert model supports dual-modal input based on video and text data.

You can find the video classification component in the Offline Training subfolder under the Video Algorithm folder of the component library.

Configure the video classification component

  • Input ports
    Input port (from left to right)Data typeRecommended upstream componentRequired
    Training dataOSSRead File DataNo If no input port is used to pass the training data to the video classification component, you must go to the Fields Setting tab of the component and set the oss path to train file parameter. For more information, see the Component parameters table in this topic.
    Evaluation dataOSSRead File DataNo If no input port is used to pass the evaluation data to the video classification component, you must set the oss path to evaluation file parameter on the Fields Setting tab of the component. For more information, see the Component parameters table in this topic.
  • Component parameters
    TabParameterRequiredDescriptionDefault value
    Fields Settingoss path to save checkpointYesThe Object Storage Service (OSS) path in which the model is stored. Example: oss://pai-online-shanghai.oss-cn-shanghai-internal.aliyuncs.com/test/test_video_cls. None
    oss path to dataNoThe OSS directory where the video file is stored. If a directory is specified, the video file path contains the directory and the name of the video file in the labeling file. For example, if the OSS directory is oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/ and the video file name in the labeling file is video/1.mp4, the video file path is oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/video/1.mp4.None
    oss path to train fileNoThe OSS path in which the training data is stored. This parameter is required if no input port is used to pass the training data to the video classification component. Example: oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/train_pai.txt.

    If you use both an input port and this parameter to pass the training data to the video classification component, the training data that is passed by using the input port is preferably used.

    If the labeling file does not contain text, separate the video file name and label in each row of the labeling file with a space character. Example: Video file name Label. If the labeling file contains text, separate the video file name, text, and label in each row with \t. Example: Video file name\tText\tLabel.

    None
    oss path to evaluation fileNo

    The OSS path in which the evaluation data is stored. This parameter is required if no input port is used to pass the evaluation data to the video classification component. Example: oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/train_pai.txt.

    If you use both an input port and this parameter to pass the evaluation data to the video classification component, the evaluation data that is passed by using the input port is preferably used.
    None
    oss path to pretrained modelNoThe OSS path in which a pre-trained model is stored. We recommend that you use a pre-trained model to improve the model precision. None
    Parameters Settingvideo classification networkYesThe network that is used by the model. Valid values:
    • x3d_xs
    • x3d_l
    • x3d_m
    • swin_t
    • swin_s
    • swin_b
    • swin_t_bert
    x3d_xs
    whether to use multilabelNoSpecifies whether to use multiple labels.

    This parameter is available only when swin_t_bert is selected.

    false
    numclassesYesThe number of categories. None
    learning rateYesThe initial learning rate.

    For the x3d model, we recommend that you set the learning rate to 0.1. For the swin model, we recommend that you set the learning rate to 0.0001.

    0.1
    number of train epochsYesThe number of training iterations.

    For the x3d model, we recommend that you set the value to 300. For the swin model, we recommend that you set the value to 30.

    10
    warmup epochYesThe number of warmup iterations. We recommend that you set the initial learning rate for warmup to a small value so that the value of the learning rate parameter can be reached only after the specified number of warmup iterations are implemented. This prevents the model gradient from exploding. For example, if you set the warmup epoch parameter to 35, the learning rate of the model will be gradually increased to the value specified by the learning rate parameter after 35 warmup iterations. 35
    train batch sizeYesThe size of a training batch. This parameter specifies the number of data samples used in a single model iteration or training process. 32
    model save intervalNoThe epoch interval at which a checkpoint is saved. A value of 1 indicates that a checkpoint is saved each time an epoch is complete. 1
    Tuninguse fp 16YesSpecifies whether to enable FP16 to reduce memory usage during model training. None
    single worker or distributed on dlcNoThe mode in which the component is run. Valid values:
    • single_dlc: single worker on Deep Learning Containers (DLC)
    • distribute_dlc: distributed workers on DLC
    single_dlc
    gpu machine typeNoThe specification of the GPU-accelerated node that you want to use. 8vCPU+60GB Mem+1xp100-ecs.gn5-c8g1.2xlarge
  • Output ports
    Output port (from left to right)Data typeDownstream component
    output modelAn OSS path. The value is the same as that you specified for the oss path to save checkpoint parameter on the Fields Setting tab. The output model in the .pth format is stored in this OSS path. video prediction

Compute engine

The video classification component supports only the DLC engine.

Examples

The following figure shows a sample pipeline in which the video classification component is used. Sample experimentPerform the following steps to configure the component:
  1. Use two Read File Data components as the upstream components of the video classification component to read video data files as the input training data and evaluation data for the video classification component. To do this, set the OSS Data Path parameters of the two Read File Data components to the OSS paths of the video data files.
    The following figure shows the format of a video labeling file. Labeling fileEach row in the file indicates a video file path and a category label that are separated by a space character.
  2. Configure the training data and evaluation data as the input of the video classification component and set other parameters. For more information, see Configure the video classification component.