All Products
Search
Document Center

Platform For AI:Video classification

Last Updated:Feb 28, 2024

You can use the video classification component to train a video classification model for inference based on unprocessed video data. This topic describes how to configure the video classification component and provides an example on how to use the component in Platform for AI (PAI).

Prerequisites

OSS is activated, and Machine Learning Studio is authorized to access OSS. For more information, see Activate OSS and Grant the permissions that are required to use Machine Learning Designer.

Limits

  • The video classification component is available only in Machine Learning Designer of PAI.

  • You can use the video classification component with the computing resources of Deep Learning Container (DLC).

Overview

The video classification module provides mainstream 3D Convolutional Neural Network (CNN) and transformer models that can be used to run video classification training jobs. The supported X3D models include X3D-XS, X3D-M, and X3D-L and the supported transformer models include swin-t, swin-s, swin-b, and swin-t-bert. The swin-t-bert model supports dual-modal input based on video and text data.

You can find the video classification component in the Offline Training subfolder in the Video Algorithm folder of the component library.

Configure the component in the PAI console

  • Input ports

    Input port (from left to right)

    Data type

    Recommended upstream component

    Required

    train data

    OSS

    Read File Data

    No. If you do not use this input port to pass the training data to the video classification component, you need to go to the Fields Setting tab of the component and configure the oss path to train file parameter. For more information, see the Component parameters table in this topic.

    eval data

    OSS

    Read File Data

    No. If you do not use this input port to pass the evaluation data to the video classification component, you need to go to the Fields Setting tab of the component and configure the oss path to evaluation file parameter. For more information, see the Component parameters table in this topic.

  • Component parameters

    Tab

    Parameter

    Required

    Description

    Default value

    Fields Setting

    oss path to save checkpoint

    Yes

    The Object Storage Service (OSS) path in which the model is stored. Example: oss://pai-online-shanghai.oss-cn-shanghai-internal.aliyuncs.com/test/test_video_cls.

    N/A

    oss path to data

    No

    The OSS directory in which the video file is stored. If a directory is specified, the video file path contains the directory and the name of the video file in the labeling file. For example, if the OSS directory is oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/ and the video file name in the labeling file is video/1.mp4, the video file path is oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/video/1.mp4.

    N/A

    oss path to train file

    No

    The OSS path in which the training data is stored. This parameter is required if you do not use the input port to pass the training data to the video classification component. Example: oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/train_pai.txt.

    If you use both an input port and this parameter to pass the training data to the video classification component, the training data that is passed by using the input port is used.

    If the labeling file does not contain text, separate the video file name and label in each row of the labeling file with a space. Example: Video file name Label. If the labeling file contains text, separate the video file name, text, and label in each row with \t. Example: Video file name\tText\tLabel.

    N/A

    oss path to evaluation file

    No

    The OSS path in which the evaluation data is stored. This parameter is required if you do not use the input port to pass the evaluation data to the video classification component. Example: oss://pai-vision-data-hz/EasyMM/DataSet/kinetics400/train_pai.txt.

    If you use both an input port and this parameter to pass the evaluation data to the video classification component, the evaluation data that is passed by using the input port is used.

    N/A

    oss path to pretrained model

    No

    The OSS path in which a pre-trained model is stored. We recommend that you use a pre-trained model to improve the model precision.

    N/A

    Parameters Setting

    video classification network

    Yes

    The network that is used by the model. Valid values:

    • x3d_xs

    • x3d_l

    • x3d_m

    • swin_t

    • swin_s

    • swin_b

    • swin_t_bert

    x3d_xs

    whether to use multilabel

    No

    Specifies whether to use multiple labels.

    This parameter is available only if you select the swin_t_bert network.

    false

    numclasses

    Yes

    The number of categories.

    N/A

    learning rate

    Yes

    The initial learning rate.

    For the x3d model, we recommend that you set the learning rate to 0.1. For the swin model, we recommend that you set the learning rate to 0.0001.

    0.1

    number of train epochs

    Yes

    The number of training iterations.

    For the x3d model, we recommend that you set the value to 300. For the swin model, we recommend that you set the value to 30.

    10

    warmup epoch

    Yes

    The number of warmup iterations. We recommend that you set the initial learning rate for warmup to a small value. This way, the value of the learning rate parameter can be reached only after the specified number of warmup iterations are implemented. This prevents the model gradient from exploding. For example, if you set the warmup epoch parameter to 35, the learning rate of the model is gradually increased to the value specified by the learning rate parameter after 35 warmup iterations.

    35

    batch size

    Yes

    The size of a training batch. This parameter specifies the number of data samples used in a single model iteration or training process.

    32

    model save interval

    No

    The epoch interval at which a checkpoint is saved. A value of 1 indicates that a checkpoint is saved each time an epoch is completed.

    1

    Tuning

    use fp 16

    Yes

    Specifies whether to enable FP16 to reduce memory usage during model training.

    N/A

    single worker or distributed on dlc

    No

    The mode in which the component is run. Valid values:

    • single_dlc: single worker on Deep Learning Containers (DLC)

    • distribute_dlc: distributed workers on DLC

    single_dlc

    gpu machine type

    No

    The specification of the GPU-accelerated node that you want to use.

    8vCPU+60GB Mem+1xp100-ecs.gn5-c8g1.2xlarge

  • Output ports

    Output port (from left to right)

    Data type

    Downstream component

    output model

    The OSS path of the output model. The value is the same as that you specified for the oss path to save checkpoint parameter on the Fields Setting tab. The output model in the .pth format is stored in this OSS path.

    video prediction

Examples

The following figure shows a sample pipeline in which the video classification component is used. 视频分类示例In this example, configure the components in the preceding figure by performing the following steps:

  1. Use two Read File Data components as the upstream components of the video classification component to read video data files as the input training data and evaluation data of the component. To do this, set the OSS Data Path parameters of the two Read File Data components to the OSS paths of the video data files.

    The following figure shows the format of a video labeling file. 标签文件Each row in the file contains a video file path and a category label that are separated by a space character.

  2. Specify the training data and evaluation data as the input of the video classification component and configure other parameters. For more information, see Configure the video classification component.

References