Machine Learning Platform for AI (PAI) provides the algorithm of video classification. You can use the algorithm to train video classification models based on tens of millions of videos. This topic describes how to generate a video classification model based on short video data.

You can use Elastic Algorithm Service (EAS) of PAI to deploy the trained video classification models as RESTful API operations. These operations can be called by using the MaxCompute console or ODPS SQL nodes of DataWorks. For more information, see Calling method.

You can download the test data, training models, and configuration files that are used in this test. For more information, see Download resources for video classification.

Data description

The algorithm of video classification supports the original video data in common formats such as .avi and .mp4. In this topic, the video files that are used to train a video classification model are eyemakeup and lipsmakeup. For more information, see Download resources for video classification.

Data format conversion

The data format conversion module can convert original video files to TFRecord files. TFRecord files can increase the speed of model training. You can run the following command to convert original video files to TFRecord files:
pai -name easy_vision_ext
-project algo_public
-Dbuckets='oss://{bucket_name}.{oss_host}/{path}/'
-Darn='acs:ram::*******:role/aliyunodpspaidefaultrole'
-DossHost='{oss_host}'
-Dcmd convert
-Dconvert_config='{bucket_name}.{oss_host}/{path}/{config_file}'
-Dlabel_file='{bucket_name}.{oss_host}/{path}/{config_file}/{label_file}'
-Doutput_tfrecord='{bucket_name}.{oss_host}/{path}/'
Example:
pai -name easy_vision_ext
-project algo_public
-Dbuckets='oss://demo-yuze.oss-cn-beijing-internal.aliyuncs.com/vip/'
-Darn='acs:ram::******:role/aliyunodpspaidefaultrole'
-DossHost='oss-cn-beijing-internal.aliyuncs.com'
-Dcmd convert
-Dconvert_config='oss://demo-yuze.oss-cn-beijing-internal.aliyuncs.com/vip/ucf101_qince.config'
-Dlabel_file='oss://demo-yuze.oss-cn-beijing-internal.aliyuncs.com/vip/vip.csv'
-Doutput_tfrecord='oss://demo-yuze.oss-cn-beijing-internal.aliyuncs.com/vip/'
where:
  • Dbuckets: the root directory of the Object Storage Service (OSS) endpoint.
  • Darn: the Alibaba Cloud Resource Name (ARN) of the RAM role that has the permissions to access OSS resources. For more information about how to obtain the ARN, see the "I/O parameters" section of the Task parameters of PAI-TensorFlow topic.
  • DossHost: the endpoint of the OSS bucket.
  • Dconvert_config: the configuration file. It specifies the categories of the original video files. The data description section of this topic provides the download address of this configuration file. The following example shows the content of the configuration file:
    class_map {
      label_name: "ApplyEyeMakeup"
    }
    class_map {
      label_name: "ApplyLipstick"
    }
    model_type: VIDEO_CLASSIFICATION
    converter_class: "QinceConverter"
    write_thread_num: 8
    part_record_num: 64
    test_ratio: 0.0
  • Dlabel_file: the OSS endpoint where the video files that are used for the training are stored. You must upload the video files to the OSS endpoint and specify the path in the configuration file, as shown in the following sample code:
    # Data ID, URL, and tagging result
    1,"{""tfspath"": ""oss://demo-yuze/data/eye/public_v_ApplyEyeMakeup_g01_c01.avi""}","{""option"": ""ApplyEyeMakeup""}"
    2,"{""tfspath"": ""oss://demo-yuze/data/eye/public_v_ApplyEyeMakeup_g02_c03.avi""}","{""option"": ""ApplyEyeMakeup""}"
    3,"{""tfspath"": ""oss://demo-yuze/data/eye/public_v_ApplyEyeMakeup_g02_c04.avi""}","{""option"": ""ApplyEyeMakeup""}"
    4,"{""tfspath"": ""oss://demo-yuze/data/eye/public_v_ApplyEyeMakeup_g03_c01.avi""}","{""option"": ""ApplyEyeMakeup""}"
    5,"{""tfspath"": ""oss://demo-yuze/data/eye/public_v_ApplyEyeMakeup_g04_c01.avi""}","{""option"": ""ApplyEyeMakeup""}"
    6,"{""tfspath"": ""oss://demo-yuze/data/lips/public_v_ApplyLipstick_g04_c02.avi""}","{""option"": ""ApplyEyeMakeup""}"
    7,"{""tfspath"": ""oss://demo-yuze/data/lips/public_v_ApplyLipstick_g05_c01.avi""}","{""option"": ""ApplyLipstick""}"
    8,"{""tfspath"": ""oss://demo-yuze/data/lips/public_v_ApplyLipstick_g07_c04.avi""}","{""option"": ""ApplyLipstick""}"
    9,"{""tfspath"": ""oss://demo-yuze/data/lips/public_v_ApplyLipstick_g01_c02.avi""}","{""option"": ""ApplyLipstick""}"
                            
    Replace the paths in the configuration file with actual OSS endpoints.
  • Doutput_tfrecord: the output path of the TFRecord files.

Train a video classification model

You can run the following command to train a video classification model based on the converted data:
pai -name ev_video_classification_ext
-project algo_public
-Dbackbone='resnet_3d_50'
-Dnum_epochs=50
-Ddecay_epochs=5
-Dsave_checkpoints_epochs=1
-Dmodel_dir='{bucket_name}.{oss_host}/{output_model_path}/'
-Duse_pretrained_model=true
-Dpretrained_model='{bucket_name}.{oss_host}/{model_path}/resent_3d_50_model.ckpt'
-Dtrain_data='{bucket_name}.{oss_host}/{path}/data_train_0_0.tfrecord'
-Dtest_data='{bucket_name}.{oss_host}/{path}/data_train_0_0.tfrecord'
-Dlabel_map_path='{bucket_name}.{oss_host}/{path}/data_label_map.pbtxt'
-Dnum_test_example=10
-Dtrain_batch_size=2
-Dtest_batch_size=2
-Dbuckets='{bucket_name}.{oss_host}/{path}'
-Darn='acs:ram::*********:role/aliyunodpspaidefaultrole'
-DossHost='{oss_host}'
-Dinitial_learning_rate=0.0001
-Dstaircase=false
-DgpuRequired=100
-Dnum_classes=2
The following sample code shows how to train a video classification model:
pai -name ev_video_classification_ext
-project algo_public
-Dbackbone='resnet_3d_50'
-Dnum_epochs=50
-Ddecay_epochs=5
-Dsave_checkpoints_epochs=1
-Dmodel_dir='oss://demo-yuze.oss-cn-beijing-internal.aliyuncs.com/model/'
-Duse_pretrained_model=true
-Dpretrained_model='oss://demo-yuze.oss-cn-beijing-internal.aliyuncs.com/model/resent_3d_50_model.ckpt'
-Dtrain_data='oss://demo-yuze.oss-cn-beijing-internal.aliyuncs.com/vip/data_train_0_0.tfrecord'
-Dtest_data='oss://demo-yuze.oss-cn-beijing-internal.aliyuncs.com/vip/data_train_0_0.tfrecord'
-Dlabel_map_path='oss://demo-yuze.oss-cn-beijing-internal.aliyuncs.com/vip/data_label_map.pbtxt'
-Dnum_test_example=10
-Dtrain_batch_size=2
-Dtest_batch_size=2
-Dbuckets='oss://demo-yuze.oss-cn-beijing-internal.aliyuncs.com/vip/'
-Darn='acs:ram::********:role/aliyunodpspaidefaultrole'
-DossHost='oss-cn-beijing-internal.aliyuncs.com'
-Dinitial_learning_rate=0.0001
-Dstaircase=false
-DgpuRequired=100
-Dnum_classes=2
where:
  • Dbackbone: the type of the backbone network.
  • Dmodel_dir: the directory of the output model.
  • Dpretrained_model: the uploaded address of the pretrained model.
  • Dtrain_data: the TFRecord file that is generated by converting the training data.
  • Dtest_data: the TFRecord file that is generated by converting the test data.
  • Dlabel_map_path: a .pbtxt file that is generated after data conversion.
  • Dnum_test_example: the number of samples that used in the test.
  • Dtrain_batch_size: the number of samples that are trained in a batch.
  • Dbuckets: the root directory.
  • Darn: the ARN of the RAM role that has the permissions to access OSS resources. For more information about how to obtain the ARN, see the "I/O parameters" section of the Task parameters of PAI-TensorFlow topic.
  • Dnum_classes: the number of categories.

The final video classification model is generated in TensorFlow SaveModel format and is stored in the Dmodel_dir directory. You can use EAS to deploy the final video classification model as RESTful API operations. For more information, see Deploy models.