The object detection component can be used to train object detection models. You can use the trained models for inference and apply the models to business scenarios in which you want to detect entities that have risks in images. This topic describes the parameter settings of the object detection component and provides examples on how to use the component.

Prerequisites

OSS is activated, and Machine Learning Studio is authorized to access OSS. For more information, see Activate OSS and Grant PAI the permissions to access OSS.

Limits

This component is available only in Machine Learning Designer.

Introduction

Machine Learning Platform for AI (PAI) supports various mainstream algorithms that you can use to train an object detection model. The following algorithms are supported:
  • SSD
  • Faster-RCNN
  • R-FCN
  • YoloV5
  • YOLOX
  • YOLOX_EDGE

You can find the object detection component in the Offline Training subfolder of the Video Algorithm folder in the component library. The component can be used for model training based on up to hundreds of millions of images.

Component configuration in the PAI console

  • Input ports
    Input port from left to rightData typeRecommended upstream componentRequired
    TFRecords for TrainingObject Storage Service (OSS) pathdata to tfrecordNo.

    If you do not use this input port to specify the training data, you must set the oss path to training tfrecord parameter on the Fields Setting tab.

    TFRecords for EvaluationOSS pathdata to tfrecordNo.

    If you do not use this input port to specify the evaluation data, you must set the oss path to evaluation tfrecord parameter on the Fields Setting tab.

    yolov5 class list fileOSS pathRead File DataNo. A category file is required only if you set the model type parameter to YOLOV5.

    If you do not use this input port to specify the category file, you can set the oss path of class list file parameter on the Fields Setting tab.

  • Component parameters
    TabParameterRequiredDescriptionDefault value
    Fields Settingmodel typeYesThe type of the model to be trained. Valid values:
    • SSD
    • FasterRCNN
    • RFCN
    • YOLOV5: the latest algorithm supported by PAI. This algorithm has the advantages of high accuracy and high training speed.
    FasterRCNN
    oss dir to save modelYesThe OSS path in which you want to store the trained model. Example: oss://pai-online-shanghai.oss-cn-shanghai-internal.aliyuncs.com/test/ckpt/. N/A
    oss path to training tfrecordNoThe OSS path in which the training data in the TFRecord format is stored. This parameter is required if no input port is used to specify the training data for the component. The parameter value can contain wildcards. Example: oss://a/train*.tfrecord.

    If you use both an input port and this parameter to specify the training data for the component, the training data that is specified by using the input port is preferentially used.

    N/A
    oss path to evaluation tfrecordNo

    The OSS path in which the evaluation data in the TFRecord format is stored. This parameter is required if no input port is used to specify the evaluation data for the component. The parameter value can contain wildcards. Example: oss://a/test*.tfrecord.

    If you use both an input port and this parameter to specify the evaluation data for the component, the evaluation data that is specified by using the input port is preferentially used.

    N/A
    use pretrained modelNoSpecifies whether to use a pre-trained model. We recommend that you use a pre-trained model to improve the training accuracy. Yes
    oss path to pretrained modelNoThe OSS path in which your pre-trained model is stored. If you have a pre-trained model, set this parameter to the OSS path of your pre-trained model. If you do not set this parameter, the corresponding default pre-trained model provided by PAI is used. N/A
    oss path of class list fileNoThe OSS path of the category file. A category file is required only if the model type parameter is set to YOLOV5. The value of this parameter is automatically generated by PAI.

    A category file specifies the categories that require training. Each line in the category file indicates a category.

    No
    YOLOV5 Data Source TypeNoThe data source type that you want to use for the YOLOV5 model. This parameter is required only if the model type parameter is set to YOLOV5. Valid values:
    • DetSourceCOCOYOLOV5
    • DetSourcePAI
    • DetSourceVOC
    DetSourcePAI
    yolo with oss label cacheNoSpecifies whether to enable the cache feature to read OSS objects at a higher speed. This parameter is required only if the model type parameter is set to YOLOV5. No
    Parameters SettingbackboneNoThe name of the backbone network used by the model.
    • Valid values if you set the model type parameter to SSD:
      • resnet_v1_50
      • vgg16_reduce_fc
      • mobilenet_v1
    • Valid values if you set the model type parameter to FasterRCNN or RFCN:
      • resnet_v1_50
      • resnet_v1_101
    • You do not need to set this parameter if you set the model type parameter to YOLOV5.
    resnet_v1_50
    YOLOV5 model sizeNoThe size of the YOLOV5 model that you want to train. This parameter is required if you set the model type parameter to YOLOV5. Valid values:
    • yolov5s
    • yolov5m
    • yolov5l
    • yolov5x
    yolov5m
    num classesYesThe number of categories in the dataset. If you do not set this parameter, PAI automatically analyzes the dataset to obtain the number of categories. N/A
    anchor scalesYesThe size of an anchor box. The size of an anchor box is the same as that of the input image in which the anchor box is located after the image is resized. We recommend that you take into consideration the size of the resized input image when you set this parameter. The value of this parameter varies based on the value of the model type parameter.
    • If you set the model type parameter to SSD, you do not need to set this parameter. By default, the size of an anchor box is 0.1, 0.2, 0.37, 0.54, 0.71, 0.88, 0.96, or 1.0 times the size of the input image.
    • If you do not use Feature Pyramid Network (FPN) and set the model type parameter to FasterRCNN or RFCN, you can specify various sizes for anchor boxes. Default value: [128 256 512].
    • If you use FPN, set this parameter to the size of an anchor box in the layer that has the highest resolution. The total number of layers is five. The size of an anchor box in a layer is twice that in the previous layer. For example, if the size of an anchor box in the first layer is 32, the sizes of the anchor boxes in the next four layers are 64, 128, 256, and 512. By default, if FPN is used, the size of an anchor box in the first layer is 32 for an SSD or a Faster R-CNN model. FPN is not supported for an R-FCN model.
    N/A
    anchor ratioNoThe ratio of the width to the height of each anchor box. Separate multiple ratios with spaces. 0.5 1 2
    the min size of image after resizeNoThe one or more lengths for the shorter sides of images after images are resized. This parameter is required if you set the model type parameter to FasterRCNN or RFCN.

    If you specify multiple lengths for the shorter sides of images in the value of this parameter, the last one is used to evaluate the model, whereas the others are used to train the model. This way, multi-scale training is implemented. If you specify only one length for the shorter sides of images, this length is used for both training and evaluation.

    600
    the max size of image after resizeNoThe one or more lengths for the longer sides of images after images are resized. This parameter is required if you set the model type parameter to FasterRCNN or RFCN.

    If you specify multiple lengths for the longer sides of images in the value of this parameter, the last one is used to evaluate the model, whereas the others are used to train the model. This way, multi-scale training is implemented. If you specify only one length for the longer sides of images, this length is used for both training and evaluation.

    1024
    optimizerYesThe optimizer that you want to use for model training. Valid values:
    • momentum: stochastic gradient descent (SGD) with momentum
    • adam
    momentum
    input image size of yolov5NoThe size to which input images are to be adjusted during preprocessing. Unit: pixels. The value must be an integer, and the integer specifies the height, weight, and size of an image. 640
    learning rate policyYes
    The policy that is used to adjust the learning rate. Valid values:
    • exponential_decay: The learning rate is subject to exponential decay. For more information, see tf.compat.v1.train.exponential_decay.
    • polynomial_decay: The learning rate is subject to polynomial decay. For more information, see tf.compat.v1.train.polynomial_decay. If you set this parameter to polynomial_decay, the num_steps parameter is automatically set to the total number of training iterations. The final learning rate is automatically set to one-thousandth of the initial learning rate.
    • manual_step: The learning rate is subject to those specified for each epoch. We recommend that you select this option if you have advanced business requirements. If you set this parameter to manual_step, you must set the decay_epochs parameter to specify the epochs for which you want to adjust the learning rate. You must also set the manual step learning rates parameter to specify the learning rate for each epoch as needed.
    exponential_decay
    initial learning rateYesThe initial learning rate. 0.01
    decay_epochsNo

    The epoch interval at which you want to adjust the learning rate. This parameter is required if you set the learning rate policy parameter to exponential_decay or manual_step.

    If you set the learning rate policy parameter to exponential_decay, the decay_epochs parameter is equivalent to the decay_steps parameter in tf.train.exponential_decay. For more information, see tf.compat.v1.train.exponential_decay. The system automatically converts the value of the decay_epochs parameter to the value of the decay_steps parameter based on the total number of training data entries. In general, you can set the decay_epochs parameter to half of the total number of epochs. For example, you can set this parameter to 10 if the total number of epochs is 20.

    If you set the learning rate policy parameter to manual_step, the decay_epochs parameter specifies the epochs for which you want to adjust the learning rate. For example, a value of 16 18 indicates that you want to adjust the learning rate for the 16th and 18th epochs. Typically, if the total number of epochs is N, you can set the decay_epochs parameter to 8/10 × N and 9/10 × N.

    10
    decay_factorNoThe decay rate. This parameter is required if you set the learning rate policy parameter to exponential_decay. This parameter is equivalent to the decay_factor parameter in tf.train.exponential_decay. For more information, see tf.compat.v1.train.exponential_decay. 0.95
    staircaseNoSpecifies the mode in which the learning rate decays. This parameter is required only if you set the learning rate policy parameter to exponential_decay. If you select this check box, the learning rate decays at discrete intervals. Otherwise, the learning rate decays at consecutive intervals. This parameter is equivalent to the staircase parameter of tf.train.exponential_decay. For more information, see tf.compat.v1.train.exponential_decay. Yes
    manual step learning ratesNo

    The learning rates for the specified epochs. This parameter is required only if you set the learning rate policy parameter to manual_step. If you want to adjust the learning rate for two epochs, specify two learning rates in the value. For example, if you set the decay_epoches parameter to 20 40, you must specify two learning rates in the value of this parameter, such as 0.001 0.0001. This indicates that the learning rate of the 20th epoch is adjusted to 0.001 and the learning rate of the 40th epoch is adjusted to 0.0001. We recommend that you adjust the learning rate to one tenth, one hundredth, one thousandth, and the like, of the initial learning rate in sequence for the specified epochs.

    N/A
    train batch sizeYesThe size of a training batch. Specifically, this parameter specifies the number of data entries used in a single iteration or training process. N/A
    eval batch sizeYesThe size of an evaluation batch. Specifically, this parameter specifies the number of data entries used in a single iteration or evaluation process. 1
    num epochsYesThe total number of training epochs. If you use the YOLOX or YOLOX_EDGE algorithm, the total number of epochs consists of the number of epochs for the warm-up strategy and the number of epochs to complete after the learning rate is stable if the warm-up strategy is used. 20
    number of evaluation imagesNoThe number of data entries that are evaluated during the training. Yes
    number of image for visualizationNoThe number of data entries that can be visualized during the evaluation. 5
    save checkpoint epochNoThe frequency at which checkpoints are saved. A value of 1 indicates that all data is iterated once for the training. 1
    Tuningio thread num for trainingNoThe number of threads that are used to read the training data. 4
    single worker or distributed on MaxCompute or DLCNo
    The computing engine that is used to run the algorithm component. You can select a computing engine based on your business requirements. The following computing engines are supported:
    • MaxCompute: Use the MaxCompute instance that is associated with your AI workspace as the computing engine. For information about how to add a computing engine, see Create a workspace. For information about the billing rules, see Billing of Machine Learning Designer.
    • DLC: Use the DLC instance that is associated with your AI workspace as the computing engine. For information about how to add a computing engine, see Create a workspace. For information about the billing rules, see Billing of DLC.
    The compute engine that is used to run the component. Valid values:
    • single_on_max_compute

      If you select single_on_max_compute, you must set the use gpu parameter. A value of 100 for the use gpu parameter indicates one GPU. If you do not need GPUs, set the use gpu parameter to 0.

    • distribute_on_max_compute
      If you select distribute_on_max_compute, you must set the following parameters:
      • number of worker: the number of concurrent workers.
      • cpu core number: the number of CPU cores for each worker. A value of 100 indicates one CPU core.
      • memory: the memory size of each worker. Unit: MB.
    • single_on_dlc

      If you select single_on_dlc, you must set the gpu machine type parameter. The gpu machine type parameter specifies the GPU specifications.

    • distribute_on_dlc
      If you select distribute_on_dlc, you must set the following parameters:
      • number of worker: the number of concurrent workers.
      • cpu machine type: the CPU specifications to be used.
      • gpu machine type: the GPU specifications to be used.
    distribute_on_dlc
  • Output port
    Output portData typeDownstream component
    output modelThe OSS path in which you want to store the output model. The OSS path is the value that you specify for the oss dir to save model parameter on the Fields Setting tab. The output model in the SavedModel format is stored in this path. image prediction

Compute engine

You can use the object detection component based on computing resources of MaxCompute and Deep Learning Containers (DLC) of PAI. To specify the compute engine of the component, go to the Tuning tab and set the single worker or distributed on MaxCompute or DLC parameter.

Examples

To build an SSD, Faster R-CNN, or R-FCN model, you can use the object detection component to create a pipeline as shown in the following figure. Example 1In this example, the components are configured by using the following steps:
  1. Label images by using iTAG provided by PAI. For more information, see Process labeling jobs.
  2. Configure the Read File Data-1 component to read the labeling result file xxx.manifest. To do so, set the OSS Data Path parameter of the Read File Data-1 component to the OSS path of the labeling result file, such as oss://pai-online-shanghai.oss-cn-shanghai.aliyuncs.com/ev_demo/xxx.manifest.
  3. Configure the data to tfrecord-1 component to divide the labeling result file into two TFRecord files that contain a training dataset and an evaluation dataset. For more information, see data to tfrecord.
  4. Specify the training dataset and evaluation dataset that are used as the input of the object detection-1 component and configure other parameters. For more information, see the "Component configuration in the PAI console" section in this topic.
  5. Configure the image prediction-1 component to perform batch inference. For more information, see image prediction.
To build a YOLOV5 model, you can use the object detection component to create a pipeline as shown in the following figure. Example 2In this example, the components are configured by using the following steps:
  1. Label images in the training dataset and evaluation dataset by using iTAG provided by PAI. For more information, see Process labeling jobs.
  2. Configure the Read File Data-1 component to read xxx.manifest, which is the labeling result file for the training dataset. To do so, you must set the OSS Data Path parameter of the Read File Data-1 component to the OSS path of the labeling result file. such as oss://pai-online-shanghai.oss-cn-shanghai.aliyuncs.com/ev_demo/xxx.manifest.
  3. Configure the Read File Data-2 component to read xxx.manifest, which is the labeling result file for the evaluation dataset. To do so, you must set the OSS Data Path parameter of the Read File Data-2 component to the OSS path of the labeling result file. such as oss://pai-online-shanghai.oss-cn-shanghai.aliyuncs.com/ev_demo/xxx.manifest.
  4. Specify the category file. To do so, you must set the OSS Data Path parameter of the Read File Data-3 component to the OSS path of the category file. Example: oss://pai-online-shanghai.oss-cn-shanghai.aliyuncs.com/ev_demo/image moderation-detection label_class.txt. You can build a category file based on the sample category file provided by PAI. For more information, see Sample category file for object detection.
  5. Specify the training dataset, evaluation dataset, and category file that are used as the input of the object detection-1 component and set other parameters. For more information, see the "Component configuration in the PAI console" section in this topic.
  6. Configure the image prediction-1 component to perform batch inference. For more information, see image prediction.