EasyVision of Platform for AI (PAI) allows you to train image detection models and use the trained models to make predictions. This topic describes how to use PAI commands to train an image detection model.
EasyVision simplifies the training configuration. You can use the -Dparam_config
parameter to set common parameters. This way, you do not need to know the rules or logic of the configuration files of EasyVision. If you need advanced parameters to train an image detection model, you can use the -Dconfig
parameter to pass the configuration file to EasyVision. The following table describes the models that are supported for image detection training.
Model | Backbone | Whether FPN is supported |
FasterRCNN |
| Yes |
RFCN |
| No |
SSD |
| Yes |
SSD | vgg16_reduce_fc | No |
SSD | mobilenet_v1 | No |
Image detection training
Image detection training with a single GPU
pai -name easy_vision_ext -Dbuckets='oss://{bucket_name}.{oss_host}/{path}' -Darn='acs:ram::*********:role/aliyunodpspaidefaultrole' -DgpuRequired=100 -Dcmd train -Dparam_config ' --model_type SSD --backbone resnet_v1_50 --num_classes 20 --model_dir oss://YOUR_BUCKET_NAME/test/ssd_fpn_resnet50 --train_data oss://pai-vision-data-sh/data/voc0712_tfrecord/voc0712_part_*.tfrecord --test_data oss://pai-vision-data-sh/data/voc0712_tfrecord/VOC2007_test.tfrecord --num_test_example 2 --train_batch_size 32 --test_batch_size 1 --image_sizes 300 --lr_type exponential_decay --initial_learning_rate 0.001 --decay_epochs 20 --staircase true '
Image detection training with multiple GPUs
pai -name easy_vision_ext -Dbuckets='oss://{bucket_name}.{oss_host}/{path}' -Darn='acs:ram::*********:role/aliyunodpspaidefaultrole' -Dcmd train -Dcluster='{ \"ps\": { \"count\" : 1, \"cpu\" : 600 }, \"worker\" : { \"count\" : 3, \"cpu\" : 800, \"gpu\" : 100 } }' -Dparam_config ' --model_type SSD --backbone resnet_v1_50 --num_classes 20 --model_dir oss://YOUR_BUCKET_NAME/test/ssd_fpn_resnet50 --train_data oss://pai-vision-data-sh/data/voc0712_tfrecord/voc0712_part_*.tfrecord --test_data oss://pai-vision-data-sh/data/voc0712_tfrecord/VOC2007_test.tfrecord --num_test_example 2 --train_batch_size 32 --test_batch_size 1 --image_sizes 300 --lr_type exponential_decay --initial_learning_rate 0.001 --decay_epochs 20 --staircase true '
Parameters
Parameter | Required | Description | Value format or example value | Default value |
buckets | Yes | The endpoint of the Object Storage Service (OSS) bucket. | oss://{bucket_name}.{oss_host}/{path} | N/A |
arn | Yes | The Alibaba Cloud Resource Name (ARN) of the RAM role that has the permissions to access OSS resources. For more information about how to obtain the ARN, see the "I/O parameters" section of the Parameters of PAI-TensorFlow tasks topic. | acs:ram::*:role/aliyunodpspaidefaultrole | N/A |
cluster | No | The configuration of parameters that are used for distributed training. | The JSON string. | "" |
gpuRequired | No | Specifies whether to use GPUs. Each worker uses one GPU by default. If you set this parameter to 200, each worker uses two GPUs. | 100 | 100 |
cmd | Yes | The type of the EasyVision task. Set this parameter to train when you train a model. | train | N/A |
param_config | Yes | The configuration of parameters that are used for model training. The format of the param_config parameter is the same as that of the ArgumentParser() object in Python. For more information, see param_config. | STRING | N/A |
param_config
The param_config parameter contains several parameters that are used for model training. The format of the param_config parameter is the same as that of the ArgumentParser() object in Python. The following example shows the configuration of the param_config parameter:
-Dparam_config = '
--backbone resnet_v1_50
--num_classes 200
--model_dir oss://your/bucket/exp_dir
'
The values of all string parameters in the param_config parameter are not enclosed in double quotation marks (") or single quotation marks (').
Parameter | Required | Description | Value format or example value | Default value |
model_type | Yes | The type of the model to train. Valid values:
| STRING | N/A |
backbone | Yes | The name of the backbone network that is used by the model. Valid values:
| STRING | N/A |
weight_decay | No | The value of L2 regularization. | FLOAT | 1e-4 |
use_fpn | No | Specifies whether to use Feature Pyramid Network (FPN). | BOOL | false |
num_classes | No | The number of categories, excluding background categories. | 21 | N/A |
anchor_scales | No | The size of the anchor box. The size of the anchor box is the same as that of the input image where the anchor box resides after the image is resized. If you use an SSD model, you do not need to specify this parameter. If you use FPN, set this parameter to the size of the anchor box in the layer that has the highest resolution. The total number of layers is five. The size of the anchor box in a layer is twice as much as that in the previous layer. For example, if the size of the anchor box in the first layer is 32, the sizes of the anchor boxes in the next four layers are 64, 128, 256, and 512. If you use the FasterRCNN model or the RFCN model without FPN support, you can specify multiple anchor sizes as needed. For example, you can set the anchor_scales parameter to 128 256 512. | FLOAT list. Example value: 32 (single scale) or 128 256 512 (multiple scales). |
|
anchor_ratios | No | The ratios of the width to the height of the anchor boxes. | FLOAT list | 0.5 1 2 |
image_sizes | No | The size of the images after they are resized. This parameter takes effect only when an SSD model is used. The value of this parameter is a list that contains two numbers, which indicate the height and width. | FLOAT list | 300 300 |
image_min_sizes | No | The length of the shorter side of images after they are resized. This parameter is used for FasterRCNN and RFCN. If you specify multiple lengths for the shorter sides of the images in the value of this parameter, the last one is used to evaluate the model, whereas one of the others is randomly selected to train the model. This way, the multi-scale training is supported. If you set only one length for the shorter sides of images, this length is used for both training and evaluation. | FLOAT list | 600 |
image_max_sizes | No | The length of the longer side of images after they are resized. This parameter is used for FasterRCNN and RFCN. If you specify multiple lengths for the longer sides of the images in the value of this parameter, the last one is used to evaluate the model, whereas one of the others is randomly selected to train the model. This way, the multi-scale training is supported. If you set only one length for the longer sides of images, this length is used for both training and evaluation. | FLOAT list | 1024 |
optimizer | No | The type of the optimizer. Valid values:
| STRING | momentum |
lr_type | No | The policy that is used to adjust the learning rate. Valid values:
| STRING | exponential_decay |
initial_learning_rate | No | The initial learning rate. | Floating point | 0.01 |
decay_epochs | No | If you set the lr_type parameter to exponential_decay, the decay_epochs parameter is equivalent to the decay_steps parameter of tf.train.exponential.decay. In this case, the decay_epochs parameter specifies the epoch interval at which you want to adjust the learning rate. The system automatically converts the value of the decay_epochs parameter to the value of the decay_steps parameter based on the total number of training data entries. Typically, you can set the decay_epochs parameter to half of the total number of epochs. For example, you can set this parameter to 10 if the total number of epochs is 20. If you set the lr_type parameter to manual_step, the decay_epochs parameter specifies the epochs for which you want to adjust the learning rates. For example, the value 16 18 indicates that you want to adjust the learning rates for the 16th and 18th epochs. Typically, if the total number of epochs is N, you can set the two values of the decay_epochs parameter to 8/10 × N and 9/10 × N. | INTEGER list. Example value: 20 20 40 60. | 20 |
decay_factor | No | The decay rate. This parameter is equivalent to the decay_factor parameter of tf.train.exponential.decay. | FLOAT | 0.95 |
staircase | No | Specifies whether the learning rate changes based on the decay_epochs parameter. This parameter is equivalent to the staircase parameter of tf.train.exponential.decay. | BOOL | true |
power | No | The power of the polynomial. This parameter is equivalent to the power parameter of tf.train.polynomial.decay. | FLOAT | 0.9 |
learning_rates | No | The learning rates that you want to set for the specified epochs. This parameter is required when you set the lr_type parameter to manual_step. If you want to adjust the learning rates for two epochs, set two learning rates in the value. For example, if the decay_epoches parameter is set to 20 40, you must specify two learning rates in the learning_rates parameter, such as 0.001 0.0001. This indicates that the learning rate of the 20th epoch is adjusted to 0.001 and the learning rate of the 40th epoch is adjusted to 0.0001. We recommend that you adjust the learning rates to one tenth, one hundredth, and one thousandth of the initial learning rate in sequence. | FLOAT list | N/A |
lr_warmup | No | Specifies whether to warm up the learning rate. | BOOL | false |
lr_warm_up_epochs | No | The number of epochs for which you want to warm up the learning rate. | FLOAT | 1 |
train_data | Yes | The OSS endpoint of the data that is used to train the model. | oss://path/to/train_*.tfrecord | N/A |
test_data | Yes | The OSS endpoint of the data that is evaluated during the training. | oss://path/to/test_*.tfrecord | N/A |
train_batch_size | Yes | The size of the data that is used to train the model in the current batch. | INT. Example value: 32. | N/A |
test_batch_size | Yes | The size of the data that is evaluated in the current batch. | INT. Example value: 32. | N/A |
train_num_readers | No | The number of concurrent threads that are used to read the training data. | INT | 4 |
model_dir | Yes | The OSS endpoint of the model. | oss://path/to/model | N/A |
pretrained_model | No | The OSS endpoint of the pretrained model. If this parameter is specified, the actual model is finetuned based on the pretrained model. | oss://pai-vision-data-sh/pretrained_models/inception_v4.ckpt | "" |
use_pretrained_model | No | Specifies whether to use a pretrained model. | BOOL | true |
num_epochs | Yes | The number of training iterations. The value 1 indicates that all data is iterated once for the training. | INT. Example value: 40. | N/A |
num_test_example | No | The number of data entries that are evaluated during the training. The value -1 indicates that all training data is evaluated. | INT. Example value: 2000. | -1 |
num_visualizations | No | The number of data entries that can be visualized during the evaluation. | INT | 10 |
save_checkpoint_epochs | No | The epoch interval at which a checkpoint is saved. The value 1 indicates that a checkpoint is saved each time an epoch is complete. | INT | 1 |
save_summary_epochs | No | The epoch interval at which a summary is saved. The value of 0.01 indicates that a summary is saved each time 1% of the training data is iterated. | FLOAT | 0.01 |
num_train_images | No | The total number of data entries that are used for the training. This parameter is required if you use custom TFRecord files to train the model. | INT | 0 |
label_map_path | No | The category mapping file. This parameter is required if you use custom TFRecord files to train the model. | STRING | "" |