This topic describes how to use data conversion features of Machine Learning Platform for AI (PAI) to convert images to TFRecord files. This way, you can use the TFRecord files to train models by using training components provided by PAI. If you use the smart labeling feature of PAI to label data, PAI generates a labeled dataset. Then, you can call the data conversion component to convert the labeled dataset to a TFRecord file. If you use other platforms to label data, you must run PAI commands to convert the labeled data to a labeled dataset supported by PAI. Then, you can convert the labeled dataset to a TFRecord file.

Convert a labeled dataset for single-label or multi-label image classification

Run the following PAI command to convert a labeled dataset to a TFRecord file. The labeled dataset applies to single-label or multi-label image classification.
pai -name easy_vision_ext
      -Dbuckets='oss://{bucket_name}.{oss_host}/{path}/'
      -Darn='acs:ram::*******:role/aliyunodpspaidefaultrole'
      -DossHost='{oss_host}'
      -Dcmd  convert
      -Dlabel_file 'oss://{bucket_name}/path/to/your/{label_file}'
      -Dconvert_param_config '
        --class_list_file oss://{bucket_name}/path/to/your/{class_list_file}
        --max_image_size 600
        --write_parallel_num 8
        --num_samples_per_tfrecord 128
        --test_ratio 0.1
        --model_type CLASSIFICATION
      '
      -Doutput_tfrecord 'oss://{bucket_name}/path/to/output/data_prefix'
      -Dcluster='{
              \"worker\" : {
                \"count\" : 1,
                \"cpu\" : 800
              }
            }'

Convert a labeled dataset for text detection or recognition

Run the following PAI command to convert a labeled dataset to a TFRecord file. The labeled dataset applies to text detection or recognition.
pai -name easy_vision_ext
      -Dbuckets='oss://{bucket_name}.{oss_host}/{path}/'
      -Darn='acs:ram::*******:role/aliyunodpspaidefaultrole'
      -DossHost='{oss_host}'
      -Dcmd  convert
      -Dlabel_file 'oss://{bucket_name}/path/to/your/{label_file}'
      -Dconvert_param_config '
        --model_type TEXT_END2END
        --default_class text
        --max_image_size 2000
        --char_replace_map_path oss://{bucket_name}/path/to/your_char_replace_map
        --default_char_dict_path oss://{bucket_name}/path/to/your_char_dict
        --test_ratio 0.1
        --write_parallel_num 8
        --num_samples_per_tfrecord 64
      '
      -Doutput_tfrecord 'oss://pai-vision-data-hz/test/convert/recipt_text_end2end/data'

PAI command parameters

Parameter Required Description Format Default value
cmd Yes The operation that you want to perform. Set the value to convert. STRING convert
buckets No The name of the Object Storage Service (OSS) bucket. Add a forward slash (/) at the end of the bucket name. If you specify multiple buckets, separate the bucket names with commas (,). "oss://bucket_name/? role_arn=xxx&host=yyy" "oss://bucket_1/? role_arn=xxx&host=yyy,oss://bucket_2/" N/A
label_file Yes The OSS path of the labeled dataset. For more information, see Overview. oss://your_bucket/xxx.csv N/A
convert_param_config No The information about the conversion task. For more information, see the following table. You can also replace convert_param_config with convert_config. --parama valuea --paramb valueb ""
output_tfrecord No The OSS path of the TFRecord file. oss://your_dir/prefix ""
cluster No The information about the workers that are used to perform conversion in a distributed manner. JSON string "{\"worker\":{\"count\":3, \"cpu\": 800, \"gpu\":0, \"memory\": 20000}}"
The following table describes the parameters under convert_param_config.
Parameter Required Description Format Default value
model_type Yes The type of models to which the converted data applies. Valid values:
  • CLASSIFICATION: single-label or multi-label image classification
  • DETECTION: object detection
  • SEGMENTATION: image semantic segmentation
  • INSTANCE_SEGMENTATION: instance segmentation
  • TEXT_END2END: end-to-end optical character recognition (OCR)
  • TEXT_RECOGNITION: single-line text recognition
  • TEXT_DETECTION: text detection
  • VIDEO_CLASSIFICATION: video classification
  • SELF_DEFINED: custom conversion
Note If the value of model_type is set to TEXT_END2END or TEXT_RECOGNITION, the char_replace_map_path and default_char_dict_path parameters take effect. If the value of model_type is set to VIDEO_CLASSIFICATION, the decode_type, sample_fps, reshape_size, decode_batch_sizev, and decode_keep_size parameters take effect.
STRING N/A
class_list_file No The OSS path of the category file. The file contains a list of category names. The category name may alternatively be presented in the following format: Category name: Name of the mapping category. oss://path/to/your/classlit "
test_ratio No The ratio used for dividing the set of test data into different subsets. Valid values: 0 to 1. If the value is set to 0, the total set of test data is used for training. If the value is set to 0.1, 10% of the test data is used for verification. 0.1 0.1
max_image_size No The maximum pixel value for the longest edge of the image. If you have set the parameter and the size of an image exceeds the upper limit, the image is resized and saved to the TFRecord file. This reduces storage space and accelerates data reading. INT N/A
max_test_image_size No The maximum pixel value for the longest edge of the image that is used for testing. The value of this parameter is the same as that of max_image_size. This parameter is used for configuring test data. INT ${max_image_size}
default_class No The name of the default category. A category name not included in class_list will be named as the default category. STRING None
error_class No The name of the invalid category. Objects or bounding boxes that belong to this category are not used for training. STRING N/A
ignore_class No The name of the category that is used only for model detection. Bounding boxes that belong to this category are not used for training. STRING N/A
converter_class No The name of the conversion class. STRING QinceConverter
seperator No The separator used in the split() method to break the label file into substrings. STRING N/A
image_format No The encoding format of the images in the TFRecord file. STRING jpg
read_parallel_num No The number of concurrent reads. INT 10
write_parallel_num No The number of concurrent writes to the TFRecord file. INT 1
num_samples_per_tfrecord No The number of images saved in each TFRecord file. INT 256
user_defined_converter_path No The HTTP or OSS path of the user-defined converter code. Example: http://path/to/your/converter.py. STRING N/A
user_defined_generator_path No The HTTP or OSS path of the user-defined generator code. Example: http://path/to/your/generator.py. STRING N/A
generator_class No The name of the user-defined generator class. STRING N/A
char_replace_map_path No The OSS path of the CSV file for character map replacement. The file contains two columns named original and replaced.
  • The original column contains a list of original strings.
  • The replaced column contains a list of strings for replacing the original strings.
STRING N/A
default_char_dict_path No The OSS path of the file for mapping characters to IDs. The file contains rows of characters. One row represents one character. The ID of a character equals to the row number minus 1. STRING N/A
decode_type No The video decoding format. Valid values:
  • 1: intra only
  • 2: keyframe only
  • 3: without bidir
  • 4: decode all
INT 4
sample_fps No The number of frames extracted for sampling per second. FLOAT 5
reshape_size No The size of the output frame, in pixels. INT 224
decode_batch_size No The number of images contained in each batch for decoding. INT 10
decode_keep_size No The number of overlapped frames in different batches. INT 0