This topic describes how to use the data conversion features of Platform for AI (PAI) to convert images into TFRecord files. This way, you can use the TFRecord files to train models by using the training components provided by PAI. If you use iTAG of PAI to label data, the system generates a labeled dataset. Then, you can call the data conversion component to convert the labeled dataset into a TFRecord file. If you use other platforms to label data, you must run PAI commands to convert the labeled data into a labeled dataset supported by PAI. Then, you can convert the labeled dataset into a TFRecord file.
You can run PAI commands by using the SQL Script component, the MaxCompute client, or an ODPS SQL node of DataWorks. For more information, see MaxCompute client (odpscmd) or Develop a MaxCompute SQL task.
Convert a labeled dataset for single-label or multi-label image classification
Run the following PAI command to convert a labeled dataset into a TFRecord file. The labeled dataset is applicable to single-label or multi-label image classification.
pai -name easy_vision_ext
-Dbuckets='oss://{bucket_name}.{oss_host}/{path}/'
-Darn='acs:ram::*******:role/aliyunodpspaidefaultrole'
-DossHost='{oss_host}'
-Dcmd convert
-Dlabel_file 'oss://{bucket_name}/path/to/your/{label_file}'
-Dconvert_param_config ' --class_list_file oss://{bucket_name}/path/to/your/{class_list_file} --max_image_size 600 --write_parallel_num 8 --num_samples_per_tfrecord 128 --test_ratio 0.1 --model_type CLASSIFICATION'
-Doutput_tfrecord 'oss://{bucket_name}/path/to/output/data_prefix'
-Dcluster='{\"worker\" : {\"count\" : 1,\"cpu\" : 800}}'
Convert a labeled dataset for text detection or recognition
Run the following PAI command to convert a labeled dataset into a TFRecord file. The labeled dataset is applicable to text detection or recognition.
pai -name easy_vision_ext
-Dbuckets='oss://{bucket_name}.{oss_host}/{path}/'
-Darn='acs:ram::*******:role/aliyunodpspaidefaultrole'
-DossHost='{oss_host}'
-Dcmd convert
-Dlabel_file 'oss://{bucket_name}/path/to/your/{label_file}'
-Dconvert_param_config '--model_type TEXT_END2END --default_class text --max_image_size 2000 --char_replace_map_path oss://{bucket_name}/path/to/your_char_replace_map --default_char_dict_path oss://{bucket_name}/path/to/your_char_dict --test_ratio 0.1 --write_parallel_num 8 --num_samples_per_tfrecord 64'
-Doutput_tfrecord 'oss://{bucket_name}/test/convert/recipt_text_end2end/data'
PAI command parameters
Parameter | Required | Description | Format | Default value |
buckets | No | The name of the Object Storage Service (OSS) bucket. The name must end with a forward slash (/). If you specify multiple buckets, separate the bucket names with commas (,). | "oss://bucket_name/?role_arn=xxx&host=yyy" "oss://bucket_1/?role_arn=xxx&host=yyy,oss://bucket_2/" | Empty |
cmd | Yes | The operation that you want to perform. Set the value to convert. | STRING | convert |
label_file | Yes | The OSS path of the labeled dataset. For more information, see Overview of labeled dataset format. | oss://your_bucket/xxx.csv | N/A |
convert_param_config | No | The information about the conversion task. For more information, see the following table. You can also replace convert_param_config with convert_config. | --parama valuea --paramb valueb | "" |
output_tfrecord | No | The OSS path of the TFRecord file. | oss://your_dir/prefix | "" |
cluster | No | The information about the worker nodes that are used to perform conversion in a distributed manner. | JSON string | "{\"worker\":{\"count\":3, \"cpu\": 800, \"gpu\":0, \"memory\": 20000}}" |
The following table describes the options in the convert_param_config field.
Option | Required | Description | Format | Default value |
model_type | Yes | The type of models to which the converted data is applicable. Valid values:
Note If the value of the model_type parameter is set to TEXT_END2END or TEXT_RECOGNITION, the char_replace_map_path and default_char_dict_path parameters take effect. If the value of the model_type parameter is set to VIDEO_CLASSIFICATION, the decode_type, sample_fps, reshape_size, decode_batch_size, and decode_keep_size parameters take effect. | STRING | N/A |
class_list_file | No | The OSS path of the category file. The file contains a list of category names. The category names may be presented in the following format: | oss://path/to/your/classlit | " |
test_ratio | No | The ratio that is used to divide the set of test data into different subsets. If the value is set to 0, the entire set of test data is used for training. If the value is set to 0.1, 10% of the test data is used for verification. | FLOAT | 0.1 |
max_image_size | No | The maximum pixel value for the longer side of the images. If you configure this parameter and the size of an image exceeds the upper limit, the image is resized and saved to the TFRecord file to reduce storage space and accelerate data reading. | INT | N/A |
max_test_image_size | No | The maximum pixel value for the longer side of the image that is used for testing. The value of this parameter is the same as the value of the max_image_size parameter. This parameter is used to configure test data. | INT | ${max_image_size} |
default_class | No | The name of the default category. A category whose name is not included in class_list is considered the default category. | STRING | None |
error_class | No | The name of the invalid category. Objects or bounding boxes that belong to this category are not used for training. | STRING | N/A |
ignore_class | No | The name of the category that is used only for model detection. Bounding boxes that belong to this category are not used for training. | STRING | N/A |
converter_class | No | The name of the conversion class. Valid values:
| STRING | pai labeling format(old version) |
separator | No | The separator that is used in the split() method to divide the label file into substrings. | STRING | N/A |
image_format | No | The encoding format of the images in the TFRecord file. The following image encoding formats are commonly used:
| STRING | jpg |
read_parallel_num | No | The number of concurrent reads. | INT | 10 |
write_parallel_num | No | The number of concurrent writes to the TFRecord file. | INT | 1 |
num_samples_per_tfrecord | No | The number of images saved in each TFRecord file. | INT | 256 |
user_defined_converter_path | No | The HTTP or OSS path of the custom converter code. Example: http://path/to/your/converter.py. | STRING | N/A |
user_defined_generator_path | No | The HTTP or OSS path of the custom generator code. Example: http://path/to/your/generator.py. | STRING | N/A |
generator_class | No | The name of the custom generator class. | STRING | N/A |
char_replace_map_path | No | The OSS path of the CSV file for character map replacement. The file contains two columns named original and replaced.
| STRING | N/A |
default_char_dict_path | No | The OSS path of the file that is used to map characters to IDs. In the file, each character occupies a line. The ID of a character is equal to the line number minus 1. | STRING | N/A |
decode_type | No | The format used for decoding video. Valid values:
| INT | 4 |
sample_fps | No | The number of frames that are extracted for sampling per second. | FLOAT | 5 |
reshape_size | No | The size of the output frames. Unit: pixels. | INT | 224 |
decode_batch_size | No | The number of frames that are decoded at the same time. | INT | 10 |
decode_keep_size | No | The number of overlapped frames in different batches. | INT | 0 |