All Products
Search
Document Center

Platform For AI:Convert images into TFRecord files

Last Updated:Apr 11, 2024

This topic describes how to use the data conversion features of Platform for AI (PAI) to convert images into TFRecord files. This way, you can use the TFRecord files to train models by using the training components provided by PAI. If you use iTAG of PAI to label data, the system generates a labeled dataset. Then, you can call the data conversion component to convert the labeled dataset into a TFRecord file. If you use other platforms to label data, you must run PAI commands to convert the labeled data into a labeled dataset supported by PAI. Then, you can convert the labeled dataset into a TFRecord file.

Note

You can run PAI commands by using the SQL Script component, the MaxCompute client, or an ODPS SQL node of DataWorks. For more information, see MaxCompute client (odpscmd) or Develop a MaxCompute SQL task.

Convert a labeled dataset for single-label or multi-label image classification

Run the following PAI command to convert a labeled dataset into a TFRecord file. The labeled dataset is applicable to single-label or multi-label image classification.

pai -name easy_vision_ext 
      -Dbuckets='oss://{bucket_name}.{oss_host}/{path}/' 
      -Darn='acs:ram::*******:role/aliyunodpspaidefaultrole' 
      -DossHost='{oss_host}' 
      -Dcmd  convert 
      -Dlabel_file 'oss://{bucket_name}/path/to/your/{label_file}' 
      -Dconvert_param_config ' --class_list_file oss://{bucket_name}/path/to/your/{class_list_file} --max_image_size 600 --write_parallel_num 8 --num_samples_per_tfrecord 128 --test_ratio 0.1 --model_type CLASSIFICATION'
      -Doutput_tfrecord 'oss://{bucket_name}/path/to/output/data_prefix'
      -Dcluster='{\"worker\" : {\"count\" : 1,\"cpu\" : 800}}'

Convert a labeled dataset for text detection or recognition

Run the following PAI command to convert a labeled dataset into a TFRecord file. The labeled dataset is applicable to text detection or recognition.

pai -name easy_vision_ext
      -Dbuckets='oss://{bucket_name}.{oss_host}/{path}/'
      -Darn='acs:ram::*******:role/aliyunodpspaidefaultrole'
      -DossHost='{oss_host}'
      -Dcmd  convert
      -Dlabel_file 'oss://{bucket_name}/path/to/your/{label_file}'
      -Dconvert_param_config '--model_type TEXT_END2END --default_class text --max_image_size 2000 --char_replace_map_path oss://{bucket_name}/path/to/your_char_replace_map --default_char_dict_path oss://{bucket_name}/path/to/your_char_dict --test_ratio 0.1 --write_parallel_num 8 --num_samples_per_tfrecord 64'
      -Doutput_tfrecord 'oss://{bucket_name}/test/convert/recipt_text_end2end/data'

PAI command parameters

Parameter

Required

Description

Format

Default value

buckets

No

The name of the Object Storage Service (OSS) bucket. The name must end with a forward slash (/). If you specify multiple buckets, separate the bucket names with commas (,).

"oss://bucket_name/?role_arn=xxx&host=yyy" "oss://bucket_1/?role_arn=xxx&host=yyy,oss://bucket_2/"

Empty

cmd

Yes

The operation that you want to perform. Set the value to convert.

STRING

convert

label_file

Yes

The OSS path of the labeled dataset. For more information, see Overview of labeled dataset format.

oss://your_bucket/xxx.csv

N/A

convert_param_config

No

The information about the conversion task. For more information, see the following table. You can also replace convert_param_config with convert_config.

--parama valuea --paramb valueb

""

output_tfrecord

No

The OSS path of the TFRecord file.

oss://your_dir/prefix

""

cluster

No

The information about the worker nodes that are used to perform conversion in a distributed manner.

JSON string

"{\"worker\":{\"count\":3, \"cpu\": 800, \"gpu\":0, \"memory\": 20000}}"

The following table describes the options in the convert_param_config field.

Option

Required

Description

Format

Default value

model_type

Yes

The type of models to which the converted data is applicable. Valid values:

  • CLASSIFICATION: single-label or multi-label image classification

  • DETECTION: object detection

  • SEGMENTATION: semantic image segmentation

  • INSTANCE_SEGMENTATION: instance segmentation

  • TEXT_END2END: end-to-end optical character recognition (OCR)

  • TEXT_RECOGNITION: single-line text recognition

  • TEXT_DETECTION: text detection

  • VIDEO_CLASSIFICATION: video classification

  • SELF_DEFINED: custom conversion

Note

If the value of the model_type parameter is set to TEXT_END2END or TEXT_RECOGNITION, the char_replace_map_path and default_char_dict_path parameters take effect. If the value of the model_type parameter is set to VIDEO_CLASSIFICATION, the decode_type, sample_fps, reshape_size, decode_batch_size, and decode_keep_size parameters take effect.

STRING

N/A

class_list_file

No

The OSS path of the category file. The file contains a list of category names. The category names may be presented in the following format: Category name: Name of the mapping category.

oss://path/to/your/classlit

"

test_ratio

No

The ratio that is used to divide the set of test data into different subsets. If the value is set to 0, the entire set of test data is used for training. If the value is set to 0.1, 10% of the test data is used for verification.

FLOAT

0.1

max_image_size

No

The maximum pixel value for the longer side of the images. If you configure this parameter and the size of an image exceeds the upper limit, the image is resized and saved to the TFRecord file to reduce storage space and accelerate data reading.

INT

N/A

max_test_image_size

No

The maximum pixel value for the longer side of the image that is used for testing. The value of this parameter is the same as the value of the max_image_size parameter. This parameter is used to configure test data.

INT

${max_image_size}

default_class

No

The name of the default category. A category whose name is not included in class_list is considered the default category.

STRING

None

error_class

No

The name of the invalid category. Objects or bounding boxes that belong to this category are not used for training.

STRING

N/A

ignore_class

No

The name of the category that is used only for model detection. Bounding boxes that belong to this category are not used for training.

STRING

N/A

converter_class

No

The name of the conversion class. Valid values:

  • pai itag labeling format: the format of labeled files generated by iTAG

  • pai labeling format(old version): the previous format of labeled files generated by iTAG

  • qince labeling format: the format of labeled files generated by the qince platform

  • ssl labeling format: the file format defined by PAI for self-supervised image learning

STRING

pai labeling format(old version)

separator

No

The separator that is used in the split() method to divide the label file into substrings.

STRING

N/A

image_format

No

The encoding format of the images in the TFRecord file. The following image encoding formats are commonly used:

  • jpg

  • png

  • bmp

STRING

jpg

read_parallel_num

No

The number of concurrent reads.

INT

10

write_parallel_num

No

The number of concurrent writes to the TFRecord file.

INT

1

num_samples_per_tfrecord

No

The number of images saved in each TFRecord file.

INT

256

user_defined_converter_path

No

The HTTP or OSS path of the custom converter code. Example: http://path/to/your/converter.py.

STRING

N/A

user_defined_generator_path

No

The HTTP or OSS path of the custom generator code. Example: http://path/to/your/generator.py.

STRING

N/A

generator_class

No

The name of the custom generator class.

STRING

N/A

char_replace_map_path

No

The OSS path of the CSV file for character map replacement. The file contains two columns named original and replaced.

  • The original column contains a list of original strings.

  • The replaced column contains a list of strings that are used to replace the original strings.

STRING

N/A

default_char_dict_path

No

The OSS path of the file that is used to map characters to IDs. In the file, each character occupies a line. The ID of a character is equal to the line number minus 1.

STRING

N/A

decode_type

No

The format used for decoding video. Valid values:

  • 1: intra only

  • 2: keyframe only

  • 3: without bidir

  • 4: decode all

INT

4

sample_fps

No

The number of frames that are extracted for sampling per second.

FLOAT

5

reshape_size

No

The size of the output frames. Unit: pixels.

INT

224

decode_batch_size

No

The number of frames that are decoded at the same time.

INT

10

decode_keep_size

No

The number of overlapped frames in different batches.

INT

0