All Products
Search
Document Center

Platform For AI:End-to-end text recognition

Last Updated:Apr 03, 2024

EasyVision of Machine Learning Platform for AI allows you to perform model training and prediction in end-to-end text recognition. In addition, you can use EasyVision to perform distributed training and prediction on multiple servers. This topic describes how to use EasyVision to achieve offline prediction in end-to-end text recognition based on existing training models.

Data format

For more information, see Input data formats.

Offline prediction in end-to-end text recognition

You can run the following Machine Learning Platform for AI command to start offline prediction in end-to-end text recognition based on existing files: You can call the PAI command by using the SQL Script component, the MaxCompute client, or an ODPS SQL node of DataWorks. For more information, see MaxCompute client (odpscmd) or Develop a MaxCompute SQL task.

pai -name ev_predict_ext
             -Dmodel_path='OSS path of your model'
             -Dmodel_type='text_spotter'
             -Dinput_oss_file='oss://path/to/your/filelist.txt'
             -Doutput_oss_file='oss://path/to/your/result.txt'
             -Dimage_type='url'
             -Dnum_worker=2
             -DcpuRequired=800
             -DgpuRequired=100
             -Dbuckets='Your OSS directory'
             -Darn='Alibaba Cloud Resource Name (ARN) of the role that you are assuming'
             -DossHost='Your OSS domain'

For more information, see Parameters.

Output

Prediction results are written to an output file. Each entry in the file consists of an Object Storage Service (OSS) path and a JSON string. The OSS path indicates the path of the original image, whereas the JSON string indicates the prediction result. For example, the output file contains the following information:

oss://path/to/your/image1.jpg,  JSON string
oss://path/to/your/image1.jpg,  JSON string
oss://path/to/your/image1.jpg,  JSON string

The JSON string is in the following format:

{
  "detection_keypoints": [[[243.57516479492188, 198.84210205078125], [243.91038513183594, 247.62425231933594], [385.5513916015625, 246.61660766601562], [385.2197570800781, 197.79345703125]], [[292.2718200683594, 114.44700622558594], [292.2237243652344, 164.684814453125], [571.1962890625, 164.931640625], [571.2444458007812, 114.67433166503906]]],
  "detection_boxes": [[243.5308074951172, 197.69570922851562, 385.59625244140625, 247.7247772216797], [292.1929931640625, 114.28043365478516, 571.2748413085938, 165.09771728515625]],
  "detection_scores": [0.9942291975021362, 0.9940272569656372],
  "detection_classes": [1, 1],
  "detection_classe_names": ["text", "text"],
  "detection_texts_ids" : [[1,2,2008,12], [1,2,2008,12]],
  "detection_texts": ["This is an example.", "This is an example."],
  "detection_texts_scores" : [0.88, 0.88]
 }

The following table describes the parameters in the JSON string.

Parameter

Description

Shape

Data type

detection_boxes

The text area with detected coordinates in the following order: [top, left, bottom, right].

[num_detections, 4]

FLOAT

detection_scores

The probability that the text is detected.

num_detections

FLOAT

detection_classes

The ID of the category to which the text area belongs.

num_detections

INT

detection_class_names

The name of the category to which the text area belongs.

num_detections

STRING

detection_keypoints

The (y, x) coordinates of the four vertices of the detected text area.

[num_detections, 4, 2]

FLOAT

detection_texts_ids

The ID of the category to which a single line of the recognized text belongs.

[num_detections, max_text_length]

INT

detection_texts

The recognition result of the single-line text.

[num_detections]

STRING

detection_texts_scores

The probability that the single-line text is recognized.

[num_detections]

FLOAT