All Products
Search
Document Center

Platform For AI:End-to-end text recognition

Last Updated:Mar 11, 2026

Detect and recognize text in images using trained models for batch processing.

Data format

For more information, see Input data formats.

Run predictions

Invoke PAI commands using the SQL Script component, MaxCompute client, or ODPS SQL node in DataWorks. For more information, see Connect using local client (odpscmd) or Develop an ODPS SQL task.

pai -name ev_predict_ext
             -Dmodel_path='Path to model'
             -Dmodel_type='text_spotter'
             -Dinput_oss_file='oss://path/to/filelist.txt'
             -Doutput_oss_file='oss://path/to/result.txt'
             -Dimage_type='url'
             -Dnum_worker=2
             -DcpuRequired=800
             -DgpuRequired=100
             -Dbuckets='OSS buckets'
             -Darn='RoleArn'
             -DossHost='OSS domain name'

See Parameters for parameter descriptions.

Output format

Each line contains an image path and prediction result in JSON format.

oss://path/to/your/image1.jpg,  JSON result string
oss://path/to/your/image2.jpg,  JSON result string
oss://path/to/your/image3.jpg,  JSON result string

Result structure:

{
  "detection_keypoints": [[[243.57516479492188, 198.84210205078125], [243.91038513183594, 247.62425231933594], [385.5513916015625, 246.61660766601562], [385.2197570800781, 197.79345703125]], [[292.2718200683594, 114.44700622558594], [292.2237243652344, 164.684814453125], [571.1962890625, 164.931640625], [571.2444458007812, 114.67433166503906]]],
  "detection_boxes": [[243.5308074951172, 197.69570922851562, 385.59625244140625, 247.7247772216797], [292.1929931640625, 114.28043365478516, 571.2748413085938, 165.09771728515625]],
  "detection_scores": [0.9942291975021362, 0.9940272569656372],
  "detection_classes": [1, 1],
  "detection_classe_names": ["text", "text"],
  "detection_texts_ids" : [[1,2,2008,12], [1,2,2008,12]],
  "detection_texts": ["This is an example", "This is an example"],
  "detection_texts_scores" : [0.88, 0.88]
 }

Output parameters:

Parameter

Description

Shape

Data type

detection_boxes

Bounding box coordinates in [top, left, bottom, right] format

[num_detections, 4]

FLOAT

detection_scores

Detection confidence score

num_detections

FLOAT

detection_classes

Detection category ID

num_detections

INT

detection_class_names

Detection category name

num_detections

STRING

detection_keypoints

Four corner points in (y,x) coordinate format

[num_detections, 4, 2]

FLOAT

detection_texts_ids

Character ID array for recognized text

[num_detections, max_text_length]

INT

detection_texts

Recognized text content

[num_detections]

STRING

detection_texts_scores

Recognition confidence score

[num_detections]

FLOAT