This topic describes how to configure Optical Character Recognition (OCR) in the procedure of creating datasets, labeling images, creating tasks, and training and deploying models.


  • AutoLearning is authorized to access Object Storage Service (OSS). For more information, see OSS authorization.
  • An instance is created. For more information, see Create an instance.
  • Images that are used for model training are uploaded to OSS. We recommend that you use the graphical management tool ossbrowser to upload images in bulk. For more information, see Quick start.

Background information

  • Dataset requirements
    • Image quality: The images are not damaged and the resolution of the images must be higher than 30 pixels. AutoLearning supports images in JPG and JPEG formats.
    • Data balance: To reach data balance among image categories, we recommend that each category contain more than 50 images.
    • Generalization: Select images that are taken in real scenes from different angles.
  • Dataset requirements
    |-- your_image_dir /
        | -- a.jpg
        | -- a.xml
        | -- b.png
        | -- b.xml
        | -- c.png
    The images stored in OSS for model training must meet the preceding requirements. your_image_dir refers to the folder that stores the images for model training. The image labeling results are stored in XML files that are supported by Visual Object Classes (VOC) and Pattern Analysis, Statistic Modelling, and Computational Learning (PASCAL).
    The following example describes the XML format.
    <? xml version="1.0" encoding="utf-8"? >
            <name>phone number</name>
    The preceding example labels two strings: phone number and 18600000000.

Test data: OCR demo data.

Step 1: Create a dataset

  1. Visit the Computer Vision Model Training page.
    1. Log on to the Machine Learning Platform for AI console.
    2. In the left-side navigation pane, choose AutoLearning > Computer Vision Model Training.
  2. On the Computer Vision Model Training page, click Open in the Operation column.
  3. On the Data Preparation wizard page, click New Dataset.
  4. On the New Dataset page, set the following parameters.
    Parameter Description
    Dataset Name The dataset name must be 1 to 30 characters in length and can contain underscores (_) and hyphens (-). It must start with a letter or digit.
    Description Enter the description of the dataset, which helps distinguish different datasets.
    Storage type Only the default storage type OSS is supported.
    OSS path Specify the OSS path where the images for model training are stored.
  5. Click Submit.
    AutoLearning automatically creates indexes on images and labeling data. However, AutoLearning does not save the indexed images. Authorization is required only when AutoLearning needs to retrieve your images stored in OSS to train models. You can view the information of datasets in the Data list section.

Step 2: Label images

If your dataset contains unlabeled images, you can label them on the AutoLearning platform.

  1. On the Data Preparation wizard page, navigate to the Dataset list section. Then, click Labeling in the Operation column.
  2. On the Labeling page, label all images and click Submit.Image labeling
  3. Click Preview to view the labeling results.Labeling results

Step 3: Create a task

  1. On the Data Preparation wizard page, click Training tasks.
  2. On the Training tasks wizard page, click New task.
  3. On the New task page, configure the following parameters.
    Section Parameter Description
    Basic information Task name The task name must be 1 to 30 characters in length and can contain underscores (_) and hyphens (-). It must start with a letter or digit.
    Description The description of the task, which helps distinguish different tasks.
    Dataset Select dataset Select an image dataset for model training.
    Algorithm and training Select algorithm OCR (High Performance): provides fast prediction services while balancing the inference performance of the server on the cloud and clients.
    Resource configuration Specify the Number of GPUs and GPU type for the training task.
  4. Click Start training.

Step 4: View training details

  1. On the Training tasks wizard page, click Training details in the Operation column.
  2. On the Training details page, you can perform the following operations.
    Operation Description
    View training progress On the Training progress tab, view the training progress and information in the Basic information section.OCR training details
    Terminate task On the Training process tab, click Terminate task.
    View node information
    1. On the Training process tab, click a node.
    2. On the Node Information page that appears, view the status of the node and the information in the Basic information and Step information sections.
    View training logs
    1. On the Training process tab, click a node.
    2. On the Node Information page, click the log tab.

Step 5: Generate a mini program to test the model

  1. On the Training details page, click Model and deploy.
  2. On the Model and deploy wizard page, scan the QR code by using the Alipay app.Test the OCR model
    The values of the following model metrics are calculated based on a validation set. A validation set is a portion of the training set. By default, 10% of the training data is extracted and used as a validation set.
    • loss: calculates the loss between ground truth and the predicted value by using the loss function. A lower loss indicates a more accurate OCR model.
    • model_size: obtains the model size based on optimization methods such as training, quantization, and encoding.
  3. Use the mini program to scan objects to test how the model recognizes characters in real time.

Step 6: Deploy the model

  1. On the Model and deploy wizard page, click Go to PAI-EAS deployment.
  2. On the Resources And Models page that appears, Select a Resources Type, and click Next.
  3. On the Deployment details and confirmation page, enter a name in the Custom Model Name field.
  4. In the Number Of Instances and Quota fields, click Up arrow or Down arrow to adjust the number of resources.
  5. Click Deploy.
    Visit the Elastic Algorithm Service page. When the status of the model changes to Running in the State column, the model is deployed.
  6. Call the model.
    Make an API call
    • HTTP method: POST.
    • Request URL: After the model is deployed on the server, a public endpoint is automatically generated. To view the Access address and Token, perform the following steps:
      1. On the Elastic Algorithm Service page, click Invoke Intro in the Service Method column.
      2. On the Invoke Intro page, click the Public Network Invoke tab to view the Access address and Token.
    • Request body
            "body": "Base64-encoded data"
      Parameter Required Type Description
      name No string N/A
      type No string The type of the data. The default type is stream and cannot be changed.
      body Yes string The image data. The data is encoded by using Base64. Images in JPG, PNG, and BMP formats are supported.
    • Response parameters
      Parameter Type Description
      success bool Specifies whether the call is successful.
      result object The returned result.
      output array The recognition result that is returned in an array.
      label string The recognition category.
      conf number The confidence level.
      pos array The relative coordinates (x,y) of a detection frame are stored in the order of upper left, upper right, lower right, and lower left.
      meta object The image information.
      height number The height of the image.
      width number The width of the image.
    • Error codes
      Error code Error message Description
      1001 INPUT_FORMAT_ERROR The error message returned because the input format is invalid. For example, a required parameter is missing. Check whether the input format is valid.
      1002 IMAGE_DECODE_ERROR The error message returned because image decoding is failed. The image is not in the JPG, PNG, or other supported formats. Check the image format.
      2001 UNKNOWN_ERROR The error message returned because an internal server error has occurred.
      2002 GET_INSTANCE_ERROR The error message returned because the system failed to obtain the instance. This error may occur due to insufficient resources. Increase resources such as the number of CPU cores and memory size.
      2003 MODEL_FORWARD_ERROR The error message returned because a model inference failure occurred. This error is caused by an internal server error.
    • Sample requests
      curl http://******** -H 'Authorization:****==' -d '{"dataArray": [{"body": "****", "type": "stream", "name": "image"}]}'
      Replace the URL, token, and Base64 encoding information in this example with the actual values.
    • Sample responses
    • Sample error responses
      If a request error occurs, the response contains the following parameters:
      • errorCode: the error code.
      • errorMsg: the error message.
      For example, when dataArray is missing in the input parameter, the following response is returned: