Community Blog The Way to Train a Form Recognition Model on the Frontend Quickly

The Way to Train a Form Recognition Model on the Frontend Quickly

This article uses Pipcook to train a form recognition model and uses this model to improve the efficiency of form development.

By Tianke from F(x) Team

This article will use Pipcook 1.0 to quickly train a form recognition model and use this model to improve the efficiency of form development.

Pain Point

You may encounter a pain point in the process of restoring web pages on the frontend. The designer designs a form in the design draft, and you copy and modify the code into a similar form that you can find in Ant Design or Fusion. It is inefficient and troublesome.


Can it be faster? Can the form code be generated by screenshots? The answer is Yes.


You can train a target detection model. The input of the model is screenshots, and the output is the types and coordinates of all form items. Therefore, you can get all the form items in it by taking a screenshot of the form in the design draft. The form code can be generated after combining them with labels generated by text recognition. For example, I have implemented the function of generating form code by screenshots.


In this figure, the red boxes are the form items detected by the target detection model, and the green boxes are the text recognized by the text recognition API. After some calculation, the form protocol or code can be generated.

Text recognition is universal, so I will not introduce it. However, how is the function of form item detection implemented? The following part introduces the overall steps:

  • Sample: Collect thousands of form pictures and label the form items
  • Training: Feed samples into machines for learning
  • Prediction: After training, it sends a new form picture to the model. The model can predict the labels.

The following part describes each step in detail.


Here, the form recognition samples are universal target detection samples. For the labeling method, please see the previous section. A dataset of form recognition samples is provided for convenience:



Next, I will show you how to use Pipcook to run the sample pages to generate a large number of samples and train a target detection model.

An Introduction to Pipcook

Pipcook is a machine learning application framework developed by the D2C Team of Tao Technology for frontend developers. We hope Pipcook will become a platform for frontend engineers to learn and practice machine learning and promote frontend intelligence. Pipcook is an open-source framework. You are welcome to build it together with us.


Make sure that your node version is 12 or later. Then:

// Install cnpm for acceleration
npm i @pipcook/pipcook-cli cnpm -g --registry=https://registry.npm.taobao.org

The initialization is next:

pipcook init --tuna -c cnpm
pipcook daemon start


Form recognition is a target detection task, so you can create a new configuration file in .json format. Don't worry. You do not need to modify most parameters in this configuration file.


  "plugins": {
    "dataCollect": {
      "package": "@pipcook/plugins-object-detection-pascalvoc-data-collect",
      "params": {
        "url": "http://ai-sample.oss-cn-hangzhou.aliyuncs.com/pipcook/datasets/mid/mid_base.zip"
    "dataAccess": {
      "package": "@pipcook/plugins-coco-data-access"
    "modelDefine": {
      "package": "@pipcook/plugins-detectron-fasterrcnn-model-define"
    "modelTrain": {
      "package": "@pipcook/plugins-detectron-model-train",
      "params": {
        "steps": 20000
    "modelEvaluate": {
      "package": "@pipcook/plugins-detectron-model-evaluate"

You need to set up parameters in dataCollect.params:

  • url: Your sample address

You can also run this configuration file directly to train a form detection model.


The target detection model requires a large amount of computing, so you may need a GPU machine. Otherwise, the training will take several weeks.

pipcook run form.json --tuna

The training time may be a bit long, so go to lunch or write some business code.

After the training is completed, a model is generated and stored in the output directory.


After the training is completed, the output is generated in the current directory. This is a brand new npm package. First, install the dependency:

cd output
// BOA_TUNA = 1  It is mainly for acceleration in China
BOA_TUNA=1 npm install

After the installation, go back to the root directory, download a test image, and name it test.jpg.

cd ..

curl https://img.alicdn.com/tfs/TB1bWO6b7Y2gK0jSZFgXXc5OFXa-1570-522.jpg --output test.jpg

Finally, we can begin to predict:

const predict = require('./output');
(async () => {
  const v1 = await predict('./test.jpg');
  // {
  //   boxes: [
  //       [83, 31, 146, 71],  // xmin, ymin, xmax, ymax
  //     [210, 48, 256, 78],
  //     [403, 30, 653, 72],
  //     [717, 41, 966, 83]
  //   ],
  //   classes: [
  //       0, 1, 2, 2  // class index
  //   ],
  //   scores: [
  //       0.95, 0.93, 0.96, 0.99 // scores
  //   ]
  // }

Note: The result consists of three parts:

  • Boxes: This property is an array, and each element is another array containing four elements, xmin, xmax, ymin, and ymax.
  • Scores: This property is an array, and each element is the confidence coefficient of the corresponding prediction result.
  • Classes: This property is an array, and each element is the corresponding predicted category.

Visualized boxes, scores, and classes:


0 0 0
Share on

Alibaba F(x) Team

41 posts | 1 followers

You may also like


Alibaba F(x) Team

41 posts | 1 followers

Related Products