Intelligently Generate Frontend Code from Design Files: Form and Table Recognition

By Tianke

As one of the four major technical directions of the Frontend Committee of Alibaba, the frontend intelligent project created tremendous value during the 2019 Double 11 Shopping Festival. The frontend intelligent project automatically generated 79.34% of the code for Taobao's and Tmall's new modules. During this period, the R&D team experienced a lot of difficulties and had many thoughts on how to solve them. In the series "Intelligently Generate Frontend Code from Design Files," we talk about the technologies and ideas behind the frontend intelligent project.

Overview

In the frontend intelligent field, especially in mid-end and backend intelligence, form and table recognition plays a significant role. The mid-end and backend development work mostly consists of form and table development. Therefore, productivity can be improved significantly if the code for forms and tables can be generated intelligently from design files within seconds. This article describes how to generate code for forms and tables.

D2C Recognition Capabilities: Technical Architecture

Technologies such as component recognition, layout algorithms, and material attribute recognition are used in form/table recognition.

The position of the material recognition layer in the overall architecture is shown in the following figure.

Fig: Technical architecture of D2C recognition capabilities

Demo: Generating Code for Forms and Tables

To generate code for general frontend components, such as forms and tables, you need to take a snapshot, paste it, and click the Recognize icon. The fields are translated into English if the field text's language is not English (for example, Chinese) and presented in lowerCamelCase.

Fig: Form recognition demonstration

Fig: Table recognition demonstration

Form and Table Recognition Technology: Mechanisms

Form and table recognition involves the following steps:

1) Detect all types of components and their coordinates using object detection technology.

2) Recognize all text fields and their coordinates using text recognition technology and translate them into English with auto-translation technology.

3) Extract various attributes such as the layout, label, type, and fields of the form or table from the component information obtained in Step 1 and the text information recognized in Step 2 using a code converter.

Object Detection + Text Recognition

We train the Faster RCNN Inception V2 model and make predictions with it in object detection. For more information, see Intelligently Generate Frontend Code from Design Files: Basic Component Recognition. A general text recognition technology is used during text recognition to detect text content and coordinates. This article does not describe the details.

In the following figure, the red boxes indicate the to-be-detected object, and the green boxes indicate the to-be-recognized text.

Code Converter

This article focuses on extracting various attributes from a form or table using a code converter.

Converting Absolute Coordinates into Rows and Columns

First, to facilitate information processing, the absolute coordinates of the to-be-detected object and the to-be-recognized text are converted into rows and columns. A specific data structure of rows and columns is a two-dimensional array in which the first dimension is columns and the second dimension is rows. The algorithmic logic includes the following steps:

Vertically sort all recognized boxes that have absolute coordinates.
Traverse the list of vertically sorted boxes, place boxes of the same row in an array, and horizontally sort them into a row.
Sequentially place all rows in an array to generate a two-dimensional array of rows and columns.

Determining the Layout of a Form and a Table

The layout of the form or table can be determined based on the preceding row and column information.

A form has a two-dimensional layout consistent with the row and column information obtained through component recognition. Therefore, the row and column information obtained through component recognition can be directly mapped to the form protocol in a nested loop. A table is one-dimensional, and the table's structure can be determined as long as the table header is determined. Therefore, we can directly use the first row of the two-dimensional array of rows and columns obtained through text recognition as table headers.

Calculating Values of Form and Table Fields

After we determine the layout of a form or table, we can calculate the field values and types.

The form fields' values are obtained by translating the labels into English and presenting them in lowerCamelCase. How to extract the labels? It is common sense that labels are located either on the left or top of form fields. Therefore, the algorithmic logic is first to extract the labels on the left of the form fields. If the operation fails, extract the labels from the top of the form fields.

In tables, the table field is the table header. The table header's values can be obtained after the first row is extracted from the two-dimensional array of rows and columns obtained through text recognition. For example, you can double-check whether the first row is the table header by performing length-based filtering to filter out rows that have less than three fields.

Determining Types of Form and Table Fields

A form may have various field types, such as input fields, checkboxes, and radio buttons. A table may have plaintext fields and links. Therefore, we must determine the types of form and table fields.

The types of form fields are obtained from the information obtained through object detection because one of the object-detection tasks is to extract the types of form fields.

The types of table fields are also obtained from the information obtained through object detection. The fields in each table column are of the same type. Therefore, only the type of the first field in each column needs to be obtained.

Calculating Other Attribute Information of Forms and Tables

Automatically extracting the above information significantly reduces the workload. Use the following methods to extract more attribute information:

Recursive recognition: After an input field is recognized, it can be extracted and passed to another model for recognition - for example, to verify whether it is disabled or required.
Extract text information at other locations: For example, there may be placeholders or default values.

Prospects

At this year's Apsara Conference, Daniel Zhang said that big data and computing power are the fuel and engine of the digital economy. At present, in the frontend industry, componentization is taking shape, and massive amounts of components can be used as big data. Moreover, the computing power in the industry is also constantly improving. We are going to witness how artificial intelligence technologies renovate the pattern of frontend development.

Community

Intelligently Generate Frontend Code from Design Files: Form and Table Recognition

Overview

D2C Recognition Capabilities: Technical Architecture

Demo: Generating Code for Forms and Tables

Form and Table Recognition Technology: Mechanisms

Object Detection + Text Recognition

Code Converter

Converting Absolute Coordinates into Rows and Columns

Determining the Layout of a Form and a Table

Calculating Values of Form and Table Fields

Determining Types of Form and Table Fields

Calculating Other Attribute Information of Forms and Tables

Prospects

Read previous post:

Read next post:

Alibaba F(x) Team

You may also like

Comments

Alibaba F(x) Team

Related Products

Big Data Consulting for Data Technology Solution

Big Data Consulting Services for Retail Solution

E-Commerce Solution

Black Friday Cloud Services Sale