Intelligently Generate Frontend Code from Design Files: Basic Component Recognition

By Boben

Background

In the Deisgn2Code (D2C) project, we usually extract metadata from design files using developer plugins of design tools such as Sketch and Photoshop. These plugins enable us to quickly extract native elements such as images, text, and shapes from design files. We then use these elements to build the frontend pages we need. However, many basic development components (such as forms, tables, and switches) are not supported by design tools. Although tools such as Sketch enable us to design the corresponding UI components flexibly, the DSL descriptions about UI components in Sketch are usually inaccurate. Therefore, we need to obtain accurate component descriptions in other ways. We believe that recognizing basic components through deep learning is one of the best.

Design layers in Sketch

D2C Recognition Capabilities: Technical Architecture

This article describes the basic component recognition capability in the material recognition layer of the frontend intelligent D2C project. This capability recognizes predefined basic components in images to help downstream technical systems optimize recognized layers' descriptions. For example, it helps a downstream technical system optimize layers' nested structure to generate a standard component tree and optimize layer description semantics.

Technical architecture of D2C recognition capabilities

Overall Solution

Similar to other algorithm engineering solutions, a D2C solution involves the entire process from sample collection to sample evaluation, model training, model evaluation, and model prediction.

Algorithm engineering pipeline

Sample Collection and Evaluation

A good sample set is essential for a successful model. The sample quality determines the upper limit of model quality. We can optimize the model to get close to this upper limit. We can collect samples from various sources.

The basic component recognition model described in this article rarely needs to recognize the context of components. Therefore, we generate samples by using code. We use UI class libraries commonly used in the industry to ensure sample quality during programming.

Before generating samples, we need to classify the components to be recognized. We must comply with the following principles when generating samples:

Generate as many sample types as possible with the same volume of each kind. Component properties and styles may affect the model's generalization capability. Therefore, we must randomize the general style settings of the component library, such as the background color, width, height, and fillet within a reasonable range in order to generalize the component styles.
Consider designs such as text-on-image and image matting in specific scenarios. For example, when layers overlap.
Pad blank edge areas for line-box components (for example, the input field) with several pixels to prevent the model from learning edge features.

The following figure shows a simple sample:

A sample for basic component recognition

After generating samples, we need to evaluate their quality. We can perform operations such as data verification and category statistics calculation to assess the overall sample quality:

Check the tagged data for errors. For example, calculate a variance based on the tagged areas and background colors.
Calculate the statistics about the distribution of data across all UI types and check whether each type's data volume is the same.

Model Selection

Based on a recent object-detection model report on PASCAL VOC datasets, we selected the YOLO One-Stage algorithm as the transfer learning algorithm for our basic component recognition model. This algorithm ranks first among the models listed in the report. We use this algorithm to do quick tests.

Data analysis results of object detection models

The You Only Look Once (YOLO) algorithm involves the following three steps:

1) Resize images to 416°¡416 (v3)

2) Learn the sample set's classification features by training a convolutional network

3) Perform non-max suppression during prediction to filter boxes

YOLO uses a separate CNN model for end-to-end object detection. Compared with two-stage object detectors (such as R-CNN), YOLO is faster in training and prediction because it is a single-stage detector. It splits each input image into an S°¡S grid and enables each cell of the grid to detect whether there is an object whose central point falls in the cell. Each cell predicts the object's bounding box and the bounding box's confidence score. The confidence score indicates the possibility of a bounding box containing the object and the bounding box's accuracy. YOLO integrates the prediction results of all cells to obtain the final prediction results. For more information about YOLO, visit the YOLO website.

During object detection for basic web components, we want the convolutional network to learn the features of each component type in the corresponding cell. Then the convolutional network will be able to differentiate between different types of components. Therefore, we must select component samples with different features to ensure that the convolution network will learn these differences.

Model Evaluation

When evaluating an object detection model, we use the mean average precision (mAP) to measure the accuracy of the model trained with a COmparing Continuous Optimisers (COCO) dataset. We can select the model prediction results of some test set data and compare the result with the ground truth of such data to calculate the average precision (AP) for each category.

Then we can draw a chart for categories with an Intersection over Union (IoU) higher than 0.5. As you can see from the chart, the detection precision for small objects is low because it was significantly affected by some text elements. In the future, we can enhance the detection performance for small targets when pre-processing the samples.

Model-evaluation result based on mAP

Model Prediction

We can optimize the training data that is used during model prediction to achieve better results. We can optimize the training data by resizing the images to a specific size during pre-processing. We use YOLO for transfer learning to do this. It resizes each input image to 416°¡416 for training and learning. The IoU result predicted by a model trained with resized images is much higher than that predicted by a model not trained with resized images (with a 10% better performance on the same test set).

We can use a model trained with resized images to predict the results directly based on the input images and get rough bounding boxes. However, if we want the bounding boxes of UI components to be precise, we must refine the bounding boxes in each image using OpenCV. The following figure shows the comparison of bounding boxes before and after refining.

Optimization of the model prediction result

Summary

Currently, the D2C project's basic component recognition capability supports recognizing more than 20 types of UI components. In the future, we will invest more R&D efforts in the classification of refined samples and the measurement and expression of basic components' properties. We will also standardize the management of model data samples for unified output. In the future, you can recognize some basic components and modify specific expressions based on our external open-source sample sets.

Community

Intelligently Generate Frontend Code from Design Files: Basic Component Recognition

Background

D2C Recognition Capabilities: Technical Architecture

Overall Solution

Sample Collection and Evaluation

Model Selection

Model Evaluation

Model Prediction

Summary

Read previous post:

Read next post:

Alibaba F(x) Team

You may also like

Comments

Alibaba F(x) Team

Related Products

Platform For AI

Epidemic Prediction Solution

Big Data Consulting for Data Technology Solution

Big Data Consulting Services for Retail Solution