×
Community Blog Intelligently Generate Frontend Code from Design Files: Field Binding

Intelligently Generate Frontend Code from Design Files: Field Binding

This article is a part of the article series discussing how to generate frontend code from design files intelligently.

By Xiaodi

As one of the four major technical directions of the Frontend Committee of Alibaba, the frontend intelligent project created tremendous value during the 2019 Double 11 Shopping Festival. The frontend intelligent project automatically generated 79.34% of the code for Taobao's and Tmall's new modules. During this period, the R&D team experienced a lot of difficulties and had many thoughts on how to solve them. In the series "Intelligently Generate Frontend Code from Design Files," we talk about the technologies and ideas behind the frontend intelligent project.

Overview

imgcook is an ingenious chef specializing in cooking with various images such as Sketch, Photoshop Document (PSD), and static images. With a single click, imgcook intelligently generates maintainable frontend code, including view code, field binding code, component code, and a part of business logic code, from different types of visual design files. As one of the many imgcook services, intelligent field binding can help accurately identify bindable data fields in visual design files in vertical fields such as marketing. This service dramatically improves module development efficiency and enhances the accuracy of transforming visual design files to code. This service is divided into the following parts: data type rules, static text recognition, image field binding, and text field binding.

Position in the Architecture Diagram of imgcook

As shown in the figure, the intelligent field binding layer is divided into the following parts: data type rules, static text recognition, image field binding, and text field binding.

1
Hierarchical architecture of Design to Code (D2C) technology

Pre-research

The intelligent field binding layer relies on the semantic layer, which marks a node's type as "what" based on empirical data. Then, the intelligent field binding layer transforms this "what" node into a business domain field. To improve accuracy, high-confidence rules are used as binding conditions. The analysis and optimization suggestions for existing problems are as follows:

  • The semantic layer and the field binding layer are associated too tightly.

    • Problem: The semantic layer and the field binding layer are too tightly associated, resulting in poor flexibility.
    • Optimization: Separate the semantic layer's judgment process from that of the field binding layer and remove the concept of confidence.
  • The semantic layer uses hard rules.

    • Problem: The semantic layer uses hard rules and focuses on judging whether these rules are met.
    • Optimization: Use classification algorithms for hard rules and qualitatively benchmark against the node standards established by W3C.
  • The semantic layer does not make full use of machine learning algorithms.

    • Problem: Machine learning algorithms are only used for entity recognition, syntax analysis, and translation.
    • Optimization: Use deep models for image classification and traditional machine learning algorithms for text classification.
  • Business domain fields frequently change.

    • Problem: The mapped fields vary based on different business domains.
    • Optimization: Enable different configurations to bind mappings intelligently.
  • The number of hard rules needs to be increased.

    • Problem: The number of existing hard rules is not enough.
    • Optimization: Create new rules based on design files to expand the rule layer.

Technical Solution

Field binding uses Natural Language Processing (NLP)-based text recognition and image classification to recognize the content in the virtual DOM (VDOM) to determine the fields mapped to the data model, implementing intelligent field binding. The following figure shows the core flowchart of field binding.

2
Core flowchart of field binding

The core of field binding is NLP-based text recognition and image classification models, which are described in detail below.

NLP-based Text Recognition

Research

Classification and Analysis of All Dynamic Texts in Taobao's Design Files

We will use the following examples to illustrate the relationship between common fields in the business domain and texts in design files.

Product Title (itemTitle)

  • Design file text: Upto16characters
  • Real intent text: NikeAF1JESTERXX

3
Product title design file

Store Name (shopName)

  • Design file text: Upto16characters
  • Real intent text: ZARA Clothing

4
Store name design file

Store Description (shopDesc)

  • Design file text: Upto16characters
  • Real intent text: Up to 50% off

5
Store description design file

Technology Selection

Naive Bayes

One of the problems we have with NLP-based recognition in field binding is that we do not have enough samples. In particular, we rely on tenants uploading their own samples to train models for their specific business. However, tenants often do not have a large amount of data. In this case, we can use the Naive Bayes classifier for classification because the Naive Bayes formula originates from classical mathematics. Naive Bayes's posterior probability is derived from prior probability and adjustment factors and does not depend on the amount of data. Naive Bayes is robust for handling small data sets and noise. The Bayes' theorem is expressed in the following formula:

6

It is much easier to understand if we express the formula in the following form:

7

We just need to calculate P(category|feature).

Word Segmentation

Before classifying each sample, we need to extract its features, which means to segment the sample. On the AI machine learning platform, we use Alibaba Word Segmenter (AliWS) by default. AliWS is a lexical analysis system that is widely used in various product lines of Alibaba Group. AliWS provides the following features: ambiguity segmentation, multi-granularity segmentation, named entity recognition (NER), part-of-speech (POS) tagging, and semantic tagging. You can maintain your own dictionaries and handle or correct word segmentation errors. In our project, NER applies to simple entities, phone numbers, time, and dates.

Model Construction

We use the machine learning platform for rapid model construction. The machine learning platform encapsulates the word segmentation algorithm of AliWS and the multi-class classification of Naive Bayes. The following figure shows the process of model construction.

8
Process of training a text NLP model

As the figure shows, the first step is to run an SQL script to pull training samples from the database and then perform word segmentation on the samples. After that, the system proportionally splits the samples into a training set and a test set. The system uses a Naive Bayes classifier to classify the samples in the training set and then uses the test set for prediction and evaluation based on the classification results. Finally, the results are uploaded to Object Storage Service (OSS) using the odpscmd command.

Image Classification Model

Research

We find the following eight common types of images in various business scenarios: label, icon, avatar, store logo, scenario image, white background image, atmosphere image, and Gaussian blurred image.

  • Label: The label is a small rectangular image with a solid color background and short, white text on it.
  • Icon: Most icons are round. Usually, they are abstract symbols.
  • Avatar: The avatar is usually round with a face in the center.
  • Store logo: The logo is usually used to highlight a concept and is designed with text or abstract pictures.
  • Scenario image: The most common type of product images, with multiple objects and elements in it. It displays how the product or the model looks in the real environment.
  • White background image (purePicture): It features a single object with a pure white background. It highlights the object.
  • Atmosphere image (pureBackground): It is a background image with a noticeable color palette and shapes.
  • Gaussian blurred image (blurBackground): It is a blurred background image using the Gaussian filter.

In the previous model, we use some rules to recognize images based on information such as image size and position. However, recognition in this mode may be inaccurate and less flexible. For example, the icon sizes may vary in different business scenarios, and the information such as image position is of high uncertainty. Meanwhile, analysis shows that it is sufficient to distinguish images based on these categories mentioned above. Therefore, we consider using a deep learning model for image classification recognition.

Technology Selection

CNN

For image classification, our first choice is to use Convolutional Neural Network (CNN), the most popular algorithm for image processing. CNN's idea is based on human vision principles. Compared to traditional neural networks, image analysis using the convolution kernel greatly reduces the number of parameters to be trained. In addition, CNN exhibits an overwhelming advantage in feature extraction over traditional machine learning models.

So, how CNN is implemented? Before we understand the principles of CNN, let's take a look at human vision principles.

Human Vision Principles

  • Many research results in deep learning are inseparable from the study of cognitive principles, especially visual principles.
  • In 1981, the Nobel Prize for Medicine or Physiology was awarded to David Hubel, Torsten Wiesel, and Roger Sperry. The first two's main contributions related to "their discoveries concerning information processing in the visual system." They discovered that the visual cortex is hierarchical.

The following figure shows human vision principles.

9
Human vision principles

10
Example of facial recognition in the human brain (image sourced from the internet)

The human visual system also recognizes different objects hierarchically.

11
The human visual system recognizes different objects layer by layer (image sourced from the internet)

We can see that the features at the bottom layer are similar, mostly being edges. The higher it goes, the more features are extracted from objects, such as wheels, eyes, and torso. When it reaches the top layer, different advanced features eventually combine and form corresponding images, so human beings can accurately distinguish different objects.

It makes us wonder whether we can simulate this human brain's feature by constructing a multi-layer neural network. Can we recognize primary image features at lower layers, combine several primary features to form a more advanced feature at a higher layer, and finally make a classification at the top layer through the combination of multiple layers? The answer is yes, and this discovery also inspired many deep learning algorithms, including CNN.

Basic Principles of CNN

A CNN includes the input layer, the hidden layer, and the output layer. Under the hidden layer, it includes the convolution layer, the pooling layer, and the fully connected layer. Respectively, the convolution layer extracts features from the input data. The pooling layer is used for downsampling parameters through dimension reduction and to avoid overfitting. The fully connected layer is similar to a traditional neural network and is used to produce the desired results.

12
Basic principles of CNN

Transfer Learning

Since our images are mostly obtained from the internal network with limited computing resources, we need to choose an ideal training method for limited data and computing resources. Therefore, we choose to train the model through transfer learning. Transfer learning is a technique to retrain based on other trained models. Models such as VGG, ResNet, or MobileNet, which are trained based on datasets such as ImageNet after many calculations, are already capable of extracting image features and outputting information. Based on predecessors' achievements, we only need to adjust the model to train our data, which greatly reduces the training cost.

ResNet

In our project, we consider using ResNet for transfer learning. ResNet's most fundamental motivation is to address the degradation problem, that is, accuracy degrades when model depth increases. This is due to the increasing optimization difficulty caused by the deepening network depth dependence. The residual network is connected by shortcuts. When the network performs back-propagation, the gradient will never vanish, which solves the problems caused by deep networks.

TensorFlow and Machine Learning Platform

The machine learning platform provides comprehensive services for traditional machine learning and deep learning, from data processing, model training, service deployment to prediction. The machine learning platform supports the TensorFlow framework at the underlying layer as well as CPU/GPU hybrid scheduling and efficient resource reuse. We leverage the machine learning platform's computing power and GPU resources for training and deploy the inference model to EAS, the machine learning platform's online prediction service.

Model Construction

Data Preprocessing

  • Data cleansing: We crawl about 1,000 images of each category from the promotion table in MaxCompute. However, many of these images may contain invalid data, missing data, or even incorrect data because merchants upload them. For example, when we process these images, we find that many white background images are mixed with product images. We need to clean up this data first.
  • Manual sampling: Among common categories, we find it difficult to crawl atmosphere images as samples. Meanwhile, these samples have distinctive features, so we create samples based on these features. We have manually created about 1,000 samples using node-canvas. In addition, the Gaussian blur type of images are usually blurred product images. Therefore, we use OpenCV to blur the crawled product images with the Gaussian filter to obtain the samples.
  • Data enhancement: Because our scenario is unique, we cannot adopt traditional data enhancement methods, such as Gaussian blur (because one type of the image samples is Gaussian blurred images). We have created several simple data enhancement methods such as displacement and slight rotation.
  • TFRecord transformation: TFRecord is a data storage format officially designed and recommended by TensorFlow. Each TFRecord contains multiple TFExample files. Each TFExample corresponds to a set of data (x, y). TFExample is an official data framework serialization format Google developed, similar to JavaScript serialization output of JSON or Python serialization output of Pickle. Protobuf is smaller, faster, and more efficient and is everywhere in the Tensorflow source code. The following snippet from our code shows how we create a TFExample for a dataset.

We create three features that will be used later in training. image/encoded is the bytes stream of the image. label is the category of the classification. image/format is the image type, which will be used in the slim.tfexample_decoder.Image function for parsing TFRecords.

Model Construction

Establish a Transfer Training Model

TF-Slim is a lightweight high-level API of TensorFlow for defining, training, and evaluating models. TF-Slim provides many famous pre-trained CNN models. You can download the model parameters from GitHub and call the methods in the tensorflow.contrib.slim.net to load the model structure. The predict function is defined as follows. This function defines the parts that go through the model in the data flow diagram during training and prediction. Note that the pre-training model only provides the convolution layer implementation. To solve our classification problems, we need to flatten the output convolution result and add a fully connected layer and the softmax function for prediction.

Train the Model

We use the assign_from_checkpoint_fn function provided by TF-Slim to load the mobileNet pre-trained model parameters that we downloaded. Then, we use the previously defined data flow diagram to train and generate the checkpoints and related logs during the training.

Use the Model for Prediction

When training the model, we save the training data periodically by using tf.train.Saver of TensorFlow. TensorFlow generates the following four types of files:

  • .meta file: saves graphs of model data
  • .ckpt.data file: saves information about model variables, such as the weights and biases
  • .ckpt.index file: describes the corresponding key and value of the tensor
  • .checkpoint: saves the models and model-related information

In fact, we can see that a lot of information is generated when the model is saved. Much of the information is not needed for prediction. Therefore, we need to optimize the exported records to achieve high-performance predictions.

First, we freeze the saved model. When we freeze a TensorFlow model, it saves the computational diagram and the model weights into a single file. It retains only the parts required for prediction in the computational graph, removing the training-related parts. The following code snippet shows how we convert all the variables in the computational diagram to constants.

Solution

In different business scenarios, we may use different data models, and the intelligently identifiable and bindable fields are also different. Considering this, we have developed an online model training service that allows us to add categories and configure samples online. We use the service through configurations.

Flowchart

13
Product flowchart

Automatic field binding effect:

14
Field binding effect

Follow-up Planning

  • After analyzing the static texts in different scenarios, we conclude that the diversity of static texts in various designs and modules affects field binding accuracy, resulting in recognition of unexpected results. Moving forward, we will focus on sorting out the static texts of different business scenarios and filter the static texts before recognition. We will also improve the general configuration and recognition capabilities.
  • Optimize NLP and image classification recognition models. Recall and analyze inaccurate recognition results. Optimize the recognition module and process based on the analysis to improve the binding accuracy.
  • Roll out the implementation of standardized fields.
0 0 0
Share on

Alibaba F(x) Team

22 posts | 0 followers

You may also like

Comments

Alibaba F(x) Team

22 posts | 0 followers

Related Products