By Jinzhong and Minchao from F(x) Team of Taobao Technology Department, Alibaba Group
Layout restoration is the partial core of the entire Design-to-Code (D2C) process. In the layout, imgcook converts design layers into a reasonable layout structure through a set of layout algorithms. By doing so, imgcook generates a more development-friendly tree with the hierarchical relationship, such as the Document Object Model (DOM) structure of web pages or Extensible Markup Language (XML) descriptors developed by Native.
Before the layout, the plug-in structuralizes the entire visual draft and generates a flat JSON that records the absolute position, size, and style of each element. By using this flat JSON, the entire visual draft can be accurately restored. However, in daily development, the relationship between components is more complex rather than flat with absolute positions. The relationship includes comprising and comprised relationship, repeated use of the same module, and different logical states of the same module, as shown in the figure blow. Therefore, the layout algorithm needs to be further upgraded to obtain the capability to support these complex layouts.
The Loop in "Double 11 Recommendation"
Multiple states in "Double 11 Rebates"
The following UI architecture diagram indicates that a page is divided into six layers from top to bottom. Designs only present information at the element level. In order to make the page structure generated by imgcook more consistent with the encoding logic of front-end developers, the layout algorithm needs to combine some elements to form components, blocks, or modules.
UI Architecture Diagram
The following solutions here can generate a structured schema.
Based on this framework, imgcook is committed to promoting the development of intelligent layout and making the generated code more in line with developer expectations and more friendly to developers.
The following chapters will focus on page splitting, loop, and multi-status. In the development of imgcook, a more intelligent process with higher-level capabilities will gradually be used instead of the previous process of manual writing and intervention. This process achieves greater scalability and generalization of the entire restoration procedure. However, since the intelligent model is not 100% accurate in extreme cases, lower-level capabilities will be retained to ensure the robustness of the main procedure.
Different levels of Intelligent Capabilities of Layout Restoration
When imgcook is used to restore the entire page, the page will be split into several modules for maintenance.
Users can manually split a page into specific modules in their design based on the design module protocol to enhance the intervention in page splitting. This process can be completed by adding the protocol #module:Name# to corresponding modules, as shown in the following figure.
The Result (Notice the component tree on the left where modules are identified)
In order to reduce manual intervention and improve the capability of automatic identification and segmentation, rules are used to match adjacent elements and determine whether these elements belong to the same submodule. The algorithm queries all adjacent rows and merges large block structures. The result is shown in the following figure.
The rule-based page splitting lacks generalization, which means that only the edges of texts and images can be identified as the same module. In the actual application process, the result is relatively not satisfactory. Therefore, the enhanced page splitting is developed, which intelligently splits modules through computer vision. CV is used to compare designs in pixels to improve the generalization of identification.
However, as shown in the preceding figure, although CV-based edge detection is more powerful than rule-based page splitting in identification and generalization capabilities, there are still some drawbacks of CV-based edge detection.
Loop layout is a layout mode commonly used in interface design. For example, the (card) list, navigation tab, and carousel widget all use the loop structure. In the code writing process, using loops properly can make the code structure more reasonable and greatly improve the efficiency. For example, in the case shown in the following figure, a complete list component can be obtained by simply implementing a sub-component and looping the sub-component vertically.
The following three phases are involved to generate a loop layout:
In the identification phase, loops are extracted from the schema by using algorithms and models. In the annotation phase, the elements in loops are screened and labeled with the serial number and the unique loop expression. In the generation phase, the logical base binds all the screened loops to loop variables and generates loop layouts in batches.
In this section, the identification phase, the first phase of the loop, will be highlighted.
The identification of the loop is located at the last part of the entire restoration process as shown in the following figure. When the layout algorithm is in the final stage, the nesting level relationship between elements has stabilized. At this time, the schema can be used for loop identification and calculation.
As a basic labeling capability, imgcook provides node labeling that can forcibly identify some elements as loops. Loop elements can be identified as loop sub-elements during restoration by simply prefixing loop elements with continuous #loop# labels in the design software, as shown in the following figure. A "5-loop node" can be obtained by generating code based on following labels.
Before identifying a loop, it is necessary to know why the loop layout exists. The loop layouts of the front end are largely related to corresponding abstract data structures on the server. In the e-commerce industry, especially in Mobile Taobao, most products are shown in lists or feed streams. The corresponding abstract data structure is ArrayList. Therefore, the front-end styles corresponding to similar data structures in the same module are also displayed in a loop.
After understanding the above background, it is easy to know that the loop layout generally has the following characteristics.
Take the common card-type products in the marketing domain as an example, as shown below. Each product card has a similar layout with a square head chart, a big font size title, descriptive text, and clear action points.
Therefore, the first version of the loop identification algorithm has launched. First, the algorithm traverses all parent elements and performs preliminary screening on all child nodes of parent elements. After the screening, a set of parent elements that may have internal loop structures is obtained. Next, the algorithm scans every sub-element of each parent element that may contain loop structures and calculates the differences. If the sub-elements under a parent element have almost the same layout and style, this parent element is most likely a loop layout. Finally, the algorithm annotates all loop elements and submits them to the business logical base for unified code generation.
Both level 1 and level 2 are manually defined node traversal algorithms based on rules. Rule-based methods certainly cannot process issues that do not conform to rules. For this reason, D2C has moved its attention to the currently booming AI field. It uses the feature extraction capability of deep learning to find layout features from massive data. By doing so, D2C implements device-to-device layout identification and reduces or even eliminates rules that are artificially defined.
After investigation, the Generative Adversarial Networks (GAN), a new data generation model, stands out in the AI field. The algorithm of intelligent layout identification has introduced GAN in the CV. It hopes that GAN can find out layout features and convert the styles of all elements in the same layout. The working principles and practical experience of GAN in the algorithm of intelligent layout identification are introduced in details below.
In 2014, the famous Google scientist Goodfellow proposed a heavyweight deep learning model called GAN. As an excellent image generation algorithm, GAN has become one of the most promising methods for unsupervised learning. GAN has a generator G (the generative model) and a discriminator D (the discriminative model). The former generates synthetic data that complies with the distribution of real data, while the latter determines whether data is real or synthetic. They compete with each other to enhance their own capabilities. As the Deep Convolutional Neural Networks (CNN) can fit any functions, it is often used in the design of the generator and discriminator. The framework of GAN is as follows.
In the preceding figure, the random noise z is the input of the generator G. G outputs synthetic data G(z) and the discriminator D determines whether the input data is real. If the input data of D is the real data x, D outputs D(x). If the input data is G(z), D outputs D(G(z)). The mathematical modeling of the loss function of GAN is as follows.
It can be seen from the above formula that GAN needs to enhance the discrimination capability of D, and the capability of G to generate data as real as possible. During the training process of GAN, Both D and G continue to compete with each other to improve their capabilities. They are in a dynamic balance. G can generate realistic synthetic data, while D cannot distinguish synthetic data from real data. That is to say, the probability of judging the input data as real data is basically 50%.
CGAN is an evolved version of GAN, which can generate synthetic data that conforms to the distribution of real data according to input conditions. It adds supervisory information based on the original GAN. Specifically, the traditional GAN learns the image y:G:z->y from the random vector z (i.e. noise). Different from traditional GAN, CGAN directly learns a mapping from the condition image, namely s:G(y,z)->s, in which y is the condition image, and s is the synthetic image generated by the generator.
Dataset creation: Convert the style for the same group of layouts with a large white area. The generated piece of training data is shown as follows.
Model training: The model is trained then by using the algorithm pix2pix and the loss changes of the generator and discriminator during the process are shown below. It can be seen that the losses have decreased. The training is completed.
Effect presentation: Select some images randomly from the test sets for testing. The results prove that the model is effective.
As shown in the preceding figures, each image is replaced by a white area, indicating that the layout grouping is successful.
Once the loop is identified by the layout algorithm, the loop information is labeled under the smart.repeat field. The information describes specific elements in the loop, the number of each element corresponding to the loop, and so on.
In the logical base, imgcook will map an array to the component in the form of Array.map. In the following example, a product list is generated circularly.
In addition to loop layout, multi-status is also a very important part of front-end coding. An element may have different presentation and action states in different conditions. For example, in the following case of the item card, the "Buy" button in the lower right corner has three different states for whether the item is available. The states are Temporarily Unavailable, Appoint Now, and Order Now, with a similar appearance, location, and layout. However, some differences still exist.
This front-end pattern is called multi-status, and imgcook is gradually strengthening its capability to identify multi-status.
In 2020, imgcook has supported some Level 1 and Level 2 capabilities of multi-status identification. These capabilities include:
In addition to algorithm identification, manual annotation is also provided to generate multi-status by manual intervention if the algorithm fails to accurately identify the target. Click Generate Element Multi-status or use Ctrl + Shift + M in the menu of imgcook to bind multiple elements to multiple statuses of one element. Therefore, the element can be recognized as a multi-status element by the layout algorithm.
The multi-status identification algorithm adopts the logic similar to that of the loop identification. The algorithm is located in the last layer of the procedure. It can extract different statuses of the same element, modify the style, and restore and bind the statuses in a unified manner.
The following figure shows the algorithm visualization.
Thus, multiple statues that an element has can be extracted from the visuals and merged together in a unified way. The frontend can control the presentation status by only specifying parameters of the corresponding status. By doing so, the R&D efficiency can be improved.
At present, the multi-status identification is developing towards Level 3, which uses models to identify the possible multi-status in the design.
Imgcook extracts similar elements in the design by using YOLO and analyzes the semantic correlation between these elements to determine whether they are multi-status elements. The main process shows as following.
After being identified by the layout algorithm, multi-status elements are labeled with multi-status information under the
smart.layerProtocol.multiStatus field. The information describes which elements are in a multi-status cluster and which status each element corresponds to.
In the logical base, imgcook maps the presentation conditions for each status to abstract logical data by condition field. After binding the condition field, users can preview the style and logic of modules in different statuses by switching different data. The example is shown in the following figure.
While continuously optimizing the layout algorithm, a system is needed to measure the degree to which the algorithm optimizes the entire restoration procedure. For example, how much will algorithm and model optimization really help the generated code? Do these optimizations make sense and do they really make R&D more efficient? To measure the optimizations, two solutions for layout maintainability evaluation have been released to evaluate the accuracy of layout restoration.
By using CV, the UI restoration evaluation compares the original image of the design with the rendered view of the layout restored schema after domain-specific language (DSL) coding. At the same time, the restoration evaluation calculates the restoration effect based on the visual similarity and the complexity of the DOM structure.
At present, the average score of 62,807 times of UI layout restorations is 92.1%. The average accuracy of Sketch design restoration is 92.45% and that of PSD design restoration is 88.21%. The main reason for the low accuracy is that the evaluation lacks the ability to automatically judge the width and height of non-box elements. For example, it fails to judge the proper width and height of WordArt and generates a wrong box size, which needs to be manually adjusted.
However, the UI restoration evaluation only measures the consistency between the rendered UI and the visuals. It cannot ensure the rationality of the code structure. For this reason, another way to evaluate the maintainability of layout structures is needed.
The layout maintainability evaluation compares the differences between the schema generated after the layout restoration and the saved schema after being modified in the editor. Then, by calculating the amount of user changes, the evaluation judges the restoration effect and availability.
Schema changes include changes of node, position, style, and attribute. Among these changes, the latter two are general capabilities and have little impact on the layout. The former two are the core capabilities of the layout algorithm. Therefore, when calculating the availability, the weights of the former two are higher than that of the latter two. After that, the overall availability can be calculated as long as the proportions of the changed parts are integrated.
The formula to calculate the maintainability of layout restoration has been defined as below.
From the formula above, the change ratio of each subtree and the overall change ratio can be calculated. This availability evaluation method can more clearly reflect the issues that users encounter when using imgcook, such as changes in the node name, style, and DOM structure.
(Panel for Viewing Layout Changes)
In addition to field binding, node attributes, and style changes, a large number of loop nodes are deleted before saving. However, these nodes are not deleted manually. They will be automatically deleted in the subsequent business logic generation phase after identifying loops in the layout restoration phase.
In the 2020 Double 11 promotion venue, the modules with loops accounted for 67.31% of the new modules, and the modules with loops identified in the layout restoration phase accounted for 43%. In the business logic generation phase, redundant loops were automatically deleted based on the layout identification result, and loop codes were automatically generated.
Alibaba F(x) Team - June 9, 2021
Alibaba F(x) Team - June 7, 2021
Alibaba F(x) Team - June 20, 2022
Alibaba F(x) Team - June 3, 2021
Alibaba F(x) Team - February 25, 2021
Alibaba Clouder - December 31, 2020
A high-quality personalized recommendation service for your applications.Learn More
This solution provides you with Artificial Intelligence services and allows you to build AI-powered, human-like, conversational, multilingual chatbots over omnichannel to quickly respond to your customers 24/7.Learn More
Log into an artificial intelligence for IT operations (AIOps) environment with an intelligent, all-in-one, and out-of-the-box log management solutionLearn More
Explore Web Hosting solutions that can power your personal website or empower your online business.Learn More
More Posts by Alibaba F(x) Team