By Miaojing and Boben
Being one of the four major technical directions of the Frontend Committee of Alibaba, people may wonder what frontend development has to do with AI, how to achieve frontend development with AI, and whether this heavily impacts the whole industry.
Based on the theme to generate code automatically from design documents, this article analyzes these topics from the perspective of background analysis, competitive product analysis, and problem resolution.
Machine learning is trending in the industry, and AI has become the consensus for the future. Kai-Fu Lee also pointed out in "AI future" that artificial intelligence will replace nearly 50% of human work within 15 years, especially simple and repetitive tasks. Moreover, white-collar employees' work will become easier to replace than that of blue-collar workers since the work of blue-collar workers may need breakthroughs in robotics and related technologies in both software and hardware. However, only technological breakthroughs in software can replace white-collar workers. Will our frontend "white-collar" work be replaced? When and how much will be replaced?
Looking back to 2010, software affected almost all industries, making the whole software industry prosperous in recent years. But in 2019, AI affected the software industry itself. In the DBA field, for instance, Question-to-SQL can generate SQL statements automatically when you ask questions in a field. Meanwhile, TabNine, a machine learning powered source code analysis tool, assists in code generation. Moreover, an intelligent designer, "Luban" was launched in the designer industry. What about the frontend development?
We have to mention a familiar question: How to generate code automatically from a design document (Design2Code, referred to as D2C). The Frontend Committee of Alibaba focuses on the direction of intelligence, and the current stage is to improve web development efficiency. We will try to put an end to simple and repetitive work, enabling web developers to focus on more challenging work.
In 2017, Pix2Code, a paper about image to code, attracted the industry's attention. It describes generating source code directly from the design image with deep learning. Subsequently, similar ideas based on this idea regularly emerged in the community. For instance, Microsoft AI Lab launched Sketch2Code in 2018, an open source tool for converting sketch into code. At the end of the same year, Yotako drew people's attention as the platform to transfer design drafts to code. As such, machine learning has officially attracted frontend developers.
Based on the analysis of competitive products, we can get the following inspirations:
1) Currently, the object detection capability of deep learning in images is suitable for reusable material identification (module identification, basic component identification, and business component identification) with larger granularity.
2) The complete end-to-end model that generates code directly from images is highly complex, and the generated code is unreliable. We need several sub-networks to work together in order to achieve higher quality.
3) When the model cannot provide the expected accuracy, the design document's hard rule intervention can be used. On the one hand, the manual intervention ability can help users get the desired results. On the other hand, these manual rule protocols are also high-quality samples, which can be used as training samples to optimize the model's recognition accuracy.
The goal of generating code from the design document is to enable web developers to improve work efficiency and eliminate repetitive work. The general daily workflow is as follows for regular frontend developers, especially client-side developers.
The general workload of web development mainly focuses on view code, logical code, and frontend/backend integration. Next, we break down the goals and analyze them one by one.
In view code development, HTML and CSS code is generally written based on a design document. How to improve efficiency here? When facing the repetitive work of UI view development, it is natural to think about solutions like packaging and reusing materials, such as components and modules. Based on this solution, various UI libraries were precipitated. There are even higher-level encapsulations that are the platforms to build websites visually. However, reused materials cannot cover all scenarios. There are a lot of business scenarios needing personalized views. Facing the problem itself, is it possible to generate reliable HTML and CSS code directly?
To sum up, we are facing the following problems:
We are building an expert rule system related to the layout algorithm. Yes, this part is more suitable for the rule system at the current stage. For users, the layout algorithm needs to be close to 100% availability. In addition, most of the problems involved here are the combination of numerous attributes and values. Currently, rules are more controllable.
However, when it's hard to use rules to solve some problems, we can use models to solve the problems. For instance, we come across some cases where we need to recognize groups and loop. In the meantime, web developers often use existing UI libraries to build the UI interface, so it's important to recognize base components in the design documents. For these problems, we use Pipcook to build an object detection pipeline to train our models and achieve the goals. Moreover, context semantic recognition across elements is required. This is also the key problem being solved by deep learning. For example, if we want to recognize what the image means in the design draft or why some text corpus was used in some places, we need image classification and text classification models, which are also built from Pipcook based on tfjs-node.
Usually, web development also includes logic code, including data binding, dynamic, and business logic codes. The improvable part is to reuse dynamic effect and business logic code, which can be abstracted as basic components.
The ideal plan is to learn historical data like other artistic fields such as poetry, painting, and music. According to PRD's input, the new logic code can be generated directly. But can the generated code run directly without errors?
At present, although AI is being developed rapidly, the problems it can solve are still limited. It is necessary to define problems as problem types it solves well. Reinforcement learning is good at strategy optimization, and deep learning is better at computer vision, classification, and object detection.
For business logic code, the first thing that comes to mind is to use LSTM (Long short-term memory network), which in terms of NLP is to obtain the semantics of function blocks. VS Code intelligent code reminder and TabNine are using this strategy.
In addition, we found that intelligence can also help identify the location (timing) of logical points in the view and guess the logical semantics based on the view.
Let's summarize the advantages of intelligence at this stage:
Therefore, in the current business logic generation, solvable problems are relatively limited. Especially when new business logic points appear with new logic orchestration, these references are all in the PRD or mind of PD. Therefore, the current strategies are as follows for the business logic generation scheme:
We have described the strategies to generate HTML + CSS + part of JS + part of data intelligently from the above analysis. This is the primary process of D2C (Design2Code). The product we developed from this idea is imgcook. In recent years, with the maturity of third-party plugins of popular design tools (Sketch, PS, XD, etc.), the rapid development of deep learning even outperforms human recognition capabilities. This is the vital background for D2C's birth and continuous evolution.
Object detection 2014–2019 paper
Based on the general analysis of the frontend intelligent development mentioned above, we have made an overview and architecture of the existing D2C intelligent technology system, which is mainly divided into the following three parts:
Summary layering of frontend intelligent D2C capabilities
We use the same Data Protocol Specification (D2C Schema) to connect different parts of the architecture shown above in the whole project. This ensures that the recognition can be mapped to the specific corresponding fields, and the code can be correctly generated through schemes such as the code generation engine in the expression layer.
In the entire D2C project, the core is the recognition capability part. The specific decomposition of this layer is as follows. The subsequent series of articles will focus on these subdivided layers.
Technology layering of D2C identification ability
Of course, incomplete recognition and low recognition accuracy have always been a major topic of D2C, and it is also our core technical point. We try to analyze the factors that cause this problem from these perspectives:
1) Problem definition is inaccurate: Inaccurate problem definition is the primary factor affecting model recognition's inaccuracy. Many people think that samples and models are the main factors. But before that, there may be problems with the problem definition in the beginning. We need to judge whether our model is suitable for the problem, and if so, how to define the rules clearly.
2) Lack of high-quality dataset: The intelligent recognition capability of each layer depends on different datasets. How many frontend development scenarios can our samples cover? How is the data quality of each scenario? Are the data standards uniform? Is the feature engineering processing unified? Does the sample have ambiguity? How is interconnectivity? These are the problems we are facing now.
3) Low model recall and misjudgment: We often pile up many different kinds of samples in different scenarios as training, hoping to solve all identification problems through one model. However, this often leads to a low recall rate of the model's partial classification, and misjudgment also exists for some classification with ambiguity.
At present, the computer vision models in deep learning are more suitable for solving classification and object detection problems. The premise for us to judge whether the deep model should be used for a recognition problem is whether we can judge and understand the problem by ourselves, whether this kind of problem has ambiguity, and so on. And if we cannot judge accurately, then this recognition problem may not be appropriate.
If the judgment is suitable for deep learning classification, you need to continue defining all the classifications, which need to be rigorous, exclusive, and can be enumerated completely. For example, when doing the semantic proposition of images, what are the common class names of common images? For example, the analysis process is as follows:
There are many such problems in D2C projects. The problem definition itself needs to be very accurate and scientific reference based, which is relatively challenging because there is no precedent for reference. You can only use the known experience to try it first and fix it after the user tests have problems. This is a pain point that requires continuous iteration and continuous improvement.
To improve sample quality, we need to establish standard specifications for these datasets, build multi-dimensional datasets in different scenarios, and uniformly process and provide the collected data. It is expected to establish a set of standardized data systems.
We are using Pipcook's standard data format. We provide a unified sample evaluation tool for different problems (classification and object detection) to evaluate each dataset's quality. For some specific models, feature engineering with better effect (normalization, edge amplification, etc.) can be adopted, and samples of similar problems are also expected to be able to circulate and compare in different models in the future in order to evaluate the accuracy and efficiency of different models.
Data sample engineering system
We try to summarize scenarios to improve accuracy for model recall and misjudgment. The samples in different scenarios often have some similar features or some key features that affect local feature points, resulting in misjudgment. This results in a low recall rate. We expect that we can identify models by converging scenarios to improve model accuracy. We converge the scenario to the following three scenarios: wireless client-side marketing scenario, mini-app scenario, and PC scenario. The modes of these scenes have their own characteristics. Designing different recognition models for each scene can efficiently improve the recognition accuracy of a single scene.
Since a deep model is used, a more realistic problem is that the model cannot identify data other than the features learned in the training sample. And the accuracy rate cannot be 100% satisfactory to the user. Besides the samples, what can we do?
In the entire process of D2C, we also follow a methodology for identifying models. That is, designing a set of protocols or rules that can cover cases where deep learning gives wrong results. This ensures that users can still fulfill their demands when the model recognition is not accurate: Manual convention > rule policy > machine learning > deep learning. For example, you need to identify a loop in the design draft:
Among them, the manually agreed design document agreement resolution has the highest priority. This can ensure that subsequent processes are not disturbed by blocking and error recognition.
After nearly two years of optimization, the first closed-loop development of the marketing module uses D2C. This includes module creation, view code generation, logical code generation, writing supplementary logical code, and debugging.
In the Double 11 scene, it covers the new modules of Tmall and Taobao, including various scenarios. 31 modules are supported. About 79.34% of the code is generated by D2C, including the automatic generation of view code and some logic code. 98% of simple modules are generated automatically. The main reasons for manual changes to the code are new business logic, animations, field binding recognition errors, and loop recognition errors. These issues also need to be gradually improved.
D2C code generation user changes
As of 09 Nov 2019, the data is as follows:
Currently, the service available are as follows:
In the future, we hope that through the frontend co-construction project, we will use the collective strength to make the frontend intelligent technology solutions inclusive and deposit more competitive samples and models, providing higher accuracy and more availability of services. We hope to reduce repetitive and straightforward work and help web developers focus on more challenging work.
Alibaba F(x) Team - February 25, 2021
Alibaba Clouder - December 31, 2020
Alibaba F(x) Team - March 3, 2021
Alibaba F(x) Team - February 23, 2021
Alibaba F(x) Team - December 30, 2020
Alibaba F(x) Team - December 7, 2020
An end-to-end platform that provides various machine learning algorithms to meet your data mining and analysis requirements.Learn More
ET Brain is Alibaba Cloud’s ultra-intelligent AI Platform for solving complex business and social problemsLearn More
This technology can be used to predict the spread of COVID-19 and help decision makers evaluate the impact of various prevention and control measures on the development of the epidemic.Learn More
Explore Web Hosting solutions that can power your personal website or empower your online business.Learn More
More Posts by Alibaba F(x) Team