Practices of Intelligent Frontend in Alibaba Cloud's Big Data R&D Platforms

By Jifeng and Qinqi from DataWorks Team

While the AI trend is changing every aspect of our lives, AI technology is also reshaping the research and development environment of every technical position. As always, Alibaba stands in the frontline in this regard. Let's take Alibaba's frontend intelligence team as an example. It has developed Design to Code (D2C) solutions, such as Imgcook and Pipcook, a frontend algorithm engineering system, Code to Code (C2C) solutions, intelligent UI, and other capabilities. This article focuses on some practices of C2C solutions in Alibaba Cloud's Apsara Big Data Platform. Hopefully, the introduction of real-world applications can help you have a deeper understanding of frontend intelligence.

Background

As a proud product of Alibaba Cloud, Apsara Big Data Platform encompasses the best practices in Alibaba's big data development over about ten years. It serves tens of thousands of data and algorithm engineers every day, underpins 99% of the data business development tasks of Alibaba, and supports the building of big data systems in a wide range of fields, such as intelligent cities, digital government, power supply, finance, new retail, intelligent manufacturing, and smart agriculture. The following figure shows the development history, product architecture, and frontend pages concisely.

Challenges

The preceding figures show some features of the frontend pages of the Apsara platform:

Programming-Oriented: The Apsara platform has a large number of scenarios involving Web IDE and the editor to support programming activities of more than 70% of users daily.
Visualized Interaction: The platform visually presents a large amount of data and the orchestration of tasks.

Besides stability, the most important thing for a research and development platform is to improve the users' efficiency. Therefore, the top priority for the intelligent frontends of our products is to boost work efficiency.

Solutions

Given the business challenges, the practice of frontend intelligence solutions mainly engages two concerns:

Upgrading the intelligence level of product components
Developing unified algorithm engineering capabilities to ensure the continuous update and rapid deployment of algorithms

We have made a general intelligence development plan to address these issues:

Now, let's take a closer look at the intelligent editor, intelligent visualization, and algorithm engineering, respectively.

Intelligent Editor

We want to help developers explore data quickly, and the editor is one of the core tools in big data R&D. So, with the help of intelligent means, such as machine learning, we have armed the editor with more core capabilities, such as intelligent code recommendation and code diagnosis.

Intelligent Code Recommendation

This feature enables the editor to suggest context-based options when you are writing a piece of code. Once you select a recommended option, the editor makes up the rest lines that you are about to input automatically. This improves the coding efficiency significantly. With the help of intelligent algorithms, we have made an intelligent model for code recommendation based on the habits of most users. Combined with the language syntax rules, it can recommend the best syntactically correct code that fits the context. The algorithms used for code recommendation are generally Language Models. Some of the popular modeling algorithms include n-gram, LSTM, GPT, and CodeGPT (a programming language-based pre-trained model of GPT.) Since users have different code styles and coding habits, a general-purpose recommendation algorithm may not be the best solution. We have created a lightweight end-oriented recommendation model inspired by Taobao's "personalized search results" mechanism, which allows code training and recommendation on a specific end to fit personal habits.

Code Diagnosis

Code defects due to various causes have always been a headache for developers. If they can be found at an early stage, a lot of workforce and material resources can be saved. This is why code diagnosis was introduced. We have collected a large number of defect types and corresponding defective codes with the help of engine capabilities, various syntax rules, and code review information. After being trained with intelligent algorithms, the editor can detect code defects automatically. As to the training model, you may refer to a supervised learning model, which is typically a support vector machine (SVM).

Intelligent Visualization

Data lies at the very core of a big data platform, and data visualization is the best tool to bring out the characteristics and value of the data. It helps users discover the patterns of data quickly and visually, especially in data analysis scenarios.

Data Profiling

Data profiling is the process of examining the data available from an existing information source (e.g. a database or a file) and collecting statistics or informative summaries about that data. The purpose of these statistics may be to find out whether existing data can be used for other purposes easily. (Source: Wikipedia)

Data profiling is the process of obtaining statistics and summary information of data. The core task is to realize the automatic analysis of data types and features and automatic selection of charts. We use the Analyzer and Statistics modules of DataWizard to analyze types and basic features of fields. We recommend using the following classic decision-making chart, which provides chart recommendations based on comparison, distribution, composition, and relationships. We have encapsulated components of the entire profiling process, so you can use your profiling products.

In addition, we are training intelligent models based on data-chart mapping to identify data features and recommend charts intelligently.

Algorithm Engineering

Thanks to Alibaba Cloud Machine Learning Platform For AI (PAI), we have built a general model training, evaluation, and deployment path for those intelligent models mentioned above.

Model Training – We can train models in Notebook mode quickly using the interactive modeling capability of the PAI Data Science Workshop (DSW), including data loading, preprocessing, splitting of training, and testing sets. Then, users can complete their training process based on the well-developed model implementation feature of Tensorflow.

Model Evaluation – This is an important method to reflect the effect of a model. A model is intended to solve practical production problems, so the evaluation scheme of the model must also be in line with the definition of those problems to truly reflect its effect. Usually, the accuracy and the recall rate are evaluated. Other customized evaluation indicators are also added based on real-world problems. In terms of code recommendations, indicators, such as time consumption and recommended code length, are also included in addition to the accuracy (Top-N). Only by applying a comprehensive set of indicators can we evaluate the real effects of the model.
Model Deployment – The PAI Elastic Algorithm Service (EAS) allows you to deploy models online. You can upload and deploy models easily and call these models through APIs.

Outlooks

Based on intelligent technology, we are determined to keep up with the industry advancement and blend useful algorithms into our products to provide users with more powerful services.
We will explore more intelligent scenarios and use machine learning to solve problems.

Summary

Machine learning opens another door to problem-solving. We find that many business challenges can be tackled with some methods based on machine learning. Furthermore, I believe that knowing how to leverage machine learning will be a useful skill for everyone in the future. Machine learning, like data analysis, will become more user-friendly. For example, Pipcook is a machine learning framework that comes in handy for frontend operators.

We hope more tools like this can be ready for use. Currently, Alibaba Group is pushing forward frontend intelligence in RRD to Code (P2C), D2C, and C2C businesses to solve frontend business problems through intelligent means. Are you interested in this challenge? Join us!

Community

Practices of Intelligent Frontend in Alibaba Cloud's Big Data R&D Platforms

Background

Challenges

Solutions

Intelligent Editor

Intelligent Code Recommendation

Code Diagnosis

Intelligent Visualization

Data Profiling

Algorithm Engineering

Outlooks

Summary

Read previous post:

Read next post:

Alibaba F(x) Team

You may also like

Comments

Alibaba F(x) Team

Related Products

AIRec

Artificial Intelligence Service for Conversational Chatbots Solution

Big Data Consulting for Data Technology Solution

Big Data Consulting Services for Retail Solution