By Yuanyan (Serverless & Engineering Performance Director, Alibaba CBU Technology Department)
First of all, I would like to introduce the business scenario. 1688 is an online shopping platform affiliated with the Domestic Core Business Unit (CBU) of Alibaba Group. It is the earliest business of Alibaba and has a history of more than ten years. We are mainly responsible for 1688.com on the PC end and the Alibaba mobile app. 1688 is currently the largest Class B e-commerce trading platform in China. It is aimed at B2B e-commerce scenarios and provides e-commerce trading channels for small and medium-sized enterprises, such as retail, wholesale, distribution, and processing customization.
I work on the Wireless Server Technical Team of 1688. The team provides business support of the mobile side and is responsible for the construction of various scenarios inside the 1688 app, such as home page recommendations and commodity details. It is a typical e-commerce business scenario.
E-Commerce Business Scenario
FaaS Efficiency Improvement Practices in 1688 Complex Business Scenarios
The exploration of 1688 on Function as a Serverless (FaaS) technology dates back to around 2015. At that time, the biggest business goal of the whole Alibaba group was ALL IN WIRELESS. When the mobile Internet emerged, it was necessary to transfer the PC-side business (whether Taobao or 1688) to the mobile-side and generate an app to seize the mobile traffic.
Under such a large business background, the solution of 1688 is to set up a wireless server. The service interface of the PC stock is called through the microservice system, and then some lightweight service Logic Composer and UI layer mapping are carried out for the mobile service at the front desk. Finally, the service interface with the same service capability can be quickly provided for the app through the mobile gateway.
The function iteration of the mobile Internet terminal is fast. There are some problems in this mode. The construction, development, deployment, and debugging time of the traditional microservice system is long, while the business changes for the front desk are frequent. The dislocation between technical capabilities and business demands pushes us to find better solutions.
FaaS's landing within CBU has gone through two major stages. In the first phase, the department developed a set of dynamic loading systems in 2015 based on JVM. It can realize fast release, fast launch, hot deployment, etc., realizing the effect of FaaS. In the second phase, starting from last year, we co-built with the Alibaba Cloud FC Team to replace the base of the entire FaaS capability with Alibaba Cloud Function Compute (FC) to obtain better Auto Scaling and container isolation capabilities.
As mentioned earlier, in the context of rapid iteration of the entire wireless service, we need to release and change the server interface quickly. Around 2015, the mainstream idea in Java engineering was to make full use of the dynamic loading feature of JVM. The feature is to compile external code into class bytecode in real-time through a certain mechanism without restarting JVM and then dynamically load it into running JVM instances, thus realizing the effect of hot loading.
A dynamic service loading system based on JVM has been built based on this, which is MBOX. A generic lightweight service container was built in MBOX. It can receive a piece of code from outside (maybe a Java class or a simple groovy script) and compile the code in real-time to generate class bytecode. After that, the container will carry out certain security reinforcement operations on the generated bytecode (such as eliminating dead loops). Finally, it loads the generated bytecode into a class in the JVM running online through a custom class loader, generates object instances, injects middleware agents, and provides services to the outside world.
Based on MBOX, we have realized online coding, online preview, and second-level publishing. From the current perspective, it is a typical FaaS service platform with the following features:
The MBOX system has carried more than 100,000 QPS business calls of the entire 1688 for roughly five years. At its peak, there were more than 1500 online functions, saving a lot of human resources, making great contributions to the entire wireless business expansion phase, and opening a door for Serverless technology exploration.
There are many advantages to this system, but there are also shortcomings and risks.
The first one is the isolation problem. MBOX is based on JVM, which has no way to provide an effective resource isolation mechanism (such as CPU and memory), so there is a large security risk. Multiple services loaded in the same business container will affect each other. For example, a memory leak occurred in the code written by one person on the business cluster A today. Then, the performance of the entire cluster may be slowed down, and all the services above will be affected. This is a serious security risk.
The second one is that the development model of the code is too light, and only a code fragment can be written in a script-based way. The development is fast and enjoyable, but there is no engineering structure, no framework, and no design mode can be used. As a result, the application scenarios are limited, and the quality of the code is poor.
The third one is that resource management is a question for MBOX maintenance. The water level of the entire cluster often soars, and there is no way to determine which service occupies resources. However, the system cannot conduct Auto Scaling well and can only rely on manual expansion. The cost is high in the later stage of platform maintenance.
Some problems with MBOX became prominent by 2019. At that time, the industry set off a wave of Serverless and cloud-native technologies under the influence of Kubernetes. We immediately started the corresponding technical research and finally started a joint construction with the Alibaba Cloud Function Compute Team at the end of 2020. We hope to build a real cloud-native-oriented FaaS platform.
Alibaba Cloud Function Compute (FC) has built a set of easy-to-open FaaS infrastructure with high elasticity, strong isolation, and customized based on Kubernetes container automatic O&M capabilities. It has become a unified solution for FaaS capabilities within the entire Alibaba Group.
In addition to the highly mature and powerful elastic automatic O&M capabilities at the bottom, FC provides a Runtime design with high openness. Any language or team can customize their run time framework, thus meeting the demands of the front-line developers to the greatest extent.
The FC Team and Middleware Team (combined with Microsoft's latest open-source DAPR technology) have realized a set of standardized Sidecar capabilities for the long-standing problem of cross-language middleware calling. It covers common middleware, such as RPC, cache, MSMQ, and configuration center. While smoothing out multi-language differences, the run time container of users is streamlined, and the cold start and elastic speed of functions are improved.
Finally, we jointly built a common Runtime framework and R&D operation and maintenance supporting facilities for Java developers in the group with the support of the strong technology at the bottom of FC, replacing the original JVM MBOX system and realizing the technical replacement of the FaaS capability.
Reviewing CBU's Serverless evolution from the earliest microservices model to the self-developed JVM FaaS system and then to the current Function Compute (FC), the most suitable technical scheme for business scenarios is explored. It is also the first step in the industry to land FaaS on a large scale. As the first group of departments to land the Serverless concept on a large scale, our FaaS system has a business penetration rate of more than 80% in the department. The use time has exceeded five years.
After understanding the capabilities and implementation of FaaS, there is the most important issue. How can you implement FaaS capabilities in business systems?
Based on past practical experience, the landing of FaaS in the actual business is not as smooth as most people think, and it is bound to encounter some problems. There are three key ideas:
Two comparative landing models have been summed (through practical exploration) on the combination of stock complex business and FaaS capabilities, namely the BFF model and extension point model.
The BFF model is a mainstream practice in FaaS. We can abstract the logic in traditional Serverful apps. Generally, the logic in business scenarios can be divided into two layers according to the frequency of changes: part of the code logic is light, and there are no complex dependencies. However, most of the product requirements may be concentrated in this part, which is called the change layer.
The other part of the code may be the business application framework, middleware two-party dependencies, and some core business logic. The changes are not too much, but the transformation is risky, and the transformation benefits may not be satisfying. It is called the stability layer.
If your business application can be split according to the preceding idea, it is suitable to adopt the BFF mode. You can abstract the change layer and put it into the Serverless function, thus achieving an effect similar to the BFF layer. The consumer at the front desk consumes the API of the stability layer in the Serverful app directly through the function. As such, a business buffer can be constructed, which can realize faster release and delivery with less operation and maintenance. A considerable part of the old code burden cannot be removed, but 80% of energy can be concentrated to improve efficiency.
This model is suitable for foreground business scenarios. For example, the controller layer in the traditional application M-V-C architecture is suitable for FaaS replacement.
The second model is the extension point model, which is suitable for mid-background scenes or some mid-end systems in the business. For example, our commodity center uses this model.
It is complicated for the application of the middle and back office class. There are many business logic, coupled with a long history, so it is not suitable for a drastic transformation. However, we can abstract the complex business logic layer to some extent, design some key extension points for the future, and provide FaaS adaptation solutions for extension points.
As such, some subsequent incremental business logic can be provided using FaaS capabilities. The existing business logic can remain unchanged. It only needs to be slightly adapted in the code structure, and it can also become a standard extension point implementation.
Another advantage of the extension point pattern is that it can make the original closed architecture open. After adopting this pattern, even for mid-end applications, as long as the docking specification of the extension point is formulated, any business party can realize the desired extension capability by providing customized FaaS functions. In the 1688 commodity mid-end system, the business openness and customization capability of commodity price computational logic are realized through this expansion point model.
Scripted programming was used in the early stage when we defined the programming interface of functions in the MBOX system. The user's programming granularity was a piece of code and a Java class. This method is lightweight, but it will result in a large number of functions. (If you want to implement a slightly complex business, many scripts may need to be written. Since there is no engineering structure, the code quality is low, and some design patterns cannot be used.)
Therefore, when formulating the programming interface of functions based on FC, a rule based on the experience above is set. The running granularity of user functions should be a Micro App instead of a Single Function. The granularity at the programming interface should be a Code Project instead of a Single Script.
Based on this principle, for developers, a function instance is closer to a micro-application, and the simplest engineering structure is retained as a whole. A single function point can be realized at a lower cost, and the development of complex logic and the introduction of various two-and three-way libraries will not cause serious function expansion and fragmentation problems.
The Micro App-based function granularity definition is adopted, but the number of functions used in business will still be several orders of magnitude higher than traditional micro applications. A four-layer latitude function classification definition of [business domain]-[function group]-[function]-[interface] is designed and plug-ins are embedded in the engineering template of functions to solve this problem. When the function is completed, it will collect and report these grouping and classification information and build a function service market for internal research and development personnel. Everyone can see the current function classification and the attribution information of each interface API.
"Only focus on business" is the core concept that Serverless has advertised to developers since its inception. However, if you switch the underlying infrastructure (such as operation and maintenance) to Serverless's infrastructure and introduce FaaS and other related technical capabilities, it is far from the real "Only focus on business" for R&D personnel.
When Serverless first landed, it was found that the efficiency of writing code by R&D personnel seemed to be much higher. However, the R&D efficiency did not improve qualitatively from the perspective of the overall business team's business demand delivery. Most of the demand research and development processes were still lengthy, and the communication cost and collaboration cost in the demand promotion process were still high. However, it can be found that the bottleneck of R&D performance may not lie in the R&D itself but outside the code.
Is Serverless's ability to improve R&D efficiency a false proposition? Certainly not. First, Serverless and FaaS can reduce O&M and coding costs and improve efficiency. Second, the emergence of Serverless technology has lowered the technical threshold of the server, enabling some non-server professional R&D personnel to develop simple business logic. It makes full-stack development possible.
Considering the efficiency of overall demand delivery and assuming that R&D personnel can independently complete all the development work of the whole demand without communicating with others, the efficiency is bound to be maximized. Serverless makes the landing and popularization of this new research and development mode possible, which may be the real meaning of "Only focus on business."
In a word, Serverless is not a silver bullet for improving efficiency. No technology is a silver bullet. When we expect to improve R&D efficiency, we should look at it from a global perspective instead of focusing on the R&D stage.
Finally, let's take a practical business scenario as an example to introduce how 1688 combines FaaS capabilities for R&D and efficiency improvement in a complex business scenario.
An introduction to the 1688 business scenario of commodity details is listed below:
Commodity details are the final display page for buyers of commodities, and it carries a large amount of commodity information. The 1688 commodity details page is different from other ordinary C-type e-commerce companies because there are many trading channels for B-type trade fairs, such as spot wholesale, distribution, and processing customization, in which the trading mode, price, and inventory logic of each channel are different. In addition, Class B e-commerce companies have great differences in expression for goods in different industries, such as consumer goods and industrial products. Customization of different channels and industries and the overlay of various e-commerce marketing activities make the business complexity of 1688's product details page high.
The original technical architecture involves multiple teams:
In this model, 5 or 6 teams work together at the same time to meet demand, and the communication cost is high. In addition, the server side adopts a relatively heavy micro-application model, which makes the R&D and operation efficiency low.
First of all, we introduced FaaS capability in BFF mode at the frontend business logic layer with the most and fastest changes in the business core and gathered all UI-related logic upstream into the FaaS function. On the one hand, it improves the efficiency of the development and deployment of the corresponding logic on the server side. On the other hand, the components on the frontend and client side only need to handle the simplest display logic, thus smoothing out multi-terminal technical differences as much as possible and enabling some cross-terminal capabilities.
The main problem at the backend of a product is that the access cost of various customized business logic is high. Therefore, we use the FaaS extension point model to optimize the product information by abstracting the core logic (such as price and inventory) into standard extension points and connecting them to function gateways. This allows any business party to write a function based on a template to customize its business logic, thus opening up the closed architecture.
After the technical architecture transformation mentioned above, the entire demand development model and procedures have undergone several changes. In the old model, if you want to customize the product details of a business, you need to participate from the backend of the business to the client. Many procedures need to be changed at a high cost.
In the new model, it can be achieved by writing simple FaaS functions, from the customization of the core business logic to the implementation of the business logic on the foreground presentation side. One person can complete the back-office business logic changes. (If the frontend and client-side components can realize low code + cross-end development, the full-stack development of business requirements can be realized!)
Finally, let's look at the improvement of overall business scenarios after being reformed. In combination with the FaaS R&D model, the demand time related to product details reduces by 80%, the frequency of issues increases by 300% +, and the demand for throughput is also improved. The key is that the investment in R&D personnel is reduced by 50%, and the entire backend side only needs half regular staff +1 outsourced staff support instead of the original 2 full-time employees. The number of relevant teams and personnel involved in demand development has decreased a lot, and the entire R&D delivery procedure has become concise and clear.
Finally, a simple summary and outlook of Serverless technology are shared from the perspective of a business team.
Summarize several key conclusions based on our experience in business scenarios:
As for the follow-up development of Serverless and FaaS, I would like to express some personal views. You are welcome to discuss them rationally.
99 posts | 7 followers
FollowAlibaba Cloud Community - June 10, 2022
Alibaba Clouder - November 26, 2020
Alibaba Cloud Community - December 2, 2021
Alibaba Cloud Native - August 14, 2024
Alibaba Cloud Community - April 27, 2022
Alibaba Cloud Native Community - March 14, 2022
99 posts | 7 followers
FollowAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreThis solution enables you to rapidly build cost-effective platforms to bring the best education to the world anytime and anywhere.
Learn MoreVisualization, O&M-free orchestration, and Coordination of Stateful Application Scenarios
Learn MoreServerless Application Engine (SAE) is the world's first application-oriented serverless PaaS, providing a cost-effective and highly efficient one-stop application hosting solution.
Learn MoreMore Posts by Alibaba Cloud Serverless