By Ma Ruila, Founder of Wolai.com
Our daily work is almost inseparable from cloud documents. Documents are no longer only used for simple records; they extend to office collaboration, information organization, knowledge sharing, and more. Among many online document applications in China, Wolai stands out with its new functions, fast iteration, smooth collaborative experience in different places, efficient information organization, and information integration of blocks.
People enjoy the unique functions and comfortable user experience of Wolai and want to know the technical architecture behind it. This afternoon, we invite Ma Ruila (wolai.com Founder) to discuss the Serverless architecture behind Wolai.
At the beginning of Wolai, we hoped to put the architecture on Serverless completely. We conducted detailed research on several Serverless products domestically and abroad during the technology selection stage. We found that Alibaba Cloud Function Compute (FC) has outstanding advantages in support and overall solutions. It matches our needs well. Therefore, we chose the Serverless architecture and Alibaba Cloud Function Compute (FC).
As an office collaboration application, Wolai has the function of online multi-user document editing at the same time. It is important to have a stable Web service interface and a distributed database with scaling capability, high concurrent writing, and read separation to achieve this. When we found that Serverless products can be matched well with distributed databases, we initially confirmed the main architecture of Wolai.
Next, we began to verify the feasibility of using FC on Alibaba Cloud. Through verification, we found that FC can help us solve the problems above but also help us save labor costs and cloud resource usage costs.
Next, I want to discuss why I insisted on Serverless architecture at the initiation stage of my company. I used to work for a payment company, so I'll use a typical payment company as an example.
Suppose you set up a payment company; it needs more than 200 systems to support. If most of these systems are based on Java, it means the machines behind the payment business are clusters. Each release and development need to group these clusters. Then, they need to go online and offline one by one, which requires huge labor costs.
In addition to cluster grouping, developers need to pay attention to whether there are bottlenecks in various systems (such as cache and log systems). Once a problem occurs, a lot of energy must be spent on the entire operation and maintenance system. With the development of the company, the service level has finally increased. At this time, you will find the cost has also increased significantly. For example, it needs to deploy new machines and takes a lot of time to do the scaling work of computing (It is only scale out, and there is no way to scale in, right?). So, the traffic scaling problem will be the first problem you encounter, and it is inevitable.
Next, you will encounter the problem of traffic peaks and troughs. The concurrency of payment requests may be particularly high during the daytime or during promotions and flash sales while low in the evening. If a large amount of traffic comes in at the same time, you need to distribute the computing resources quickly, which requires a lot of operation and maintenance work.
It is difficult for a newly established company to put a lot of effort into the operation and maintenance of servers. At this time, you can choose Serverless to help solve the problems above.
You can safely hand over the work of server operation and maintenance to Serverless. All you need to focus on is your business logic.
In the past, when we were doing Web services, the focus of our development work was to do all kinds of optimization on the Web server. They were all to solve one problem. Can the server performance be better after it scales up? If it can't, how can we do this optimization?
Developers conduct Nginx performance tuning, load balancing, reverse proxy, and other complicated optimization work, which takes a lot of time. When we use Function Compute, it is equivalent to removing the work of tuning the Web server, which saves labor costs and improves efficiency. After using the Function Compute across the entire service, developers can put their energy into the business code without caring about how the service runs stably.
Since the launch of the service on June 15, 2020, we have never encountered the problem of service down or offline maintenance (and these problems were common before using Function Compute). In the past, we spent a long time looking for the cause of the problem. We may need to upgrade the Web service, add a few machines, or do reverse proxy and load balancing. Even if all this work were done, the re-online service will still need a maintenance period. It is still difficult for us to keep the service online. Function Compute provides an important continuous service, allowing my business to steadily and continuously release incrementally.
When I worked in the former company, we did version releases every two weeks. Each release had a detailed release list, which involved a lot of conditions and dependencies. The scripts that needed O&M to run were complicated, and any little error could lead to a minor accident or a major accident.
When we put the entire architecture on Serverless and split all functions, the probability of accidents was reduced. Even if there is a problem, I can solve it by rolling back quickly. Currently, our R&D is used to releasing at least one version every day. All the problems solved on that specific day will be released. Compared with traditional software companies, our iteration speed will be much faster when deployed on Serverless architecture.
Collaborative editing is the top priority on collaborative office products (such as Wolai), which has high requirements for algorithms. We have also solved this problem by using Function Compute.
The Wolai cloud note-taking function has a concept of Block to reduce the minimum information unit that users can access from file to information block. Information Block can accommodate text paragraphs, tables, lists, embedded images, videos, and other information. It can be easily edited, moved, and presented in real-time to form a page. Moving forward, I will refer to information block as block.
Separate blocks are on the red bottom.
After each user operation, our frontend will have a snapshot-like saving mechanism. If the user operates quickly, their multiple operations may form a transaction in a specific time slice and send it back to FC. Then, we will record these operations. When the second user performs operations at the same time, if ze also performs operations on the same block, ze will trigger the same operation.
We will calculate the order of the actual influence of these operations on the block in the function and finally get how it should work. Then, FC will issue a queue request. When any block (or the page to which it belongs) has undergone this change event, it will be thrown into Redis. Once a block or a page has been updated within five minutes, we will adjust another function to generate snapshots of the entire page and the entire block. We combined function and queue calls into an automated system.
Once a user edits a page or block, we will generate a snapshot for a fixed time. Now, we save a snapshot every minute for a single block (equivalent to a minimum unit such as a single text or a picture). That means if the user edits things within the one minute interval, we will turn it into a snapshot and then put it on OSS. If a block is frequently updated, OSS will have many one-minute snapshots for this block. Currently, we have more than 1 billion files on OSS. If it is page-level editing, Wolai saves snapshots in five minutes, and the frequency will be lower. We have created a snapshot saving system through the combination of FC and queue.
Wolai Serverless Architecture
After studying the user behavior of Wolai, we found that our users usually open Wolai documents at work every morning, using them throughout the day until ze get off work.
Our users do not need to open and close the application quickly like applet users. On the contrary, they do not have particularly high requirements for the initial loading speed, so our focus is not on server-side rendering. Studying user habits help us pay more attention to the question of whether users can respond quickly to each step of the operation. There are two points involved:
Using Function Compute allows the frontend engineers at Wolai to take charge of the development process from front to back. Our research and development iteration speed is fast.
We separate each small function point and deploy a lot of services on Function Compute to achieve fast iteration and save time/effort. Each service has multiple functions. This realizes function decoupling through manual splitting. The advantage is that when we need to release if we only make some optimizations or bug fixes for one function, we only need to release this function.
Therefore, we can quickly accumulate and release every day. Most functions are decoupled and do not affect each other. We try to separate all functions and turn them into independent business logic. This ensures the speed of R&D iteration.
Currently, there are ten R&D engineers on our team. Eight of them are frontend engineers. The effectiveness of the whole team has improved significantly.
When making the selection, we made a rough calculation of the cost of using Function Compute. Function Compute can save more than half of the computing cost compared with the traditional framework, and the labor investment can be reduced by half or more.
If we choose the traditional architecture, we need at least two O&M engineers for our application (based on the complexity of the current system). Now, frontend engineers can develop and maintain the system from beginning to end. The labor cost is calculated at 30,000 yuan per month, including the cost of place and hardware. Small companies can save at least 700,000 to 800,000 yuan on O&M. The cost of computing resources will also increase; if the traditional framework costs 1 million a year, Function Compute can reduce costs to at least 500,000.
Wolai took a short time from the selection to the completion of the project with Function Compute. I am grateful to Alibaba Cloud for making such a product. I believe more enterprises can pay less attention to resources and focus more on how to provide better services to customers because of Serverless technology at the computing level (especially at the resource scheduling level).
Alibaba Cloud Native Community - November 22, 2022
Alibaba Developer - April 7, 2020
Alibaba Cloud Community - December 28, 2021
Alibaba Clouder - February 14, 2020
Alibaba Cloud Serverless - June 28, 2022
Alibaba Cloud Native Community - July 19, 2022
A unified, efficient, and secure platform that provides cloud-based O&M, access control, and operation audit.Learn More
Alibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.Learn More
An online computing service that offers elastic and secure virtual cloud servers to cater all your cloud hosting needs.Learn More
Visualization, O&M-free orchestration, and Coordination of Stateful Application ScenariosLearn More
More Posts by Alibaba Cloud Serverless