×
Community Blog Serverless Splitting Practice for the Monolithic Applications of Idle Fish

Serverless Splitting Practice for the Monolithic Applications of Idle Fish

This article discusses Serverless splitting practices and the benefits of automated splitting tools.

By Jian Chao, from Idle Fish Technology

Background

In 2018, we put forward the R&D solution featuring cloud-edge unification based on Flutter + Dart FaaS in practice. This solution reduces the R&D threshold of the service-side business assembly layer by using the light (focusing on business), fast (single interface and single function, fast R&D, and fast deployment), and NoOps (O&M platform) capabilities of Serverless. Client-side developers can also have the opportunity to participate in service-side business development. This reduces the problem of client-server collaboration efficiency and improves the iterative efficiency of emerging businesses. In the traditional application architecture of Idle Fish, there is also a similar business assembly layer called idleapi.

1

As the vertical business boundaries of applications and the hierarchical design of the architecture are not clear, almost all businesses are iterated on idleapi. New businesses continue to accumulate, old businesses continue to iterate, and expired businesses cannot be cleaned up in time, resulting in the continuous expansion of the application scale. According to statistics, as of the Double 11 Global Shopping Festival in 2020, idleapi has provided more than 1,200 gateway interfaces. More than 500 of them have no business traffic (business is disabled), but the code is still running and has not been cleaned up in time. As a result, idleapi has a total of more than 700,000 lines of code, over 2,000 business switches, and hundreds of business modules. As a result, many businesses, code, and development objects are coupled in one application, causing a series of isolation problems.

Online Stability

Hundreds of business modules run in one application process and interfere with each other, which can easily lead to isolation problems. For example, if a business module has a problem (running out of memory or thread pool resources), other business modules deployed on the same machine will have no resources available, resulting in the denial of service. The core business deployed on the same machine will encounter a failure. Such examples exist every year.

Low R&D Efficiency

Dozens of R&D personnel develop and maintain hundreds of business modules, and each release will have more than ten branches. Each additional business branch will face the risk of code conflicts. The greater the gap between the baseline version of a branch and the baseline version of other branches results in more conflicts to be resolved and longer time consumed. According to statistics, it takes about 30 minutes for idleapi to pre-release once, of which 20 minutes are used for waiting for developers to resolve conflicts. So, the development efficiency is low.

Conflict Caused by Vertical Businesses

Idle Fish adjusted the personnel structure according to the business domain to develop businesses and pay attention to the business indicators, but the application structure was too late to follow up. Although the same business group can be autonomous, cohesive, and communicate effectively when all businesses are coupled in one application, a lot of energy is still needed for cross-group collaboration between businesses.

Governance - Splitting

The structures of large systems tend to disintegrate during development, qualitatively more so than with small systems. Organizations that design systems are constrained to produce designs that are copies of the communication structures of these organizations. According to Conway's Law, large systems always tend to be decomposed and reorganized in development to achieve some homomorphism of system architecture and personnel structure. We split it to solve various problems with idleapi. In the process of splitting, several issues must be considered in advance.

2

  1. What is the splitting product? Is it a traditional monolithic application divided by business domain or a FaaS function based on business interfaces?
  2. During the splitting process, is the business code all rewritten or reused? How do we handle redundant business code?
  3. How do we migrate the configurations, monitoring, and alerts of my business?
  4. How do we verify quickly?
  5. How do we achieve a smooth grayscale release? How do we roll back? How do we deal with new requirements during business migration?
  6. Are there any measures to prevent the application/FaaS inflation problem from recurring after the application is launched?

The problems above are the key points of the splitting process and determine whether the splitting scheme can be implemented successfully. Next, let's analyze them one by one.

Traditional Applications vs. FaaS Functions

The first problem to be solved by the splitting: What is the target splitting product? There are roughly two ideas:

  1. Split into small traditional applications according to the business domain with independent development, deployment, and maintenance
  2. Split into corresponding FaaS functions based on the gateway interface

Based on the exploration and comparison over the past few years, we believe that the idea of FaaS functions is very suitable for solving the problems encountered by idleapi.

3

Debugging Period

First of all, for traditional applications, multiple interfaces are developed in parallel on one application during the debugging period. When a different branch code is released, there is a risk of code merging conflicts, and it takes about 30 minutes for a pre-release deployment.

For FaaS, one gateway interface corresponds to one FaaS function. Each FaaS function has an independent Git repository and deployment environment. FaaS functions are independent of each other and physically isolated. Developers can safely modify their code and baseline versions and can also initiate remote debugging at any time without hindering the debugging of other developers. Moreover, each FaaS function focuses on only one service gateway interface. The amount of code and internal services that FaaS functions depend on are much smaller than traditional applications. Therefore, pre-release deployment takes only three minutes at a time, which is nearly ten times faster than traditional applications.

Operating Period

During the operating period, each FaaS function runs on a different cluster. This natural physical isolation prevents FaaS functions from causing isolation faults. If a FaaS function runs out of the thread pool or disk resources, it does not affect the functions deployed on other clusters (except for associated businesses).

Encoding Period

Although FaaS functions have advantages in the debugging period, operating period, and O&M period, traditional monolithic applications have advantages in the encoding period. For example:

  • Code Reusability: The code of multiple services is located in one engineering warehouse. The underlying tool class and manager class can be called by upper-layer services directly. The code reuse is simple and direct. In FaaS mode, different gateway interfaces are in different code warehouses. To reuse the code, we need to copy the code or expand public code to internal packages or domain services, which will cause code maintenance problems.
  • Software Version Upgrade: When Pandora or an internal package must be upgraded, traditional applications only need to upgrade the version of the software that the application depends on and release it again. In FaaS mode, if each function needs to be modified and released by business developers one by one, the workload of repeated work will be hundreds of times more than traditional applications, which will affect the development efficiency substantially. We are also trying to solve this problem through some platform-based tools or layered measures.

Splitting Tool

After the splitting solution is determined, idleapi will be split into hundreds of FaaS functions based on gateway interfaces from a giant monolithic application. It is unrealistic to re-implement so many businesses, so the best way is to reuse the business code in the monolithic application.

4

After analyzing the code, we found that in idleapi, the code of each business references each other, forming an intricate giant mesh structure. One business interface is associated with the code of five (or even ten) other business interfaces, involving nearly 1,000 source files, which accounts for 1/4 of the total number of idleapi source code files. This does not simplify the business code. Besides the business gateway entry, there are also various other implicit function entries. For example, JSON serialization will call the set function and bean initialization function of the class automatically. It poses a great challenge to the manual splitting of business code.

To this end, we have designed and implemented a code splitting tool that can help businesses analyze the classes, methods, and properties on which business entry functions depend in interwoven code and exclude classes, methods, and properties that are not called. The tool can reduce the number of source files on which a single business portal depends to about 100 (70% of which are from interface data types.) Combined with the FaaS business framework designed and implemented by us, when developers migrate businesses, they can split the business code, create FaaS functions, and deploy them to the pre-release environment with one click. The whole process takes less than half an hour. For business switch configurations, we also provide a migration tool that can migrate online or pre-release configurations to new functions in batches with one click, eliminating the need for the copy approval of manual migration.

Automated Regression Testing

Testing is the last barrier to ensuring the quality of the split business code. We collaborated with the FaaS platform and the automated regression testing platform to reduce the extra workload that application splitting brings to business personnel and developers and adapt the recording, playback, and other regression testing functions to the SideCar and Pod architectures of the FaaS platform. Developers only need to record online traffic in traditional applications after the FaaS function is released and then import the traffic to the FaaS function to be tested for automated regression testing.

Developers can complete the regression testing of the business by themselves by combining it with the automated testing platform. This reduces the risk of business migration and the pressure on testing personnel and improves migration efficiency.

O&M

In terms of the O&M of the FaaS business, we try our best to keep the O&M habits of the developers. The split FaaS function retains the name of the log, the organization format of the log, and the code in the monolithic application. It also retains developers' ability to log in to the remote machine. At the same time, we adapt the personalized business log to the white-screen log function of the FaaS platform. Developers can search all logs on any machine through the control platform, which is much more effective than logging into machines to check one by one. At the same time, the log-based monitoring and alerting system only needs to update the corresponding monitored business log path to complete the monitoring migration.

5

Architecture Evolution

There are two ways to solve the problem of business code reuse after the application is split into fine-grained FaaS functions:

The first solution is governance before splitting. First, transform and reconstruct the monolithic application, expand the code reused by each business (expand to the public internal package or the service layer of the business domain), and then split the monolithic application into multiple FaaS functions

There are two problems with the solution above:

  1. The zombie code accounts for only half of the total, which will bring about an invalid refactoring workload.
  2. If we conduct refactoring on the original application, new business iteration and refactoring AB are mixed together for development and grayscale release with high complexity and high risk.

The second solution is splitting before governance. First, the monolithic application is split by business, and the problem of code reuse is ignored temporarily. After splitting, some developers perform code reuse and transformation according to real-world business needs in the subsequent development process. Encapsulate the code reused by businesses to a working internal package or expand it to the domain service.

Compared to the first solution, sorting out reusability issues among clearly isolated function code bases will be less complex and risky. Therefore, we chose the second solution.

6

Benefits

7

Over 30 gateway interfaces have been split from the monolithic application and delivered for business development and maintenance. This verifies that the solution is feasible in the splitting and governance of monolithic applications. We will provide the split solution to help developers split the migration business by themselves later. After the splitting, the business retains its original development and O&M habits.

At the same time, one service gateway interface corresponds to one function, so that one FaaS function only focuses on one service gateway interface. This solves the problem of the continuous expansion of traditional applications in scenarios where services continue to innovate. This focus also makes the amount of function code less than 3% of traditional applications (mostly data code), and it only takes five minutes for a business release (Java).

Summary

8

In general, developers can split a business interface with one click within half an hour and deploy it in pre-release with the help of automated splitting tools. It does not require manual intervention during the process, and the split function maintains the original development and maintenance habits. The low migration cost is acceptable to developers. Moreover, with the help of the business focus of functions, one interface corresponds to one function. Each function will not be interfered with by other services during the development period, so the measurability and deployment speed are high. During the operating period, each function runs on different physical machines. This natural physical isolation improves the stability during the operating period and reduces the O&M costs of the business.

Outlook

The FaaS function platform is still developing rapidly and can still be improved.

The Cost of Machines

  • High Cost of Function Machines with Low Traffic: Under the high requirements of safe production, even functions with low traffic require two machines per data center, which is a serious waste. The platform is considering various measures to improve machine utilization, such as reducing machine specifications and overselling.
  • Elasticity: When the upstream and downstream procedures of the service are relatively long, single-point elasticity cannot solve all problems. This requires overall consideration and resolution.
0 0 0
Share on

XianYu Tech

56 posts | 4 followers

You may also like

Comments

XianYu Tech

56 posts | 4 followers

Related Products