What Is the Experience of One Million Lines of Front-end Code

one million lines of front-end code

One million lines of front-end code.In recent years, Alibaba data middle -end products have developed rapidly. Quick BI, the core product, has been the only domestic BI selected in the Gartner Magic Quadrant for two consecutive years. The source code of Quick BI's single code repository exceeded 1 million lines. There are many people and modules involved in the whole development process. Because of the following principles, the product can be kept in a rapid development state.
In recent years, Alibaba data middle -end products have developed rapidly. Quick BI, the core product, has been the only domestic BI selected in the Gartner Magic Quadrant for two consecutive years. The source code of Quick BI's single code repository exceeded 1 million lines. There are many people and modules involved in the whole development process. Because of the following principles, the product can be kept in a rapid development state.
First share some key data:
•Code: TypeScript 820,000 lines, styles Sass+Less+CSS 180,000 lines. ( cloc statistics, remove automatically generated code)
•Collaboration: Code Review 12,111 times, Commit 53,026 times.

One million lines of front-end code.Many people will ask, with so much code, why not split the code base? Why don't you introduce micro-frontend and serverless framework as soon as possible? Don't you worry about not being able to maintain it and start the turtle speed?
The reality is that such a large amount of code was expected from day one. The startup time also slowed down from the first few seconds to 5 to 10 minutes later, and then optimized to the recent 5 seconds. During the whole process, the team felt the advantage of Monorepo (single code repository).
This practice wants to illustrate:
•A big Codebase can be a good thing, and it's easy. Easier to support complex and flexible business with extremely "simple" architecture
•To achieve a simple architecture, the internal need for clearer specifications, closer collaboration, and more efficient execution
•Problems that can be solved through engineering should not be developed through specifications, and those that can be solved through specifications should not be developed freely.
2019 , on a sunny afternoon, just the day before the May Day celebration, we used our collective wisdom to vote for a satisfactory warehouse name. At the same time, taking the opportunity of the integration of Quick BI and FBI base, the project started. Later, the base code was changed to positive, and the upper-level business code was also absorbed.
commit 769bf68c1740631b39dca6931a19a5e1692be48d
Date: Tue Apr 30 17:48:52 2019 +0800

A New Era of BI Begins
Why Monorepo ?

One million lines of front-end code.There was a lot of discussion within the Monorepo and Polyrepo teams before work started .
I used to like Polyrepo very much. I created an independent repo for each component and an independent npm . For example, before 2019, there were 43 editor components of the form class alone :

I thought this could achieve perfect decoupling and ultimate reuse ? ?
But in fact:
1.Every time Babel, React, etc. rely on the overall upgrade, it can make people peel off the skin, so they have developed their own scaffolding . Building wheels is forced out, things have been done a little bit, but the ability to write scripts has skyrocketed
2.Every time you debug a component, npm link it. Later, the component cross-level, you can do 3 layers of npm link, everyone who has used it knows what a bad experience this is
3.Versions are difficult to align. Before each release of the main warehouse , the version alignment between components is a test of eyesight, and a slight accident triggers an online fault.
4.What about the advantage of being easy for others to reuse? In the end, it is too hard to support my own business, how can I dare to let others reuse it
In the end we merged all these components into one repository, in fact, companies like Google/Facebook/Microsoft really respect Monorepo internally .
But we're not fundamentalist Monorepo , there's no need to hard-code unrelated production code together. Within the solid line team, a single product can use Monorepo , which will greatly reduce the cost of collaboration. But at the beginning, there were still many questions within the team.
A few core questions about Monorepo ?
1. A single warehouse, will the volume be very large?
1 million lines of code?
First guess: 1GB? 10GB? or more?
First, calculate it according to the formula:
The volume of the code = the volume of the source code + the volume of .git + the resource file (audio and video, pictures, other files)
Let's calculate the size of the source code together:
It is generally recommended that each line be less than 120 characters. We take 100 characters per line to calculate, and 1 million lines are:
100 * 1000,000 = 100,000,000 B
After the conversion it was 100 MB!
So how big is our warehouse actually?
Only 85 MB! That's an average of 85 characters per line .
2. Calculate the volume of .git again:
The commit history, branch and tag information of all code are recorded in .git . Will it be huge?
In fact, Git has done a lot of optimizations at the bottom: 1. All branches and tags are references; 2. Changes are incrementally stored; 3. zlib compression is used when changing object storage. (For repeated boilerplate code, it will only be stored once, and the compression ratio of normalized code is extremely high).
In our experience, .git records 10,000 commits with only 1-3 additional code volumes.
3. Resource file size
Git does a lot of optimizations for source code, except for resource files like video and audio. We recently used BFG to optimize another product's warehouse from 22GB to 200MB, a 99% reduction! And the commit history and branches of the optimized code are preserved (because BFG will edit Git commit records, and some commit ids will change).
The previous 22 GB is because the repository stores videos, published build files and sourcemap files, which should not be placed in the source code repository.
To sum up, the size of one million lines of code is generally between 200MB and 400MB. So what is the estimated size of 10 million lines of code?
Multiply by ten is between 2GB ~ 4GB . This is nothing compared to a few G in node_modules , and it is easy to manage. To add a case, the Linux kernel has 28 million lines, using Monorepo , thousands of people work together. It is said that Linus developed Git to manage Linux source code at that time.
2. Is the startup very slow? 5 minutes or 10 minutes?
I heard some teams say that with hundreds of thousands of lines of code and 10+ minutes of startup, a typical "megalithic" project is already difficult to maintain. Quickly unpack or change the front end . Maybe there are only 3 people in the team, but 5 projects are dismantled , and it is very troublesome to coordinate.
We do 3 things:
1.Split multiple Entry by page, only need to start one Entry at a time
2.Sort out the dependencies between sub-packages, pursue the ultimate lazy loading, Tree-Shaking
3.Webpack switches to Vite
Especially after Webpack switched to Vite , the cold start time of the final project was optimized from 2-5 minutes to within 5 seconds. The hot compile time is optimized from 5 seconds to within 1 second, and Apple M1 computers are basically within 500ms.
3. What about code reuse? Do you need to import all of them when Monorepo is reused?
Traditional software engineering thinking pursues DRY, but it is not the more DRY the better.
For every line of code written, there is a corresponding price: the cost of maintenance. To reduce code, we have reusable modules. But code reuse has a problem: it becomes a hindrance when you want to change it later.
For a long-term iterative product like Quick BI, most of the requirements are extensions of the original functions, so it is most important to write code that is easy to maintain. Therefore, the team does not encourage the use of magic's stunt writing method; instead of simply pursuing code reuse rate, it pursues easier modification; and encourages coding methods that are easy to delete when modules are offline in the future.
For scenarios where reuse does exist, we have done unpacking. Inside Monorepo , we have dismantled multiple packages (screenshots will follow ), for example, other products need BI to build, you can reuse @alife /bi-designer , and use Tree-Shaking to minimize the introduction of dependencies.
Current development experience
1. 5 seconds for cold start and 1 second for hot compilation . It used to be 5-10 minutes.
1.Problems that can be solved by changing a line of code, really change a line and publish it once . Instead of changing 10+ projects , publish N times by dependency.
2.Newcomers build a good environment in 10 minutes and start developing
a. Compared with the previous repo for each component, package empowerment takes a long time
4. Avoid the problem of version
mismatch a. For 2C products, there is no need for multi-version and multi-trunk branches, but it is not easy for multiple npm to depend on aligned versions
. b. For 2B products, due to multi-environment and multi-version, it will be more complicated. The complexity is extremely high. Monorepo unifies versions of internal dependencies by branching
5. The engineering upgrade only needs one time. Currently based on Pri developed by Lerna Monorepo scheme.
It is not easy to maintain such an experience, and there are still many problems to be solved in the development.
the real problem
It is not enough to put the code together. The complex issues behind it are collaboration, technical solutions, and stability (how to prevent one person from submitting code and causing the entire product to crash?)
1. Package dependency management
Internally split multiple sub-packages, each sub-package is a sub-file, which can be published separately on npm , see the figure below:

The core principles of internal package management are:
•One-way dependency from left to right, only the right side can refer to the left side. Avoid circular dependencies
•The specification is not enough, develop a plug-in to automatically detect, if the left side depends on the right side, an error will be reported directly
For the introduction of open source npm , it should be more cautious. Most npm are maintained for no more than x years, and even a once-standard tool library like Moment.js will be out of maintenance. Probably 20% of npm is unmaintained. But in the future, if your online users encounter problems, you need to rely on your own source code and become passive. Therefore, our principle is that the introduction of open source npm requires three people to pass the offline review.

2. one million lines of front-end code.Code Review Culture

Mutual Code Review can help newcomers grow rapidly, and it is also a way to build a team's technical culture.
100% CR has been pushed across the team for the past few years, but it's not enough. Mechanical execution is easy to make CR a mere formality, and it needs to be done in different scenarios.
There is a risk with Monorepo that if there is a problem, it may be a problem with the whole.
At present, our Code Review is mainly divided into 3 scenarios:
1.Online MR Code Review【1 to 1】
2.Thematic Code Review [3-5 people]
3.Collective Code Review before major version release【All】
The experience of 12,111 Code Reviews is many, mainly:
Timely review, encourage small-grained MR, do not have to wait for the entire function to be developed
Code is written for people to read, encouraging vernacular-like code, not classical Chinese
Establish best practices (directory tree structure, naming conventions, data flow conventions). There are 10 ways to develop a feature, but the team needs to choose 1 and promote it
is discouraged , for future maintainability. Can be achieved with simple technology, don't use "advanced" unpopular technology
Emphasize the development of cleanliness and the pursuit of a culture of elegant code. (whether the naming is easy to understand, whether the comments are complete, whether there are performance risks, etc.)
3. Engineering construction
this process, we must first thank the support of the front-end DEF engineering team of Tao Department . In the case of so many codes, we continue to challenge the limit to upgrade DEF to support us.
In addition to making a specification for documentation, a specification that can be checked by automated tools is a good specification.
Checker: ESLint , TS type verification, and Prettier
grammar checker are important methods to promote the implementation of specifications. ESLint can be incremental, and the pre-hooks of git commit after optimization are still very fast. However, TS type check is relatively slow because it does not support incremental, and needs to be used with CI/CD.
Webpack vs Vite
use Webpack for publishing and Vite for development .
The development environment uses Vite for quick debugging, and the production environment still uses Webpack for packaging.
The risk is that the development and production compilation products are inconsistent, which needs to be avoided by regression testing before going online.
4. Performance optimization
For data products, performance challenges come from Monorepo After the resource package becomes larger, there are also challenges brought by the large amount of data to the rendering calculation.
Performance optimization can be divided into 3 steps:
•Resource loading: Refinement of Tree Shaking is difficult. The Tree-Shaking of Webpack itself is not good, and it does not support Class method for Tree Shaking, so sometimes the code needs to be modified. The Lazy Loading module is loaded on demand, especially for large components such as charts and SQL editors. Reasonable interface preload, don't let the network idle.
•View rendering: Minimize the number of component renderings, optimize virtual scrolling of table components, and preload and render when idle.
•Data fetch request: resource localization buffering scheme, the mobile terminal uses PWA to cache resource files and data such as JS to the local.
In addition, there are performance detection tools to locate performance stuck points. It is planned to make a code performance latch, and a reminder will be issued if the package volume is found to increase before the code is submitted.
5. Data-driven architecture optimization
Being in the data center, I am convinced of the business value of data. But for development itself, data is rarely used in depth.
So S1 focuses on exploring the digitization of the development experience. Do analysis by collecting your development environment and startup time-consuming data [do not count other data to avoid involution]. I found a lot of interesting things. For example, a classmate took 3-5 minutes of hot compilation. He thought that others were so slow, which seriously affected the development efficiency. When he found abnormal data from the report, he helped him solve it within ten minutes.
Another example, in order to maintain the consistency of online packaging products and promote the team to unify the Node.js version, it used to rely on nails, and it is impossible to know the effect of nailing many times. Once you have the report, it will be clear at a glance.

one million lines of front-end code deeper experience

The most efficient way to do this is once
every line of code leaves a cost. In the long run, the most efficient way is to do it right at once.
Schwarzman said, "It's just as difficult to do big things as it is to do small things. Both take your time and energy." That being the case, you might as well write the code all at once. If "TODO" is left in the code, it may be TO DO forever. Objectively speaking, it is more difficult to do a good job at one time. First of all, everyone thinks that the "good" standard is different, and behind it is the individual's technical ability, the pursuit of experience, and the understanding of the business.
Organizational culture and technology complement each other
. Technical architecture and organizational structure have a lot to do. It is more important to choose a technical architecture that suits the organization.

If an organization is decentralized, there is a significant synergy cost to using Monorepo . But if the organization is cohesive, Monorepo can be greatly improved.

Engineering and architectural foundation is a team matter, and it is difficult to promote it by individuals.
In the short term, you can rely on battles and copying, and in the long run, you need to form a culture to continue to iterate.
The high cost of organizational communication should be solved through the organization, and the power to solve it through technology is small. What technology can do is take full advantage of the tools to make changes happen quickly.
Simplicity does not precede complexity, but after complexity

For a simple architecture, someone will always find a way to make it complicated. Step on the pit and make up your mind to rebuild. If you succeed, you will return to simplicity. If you fail, you will be subverted by a new simple model. Stepping on the pit itself is also valuable, otherwise newcomers will always be unable to hold back and step on it again. It's easy to be complex, but keeping it simple requires vision and restraint. Without the tempering of the process , other people's antidote may be poison to you.

The architecture cannot be immutable. Our charts were very simple to use D3 and ECharts directly at first . Later, many customizations became more complicated and difficult to maintain. Therefore , the architecture became simple again after self-developed bi-charts based on G2, and the development experience before and after may be similar . Yes, but the technology behind it has completely changed.
Summary and Outlook
A million lines of code is nothing to fear, a normal node that can still be as agile as tens of thousands of lines of code.
Now Quick BI has reached 10,000 lines and is on its way to the goal of world-class BI. The above content is more related to engineering. The purpose of doing a good job in engineering is to let developers focus more on business. There are actually more business challenges that are not mentioned, because data analysis is inherently dealing with massive data, and performance optimization has a long-term Practice; insight into rich and varied data, with a lot of precipitation in visualization and complex forms, visualization is not only technology, but also the business itself; mobile phone, flat-screen TV and other multi-end displays, cross-end adaptation challenges. In the future, it is also hoped that data analysis can be turned into an engine that can be quickly integrated into office and business processes.

The current development model is not perfect. In the process of iteration, technical debt will inevitably arise. The essence of architecture optimization is to maintain maintainability and reduce technical debt. Recently, the team is mulling the introduction of Redux-Toolkit, which will greatly upgrade the data fetching and data flow, and will share the progress.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00