Seven Suggestions for Efficient and High-Quality Code Review

By Songhua

Introduction

What is the essence of Code Review (CR)? Is it for error checking? Or for KPI? This article shares the views of Alibaba's Senior Technical Experts. CR is a long-term practice and orientation of organizational culture. Through CR, a benign interactive technical atmosphere was formed, spreading and sharing knowledge and improving code quality. This article also gives seven practical suggestions to improve the efficiency and quality of CR.

There are many articles about CR. CR is a standardized practice of many organizations, but many teams have the following questions when they try CR:

Has the "politically correct" CR achieved the desired results?
Faced with a lot of code, where should I start? Which ones should I review? Which ones should I not review?
Has anyone carefully reviewed my code? How do I make it easier for others to review my code?

These questions are common and have been raised in different contexts for years. To answer these questions, you only need to focus on two aspects:

Understand the core goals of CR and establish correct expectations about CR
Learn why CR may be ineffective and adopt targeted measures to improve its effectiveness

Why Do We Need CR?

Many people think CR is used for error checking and hope to use the number of defects in the code to determine the effectiveness of CR. This underestimates the value of CR. The essential role of CR is not problem discovery. If we aim to discover problems, we have more and better ways than CR. The role of CR primarily involves the culture and sociology of the organization and is a long-term practice.

CR Ensures the Compliance with Coding Standards

Coders' Perspective: Benign Social Pressure

You are coding all out, and your deadline is looming. In addition, your organization requires unit testing of code: All new code must have complete automated unit testing. However, under pressure, you may want to lower the coding expectations a little. You may want to write the unit test for the new code later or write a slightly less useful test to meet the coverage requirements of the tool, for example, tests without assertions.

However, when you know that your code will be sent out for review by your colleagues, does this idea seem less enticing? This kind of pressure is benign and can give you immediate results, preventing you from choosing the "speculative" behavior of short-term gains and long-term losses. If there were no CR, you might be tempted to "do whatever you want," but you would still have to pay for this kind of trickery.

Maintainers' Perspective: Code Readability Guarantee

There are many ways to fulfill the same software requirements. Interested readers can search for "Hello World's N Ways of Writing" by themselves. Although all roads lead to Rome, the costs of roads are different. From variable naming to design structure, if you adopt a less common method, you may be causing trouble for later code maintainers. This maintenance activity may occur after one month, after 1 year, or longer. At that point, as the code author, you are no longer on the team, and no one can understand why the software was designed this way.

CR forces this feedback cycle ahead of time. Immediately after the code is written, there are one or more readers that are code reviewers. Therefore, this code underwent readability verification immediately after it was written. Ideally, if the organization already has coding specifications and design specifications, it can also ensure that the code follows these specifications. If it is found in the CR process that this code does not follow the specifications, it can also provide an advantage. It points to another key value of CR: knowledge dissemination.

CR Brings Knowledge Dissemination and Design Consensus

You may be the leader of a team, worrying about how to improve the programming skills of your team members. You want them to read books, so you bring in introductory books, such as Clean Code. You also introduce the classic Design Mode and recommend Domain-Driven Design. You may periodically organize team members to share their business knowledge in the hope that the team members can understand the business logic of products.

All of these efforts are very good, but it is also possible that you will be hit. A month later, it seems that the team members have established a concept for the naming convention, but there is no consensus on how to perform specific naming. Many of the team members already know some design patterns. However, some use excessive patterns and make the code bloated, while others only know singleton. In addition, the team members are contentious about what an entity object is and what a value object is, and no one can clearly say what aggregation is and which scenarios it applies to.

What you really lack is a scene. It is difficult to form a consensus if abstract concepts are not implemented in practice. Some people may know the concept of "precedent" in the common law legal system, which is a very good way to form a consensus at the legal level. CR is forming a precedent: which type of design is reasonable and which type of design is unreasonable? By analyzing specific problems, the team will gradually form a design consensus. In the process, new team members that are not familiar with the consensus can also slowly integrate.

Verify Logical Correctness

CR can also be used to verify logical correctness.

It is the designer's responsibility to ensure the code logic is correct.

To prevent CR from being abused and raising expectations too high, we must first declare one thing: It is the designer's responsibility to ensure the logic of the code is correct. If a null pointer error occurs, how can we determine whether the coding is bad, the CR is bad, or the test is bad? First, the code author must have created this problem. Is it fair to blame this problem on the code reviewer? Perhaps, the code reviewer is responsible for discovering such problems, but what if the code is inherently buggy? What if the code reviewer has reviewed 1,000 lines of code at a time but cannot review the entire code to find this problem? As a code reviewer, one can find a lot of reasons why such a null pointer error is missed.

Other Methods to Find Logical Errors

You have many other methods to find errors, and their cost is often not high:

Writing automated unit tests
Using static code checking tools

Regardless of whether there is CR activity, the preceding two points are actions that a professional developer and development organization should strongly advocate.

The Value of Finding Errors

Under the premise that the preceding two points are recognized, CR should be used to find errors. It essentially establishes a redundancy mechanism, through multiple people working on the same piece of code, to discover any negligence and cognitive errors that may exist in the code, which are often difficult to find for a single developer.

Efficient and High-Quality CR

Factors That Hinder the Effectiveness of CR

CR itself is not difficult, but if you consider the following factors, it may be more complicated:

You may not know anything about the design context of the code to be reviewed.
You may be very busy.
You suddenly received thousands of lines of code that need to be reviewed.

Practical Suggestions

Suggestion 1: Small Batches - Less Code for Each Review

Studies show that successful CR activities must be small-scale. For example, the paper "Modern Code Review: A Case at Google" states that in the CR activities of Google, only one file was modified for 35% of the CR activities, less than ten files were modified for 90% of the CR activities, and only one line of code was modified for 10% of the CR activities.

There are two main benefits of reviewing small amounts of code at a time: It is very clear where the changes are made, and the problem will be clear at a glance. If you push more than 1,000 lines of code to someone at a time, it is extremely unlikely to obtain a valuable review.

A review should be performed for a code of a meaningful and complete feature. If the code is not to fix defects, developers are required to develop features iteratively.

Suggestion 2: Multiple Batches - Frequent Reviews

Small batches will inevitably lead to multiple batches. In the 2013 Microsoft paper "iExpectations, Outcomes, and Challenges of Modern Code Review," and the aforementioned paper from Google, the practice of frequent reviews was mentioned. Among them, Google's weekly median number of code changes per developer is three, and the weekly median number of reviews per reviewer is four.

Suggestion 3: Find the Right Person - the Right Reviewer

Who is suitable for reviewing your code? It is unwise to choose someone that has nothing to do with the code being reviewed. The best potential candidates are listed below:

If your organization has an owner mechanism, the owner is the right person.
Colleagues that work in the same context as you
Colleagues that have recently modified the same code
More experienced programmers that you can work with to get professional feedback

There are already some tools that can recommend reviewers depending on the context. This facilitates the selection of appropriate reviewers.

Suggestion 4: Quick Response

When the granularity of each review is not large and frequent reviews are needed, quick response is possible, which is also an inevitable requirement. Google's median number of review hours per reviewer is four hours. This metric can be a good reference.

Suggestion 5: Use Modern Tools

A high-quality review that provides quick response depends on modern tools. Modern review tools can automatically integrate into workflows, highlight changes, and automatically summarize changes. There are already many modern tools available, and readers that have not yet selected the tools can search by themselves.

Suggestion 6: Pair Programming

When it comes to "small batch, multi-batch, and fast feedback," if you have experience in pair programming, you will immediately realize that pair programming is very similar to CR.

Pair programming means two people that are co-programming have the same context, so there are no problems with context switching, no problems with lack of time, and no need to use additional tools to provide feedback to each other anytime or anywhere. In my opinion, pair programming is the best CR.

Suggestion 7: Online Review and Offline Review

Online reviews should be normalized behavior. When it comes to the "knowledge dissemination" value of CR, offline reviews are a useful supplement. Experienced teams will organize offline reviews periodically or irregularly, so they can obtain a wider range of knowledge dissemination than online reviews. This can also lead to heated discussions and debates, forming a consensus of higher quality.

Digital Metrics

The full digitization of R&D activities has brought some valuable data insights. CR activities can be continuously promoted through the observation of metrics if the tools can support it. We summarized the following suggested metrics and data:

From the code authors' perspective:

Number of code lines per review (main metric)
Frequency of review (reference)

From the reviewers' perspective:

Response time
Number of comments and rejection rate

From the perspective of the reviewer group, you can also find the community relationship between the code author and reviewer, which is also valuable information to understand the spread of knowledge.

We recommend following the instructions below:

Avoid using the digital metrics of CR for assessment, even if the motivation is good. The essence of CR is cultural construction. We strongly recommend using CR metrics only as guidelines for improvement, not for performance-related evaluations, whether the CR metrics are the aforementioned ones or the data related to the review quality or defects found in reviews.

Best Practices

What is the code quality of your organization? How about the technical atmosphere of your organization? Are you implementing CR?

If you have not implemented CR yet, are you planning to try it out?
If it is already implemented, does the existing practice achieve the true goal of CR?

I hope the practical suggestions I've shared in this article are helpful. If you have any suggestions about CR, please share them with us.

Community