Community Blog Learning about Defect Detection in Code Intelligence

Learning about Defect Detection in Code Intelligence

This article discusses checking bugs with code defect detection.

By Qinqi

This article is Part 2 of a series that discusses many interesting tasks involved in the field of code intelligence. Each entry in the series includes several key aspects, such as a brief introduction, history, current situation of these tasks, in the hope of giving everyone a deep understanding of code intelligence.

Part 1: Learning about Intelligent Code Completion

The topic of this article is code defect detection, which determines whether there are bugs in a piece of code. This sounds a bit profound, but it works like drawing a conclusion based on the existing experience and historical data. Is it a bit similar to fortune telling? Note: Defect detection in this article only refers to checking bugs. Defect location and repair are not included.

1. Defect Detection

The defect detection in our mind might look like this:


…or something like this:


However, defect detection can only tell if there is a bug in a piece of code. It cannot tell where and what the bug is. Therefore, defect detection is not particularly useful during development or testing. Imagine I tell you there is a bug in a piece of code without telling you what it is. Isn't it annoying? If I’m not sure whether or not there is a bug, it would be alright if we could locate the problem, but it would be more annoying if we could not. Then, another question arises, "Do you think there are bugs or not?" The answer could only be, “It’s not Do you think, it’s I think."

Let’s get back to the topic. Defect detection is not aimed at development scenarios. It could be useful if you know there is a high probability of problems in the current file during the code review (CR). At this time, people that do CR need to pay special attention. Although the actual effect is average, it is still an interesting small task in the field of code intelligence that has always existed.

2. The History of Defect Detection

The history of defect detection is the same as the history of defect definition. When defects are defined, some people are thinking about how to detect these defects or avoid some defects through tools. For example, the compiled language can find some static errors in the compilation process, which leads to unsuccessful compilation. Code analysis tools or static scanning tools can also find some defects in advance. These are all part of defect identification and location. We will discuss them later.

3. The Philosophy of Defect Detection

A common way of defect detection is to extract useful features from historical code and train based on these features to obtain a prediction model to predict the subsequent code. Here, the training units can be snippets and files or one commit. Compared to files, the prediction of snippets and commits is relatively more effective and locates the defects easier. However, due to the large size of the document, the difficulty of locating defects has also increased.

Common features include the following:

  • Features based on the change metadata, such as the developer, submission time, change logs, and number of rows changed in a file
  • Changes based on the changes of the code content, such as code complexity characteristics, the word frequency of the changed code, log, and file name, or the difference in the number of nodes of the same type based on the abstract syntax tree (AST) of the code file before and after the change
  • Changes based on the evolution of software process, which quantifies changes based on the modification history of project code, such as changing the number of times the related files are modified, the number of developers that have modified the files, and other information
  • More latitude features can be extracted, such as CR information and defect information, combined with the software project management system.
  • Defective code information features are generally source code or corresponding AST.

The first three are more of some quantitative indicator characteristics and are less relevant to defects, but they are easier to obtain and aggregate. The other two are more relevant to defects but more difficult to obtain.

4. The Future of Defect Detection

As mentioned earlier, a separate defect detection technology does not bring us anything. More needs to be used together with defect location and defect repair technologies. In recent years, due to the rapid development of deep neural networks, there have been many papers and studies on defect detection and repair based on deep models. Some related products have been released, such as Microsoft's DeepDebug, which claims to repair Python defects automatically. It has not been used in practice, so the effect is still unknown. However, we can guess that it would be similar to code completion and more likely to be in the Pull Request stage. Also, there would still be a long way to go before practical application. However, we must firmly believe that technology is always developing. What if we can have a robot that automatically changes bugs one day?


0 0 0
Share on

Alibaba F(x) Team

54 posts | 1 followers

You may also like