×
Community Blog Practice | Code Problem Fixing Based on Abstract Syntax Tree (AST)

Practice | Code Problem Fixing Based on Abstract Syntax Tree (AST)

This article introduces how to automatically solve specific problems in front-end code governance through AST technology, particularly targeting probl...

1

By Mutao

Background

In the context of addressing Block and Major issues within CPO project codes, tools are needed to enhance the efficiency of code governance. To precisely adjust the code, one naturally thinks of leveraging ASTs, which can be used to make relevant adjustments to the code, thereby solving corresponding issues.

Note: The sample code in this article is based on JavaScript.

Introduction to AST

What is AST?

An Abstract Syntax Tree (AST) is an abstract representation of the syntactic structure of source code. It represents the syntactic structure of a programming language in a tree format, with each node in the tree representing a structure from the source code.

2
Image from the internet

What Can AST Do?

While front-end developers may not often deal directly with AST when writing JavaScript code daily, many engineering tools are related to it, such as Babel for code transformation, ESLint for validation, TypeScript type checking, and editor syntax highlighting. The objects these tools operate on are actually JavaScript's abstract syntax trees.

Usually, we go through three stages in the actual use of AST:

Parse: Convert the source code to an AST.

Transform: Modify the AST through various plugins.

Generate: Use code generation tools to convert the modified AST back into code.

3_
Image from the internet

Thus, by transforming code snippets into ASTs, performing specified structural processing, and then converting the modified AST back into code, we achieve the effect of modifying the code.

AST Structure

You can view the abstract syntax tree structure of your code by using AST Explorer. We recommend you use the @babel/parse parser, which will keep things consistent with the examples in this article.

Let's look at a simple example:

const name = '小明'

After conversion, the output AST tree structure is as follows:

{
    "type": "Program",
    "start": 0,
    "end": 17,
    "loc": {
      "start": {
        "line": 1,
        "column": 0,
        "index": 0
      },
      "end": {
        "line": 1,
        "column": 17,
        "index": 17
      }
    },
    "sourceType": "module",
    "interpreter": null,
    "body": [
      {
        "type": "VariableDeclaration",
        "start": 0,
        "end": 17,
        "loc": {
          "start": {
            "line": 1,
            "column": 0,
            "index": 0
          },
          "end": {
            "line": 1,
            "column": 17,
            "index": 17
          }
        },
        "declarations": [
          {
            "type": "VariableDeclarator",
            "start": 6,
            "end": 17,
            "loc": {
              "start": {
                "line": 1,
                "column": 6,
                "index": 6
              },
              "end": {
                "line": 1,
                "column": 17,
                "index": 17
              }
            },
            "id": {
              "type": "Identifier",
              "start": 6,
              "end": 10,
              "loc": {
                "start": {
                  "line": 1,
                  "column": 6,
                  "index": 6
                },
                "end": {
                  "line": 1,
                  "column": 10,
                  "index": 10
                },
                "identifierName": "name"
              },
              "name": "name"
            },
            "init": {
              "type": "StringLiteral",
              "start": 13,
              "end": 17,
              "loc": {
                "start": {
                  "line": 1,
                  "column": 13,
                  "index": 13
                },
                "end": {
                  "line": 1,
                  "column": 17,
                  "index": 17
                }
              },
              "extra": {
                "rawValue": "小明",
                "raw": "'小明'"
              },
              "value": "小明"
            }
          }
        ],
        "kind": "const"
      }
    ],
    "directives": []
  }

The data structure of the AST is a large JSON object. Each node contains key information such as the type.

AST Generation Process

The generation of the abstract syntax tree of JavaScript mainly relies on the JavaScript parser, and the whole parsing process is divided into two stages:

4

Lexical Analysis

Lexical analysis, also known as tokenization, is the process of converting a sequence of characters into a sequence of tokens. Tokens can be thought of as words in natural language. They are the smallest units with practical meaning in syntax analysis, also called syntactic units.

The syntactic units in JavaScript code mainly include the following types:

Keywords: such as var, let, and const.

Identifiers: continuous characters not enclosed in quotes, which could be variables, keywords like if and else, or built-in constants like true and false.

Operators: +, -, *, /.

Numbers: syntax like hexadecimal, decimal, octal, and scientific expressions.

Strings: for the computer, the content of the string will participate in the calculation or display.

Whitespace: continuous spaces, line breaks, and indents.

Comments: line comments or block comments are the smallest syntactic units that cannot be split.

Others: braces, parentheses, semicolons, and colons.

Tokenization example:

// JavaScript source code
const name = '小明'

// Result after tokenization
[
  {
    value: 'const',
    type:'identifier'
  },
  {
    value:' ',
    type:'whitespace'
  },
  {
    value: 'name',
    type:'identifier'
  },
  {
    value:' ',
    type:'whitespace'
  },
  {
    value: '=',
    type:'operator'
  },
  {
    value:' ',
    type:'whitespace'
  },
  {
    value: '小明',
    type:'string'
  },
]

As shown, the tokenizer breaks down the code snippet into a series of token sequences based on syntactic units, completing the first step towards transforming it into an AST. While this sounds simple, writing a tokenizer actually involves considering numerous scenarios and handling various cases according to the language features.

Syntax Analysis

Syntax analysis combines the sequence of tokens into a syntax tree on the basis of lexical analysis, and finally outputs AST.

Below is the abstract syntax tree, which includes logical relationships, obtained after processing through a syntax parser. (Only key parts are shown.)

5

AST Modification and Code Generation

Since AST is a JSON tree, you only need to traverse it and modify the relevant properties of its nodes to achieve the modification of AST. Finally, generate code based on the modified AST.

AST Application - Code ESLint Problem Fixing

Problem Resolution Plan

The Web front-end code quality problems identified by Aone scans are based on problem statistics according to ESLint rules.

Therefore, our objective is clear: resolve these issues at a low cost.

Let's take the problem identified by the scan based on the @ali/no-unused-vars rule as an example.

  • Rule explanation: unused variable names, function parameters, and others.
  • Project situation: over 1,200 instances of this type of issue.
  • Solution: delete corresponding variable names.
  • Consideration: Given the number of such issues, manual modification is impractical; thus, we plan to use a tool.
  • Question: Why develop our own logic instead of using ESLint's automatic fixes?

    • Firstly, in VSCode, ESLint's fix rules for such issues require manual triggering and depend on the ESLint plugin.
    • Secondly, rules like @ali/no-unused-vars do not provide an auto-fix option.
  • Technical solutions:

    • Execute npx eslint --format json to convert validation rules into JSON format and filter out issues of this type.
    • Use @babel/parser to transform the code from the respective files into an AST.
    • Traverse the AST nodes and match them with ESLint's output results. If a match is found, perform node deletion operations.
    • Convert the adjusted AST to code and replace the original file content.

Example

Let's start by addressing a relatively simple problem scenario:

// An ordinary variable definition statement
const name = '小明'

Assume that name is an unused variable. During ESLint validation, you would receive the following prompt:

6

According to the technical solution we made earlier:

1️⃣ Run npx eslint --format json to get ESLint's validation results. The output includes the problematic code's line, startColumn, and endColumn, allowing us to identify the affected variable name.

2️⃣ Retrieve the entire file content and pass it to @babel/parse for AST parsing, obtaining the syntax tree.

import * as babelParser from '@babel/parser';

const EXAMPLE_CODE = 'const name = "小明"'

// Parse the source code
function babelParse (code) {
  const ast = babelParser.parse(code, {
    sourceType: 'module',
    plugins: ['jsx', 'typescript'],
  });
  return ast
}

const astResult = babelParse(EXAMPLE_CODE)
console.log(astResult)
/**
{
  "type": "Program",
  "start": 0,
  "end": 17,
  "loc": {
    "start": {
      "line": 1,
      "column": 0,
      "index": 0
    },
    "end": {
      "line": 1,
      "column": 17,
      "index": 17
    }
  },
  "sourceType": "module",
  "interpreter": null,
  "body": [
    {
      "type": "VariableDeclaration",
      "start": 0,
      "end": 17,
      "loc": {
        "start": {
          "line": 1,
          "column": 0,
          "index": 0
        },
        "end": {
          "line": 1,
          "column": 17,
          "index": 17
        }
      },
      "declarations": [
        {
          "type": "VariableDeclarator",
          "start": 6,
          "end": 17,
          "loc": {
            "start": {
              "line": 1,
              "column": 6,
              "index": 6
            },
            "end": {
              "line": 1,
              "column": 17,
              "index": 17
            }
          },
          "id": {
            "type": "Identifier",
            "start": 6,
            "end": 10,
            "loc": {
              "start": {
                "line": 1,
                "column": 6,
                "index": 6
              },
              "end": {
                "line": 1,
                "column": 10,
                "index": 10
              },
              "identifierName": "name"
            },
            "name": "name"
          },
          "init": {
            "type": "StringLiteral",
            "start": 13,
            "end": 17,
            "loc": {
              "start": {
                "line": 1,
                "column": 13,
                "index": 13
              },
              "end": {
                "line": 1,
                "column": 17,
                "index": 17
              }
            },
            "value": "小明"
          }
        }
      ],
      "kind": "const"
    }
  ],
} 
*/

3️⃣ Traverse the syntax tree by using @babel/traverse.

traverse can walk through the AST generated by parse, iterate over the specified node type through the attributes defined in the second input parameter, and modify the node with the handleVariableType method.

import traverse from '@babel/traverse';

traverse(astResult, {
  VariableDeclaration(path) { // This indicates that the node whose type is VariableDeclaration is processed.
    // The node can be processed here.
    handleVariableType(path)
  }
})

7

Let's take a closer look at the AST structure of the code. In this structure, the declarations array represents the variables defined by the current node. Each element in this array is a node that defines a variable and includes an ID property that contains information about the variable name. We can compare the id.name with the variable name obtained from ESLint's output. If they match, we then compare the code location information. Each node has a loc attribute that indicates the location information of the current node, which helps us determine whether it refers to the specified variable.

Once a match is successful, we can delete the current variable by using node.declarations.splice(index, 1).

Finally, if node.declarations.length === 0, meaning there are no declared variables left, we remove the entire statement by using path.remove().

According to the above logic, add the processing code:

// Assume the variable to be deleted is named 'text', located at line 'line', starting column 'startColumn', and ending column 'endColumn'.
function handleVariableType(path) {
  const { node } = path
  node.declarations.forEach((decl, index) => {
    if (decl.id.name === text) {
      if (decl.loc.start.line === line && decl.loc.end.line === line && decl.id.loc.start.column === startColumn && decl.id.loc.end.column === endColumn) {
        node.declarations.splice(index, 1);
      }
    }
  });
  // If the declaration list is empty, remove the entire declaration statement.
  if (node.declarations.length === 0) {
    path.remove();
  }
}

By applying this processing logic, we can effectively remove the entire corresponding variable declaration statement.

Special Cases

So can all unused variable declaration statements be deleted?

Refer to the following example:

const timer = setTimeout(() => {
  console.log('a');
}, 1000);

The variable timer is not used, but it would be inappropriate to simply delete the entire statement. The right side of the assignment is a return value from a timer function, which contains logic that will execute later. Deleting this statement could affect business logic, so we need to exclude such cases.

Let's take a look at the AST corresponding to this code snippet. (Only key parts are shown.)

8

In the AST, within the VariableDeclarator node, the init node represents the expression on the right side of the assignment statement. Here, its type is CallExpression, indicating the return value of a function execution (compare this with the previous example where the type is StringLiteral). Therefore, we need to add a check in our handleVariableType method: if the init node is of this type, we should not delete it and instead require manual confirmation for handling.

// Assume the variable to be deleted is named 'text', located at line 'line', starting column 'startColumn', and ending column 'endColumn'.
function handleVariableType(path) {
  const { node } = path
  node.declarations.forEach((decl, index) => {
    if (decl.id.name === text) {
      if (decl.loc.start.line === line && decl.loc.end.line === line && decl.id.loc.start.column === startColumn && decl.id.loc.end.column === endColumn) {
        if (decl.init?.type === 'CallExpression') { // Additional check logic
          // Do not remove return values from function executions; users must decide.
        } else {
          node.declarations.splice(index, 1);
        }
      }
      
    }
  });
  // If the declaration list is empty, remove the entire declaration statement.
  if (node.declarations.length === 0) {
    path.remove();
  }
}

After modifying the AST, use @babel/generator to convert it back into a code snippet and replace the source code.

import generate from '@babel/generator';

// Convert the modified AST back to a code string
const finalCode = generate(astResult, { retainLines: true }).code;

By using AST in this way, the problem of unused variables in a simple scenario is addressed.

Additional Information

Of course, the scenario described above is just one example. The @ali/no-unused-vars rule alone can contain many different situations that require summarization and categorization before addressing the problem.

Below are some code examples of various scenarios. As governance projects increase, there may be even more unconsidered cases that will need to be handled individually with appropriate logic.

// Variables
// Delete the entire expression line
const xxx =
// Delete specified destructured variables
const { xxx } =
const { xxx: abc } =
const { xxx = [] } =
const [a ,b] =

// Functions
// Delete the function body
function a() {}
// Delete the input parameter n
function a(m, n) {
  console.log(m)
}
// Destructure and delete parameter n
function a({m,n}) {
  console.log(m)
}

// ❗️ The following examples should not be deleted. Logic check: if the current variable is the result of a method execution
const a = setTimeout(() => {
  console.log('a');
}, 1000);

const b = arr.map((item) => {
  console.log(item);
});

Conclusion

As you gradually become proficient with ASTs, you will deepen your understanding of the underlying language mechanics and discover more innovative ways to work with code. For instance, you could define new syntactic sugar and convert between multiple languages, which are all very exciting possibilities.


Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

1 1 0
Share on

Alibaba Cloud Community

1,097 posts | 321 followers

You may also like

Comments

5434635008019918 December 24, 2024 at 2:41 am

very good,very useful, laotie666

Alibaba Cloud Community

1,097 posts | 321 followers

Related Products