By Mutao
In the context of addressing Block and Major issues within CPO project codes, tools are needed to enhance the efficiency of code governance. To precisely adjust the code, one naturally thinks of leveraging ASTs, which can be used to make relevant adjustments to the code, thereby solving corresponding issues.
Note: The sample code in this article is based on JavaScript.
An Abstract Syntax Tree (AST) is an abstract representation of the syntactic structure of source code. It represents the syntactic structure of a programming language in a tree format, with each node in the tree representing a structure from the source code.
Image from the internet
While front-end developers may not often deal directly with AST when writing JavaScript code daily, many engineering tools are related to it, such as Babel for code transformation, ESLint for validation, TypeScript type checking, and editor syntax highlighting. The objects these tools operate on are actually JavaScript's abstract syntax trees.
Usually, we go through three stages in the actual use of AST:
• Parse: Convert the source code to an AST.
• Transform: Modify the AST through various plugins.
• Generate: Use code generation tools to convert the modified AST back into code.
Image from the internet
Thus, by transforming code snippets into ASTs, performing specified structural processing, and then converting the modified AST back into code, we achieve the effect of modifying the code.
You can view the abstract syntax tree structure of your code by using AST Explorer. We recommend you use the @babel/parse parser, which will keep things consistent with the examples in this article.
Let's look at a simple example:
const name = '小明'
After conversion, the output AST tree structure is as follows:
{
"type": "Program",
"start": 0,
"end": 17,
"loc": {
"start": {
"line": 1,
"column": 0,
"index": 0
},
"end": {
"line": 1,
"column": 17,
"index": 17
}
},
"sourceType": "module",
"interpreter": null,
"body": [
{
"type": "VariableDeclaration",
"start": 0,
"end": 17,
"loc": {
"start": {
"line": 1,
"column": 0,
"index": 0
},
"end": {
"line": 1,
"column": 17,
"index": 17
}
},
"declarations": [
{
"type": "VariableDeclarator",
"start": 6,
"end": 17,
"loc": {
"start": {
"line": 1,
"column": 6,
"index": 6
},
"end": {
"line": 1,
"column": 17,
"index": 17
}
},
"id": {
"type": "Identifier",
"start": 6,
"end": 10,
"loc": {
"start": {
"line": 1,
"column": 6,
"index": 6
},
"end": {
"line": 1,
"column": 10,
"index": 10
},
"identifierName": "name"
},
"name": "name"
},
"init": {
"type": "StringLiteral",
"start": 13,
"end": 17,
"loc": {
"start": {
"line": 1,
"column": 13,
"index": 13
},
"end": {
"line": 1,
"column": 17,
"index": 17
}
},
"extra": {
"rawValue": "小明",
"raw": "'小明'"
},
"value": "小明"
}
}
],
"kind": "const"
}
],
"directives": []
}
The data structure of the AST is a large JSON object. Each node contains key information such as the type.
The generation of the abstract syntax tree of JavaScript mainly relies on the JavaScript parser, and the whole parsing process is divided into two stages:
Lexical analysis, also known as tokenization, is the process of converting a sequence of characters into a sequence of tokens. Tokens can be thought of as words in natural language. They are the smallest units with practical meaning in syntax analysis, also called syntactic units.
The syntactic units in JavaScript code mainly include the following types:
• Keywords: such as var, let, and const.
• Identifiers: continuous characters not enclosed in quotes, which could be variables, keywords like if and else, or built-in constants like true and false.
• Operators: +
, -
, *
, /
.
• Numbers: syntax like hexadecimal, decimal, octal, and scientific expressions.
• Strings: for the computer, the content of the string will participate in the calculation or display.
• Whitespace: continuous spaces, line breaks, and indents.
• Comments: line comments or block comments are the smallest syntactic units that cannot be split.
• Others: braces, parentheses, semicolons, and colons.
Tokenization example:
// JavaScript source code
const name = '小明'
// Result after tokenization
[
{
value: 'const',
type:'identifier'
},
{
value:' ',
type:'whitespace'
},
{
value: 'name',
type:'identifier'
},
{
value:' ',
type:'whitespace'
},
{
value: '=',
type:'operator'
},
{
value:' ',
type:'whitespace'
},
{
value: '小明',
type:'string'
},
]
As shown, the tokenizer breaks down the code snippet into a series of token sequences based on syntactic units, completing the first step towards transforming it into an AST. While this sounds simple, writing a tokenizer actually involves considering numerous scenarios and handling various cases according to the language features.
Syntax analysis combines the sequence of tokens into a syntax tree on the basis of lexical analysis, and finally outputs AST.
Below is the abstract syntax tree, which includes logical relationships, obtained after processing through a syntax parser. (Only key parts are shown.)
Since AST is a JSON tree, you only need to traverse it and modify the relevant properties of its nodes to achieve the modification of AST. Finally, generate code based on the modified AST.
The Web front-end code quality problems identified by Aone scans are based on problem statistics according to ESLint rules.
Therefore, our objective is clear: resolve these issues at a low cost.
Let's take the problem identified by the scan based on the @ali/no-unused-vars
rule as an example.
Question: Why develop our own logic instead of using ESLint's automatic fixes?
Technical solutions:
Let's start by addressing a relatively simple problem scenario:
// An ordinary variable definition statement
const name = '小明'
Assume that name is an unused variable. During ESLint validation, you would receive the following prompt:
According to the technical solution we made earlier:
1️⃣ Run npx eslint --format json to get ESLint's validation results. The output includes the problematic code's line, startColumn, and endColumn, allowing us to identify the affected variable name.
2️⃣ Retrieve the entire file content and pass it to @babel/parse for AST parsing, obtaining the syntax tree.
import * as babelParser from '@babel/parser';
const EXAMPLE_CODE = 'const name = "小明"'
// Parse the source code
function babelParse (code) {
const ast = babelParser.parse(code, {
sourceType: 'module',
plugins: ['jsx', 'typescript'],
});
return ast
}
const astResult = babelParse(EXAMPLE_CODE)
console.log(astResult)
/**
{
"type": "Program",
"start": 0,
"end": 17,
"loc": {
"start": {
"line": 1,
"column": 0,
"index": 0
},
"end": {
"line": 1,
"column": 17,
"index": 17
}
},
"sourceType": "module",
"interpreter": null,
"body": [
{
"type": "VariableDeclaration",
"start": 0,
"end": 17,
"loc": {
"start": {
"line": 1,
"column": 0,
"index": 0
},
"end": {
"line": 1,
"column": 17,
"index": 17
}
},
"declarations": [
{
"type": "VariableDeclarator",
"start": 6,
"end": 17,
"loc": {
"start": {
"line": 1,
"column": 6,
"index": 6
},
"end": {
"line": 1,
"column": 17,
"index": 17
}
},
"id": {
"type": "Identifier",
"start": 6,
"end": 10,
"loc": {
"start": {
"line": 1,
"column": 6,
"index": 6
},
"end": {
"line": 1,
"column": 10,
"index": 10
},
"identifierName": "name"
},
"name": "name"
},
"init": {
"type": "StringLiteral",
"start": 13,
"end": 17,
"loc": {
"start": {
"line": 1,
"column": 13,
"index": 13
},
"end": {
"line": 1,
"column": 17,
"index": 17
}
},
"value": "小明"
}
}
],
"kind": "const"
}
],
}
*/
3️⃣ Traverse the syntax tree by using @babel/traverse.
traverse can walk through the AST generated by parse, iterate over the specified node type through the attributes defined in the second input parameter, and modify the node with the handleVariableType method.
import traverse from '@babel/traverse';
traverse(astResult, {
VariableDeclaration(path) { // This indicates that the node whose type is VariableDeclaration is processed.
// The node can be processed here.
handleVariableType(path)
}
})
Let's take a closer look at the AST structure of the code. In this structure, the declarations array represents the variables defined by the current node. Each element in this array is a node that defines a variable and includes an ID property that contains information about the variable name. We can compare the id.name with the variable name obtained from ESLint's output. If they match, we then compare the code location information. Each node has a loc attribute that indicates the location information of the current node, which helps us determine whether it refers to the specified variable.
Once a match is successful, we can delete the current variable by using node.declarations.splice(index, 1)
.
Finally, if node.declarations.length === 0
, meaning there are no declared variables left, we remove the entire statement by using path.remove()
.
According to the above logic, add the processing code:
// Assume the variable to be deleted is named 'text', located at line 'line', starting column 'startColumn', and ending column 'endColumn'.
function handleVariableType(path) {
const { node } = path
node.declarations.forEach((decl, index) => {
if (decl.id.name === text) {
if (decl.loc.start.line === line && decl.loc.end.line === line && decl.id.loc.start.column === startColumn && decl.id.loc.end.column === endColumn) {
node.declarations.splice(index, 1);
}
}
});
// If the declaration list is empty, remove the entire declaration statement.
if (node.declarations.length === 0) {
path.remove();
}
}
By applying this processing logic, we can effectively remove the entire corresponding variable declaration statement.
So can all unused variable declaration statements be deleted?
Refer to the following example:
const timer = setTimeout(() => {
console.log('a');
}, 1000);
The variable timer is not used, but it would be inappropriate to simply delete the entire statement. The right side of the assignment is a return value from a timer function, which contains logic that will execute later. Deleting this statement could affect business logic, so we need to exclude such cases.
Let's take a look at the AST corresponding to this code snippet. (Only key parts are shown.)
In the AST, within the VariableDeclarator node, the init node represents the expression on the right side of the assignment statement. Here, its type is CallExpression, indicating the return value of a function execution (compare this with the previous example where the type is StringLiteral). Therefore, we need to add a check in our handleVariableType method: if the init node is of this type, we should not delete it and instead require manual confirmation for handling.
// Assume the variable to be deleted is named 'text', located at line 'line', starting column 'startColumn', and ending column 'endColumn'.
function handleVariableType(path) {
const { node } = path
node.declarations.forEach((decl, index) => {
if (decl.id.name === text) {
if (decl.loc.start.line === line && decl.loc.end.line === line && decl.id.loc.start.column === startColumn && decl.id.loc.end.column === endColumn) {
if (decl.init?.type === 'CallExpression') { // Additional check logic
// Do not remove return values from function executions; users must decide.
} else {
node.declarations.splice(index, 1);
}
}
}
});
// If the declaration list is empty, remove the entire declaration statement.
if (node.declarations.length === 0) {
path.remove();
}
}
After modifying the AST, use @babel/generator to convert it back into a code snippet and replace the source code.
import generate from '@babel/generator';
// Convert the modified AST back to a code string
const finalCode = generate(astResult, { retainLines: true }).code;
By using AST in this way, the problem of unused variables in a simple scenario is addressed.
Of course, the scenario described above is just one example. The @ali/no-unused-vars
rule alone can contain many different situations that require summarization and categorization before addressing the problem.
Below are some code examples of various scenarios. As governance projects increase, there may be even more unconsidered cases that will need to be handled individually with appropriate logic.
// Variables
// Delete the entire expression line
const xxx =
// Delete specified destructured variables
const { xxx } =
const { xxx: abc } =
const { xxx = [] } =
const [a ,b] =
// Functions
// Delete the function body
function a() {}
// Delete the input parameter n
function a(m, n) {
console.log(m)
}
// Destructure and delete parameter n
function a({m,n}) {
console.log(m)
}
// ❗️ The following examples should not be deleted. Logic check: if the current variable is the result of a method execution
const a = setTimeout(() => {
console.log('a');
}, 1000);
const b = arr.map((item) => {
console.log(item);
});
As you gradually become proficient with ASTs, you will deepen your understanding of the underlying language mechanics and discover more innovative ways to work with code. For instance, you could define new syntactic sugar and convert between multiple languages, which are all very exciting possibilities.
Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.
Empowering AI Innovation and Celebrating Scientific Excellence
1,097 posts | 321 followers
FollowApsaraDB - December 30, 2021
Alibaba Cloud Community - October 25, 2022
Alibaba F(x) Team - June 20, 2022
Alibaba Clouder - May 27, 2017
Alibaba F(x) Team - June 20, 2022
chvin - July 13, 2021
1,097 posts | 321 followers
FollowExplore Web Hosting solutions that can power your personal website or empower your online business.
Learn MoreExplore how our Web Hosting solutions help small and medium sized companies power their websites and online businesses.
Learn MoreBuild superapps and corresponding ecosystems on a full-stack platform
Learn MoreWeb App Service allows you to deploy, scale, adjust, and monitor applications in an easy, efficient, secure, and flexible manner.
Learn MoreMore Posts by Alibaba Cloud Community
5434635008019918 December 24, 2024 at 2:41 am
very good,very useful, laotie666