×
Community Blog An AST Operating JavaScript

An AST Operating JavaScript

This article discusses parsing highly fault-tolerant code, modifying and generating an AST, and regenerating code.

By Xulun from F(x) Team

In earlier articles, we have learned the ESLint and stylelint writing rules. When you write the rules, you will encounter many detailed problems, the parsed code is wrong, or the attribute value is not enough to analyze problems. We need more tools to help us simplify the process of rule development, such as a parser with higher fault tolerance or a tool to obtain richer attributes.

ESLint operates primarily based on the level of Abstract Syntax Tree (AST).

ESLint needs a set of standards to support the replacement of parsers. The set of standards used by ESLint is called the ESTree specification. The three members of the steering committee of the ESTree specification happen to be from ESLint, Acorn, and Babel.

The basic format of ESTree can be viewed from the ES5 specifications. Starting from the ES6 specifications, new specifications are added for each version. For example, ES2016 adds the support for the "**" operator.

The acorn parser supports the plugin mechanism, so the espree parser used by ESLint and the Babel parser are extended on Acorn.

Acorn Parser

The default usage of Acorn is very simple. It parses a segment of code string, and an AST structure will directly come out:

let acorn = require("acorn");

console.log(acorn.parse("for(let i=0;i<10;i+=1){console.log(i);}", {ecmaVersion: 2020}));

The output is listed below:

Node {
  type: 'Program',
  start: 0,
  end: 39,
  body: [
    Node {
      type: 'ForStatement',
      start: 0,
      end: 39,
      init: [Node],
      test: [Node],
      update: [Node],
      body: [Node]
    }
  ],
  sourceType: 'script'
}

Traversing Syntax Trees

After parsing the syntax tree nodes, we can traverse the syntax trees. The Acorn-walk package allows us to traverse.

Acorn-walk offers traversal methods with several granularities. For example, we use the simple function to traverse all literal values.

const acorn = require("acorn")
const walk = require("acorn-walk")

const code = 'for(let i=0;i<10;i+=1){console.log(i);}';

walk.simple(acorn.parse(code, {ecmaVersion:2020}), {
    Literal(node) {
        console.log(`Found a literal: ${node.value}`);
    }
});

The output is listed below:

Found a literal: 0
Found a literal: 10
Found a literal: 1

The full function is used more often:

const acorn = require("acorn")
const walk = require("acorn-walk")

const code = 'for(let i=0;i<10;i+=1){console.log(i);}';
const ast1 = acorn.parse(code, {ecmaVersion:2020});

walk.full(ast1, function(node){
    console.log(node.type);
});

The output is listed below:

Identifier
Literal
VariableDeclarator
VariableDeclaration
Identifier
Literal
BinaryExpression
Identifier
Literal
AssignmentExpression
Identifier
MemberExpression
Identifier
CallExpression
ExpressionStatement
BlockStatement
ForStatement
Program

We can see the tree root is Program.

Acorn-Loose with High Fault Tolerance

Acorn has no problem in normal use, but its fault tolerance can be better.

Let's look at an example of an error:

let acorn = require("acorn");

console.log(acorn.parse("let a = 1 );", {ecmaVersion: 2020}));

Acorn fails to work correctly and reports an error:

SyntaxError: Unexpected token (1:10)
    at Parser.pp$4.raise (acorn/node_modules/acorn/dist/acorn.js:3434:15)
    at Parser.pp$9.unexpected (acorn/node_modules/acorn/dist/acorn.js:749:10)
    at Parser.pp$9.semicolon (acorn/node_modules/acorn/dist/acorn.js:726:68)
    at Parser.pp$8.parseVarStatement (acorn/node_modules/acorn/dist/acorn.js:1157:10)
    at Parser.pp$8.parseStatement (acorn/node_modules/acorn/dist/acorn.js:904:19)
    at Parser.pp$8.parseTopLevel (acorn/node_modules/acorn/dist/acorn.js:806:23)
    at Parser.parse (acorn/node_modules/acorn/dist/acorn.js:579:17)
    at Function.parse (acorn/node_modules/acorn/dist/acorn.js:629:37)
    at Object.parse (acorn/node_modules/acorn/dist/acorn.js:5546:19)
    at Object.<anonymous> (acorn/normal.js:3:19) {
  pos: 10,
  loc: Position { line: 1, column: 10 },
  raisedAt: 11
}

Then, we replace acorn-loose with high fault tolerance:

let acornLoose = require("acorn-loose");

console.log(acornLoose.parse("let a = 1 );", { ecmaVersion: 2020 }));

Acorn-loose recognizes the extra half bracket as an empty statement.

Node {
  type: 'Program',
  start: 0,
  end: 12,
  body: [
    Node {
      type: 'VariableDeclaration',
      start: 0,
      end: 9,
      kind: 'let',
      declarations: [Array]
    },
    Node { type: 'EmptyStatement', start: 11, end: 12 }
  ],
  sourceType: 'script'
}

Espree Parser

Since espree is extended on Acorn, its basic usage is compatible:

const espree = require("espree");

const code = "for(let i=0;i<10;i+=1){console.log(i);}";

const ast = espree.parse(code,{ ecmaVersion: 2020 });

console.log(ast);

The generated format is ESTree, which is the same as Acorn:

Node {
  type: 'Program',
  start: 0,
  end: 39,
  body: [
    Node {
      type: 'ForStatement',
      start: 0,
      end: 39,
      init: [Node],
      test: [Node],
      update: [Node],
      body: [Node]
    }
  ],
  sourceType: 'script'
}

If the AST is not enough, we can also directly look at the effect of word segmentation:

const tokens = espree.tokenize(code,{ ecmaVersion: 2020 });
console.log(tokens);

The results are listed below:

[
  Token { type: 'Keyword', value: 'for', start: 0, end: 3 },
  Token { type: 'Punctuator', value: '(', start: 3, end: 4 },
  Token { type: 'Keyword', value: 'let', start: 4, end: 7 },
  Token { type: 'Identifier', value: 'i', start: 8, end: 9 },
  Token { type: 'Punctuator', value: '=', start: 9, end: 10 },
  Token { type: 'Numeric', value: '0', start: 10, end: 11 },
  Token { type: 'Punctuator', value: ';', start: 11, end: 12 },
  Token { type: 'Identifier', value: 'i', start: 12, end: 13 },
  Token { type: 'Punctuator', value: '<', start: 13, end: 14 },
  Token { type: 'Numeric', value: '10', start: 14, end: 16 },
  Token { type: 'Punctuator', value: ';', start: 16, end: 17 },
  Token { type: 'Identifier', value: 'i', start: 17, end: 18 },
  Token { type: 'Punctuator', value: '+=', start: 18, end: 20 },
  Token { type: 'Numeric', value: '1', start: 20, end: 21 },
  Token { type: 'Punctuator', value: ')', start: 21, end: 22 },
  Token { type: 'Punctuator', value: '{', start: 22, end: 23 },
  Token { type: 'Identifier', value: 'console', start: 23, end: 30 },
  Token { type: 'Punctuator', value: '.', start: 30, end: 31 },
  Token { type: 'Identifier', value: 'log', start: 31, end: 34 },
  Token { type: 'Punctuator', value: '(', start: 34, end: 35 },
  Token { type: 'Identifier', value: 'i', start: 35, end: 36 },
  Token { type: 'Punctuator', value: ')', start: 36, end: 37 },
  Token { type: 'Punctuator', value: ';', start: 37, end: 38 },
  Token { type: 'Punctuator', value: '}', start: 38, end: 39 }
]

As shown in the result, the result of lexical analysis is token, while the grammatical analysis result is already the result of statements.

Why does ESLint encapsulate an espree on top of the Acorn? It is because the earliest ESLint depended on esprima. There are incompatibilities between the two, and ESLint requires more information to analyze code.

The Basic Operation of Babel

The last one was an amazing tool named Babel, with good two-way support, although it is not a default of ESLint.

Babel Parser

The amazing tool Babel can also be configured as the mode that only accepts the AST:

const code2 = 'function greet(input) {return input ?? "Hello world";}';

const babel = require("@babel/core");
result = babel.transformSync(code2, { ast: true });

console.log(result.ast);

The output result is listed below:

Node {
  type: 'File',
  start: 0,
  end: 54,
  loc: SourceLocation {
    start: Position { line: 1, column: 0 },
    end: Position { line: 1, column: 54 },
    filename: undefined,
    identifierName: undefined
  },
  errors: [],
  program: Node {
    type: 'Program',
    start: 0,
    end: 54,
    loc: SourceLocation {
      start: [Position],
      end: [Position],
      filename: undefined,
      identifierName: undefined
    },
    sourceType: 'module',
    interpreter: null,
    body: [ [Node] ],
    directives: [],
    leadingComments: undefined,
    innerComments: undefined,
    trailingComments: undefined
  },
  comments: [],
  leadingComments: undefined,
  innerComments: undefined,
  trailingComments: undefined
}

We can also use the babel.parseSync method only to read an AST:

result2 = babel.parseSync(code2);
console.log(result2);

We can also use the parser package:

const babelParser = require('@babel/parser');
console.log(babelParser.parse(code2, {}));

Babel Traverser

Acorn has a special traverser package, and Babel is not to be outdone. Babel provides the @babel/traverse package to help traverse an AST.

Let's look at an example of a code node path:

const code4 = 'let a = 2 ** 8;'
const ast4 = babelParser.parse(code4, {})
const traverse2 = require("@babel/traverse");
traverse2.default(ast4, {
    enter(path) {
        console.log(path.type);
    }
});

The output is listed below, which is the top-down path from Program:

Program
VariableDeclaration
VariableDeclarator
Identifier
BinaryExpression
NumericLiteral
NumericLiteral

Type Judgment

After traversing, we need a large number of tool functions to judge types. Babel provides us with a huge tool class library named @babel/types.

For example, if we want to judge whether an AST node is an identifier or not, we can call the isIdentifier function to judge it. Let's look at an example:

const code6 = 'if (a==2) {a+=1};';
const t = require('@babel/types');
const ast6 = babelParser.parse(code6, {})
traverse2.default(ast6, {
enter(path) {
        if (t.isIdentifier(path.node)) {
            console.log(path.node);
        }
}
});

The output is listed below:

Node {
  type: 'Identifier',
  start: 4,
  end: 5,
  loc: SourceLocation {
    start: Position { line: 1, column: 4 },
    end: Position { line: 1, column: 5 },
    filename: undefined,
    identifierName: 'a'
  },
  name: 'a'
}
Node {
  type: 'Identifier',
  start: 11,
  end: 12,
  loc: SourceLocation {
    start: Position { line: 1, column: 11 },
    end: Position { line: 1, column: 12 },
    filename: undefined,
    identifierName: 'a'
  },
  name: 'a'
}

Now, if we want to judge whether any expression uses the "==" operator or not, we can write it like this:

const code8 = 'if (a==2) {a+=1};';
const ast8 = babelParser.parse(code6, {})
traverse2.default(ast8, {
    enter(path) {
        if (t.isBinaryExpression(path.node)) {
            if(path.node.operator==="=="){
                console.log(path.node);
            }
        }
    }
});

The isBinaryExpression also supports parameters, so we can add the conditions of operators:

traverse2.default(ast8, {
    enter(path) {
        if (t.isBinaryExpression(path.node,{operator:"=="})) {
            console.log(path.node);
        }
    }
});

The Construction of AST Nodes

Type judgment is not enough. The more important role of the @babel/type library is that it can be used to generate AST nodes.

For example, if we want to generate a binary expression, we can use the binaryExpression function:

let node7 = t.binaryExpression("==",t.identifier("a"),t.numericLiteral(0));
console.log(node7);

Note: You cannot directly assign values for the identifiers or the numeric literal. The construction function of its own type must be used to generate values.

The running result is listed below:

{
  type: 'BinaryExpression',
  operator: '==',
  left: { type: 'Identifier', name: 'a' },
  right: { type: 'NumericLiteral', value: 0 }
}

If we want to change the operator "==" to "===", we can change it directly:

node7.operator="===";
console.log(node7);

The output result is listed below:

{
  type: 'BinaryExpression',
  operator: '===',
  left: { type: 'Identifier', name: 'a' },
  right: { type: 'NumericLiteral', value: 0 }
}

Let's sort out the logic above and replace the operator "==" with the operator "===". The code is listed below:

const code8 = 'if (a==2) {a+=1};';
const ast8 = babelParser.parse(code6, {})
traverse2.default(ast8, {
    enter(path) {
        if (t.isBinaryExpression(path.node,{operator:"=="})) {
            path.node.operator = "===";
        }
    }
});

Generating Code by AST

The highlight moment is generating code directly. Babel has prepared the "@babel/generator" package for us:

const generate = require("@babel/generator") ;
let c2 = generate.default(ast8,{});
console.log(c2.code);

The generated code is listed below:

if (a === 2) {
  a += 1;
}

;

Code Template

If the code we want to generate is written by AST expressions, it will be unacceptable sometimes. Therefore, we can try the code template.

Let's look at an example:

const babelTemplate = require("@babel/template");
const requireTemplate = babelTemplate.default(`
  const IMPORT_NAME = require(SOURCE);
`);

const ast9 = requireTemplate({
    IMPORT_NAME: t.identifier("babelTemplate"),
    SOURCE: t.stringLiteral("@babel/template")
});

console.log(ast9);

Note: The result generated through the code template is directly an AST. It replaces identifiers and text literal constants instead of the template string.

The output result is listed below:

{
  type: 'VariableDeclaration',
  kind: 'const',
  declarations: [
    {
      type: 'VariableDeclarator',
      id: [Object],
      init: [Object],
      loc: undefined
    }
  ],
  loc: undefined
}

You must also call the generate package to convert to the source code:

console.log(generate.default(ast9).code);

The output is listed below:

const babelTemplate = require("@babel/template");

Note: Our code template generates an AST, not a specific syntax tree. For example, we write comments in the code template, but the comments will not exist in the generated code:

const forTemplate = babelTemplate.default(`
    for(let i=0;i<END;i+=1){
        console.log(i); // output loop variable
    }
`);
const ast10 = forTemplate({
    END: t.numericLiteral(10)
});
console.log(generate.default(ast10).code);

The generated code is listed below:

for (let i = 0; i < 10; i += 1) {
  console.log(i);
}

Advanced Operations of Babel

Babel Transcoder

Now that we have Babel, it is a bit wasteful to only use it as a parser. We can use Babel as a transcoder in our code:

const code2 = 'function greet(input) {return input ?? "Hello world";}';
const babel = require("@babel/core");
let result = babel.transformSync(code2, {
    targets: ">0.5%",
    presets: ["@babel/preset-env"]});

console.log(result.code);

Remember to install @babel/core8 and @babel/preset-env*。

The result is listed below:

"use strict";

function greet(input) {
  return input !== null && input !== void 0 ? input : "Hello world";
}

Let's take another example of ES6 Class conversion:

const code3 = `
//Test Class Function
class Test {
    constructor() {
      this.x = 2;
    }
  }`;

const babel = require("@babel/core");
let result = babel.transformSync(code3, {
    presets: ["@babel/preset-env"]
});

console.log(result.code);

Except for the presets: ["@babel/preset-env"] required to be specified, others use the default parameters.

The generated code is listed below:

"use strict";

function _classCallCheck(instance, Constructor) { if (!(instance instanceof Constructor)) { throw new TypeError("Cannot call a class as a function"); } }

//Test Class Function
var Test = function Test() {
  _classCallCheck(this, Test);

  this.x = 2;
};

In ESLint rules, if the source code does not have a transcoder, we can use Babel to directly transcode to generate autofix.

The Replacement of AST Nodes

Earlier, we only modified the operators in binary expressions, but this is rare in practice. In reality, we often have to modify a large section of expressions. At this time, we can use the replaceWith function to replace old AST nodes with new AST nodes.

Let's take replacing the "==" with the "===" as an example. This time, we will directly generate a new binaryExpression to replace the original one. The left and right nodes in the expression are unchanged:

const babel = require("@babel/core");
const babelParser = require('@babel/parser');
const t = require('@babel/types');
const traverse = require("@babel/traverse");
const generate = require("@babel/generator");

const code8 = 'if (a==2) {a+=1}; if (a!=0) {a=0}';
const ast8 = babelParser.parse(code8, {})
traverse.default(ast8, {
    enter(path) {
        if (t.isBinaryExpression(path.node, {operator: "=="})) {
            path.replaceWith(t.binaryExpression("===", path.node.left, path.node.right));
        }else if(t.isBinaryExpression(path.node, {operator: "!="})){
            path.replaceWith(t.binaryExpression("!==", path.node.left, path.node.right));
        }
    }
});

let c2 = generate.default(ast8, {});
console.log(c2.code);

The output result is listed below:

if (a === 2) {
  a += 1;
}

;

if (a !== 0) {
  a = 0;
}

The Deletion of AST Nodes

When we review code, we often encounter problems. For example, console.log statements have not been deleted. At this point, we can write an AST processing tool to delete console.log statements. We can use the remove method that directly calls a node to delete the current node.

The console.log is a function call that is a CallExpression, and the caller is the callee property of CallExpression:

let code11 = "let a = 1; console.log(a);"
const ast11 = babelParser.parse(code11, {})
traverse.default(ast11, {
    enter(path) {
        if (t.isCallExpression(path) && t.isMemberExpression(path.node.callee)) {
            if (path.node.callee.object.name === "console" && path.node.callee.property.name === "log") {
                path.remove();
            }
        }
    }
});
const c11 = generate.default(ast11, {});
console.log(c11.code);

The output is listed below:

let a = 1;

We can go further. As long as it is console object, we can delete it without considering the functions that it calls:

let code12 = "let a = 1; console.log(a); console.info('Hello,World!')";
const ast12 = babelParser.parse(code12, {})
traverse.default(ast12, {
    enter(path) {
        if (t.isCallExpression(path) && t.isMemberExpression(path.node.callee)) {
            if (path.node.callee.object.name === "console") {
                path.remove();
            }
        }
    }
});
const c12 = generate.default(ast11, {});
console.log(c12.code);

Scope

Babel also supports scope analysis.

For example, we can use scope.hasBinding to check whether a local variable is bound in this scope, or we can use scope.hasGlobal to check whether a global variable is defined.

If this scope has a bound variable, we can obtain its initial value with the getBinding function.

Let's look at an example:

let code13 = `
g = 1;
function test(){
    let a = 0;
    for(let i = 0;i<10;i++){
        a+=i;
    }
}
`;
const ast13 = babelParser.parse(code13, {})
traverse.default(ast13, {
    enter(path) {
        console.log(path.type);
        const is_a = path.scope.hasBinding('a');
        console.log(is_a);
        if(is_a){
            console.log(path.scope.getBinding('a').path.node.init.value);
        }
        console.log(path.scope.hasGlobal('g'));
    }
});

The output is listed below:

Program
false
true
ExpressionStatement
false
true
AssignmentExpression
false
true
Identifier
false
true
NumericLiteral
false
true
FunctionDeclaration
true
0
true
Identifier
true
0
true
BlockStatement
true
0
true
...

We can see that when it comes to the FunctionDeclaration, the function-defined variable a begins to be bound, and we can get its initial value of 0.

Using Babel to Highlight and Mark Error Code

In addition to routine operations (such as analyzing and modifying ASTs) and generating code and transcoding by ASTs, Babel provides the code-frame feature to tag code to make error messages more readable.

Let's look at an example:

const codeFrame = require("@babel/code-frame");
const rawLines2 = 'let a = isNaN(b);';
const result2 = codeFrame.codeFrameColumns(rawLines2, {
    start: {line: 1, column: 9},
    end: {line: 1, column: 14},
}, {highlightCode: true});

console.log(result2);

Let's look at the results:

1

Is it user-friendly that code is highlighted and errors are marked red?

Let's look at a cross-line example. We need to mark the beginning and ending information and leave the rest to @babel/code-frame to solve:

const rawLines3 = ["class CodeAnalyzer {", "  constructor()", "};"].join("\n");
const result3 = codeFrame.codeFrameColumns(rawLines3, {
    start: {line: 2, column: 3},
    end: {line: 2, column: 16},
}, {highlightCode: true});

console.log(result3);

The output result is listed below:

2

A Complete Example of Replacing isNaN with Number.isNaN

The knowledge above may be a little scattered, so let's make a complete example and sort out it together. The following js script reads a js file name from command line parameters and then finds its isNaN call, mainly to save parameters and replace with the Number.isNaN call:

const babel = require("@babel/core");
const babelParser = require('@babel/parser');
const t = require('@babel/types');
const traverse = require("@babel/traverse");
const generate = require("@babel/generator");
const babelTemplate = require("@babel/template");
const fs = require("fs");

let args = process.argv;

if (args.length !== 3) {
    console.error('Please input a js file name:');
} else {
    let filename = args[2];
    let all_code = fs.readFileSync(filename, { encoding: 'utf8' });
    fix(all_code);
}

function fix(code) {
    const isNaNTemplate = babelTemplate.default(`Number.isNaN(ARG);`);
    const ast0 = babel.transformSync(code, { ast: true })?.ast;
    traverse.default(ast0, {
        enter(path) {
            if (t.isCallExpression(path) && path.node.callee.name === 'isNaN') {
                let arg1 = path.node.arguments;
                const node2 = isNaNTemplate({
                    ARG: arg1,
                });
                path.replaceWith(node2);
            }
        }
    });

    const c2 = generate.default(ast0, {});
    console.log(c2.code);
}

Summary

We extend the ability to parse highly fault-tolerant code, modify and generate an AST, and regenerate code through learning these tools in this article. Compared with direct processing of source code and text replacement, operations based on ASTs have improved in terms of security and convenience.

0 0 0
Share on

Alibaba F(x) Team

66 posts | 3 followers

You may also like

Comments