×
Community Blog A Method for Optimizing JS Programs

A Method for Optimizing JS Programs

This article discusses methods to optimize JavaScript programs, including the relationship between digital circuits and programs and Node Sea.

By Zhenzi from F(x) Team

I want to introduce my understanding of programming languages before introducing the method for optimizing JavaScript code. The story begins with a mechanical rat called Theseus and its inventor Claude Shannon.

In the biography A Mind at Play: How Claude Shannon Invented the Information Age, authors Jimmy Soni and Rob Goodman hoped to show Shannon's works, Theseus, to readers. Faced with the complex maze, Theseus only used simple and old-fashioned electronic components (such as relays and ROM) to explore the complex maze and memorize the successful path. Theseus walked out of the maze along the right path for the second time without any mistake. Most people think it is nothing more than a trick and gadget without any value.

However, in the eyes of a few smart people, the amazing wisdom contained in Theseus is almost comparable to Newton and Einstein. Shannon's introduction of Boolean algebra into electronic circuit design inspired the invention of digital circuits and even computers in later generations.

The Relationship between Digital Circuits and Programs

1
Figure 4-44: The Intelligent Dimmable and Color-Changing Circuit I Made in 2011.

Figure 4-44 shows a digital circuit inspired by Shannon. The old-fashioned electronic components in Theseus have become an integrated circuit within the right red box. The integrated circuit is provided with a clock by forming an oscillating circuit with two capacitors and a crystal oscillator in the left red box. There is a digital-to-analog conversion circuit on the integrated circuit, which converts the calculation results of the Boolean algebraic operation circuit represented by the high and low levels into an analog signal output. Therefore, the dimmable and color-changing LED module produces different power output--color and frequency output-- brightness. Inspired by Shannon, the circuit in Figure 4-44 realizes a complete program function. The color and brightness of the LED constantly change any time the clock changes.

If the complete program in Figure 4-44 directly checks the integrated circuit manual, write the corresponding instructions and data according to Figure 4-45 against the pin definition and the address of the register operation in the manual.

2
Figure 4-45: The Integrated Circuit Manual Information Used in My Circuit (Excerpted from the Official Website of ATMEL)

According to this example, with inspiration from Shannon, the biggest advantage of digital circuits is that the circuits are abstracted, so the program logic and digital circuits are unified at this abstract level. At this newly unified abstract level, the control of program logic can be compared to the control in logic circuits, and the input and output of programs can be compared to the storage (ROM and RAM) in digital circuits. A clock circuit provides rulers for signal transmission in digital circuits. The central processing unit controls the transmission and circulation of signals in digital circuits according to these rulers. This transmission and circulation turn the point-like control into a control flow and the point-like storage into a data flow. Therefore, the most important elements in programs are the control flow and data flow.

In order not to check the manual every time and use pins to burn and write ROM to inject controlling instructions and data into digital circuits, the predecessors invented a burning and writing system. The system uses the assembly language or the C language to define and describe the control flow and data flow and then uses a compiler to translate the information into the corresponding register address, instructions, and data to Figure 4-45. Finally, the corresponding register address, instructions, and data are converted into digital signals through the burner and transmitted to the integrated circuit through the integrated circuit pins to complete the burning and writing of programs. This system allows us to get rid of the manual (sometimes, we still need to check the manual but the frequency is significantly reduced), directly use programming languages to describe programs, and complete the debugging and simulation test through a simulator (similar to the frontend MOCK data), making it simple for us to program digital circuits.

3
Figure 4-46: Programming Digital Circuits (Excerpted from the Official Website of ATMEL)

It is much easier to control digital circuits in the way shown in the Figure 4-46. You can buy an Arduino development board on Taobao. Then, you can try to understand what the program is, what a digital circuit is, and what the essence of computing is according to the following code:

unsigned long colorT[] = {  0xff3300,0xff3800,0xff4500,0xff4700,0xff5200,0xff5300,0xff5d00,0xff5d00,0xff6600,0xff6500, 0xff6f00,0xff6d00,0xff7600,0xff7300,0xff7c00,0xff7900,0xff8200,0xff7e00,0xff8700,0xff8300, You can continue to add items.
}
int R_Pin = 11;
int G_Pin = 10;
int B_Pin = 9;
// This is where pins of the integrated circuit output signal in the manual correspond to the LED module connection mode.
int red,green,blue = 0;
int i = 0;
int l = sizeof(colorT);
void setup(){
  pinMode(12, OUTPUT);
  pinMode(R_Pin, OUTPUT);
  pinMode(G_Pin, OUTPUT);
  pinMode(B_Pin, OUTPUT);
  digitalWrite(12, LOW);
}
void setColor(int redValue, int greenValue, int blueValue){
  analogWrite(R_Pin, redValue);
  analogWrite(G_Pin, greenValue);
  analogWrite(B_Pin, blueValue);
}
void  loop(){
  red = (colorT[i] >> 16) & 0xff;
  green = (colorT[i] >> 8) & 0xff;
  blue = (colorT[i] >> 0) & 0xff;
  setColor(red, green, blue);
  i++;
  if(i >= l){
    i = 0;
  }
  delay(200); // Control the clock signal
}

Next, let's look at how to observe the control flow and data flow of JavaScript. As mentioned earlier, the original code text (string) should be processed with a parser. The most common purpose of processing is to generate an abstract syntax tree (AST). The parsing for D2C Schema and DesignToken aims to output the correct complete HTML documents of the inline CSS.

Abstract Syntax Tree (AST)

The reason for converting JavaScript code text into ASTs is that the compiler cannot directly manipulate the program text composed of strings. Only by changing the program text from “1+2” to new BinaryExpression(ADD, new Number(1), new Number(2)) can it be understood by the compiler. How about that? Is it similar to the programming of ATMEL integrated circuits? The operation is ADD, and the data is new Number(1). Therefore, the process from program text to ASTs can be compared to the decoding process of parsing, and the code used for parsing is called parser. Knowing this, we can easily deal with the often discussed program text, which is also called code. We recommend using a tool named esprima to change the code text into an AST, traverse AST nodes, and make some repairs. (Does this optimize performance?) Finally, the modified AST can be converted into code text with escodegen.

// Generate an AST
const esprima = require('esprima');
const AST = esprima.parseScript(jsCode);
// Traverse and modify AST
const estraverse = require('estraverse');
const escodegen = require('escodegen');
function toEqual(node){
  if(node.operator === '=='){
    node.operator = '===';
  }
}
function walkIn(ast){
  estraverse.traverse(ast, {
    enter: (node) => {
      toEqual(node);
    }
  });
}
// Generate code from AST
const escodegen = require('escodegen');
const code = escodegen.generate(ast);

Let's take a piece of real code and practice using the skills above:

acc = 0;
i = 0;
len = loadArrayLength(arr);
loop {
  if (i >= tmp)
    break;

  acc += load(arr, i);
  i += 1;
}

After using the parser provided by esprima to convert this code into an AST, I use a tool named GraphViz to convert the AST from the JSON format to a .gv file in the digraph format and then generate the content in Figure 4-47.

4
Figure 4-47: Visualizing an AST

Data Flow Graph (DFG)

Figure 4-47 is a tree, so it can be traversed easily. It can generate the corresponding machine code when we access the AST nodes. The problem with this approach is that information about variables is very scarce and scattered across different tree nodes. We need to know that the length of the array does not change between loop iterations to optimize and safely remove length lookups out of the loop. Humans can easily do it by looking at the source code, but the compiler needs to do a lot of work to extract these facts directly from the AST. As with many other compiler problems, this is usually solved by lifting the data to a more appropriate abstract layer, which is the intermediate representation (IR). In this particular case, the selection of IR is called a data flow graph (DFG). Instead of talking about grammatical entities (such as for loop, expressions, etc.), we should talk about the data (reading and variable values) and how it changes in programs.

In our particular example, the data we are interested in is the value of the variable arr. We want to be able to easily observe all its uses to verify that there is no out-of-bounds access or any other changes to modify the length of the array, which is the premise of our optimization. This is achieved by introducing a use (definition and use) relationship between different data values. Specifically, this means the value has been declared once (_node_ in Figure 4-47), and it has been used to create a new value (_edge_ in Figure 4-47). Connecting different values together will form a DFG, as shown in Figure 4-48.

5
Figure 4-48: Data Flow Graph

Note: array in the red box in the data flow graph 4-48. The solid arrow leaving it indicates the use of this value. The compiler can export the array values for the three items below by iterating on these edges:

  • loadArrayLength
  • checkIndex
  • load

If you access the values of the array node (storage and length size) in a destructive way, such graphs are constructed by explicitly cloning array nodes. Whenever we see the array node and observe its usage, it is always determined that its value will not change. This may sound complicated, but it is easy to implement, and the DFG follows a rule of single static allocation (SSA). In short, to convert any program to SSA, the compiler needs to rename all assigned values and subsequent use of variables to ensure that each variable is assigned only once.

For example, before SSA:

var a = 1;
console.log(a);
a = 2;
console.log(a);

After SSA:

var a0 = 1;
console.log(a0);
var a1 = 2;
console.log(a1);

After SSA, we are sure that when talking about a0, we are talking about its single task.

Control Flow Graph (CFG)

Using data flow analysis to extract information from programs allows us to make safety assumptions about how to optimize programs. This kind of data flow representation is very useful in many situations. There is only one problem. Converting the code into a DFG makes this IR less suitable for generating machine code than AST in the presentation chain (from source code to machine code). Since program logic is a sequential list of instructions, the CPU executes it one by one, but the DFG does not seem to convey this. Typically, this problem is solved by grouping graph nodes into blocks. This representation is called a control flow graph (CFG).

b0 {
  i0 = literal 0
  i1 = literal 0

  i3 = array
  i4 = jump ^b0
}
b0 -> b1

b1 {
  i5 = ssa:phi ^b1 i0, i12
  i6 = ssa:phi ^i5, i1, i14

  i7 = loadArrayLength i3
  i8 = cmp "<", i6, i7
  i9 = if ^i6, i8
}
b1 -> b2, b3
b2 {
  i10 = checkIndex ^b2, i3, i6
  i11 = load ^i10, i3, i6
  i12 = add i5, i11
  i13 = literal 1
  i14 = add i6, i13
  i15 = jump ^b2
}
b2 -> b1

b3 {
  i16 = exit ^b3
}

As shown in Figure 4-49, we can program a CFG according to the preceding method.

6
Figure 4-49: Control Flow Graph

As you can see, there is code before the loop in block b0, the loop header in b1, the loop test in b2, the loop body in b3, and the exit node in b4. It is very easy to translate from this example into machine code. You can replace the iXX with the CPU register name (like the register address found in the ATMEL manual mentioned earlier) and generate machine code for each instruction line by line.

CFGs have data flow relationships and order, which allows us to use them for data flow analysis and machine code generation. However, trying to optimize CFGs by manipulating the blocks contained therein and their contents can become complex and error-prone. On the contrary, Clifford Click and Keith D. Cooper proposed using a method called Node Sea to eliminate troubles caused by CFGs and complex DFGs.

Node Sea

Do you remember the fancy DFG with the dashed line? These dashed lines are what make the graph a node sea graph. Instead of grouping and sorting the nodes, we chose to declare the control dependencies as edges of dashed lines in the graph. If we delete all elements with unbroken lines and group things slightly, we will get the node sea graph shown in Figure 4-50.

7
Figure 4-50: Node Sea

The node sea in Figure 4-50 is a very powerful way to view code. It has all the information of ordinary DFGs, and it can be easily changed for optimization without constantly deleting/replacing the nodes in the blocks. Node sea graphs are usually modified by graph reduction. We simply queue all the nodes in a graph and call our function for each node in the queue. All the contents (such as change and replace) involved in the function will be put into another queue, which will be passed to the optimization function later. If you have many optimization points (such as merging/reducing network requests, merging/reducing JSBridge calls, merging/reducing API calls of local storage, etc.), you can stack them together and apply them on each node in the queue. If they depend on each other's final state, you can also apply them one by one.

0 0 0
Share on

Alibaba F(x) Team

66 posts | 3 followers

You may also like

Comments

Alibaba F(x) Team

66 posts | 3 followers

Related Products