By Wang Yi, Researcher at Ant Group
Not long ago, Xu Shiwei's Go+ project gained popularity on Hacker News, which can be seen by the report at this link. I was interested in this project and became a contributor. Recently, Xu Shiwei and the community started a livestream and asked me to share the reasons why I follow Go+. I finalized this article based on the on-screen comments during the livestream and recommendations of my friends Hong Mingsheng (TensorFlow runtime owner) and Wang Yu (Shen Diaomo).
I have been working on distributed deep learning systems for 13 years. After Xu Wei asked me to take over his original PaddlePaddle project in 2016, my personal experience of Python in industrial systems made me know more about the limitations of Python. So far, Go+ is the most reliable compensation solution.
I wish that Go+ can fully catch up with Python and overcome its shortcomings. In addition, a NumPy-like project (let's call it numgo+ for now) is available to support tensor operations and meet data science requirements. In addition, a PyTorch-like basic deep learning library (let's call it GoTorch for now) is built based on numgo+. Moreover, if possible, Go+ becomes a frontend language for deep learning compilers.
I am working at Ant Group and is responsible for SQLFlow, an open-source SQL compiler that translates SQL programs that support AI with syntax extensions into Python programs. My coworkers said that, "if the Go+ ecosystem becomes mature, they would be happy to use SQLFlow to develop Go+ programs".
You may think that I am talking nonsense, because you have no clue why complements are required for the very popular Python language.
Python has flexible syntax and integrates convenient features of many other languages. Like C++, Python allows operator overloading. The NumPy authors overload arithmetic operators for tensor operations. Like Lisp, the eval function of Python recursively implements the Python interpreter, which can explain and execute Python expressions, so that Python programs can be self-generated.
This flexibility allows programmers to give full play to their strengths, so Python is especially suitable for their exploratory work. For example, postgraduate students use Python for scientific research, and data scientists use it to replace various expensive commercial systems. In the deep learning field, Python also grows rapidly.
Python's advantages also cause some limitations. For myself, I have experienced the following pain points.
Flexible syntax indicates that a program can be written in multiple ways. In modern software engineering, people cooperate with each other instead of working independently. Multiple possible writing methods can easily cause quarrel during code review because no objective selection standard is in place. This problem is common to many other languages, such as Java. To solve this problem, the community defines some design patterns. Before programmers develop programs, they first check whether a design pattern is already available. Therefore, Java programmers must learn not only Java syntax but also design patterns. This is also true for the C++ language. One of the solutions to the preceding problem is a code style defined by Google. This code style determines the syntax that can be used or not. According to Rob Pike's explanation, selecting allowed syntax is the original design intention of Go. Python is too flexible so that its code style cannot be defined in the same way as C++. PEP8 can only specify typesetting requirements and almost does not limit syntax selection. Python cannot define patterns either because there are too many patterns.
To ensure flexibility, Python uses dynamic data types. To determine the return value of a Python function, we must carefully read its code. Python also has syntax extensions that require programmers to specify the input and output data types. However, not many people use the extensions because, after all, they use Python due to its flexibility. If the flexibility is limited, they would turn to other static languages. Due to the flexibility, it is difficult to understand long Python functions. However, Python programmers use Python to give full play to their strengths and do not care whether others understand Python functions they write. In addition, they refuse to refine the granularity of Python functions.
Despite this, you can also find elegantly written Python code, such as Google Tangent. This is a minor project with only two authors. It features a clear code structure. Each function basically contains no more than 10 lines of code, and the code is as long as the annotation. Therefore, it is easy to understand. However, this is contrary to the impression of many Python users. When I was responsible for the PaddlePaddle project, I configured CI to call various tools to check the source code in addition to learning and summarizing Python patterns. However, these tools are not intelligent enough to automatically annotate the code or split long function definitions.
Python has rich syntax and strong flexibility. Therefore, it is complex to write an interpreter and difficult to optimize performance. By contrast, the Go language has simple syntax, better expressiveness than C, and fewer keywords than C. This simplicity makes it easier to optimize the performance of Go programs. Several years after Go was published, the code performance optimization level of the Go compiler quickly approached the optimization level of GCC for C++ programs. Like Python, C++ has rich syntax, so it is difficult to develop code performance optimization functions for the compiler.
Some people tried to replace the Python interpreter with a Python compiler to optimize performance before a program is executed. However, the Python syntax is more flexible than the C++ syntax. Therefore, it is almost impossible to write a compiler that fully supports standard Python syntax. As a result, those attempts quit soon. The current common solution is to use the interpreter for runtime optimization (JIT compilation). This solution is easier than using the compiler due to the runtime information.
In the AI field, training deep learning models consumes a lot of computing resources. The solution based on the TensorFlow graph mode is as follows: Python programs written by programmers are not trained during execution. Instead, the training process is exported as a data structure called a computational graph and then submitted to the TenosrFlow runtime for execution. If the efficiency of TensorFlow runtime execution is guaranteed, the efficiency of the Python interpreter becomes minor.
The TensorFlow graph mode is well-intentioned but also superfluous. Source programs, different layers of intermediate representations (IRs), and binary code have been the expressions for describing the computing process. The computational graph provided by the TensorFlow project in early years is repetitive and non-professional. The graph is difficult to express if-else, cycle, and function definitions and calls, not to mention advanced control flow structures, such as closure, coroutine, and threading. AI engineers' non-professional compiler design makes the LLVM author Chris Lattener feel embarrassed. Therefore, he tries to replace Python with Swift for TensorFlow as the frontend language and replace the TensorFlow computational graph with Multi-Level Intermediate Representation (MLIR). For more information, visit https://www.tensorflow.org/mlir/dialects
When I was responsible for the PaddlePaddle project, I made a self-driving boat attempt with my coworker Chen Xi to verify the Paddle Fluid capabilities. To do this, we wrote an imitation learning method with Fluid to enable a boat to learn driving skills of human drivers. For more information, see series blogs at https://zhuanlan.zhihu.com/p/38395601. In this attempt, if we bring MacBook Pro that runs Python programs aboard, it will be power-consuming. On the other hand, embedded devices are not suitable for running trained Python programs. If we upload data to the server for training after the boat stops, it will be too slow for the boat to learn from human drivers.
Therefore, another coworker Yang Yang wrote Paddle Tape that used C++ to implement PyTorch automatic differentiation (autodiff). Combined with Paddle Fluid's multiple basic computing units (operators) written by C++, the tape becomes a complete C++ deep learning system and has no relationship with Python.
In early 2019, my friend Hong Mingsheng was responsible for the Swift for TensorFlow project at Google. This was another attempt to remove Python from the AI infrastructure. At that time, he asked me to share the story of Paddle Tape and self-driving boat with the Chris Lattener's team. To view the revised slides, visit this link.
I am responsible for ElasticDL at Ant Group, which is an open-source, distributed deep learning training system. During the implementation of this system, I tried to call TensorFlow graph mode, eager execution mode, PyTorch, and Swift for TensorFlow. Also, I was inspired by the Swift for TensorFlow design concepts and the prosperity strategy of the Python ecosystem.
The preceding attempts remind me that we must select a language with clear, simple, stable, and easy-to-learn syntax. Users of a language must also have exploration spirit. Go+ and its user groups based on the Go community ideally meet these requirements.
Before the emergence of Go+, some people attempted to use Go for data science and for implementing tensor operation libraries, such as gonum. However, these implementations are not easy to use as Python programs of NumPy. This is because data types must be specified for Go constants but do not need to be specified for Python constants. For comparison examples, you can visit https://github.com/qiniu/goplus/issues/307
When you use Go to define a constant of the ndarray type, the code is similar to the following:
x := numgo.NdArray(
[][]float64{
{1.0, 2.0, 3.0},
{1.0, 2.0, 3.0}})
However, when you use Python for the same purpose, the code is similar to this:
x = numpy.ndarray(
[[1.0, 2.0, 3.0],
[1.0, 2.0, 3.0]])
With the introduction of Go+ to automatically deduct data types, the code becomes the following, which is almost the same as that in Python.
x := numgo.NdArray(
[[1.0, 2.0, 3.0],
[1.0, 2.0, 3.0]])
Furthermore, Xu Shiwei adds a comment to explain that Go+ is ready to support the tensor definition syntax of MATLAB. With this feature, the program becomes simpler.
x := numgo.NdArray(
[1.0, 2.0, 3.0;
1.0, 2.0, 3.0])
Today, many similar and convenient syntax improvements have been made in Go+. These syntax extensions can greatly simplify data science programming. For more information, visit https://github.com/qiniu/goplus/tree/master/tutorial
The Go+ compiler is responsible for translating Go+ programs written by these syntax extensions into Go programs. In this way, these programs can be compiled together with libraries written in other Go languages to reuse code in the Go ecosystem.
The support for reusing the Go ecosystem is a strength of the Go+ language. During the Go development process, many basic scientific computing technologies have been accumulated, for example, the encapsulation of Go data types that implement tensors. Efficient Go implementations are also available for computing these data types. This is partly because Go programs can easily call C and C++ programs, including the proven basic libraries, such as LAPACK, in the scientific computing field and the CUDA interface library of NVIDIA GPUs. These C and C++ basic libraries are also the foundation of the Python data science ecosystem. Therefore, this article is titled "Go+ Can Efficiently Overcome Python Shortcomings."
In the preceding sections, we have mentioned deep learning technologies. Deep learning is another field where Python is widely used. It is naturally related to data science. For example, the tensor data structure of PyTorch and TensorFlow is the same as ndarray of NumPy. In the deep learning field, compilers is the latest mainstream research direction.
The majority of developers in the Go community are back-end system developers. During the livestream, some audiences posted on-screen comments that they were not AI engineers and did not follow AI. If they did think so, it is an irresponsible attitude to their jobs.
The boundary between a back-end system and an AI system is increasingly blurred. A back-end system is for Internet services. However, the entire Internet economy is built by replacing people with sleepless servers to serve the public, and AI is the basis of this logic.
In addition, the boundary will disappear in the near future. This is because online learning, reinforcement learning, imitation learning, and federated learning will replace supervised learning and become the mainstream technologies for the intelligent Internet, including traditional search, advertising, recommendation, and the emerging autonomous driving and intelligent finance. By then, an AI system will no longer be divided into training and prediction. Accordingly, AI engineers are no longer responsible for training, and back-end engineers are no longer responsible for predication.
In the AI field, an important reason why deep learning surpasses traditional machine learning is as follows: Each traditional machine learning model (the description of the knowledge structure) is trained by using one or more algorithms. However, almost all deep learning models are trained by using stochastic gradient descend (SGD) or its variants. In this way, infrastructure engineers develop training systems, and model researchers reuse the training systems. This significantly relieves the engineering burden for scientific research and improves model R&D efficiency.
The core problem of a deep learning system lies in autodiff, which is determined by the mathematical characteristics of the SGD algorithm. The SGD algorithm performs forward pass and backward pass alternately to extract model parameters from training data. A model plus parameters compose knowledge. When model researchers define a model, they describe the forward pass at the same time. However, it is difficult to manually describe the backward pass. Therefore, a program is required for automatically deducting the backward pass from the forward pass. This automatic deduction is called autodiff.
Currently, two autodiff policies are available. The first policy is derivation at the runtime. It is also known as a dynamic net and tape-based approach. Regardless of the complexity of the forward pass, for example, it may include if-else, cycle, function definitions and calls, and even coroutine and multithreading, the basic idea of this policy is to record basic operations (operators) executed in sequence in a tape. Then, the backward pass is to backtrack the records in the tape and call the derivative operator (gradient operator) of each operation in sequence. PyTorch, TensorFlow eager execution, and Paddle Tape use this policy. This policy has little relationship with the compiler but is related to JIT compilation.
The other policy is to deduct the backward pass before running. To do this, a dedicated autodiff compiler is required. TensorFlow graph mode, Caffe/Caffe2, Paddle Fluid, Google Tangent, Julia, and Swift for TensorFlow use this policy. A compiler typically translates source programs described in the source language into target programs described in the target language. However, TensorFlow graph mode, Caffe/Caffe2, and Paddle Fluid do not introduce the source language and ask users to call the Python library to describe the forward pass. Google Tangent, Julia, and Swift for TensorFlow ask users to use the Python, Julia, and Swift languages, respectively to define functions, describe the forward pass, and translate forward-pass functions into backward-pass functions.
Strictly speaking, Julia's authors have implemented various autodiff solutions, including the one at the runtime, during compilation, or both. When Mingsheng helped me revise this article, he reminded me to add this: For a different vision, where the same language is used to both implement kernels and construct and execute programs or graphs based on the kernels, see this blog. Here, the kernel refers to the implementation of the basic operation unit (operator) of deep learning.
The two autodiff policies during compilation and at the runtime are suitable for Go+ and do not affect Go+ to reuse existing technologies. Like basic libraries, such as LAPACK reused in the data science field, basic operators and gradient operators must also be reused in the deep learning field.
At the runtime, it is simpler to use a tape to implement the autodiff policy. Yang Yang developed the Paddle Tape in a week. However, the autodiff policy during compilation is complex. More than 20 Paddle Fluid developers spent several months to implement the autodiff policies for if-else, cycle, and function definitions and calls on the basis of TensorFlow Yuan Yu's work, which is described in https://arxiv.org/pdf/1805.01772.pdf
These attempts remind us the importance of reusing core technologies of the community. For example, we can replace the computational graph with MLIR to describe complex control flows. A computational graph never can describe goroutine and select. We can use Tensor Virtual Machine (TVM) as the compiler back end and use deep learning technologies to optimize deep learning programs. The outputs of all these technologies are calls to basic operators. From this perspective, operators accumulated in the previous deep learning ecosystem are similar to built-in functions. Hong Mingsheng also repeatedly stressed this when revising this article.
We hope that in the near future, Go+ can be used as a deep learning front-end language like Python, Julia, and Swift to reuse the underlying IRs, compiler back end, and basic operators.
In my opinion, the core strategy of future Go+ projects is to maintain simple syntax of Go and properly add flexibility, but not too flexible like Python or C++.
In addition, exploratory projects, such as numgo+ and GoTorch are developed through cooperation with the community. Enriching the technical ecosystem is a strategic direction of the community. Furthermore, Go+ is used as a frontend language of deep learning compilers to reuse underlying computing technologies of deep learning accumulated by the community over the years.
Finally, I would like to thank the following people for helping revise this article: Xu Shiwei, Go+ core contributors Chai Shushan and Chen Dongpo, an excellent contributor to the Go community Asta Xie, and a core contributor to the ONNX community Zhang Ke.
You are welcome to participate in discussions for any content of this article.
Wang Yi is a researcher at Ant Group and the owner of the open-source SQLFlow and ElasticDL projects. He has been writing code since 10 years old. He once used a welded circuit board to expand the "China educational computer" and transform an old-fashioned Weili twin-tub washing machine into an automatic washing machine. He developed his first computer game by using the Apple BASIC and 6502 Assembly languages. At high school, he taught himself all the computer courses for a regular computer-majored undergraduate, took part in National Computer Rank Examination (NCRE), and obtained the certification of "programmer", "senior programmer", and "system analyst". He has been engaged in artificial intelligence (AI) infrastructure for 13 years, served in several leading global Internet companies, and started businesses in Silicon Valley U.S and Beijing China.
Global Gaming Servers: How to Apply Architecture Designs to Solve Common Issues
2,599 posts | 756 followers
FollowAlibaba Cloud Community - June 10, 2022
降云 - January 12, 2021
Alibaba Cloud Product Launch - December 12, 2018
Alibaba Clouder - November 2, 2020
Alibaba Clouder - July 21, 2020
Apache Flink Community China - April 23, 2020
2,599 posts | 756 followers
FollowA platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreThis technology can be used to predict the spread of COVID-19 and help decision makers evaluate the impact of various prevention and control measures on the development of the epidemic.
Learn MoreThis solution enables you to rapidly build cost-effective platforms to bring the best education to the world anytime and anywhere.
Learn MoreAlibaba Cloud offers an accelerated global networking solution that makes distance learning just the same as in-class teaching.
Learn MoreMore Posts by Alibaba Clouder