Assistant Engineer
Assistant Engineer
  • UID627
  • Fans3
  • Follows0
  • Posts55

A probe into the concurrency mechanism of Go language

More Posted time:May 8, 2017 10:51 AM
Compared with Java and other languages, the Go language has a great advantage in that it facilitates convenient compilation of concurrent programs. Go language has a built-in goroutine mechanism with which you can quickly develop concurrent programs to better leverage multi-core processor resources. This article introduces goroutine's application and its scheduling implementation.
I. Concurrency support in the Go language
Program using goroutine
Create a goroutine using the go keyword. Place the go declaration before the desired function and call and run this function in the same address space, so that the function will be executed in a separate concurrent thread. This thread is called a goroutine in the Go language.
The usage of goroutine is as follows:
//Create a new goroutine and execute the method body before placing the go keyword in the method for call
go GetThingDone(param1, param2);

//Create a new anonymous method and execute it
go func(param1, param2) {
}(val1, val2)

//Create a new goroutine directly and execute the code block in the goroutine
go {
    //do someting...

Because goroutines are in parallel in a multi-core CPU environment, if the code block is executed in multiple goroutines, we implement parallel execution of the code.
How can we get the parallel execution results if we want to understand the implementation status of the program? This needs to be done with the help of channels.
Control concurrency using channels
Channels are used to synchronize concurrent execution functions and provide them with a value transmission mechanism.
The element type, container (or buffer) and transmission direction passed through the channel are specified by the "<-" operator.
You can use the built-in make function to allocate a channel:
i := make(chan int)       // by default the capacity is 0
s := make(chan string, 3) // non-zero capacity

r := make(<-chan bool)          // can only read from
w := make(chan<- []os.FileInfo) // can only write to

Configure runtime.GOMAXPROCS
With the following code, you can explicitly set whether to use multiple cores to execute concurrent tasks:

The number of GOMAXPROCS can be allocated according to the amount of tasks, but it must not be greater than the number of CPU cores.
The configuration of parallel execution is more suitable for CPU-intensive scenarios with a relatively high degree of parallelism. If multiple cores are used in an I/O-intensive scenario, the performance loss brought about by CPU switching will be increased.
Having understood the concurrency mechanism of the Go language, next we can look at the specific implementation of the goroutine mechanism.
II. Differentiate parallelism from concurrency
Process, thread and processor
In modern operating systems, threads are the basic unit for processor scheduling and distribution, and processes are the basic unit of the resource ownership.
Each process consists of the private virtual address space, code, data, and various other system resources. A thread is an execution unit within a process.
Each process has at least one main execution thread. You don't need to take the initiative to create a main thread, but the system will create it automatically.
You can create other threads in the applications as needed and the multiple threads run concurrently in the same process.
Parallelism and concurrency
Parallelism and concurrency are two different concepts. Understanding them is very important for understanding multithreaded models.
When describing the concurrency or parallelism of a program, it should be specified whether the description is from the process or the thread perspective.
• Concurrency: There are a lot of threads or processes in execution within a time period, but there is only one of them in execution at any point of time. Multiple threads or processes scramble for time slices for alternate execution
• Parallelism: There are multiple threads or processes in execution within a time period and at a time point
Non-concurrent programs have only one vertical control logic. At any time, the program can only be in a position in this control logic, that is, sequential execution. If a program is processed at the same time by multiple CPU pipelines, we say that the program is running in parallel.
Parallelism requires hardware support. Single-core processor can only support concurrency, and only multi-core processors can support parallel execution.
• Concurrency is a necessary condition for parallelism. If a program itself is not concurrent, that is, there is only one logical execution order, we cannot make it process in parallel.
• Concurrency is not a sufficient condition for parallelism. If a concurrent program is only processed by a CPU (by time sharing), it is not in parallel.
For example, the preparation of the simplest program of outputting "Hello World" is non-concurrent. If multiple threads are added in the program, with each thread printing a "Hello World", the program will be concurrent. If only one CPU is allocated for this program during its execution, the concurrent program is not in parallel. Parallelism can be achieved when the program is run in an operating system with multi-core processors.
III. Several different multi-threaded models
User-level thread and kernel-level thread
Thread implementation can be divided into two categories: user-level thread (ULT) and kernel-level thread (KLT). The user-level thread is supported by the user code, and the kernel-level thread is supported by the operating system kernel.
Multi-threaded model
Multi-threaded models refer to the different connection ways of user-level threads and kernel-level threads.
(1) M-to-one model (M:1)
Multiple user-level threads can be mapped to a kernel-level thread, and thread management is completed in the user space.
In this model, user-level threads are not visible to the operating system (that is, user-level threads are transparent).
The advantage of this model is that the thread context switching takes place in the user space, avoiding mode switching, which has a positive effect on performance.
Disadvantages: All threads are based on one kernel scheduling entity, that is, the kernel-level thread, which means that only one processor can be used. This is not acceptable in a multi-processing environment. In essence, the user-level thread only solves the concurrency issue, but does not solve the parallelism issue.
If the thread sinks into the kernel mode because of I/O operations, and the kernel-mode threads are congested waiting for I/O data, all threads will be congested. The user space can also use non-congested I/O, but there will still be the performance and complexity issues.
(2) One-to-one model (1:1)
Every user-level thread is mapped to a kernel-level thread.
Each thread is scheduled independently by the kernel scheduler, so the congestion of a thread will not influence other threads.
Advantages: With the hardware support of multi-core processors, the kernel space thread model supports true parallelism. If a thread is congested, another thread can continue the execution. Therefore, the concurrency capability is strong.
Disadvantages: A kernel-level thread will be created to correspond to each user-level thread created. As a result, the thread creation overhead will be high, which may affect the application performance.
(3) M-to-N model (M:N)
The number of kernel-level threads to the number of user-level threads is M:N, and the kernel user space integrates the advantages of the first two.
This model requires the interoperation of the kernel-level thread scheduler and the user space thread scheduler. Essentially multiple threads are bound to multiple kernel-level threads, which makes the context switching of most of the threads occur in the user space, and the adoption of multiple kernel-level threads can take full advantage of processor resources.
IV. Scheduling implementation of the goroutine mechanism
The goroutine mechanism implements the M:N threading model. The goroutine mechanism is an implementation means of the coroutine. The golang built-in scheduler allows each CPU in the multicore CPU to execute a coroutine.
The key to understanding the principles of the goroutine mechanism is to understand the implementation of the Go language scheduler.
How does the scheduler work
There are four important structures in the Go language to support the entire scheduler implementation, namely M, G, P, and Sched.
The first three definitions are in runtime.h and Sched is defined in proc.c.
• The Sched structure is a scheduler. It maintains some queues that store M and G and some status information of the scheduler.
• The M structure is a machine. It is a system thread and managed by the operating system. The goroutine runs on M. M is a very large structure which maintains a lot of information such as the small object memory cache (mcache), the currently executed goroutine, and the random number generator (mcache).
• The P structure is a processor. Its main purpose is to execute the goroutine. It maintains a goroutine queue, that is, runqueue. Processor is an important part of scheduling from N:1 to M:N.
• G is the core structure of the goroutine implementation. It contains the stack, instruction pointer, and other very important information for goroutine scheduling, such as its congested channels.
The number of processors is the GOMAXPROCS value set as the environmental variable at startup, or it can be set by calling the GOMAXPROCS() function during the runtime. The fixed number of processors means that only GOMAXPROCS threads are running the go code at any time.
Refer to this widely-spread blog: http://morsmachine.dk/go-scheduler
We can represent Machine Processor and Goroutine with triangles, rectangles and circles respectively.
In the single-core processor scenario, all goroutines run in the same M system thread, and each M system thread maintains a processor. At any time point, only one goroutine exists in a processor, and other goroutines are waiting in the runqueue. After a goroutine exhausts its own time slices, it releases the context and goes back to the runqueue.
In the multiple-core processor scenario, each M system thread will hold a processor in order to run goroutines.
Under normal circumstances, the scheduler will be scheduled following the procedures above - however, the thread may get congested. For details, refer to the processing methods of goroutine on thread congestion.
Thread congestion
When a running goroutine is congested, such as during a system call, another system thread (M1) will be created, and the current M thread will abandon its processor which is switched to a new thread to run.
The runqueue execution completed
When the runqueue of one of the processors is empty, and there is no goroutine to be scheduled, it will steal half of the goroutines from another context.
V. More thoughts on concurrency implementation
There is much worth exploring about the concurrency mechanism of the Go language, such as the different implementations of the Go language and Scala, and the comparison between the Golang CSP and the Actor model.
Understanding these implementations of concurrency mechanisms can help us to better develop concurrent programs and optimize performance.
To learn more about the three multi-threaded models, you can look at the implementation of the Java language.
We know that Java encapsulates the differences of underlying operating systems through JVM, and different operating systems may adopt different threading models. For example, the Linux and Windows may use the one-to-one model, and some versions of Solaris and Unix may use many-to-many models. The JVM specification does not specify a concrete implementation of multithreaded models, and 1:1 (kernel thread), N:1 (user-mode thread), and M:N (mixed) models all apply. Speaking of the multi-threaded models in the Java language, the implementation should be JVM-specific. For example, the Oracle/Sun HotSpot VM uses the 1:1 thread model by default.