Those hidden corners in Golang

Do we really use recover()?

In a system alarm investigation, it was found that a component, Panic, did not resume operation. The log of Panic was "fatal error: concurrent map writes". At that time, the component could only be restarted manually. When viewing the source code, it is found that the recover() function already exists in the function corresponding to the panic location. The source code logic abstracted is as follows:

package main
import (
"fmt"
)
func concurrentMapWrite() {
defer func() {
if err := recover(); err != nil {
fmt.Printf("Panic occurred due to %+v, Recovered in f", err)
}
}()
m := map[int]int{}
idx := 0
for {
go func() {
m[idx] = 1
}()
idx++
}
}

func main() {
concurrentMapWrite()
}

At that time, it was preliminarily judged that the recover did not capture "fatal error: concurrent map writes". In order to verify my guess, a series of investigations were conducted as follows.

Use recover() in defer

When the Golang program does not run as expected, it will often feed back to the user through "errors" and "exceptions". The former is the error returned when the code logic is wrong, which is expected by the programmer and will not damage the operation of the program; The latter is often caused by unexpected errors in the code, which makes the program unable to continue to run. If it is not handled, the program will exit abnormally, which is very dangerous.

In order to improve the robustness of the program, we need to rely on Golang's recover and defer mechanisms to ensure that the program can continue to run after an exception occurs, avoid accidental exit of the program, and ensure the stable operation of the business.

The contents contained in the defer keyword will be executed before the return of the function it belongs to; The recover function is used to recover goroutine from abnormal scenarios. It takes effect only when called in defer.

Its usage is as follows:

func div(x, y int) int {
return x / y
}
func f() {
defer func() {
if err := recover(); err != nil {
fmt.Printf("Panic occurred due to %+v, Recovered in f", err)
}
}()
fmt.Println(div(1, 0))
}

Defer above Recover is equivalent to the try/catch keyword in Java, which can ensure that the program can not be interrupted by exceptions. We know that using try Catch can catch all types of exceptions, as long as catch is followed by the base class Exception of all exceptions; Then why is Golang not like this?

Unrecoverable panic

Different from try Catch. Not all exceptions can be caught by recover in Golang:

• When an exception is thrown through runtime. panic (), it can be caught by the recover method;
• When an exception is thrown through runtime. throw() or runtime. fatal(), it cannot be caught by the recover method.

The "concurrent map writes" exception encountered in the above actual scenario is thrown through runtime. fatal(). Specific source code (runtime/map. go).

The map uses the flag bit h.flags to check whether the map has concurrent writes. If yes, the fatal method will be called. At this time, the error is "fatal error" and the program will be forced to exit.

It can be seen from the comments of the fatal method that this method is equivalent to the throw method, but only the user level exceptions will be thrown, and the system level exceptions will be thrown by runtime.throw. The fatalthrow method is also called in the fatal method. The annotation of the method clearly indicates: "fatalthrow implements an unrecoverable runtime throw". Therefore, the exceptions thrown by this method are unrecoverable exceptions.

The reason why "concurrent map writes" are regarded as unrecoverable exceptions is that when Golang detects data competition, the internal structure of the map has been damaged. Continuing to run may produce unexpected results, so the program will be forced to end. Some other types of unrecoverable anomalies are listed below:

• Out of memory

• Concurrent map writes

• Stack memory exhaustion

• Attempting to launch a nil function as a goroutine

• All goroutines are asleep - deadlock

• Thread limit exhaustion

What are the pits for slicing expansion?

In the development process, the slice is passed to the function as a parameter, and then the slice content is modified in the function. The corresponding modification is expected to be updated to the actual parameter synchronously. However, in the actual development test, it is found that some scenarios meet expectations, while others do not.

If the capacity of the slice is expanded in the function and the size of the expanded slice does not exceed its original capacity, the existing elements in the slice will be modified, and the modification will be synchronized to the actual parameter slice, but the expansion will not be synchronized to the actual parameter slice; If this slice is expanded in the function and the size of the expanded slice exceeds its original capacity, the modification will not be synchronized to the actual parameter slice, and the expansion will not be synchronized to the actual parameter slice.

Mechanism of chip expansion

Updating the existing elements in the slice parameters in the function will affect the actual parameters

The slice itself maintains a pointer attribute, which is used to point to the collection of some elements in its underlying array. Its structure is as follows:

type slice struct {

Array unsafe. Pointer//Pointer to the underlying array

Len int//Length of the slice

Cap int//Slice capacity

}

The official Golang document states that there is only one way to pass function parameters. The value transfer method will copy the actual parameters to the function when calling the function. When the slice parameter is transferred to the function, its array, len, and cap are copied. Therefore, the slice and the actual parameter slice in the function share the underlying slice array. Changes to the existing elements in the slice in the function will be synchronized to the actual parameter slice.

Slice capacity expansion strategy

The slice can be dynamically expanded by appending elements through the append function. The expanded elements will be stored in the existing storage space of the slice. However, the upper limit of the storage space of the slice is determined by the slice capacity. When the number of expanded elements exceeds the slice capacity, the slice must expand the underlying array to accommodate these elements, We analyze the capacity expansion and slicing strategy of Golang (1.19.2+) at this time through the groveslice method in go/src/runtime/slice. go.

When the total number of elements after capacity expansion exceeds the slice capacity, the capacity expansion strategy is as follows:

• threshold = 256

• If the number of elements after capacity expansion exceeds twice the original capacity, assign the number of elements after capacity expansion directly to the new capacity, otherwise execute the following

• If the original capacity is less than threshold, assign twice the original capacity to the new capacity; otherwise, execute the following

• On the basis of the original capacity, increase (original capacity+threshold * 3)/4 each time until it is no less than the number of elements after capacity expansion

After the expansion strategy is completed and the new capacity value is obtained, memory will be applied based on this value, and then the data in the original array and the expanded data will be copied to the new memory. At this time, the dynamic expansion of the slice will be completed. The formula is as follows:

It can be seen from the above that when the function resizes the parameter slice and the number of elements after resizing exceeds the original slice capacity, the underlying array will be migrated to another memory area, so the update of existing elements of the parameter slice in the function cannot affect the parameter slice.

The above scenario is a grpc communication process abstracted from my actual development and application process, which is also a more general process. The client side passes the context with timeout to the server side. The server side needs to complete the request processing and return the response to the client side within the timeout. If the timeout exceeds, the link will be broken and the client side will not receive any response.

However, in the actual development and application, it is found that even if the context on the server side times out, the request response will still be delivered to the client side sporadically, resulting in an unexpected situation for one of our functions. In order to describe the corresponding interaction process with code, I put simplified example code here to describe the interaction logic at that time.

Grpc timeout transfer process

In the communication process of Golang grpc, the timeout information will be transmitted at different communication terminals, and the transmission medium is Http2 Request Frame. Before sending the request, the grpc client will encapsulate the information in different frames, such as the response payload used by the Data Frame to store the request; The header frame user has some data transferred across goroutine, such as path information. The timeout information is stored in the Header Frame.

After receiving the timeout information, the client server takes the grpc timeout field from the header and creates a new context instance based on the timeout information.

On the grpc client side, the context. Done() will be constantly checked to determine whether the context has timed out. If it has timed out, the link will be disconnected. However, there will also be context timeout traces. For example, the client side context has timed out, but the next round of inspection has not started yet. At the same time, the server side just returns the response information. At this time, although the client side context has timed out, the response from the server side will still be received and processed; The more common case is select {case<- ctx;...; case<- response;...}, which results in a 50% probability that the context timeout is not detected. For details, please refer to the issue I mentioned in grpc go earlier.

Ensure grpc response timeout error

In the error scenario I experienced earlier, the server side context timed out and returned a response to the client side. At this time, the client side expected that it would also timeout and disconnect the link, but in fact, it would successfully receive the response from the client side. Due to the problem of processing logic, the response at that time did not contain timeout errors, so the client side would resend the request after receiving the request. After resending, The context timeout is detected and the link is finally disconnected, resulting in an error.

Therefore, in the application process, it is necessary to ensure that the error message returned in the response is grpc.DeadlineExceeded when the server side context timeout occurs, so that the client side can also perceive the timeout and avoid unnecessary logic.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us