All Products
Search
Document Center

Microservices Engine:Why do a large number of connections in the TIME_WAIT state exist on my client after I use MSE instances?

Last Updated:Aug 03, 2023

Problem description

When Microservices Engine (MSE) instances are used, a large number of connections in the TIME_WAIT state exist on the client. This issue does not occur when self-managed Nacos instances are used.

Possible causes

The client establishes a large number of concurrent short-lived connections to the Server Load Balancer (SLB) instance and is responsible for closing the connections.

If a large number of connections in the TIME_WAIT state exist only after you use MSE instances and the access mode remains unchanged, your client may have experienced rapid reclaim or reuse of TIME_WAIT sockets. In this case, this issue may be caused by the following TCP kernel parameters:

  • net.ipv4.tcp_tw_recycle = 1

  • net.ipv4.tcp_tw_reuse = 1

  • net.ipv4.tcp_timestamps = 1

The captured packets show that this issue occurs because the SLB instance deletes the net.ipv4.tcp_timestamps parameter.

Solutions

For clients that do not have TCP timestamp information, replace short-lived connections with persistent TCP connections.

Note

To resolve this issue, you must also pay attention to the following parameters that specify TIME_WAIT-related thresholds:

  • net.ipv4.ip_local_port_range: specifies the maximum number of source ports.

  • net.ipv4.tcp_max_tw_buckets: specifies the maximum number of buckets in the TIME_WAIT state.

  • max open files: specifies the maximum number of file descriptors.

  • If the number of connections in the TIME_WAIT state is significantly smaller than the thresholds specified by the preceding parameters, ignore this issue.

  • If the number of connections in the TIME_WAIT state is close to the values of the thresholds specified by the preceding parameters, try to use one of the following solutions:

    • Increase the threshold values of the client machine to ensure that the number of connections in the TIME_WAIT state is smaller than half of each threshold value.

    • If you use Nacos Client 1.x, upgrade it to Nacos Client 2.x that supports persistent connections.