This topic describes common Linux kernel network parameters and provides solutions for related issues.
Self-service troubleshooting tool
The Alibaba Cloud self-service troubleshooting tool helps you quickly check your kernel parameter configuration and provides a detailed diagnostic report.
Click to open the self-service troubleshooting page and select the target region.
The diagnostic report may show an anomaly such as Inbound rules for common ports are not configured in the security group. Specifically, inbound traffic for the ICMP protocol on port -1 is not allowed, which prevents the instance from responding to PING requests. To fix this, modify the security group rules to allow traffic on the corresponding port.
If the self-service troubleshooting tool cannot identify your issue, proceed with the following steps for manual troubleshooting.
View and modify kernel parameters
Usage notes
Before you modify kernel parameters, note the following:
-
Modify kernel parameters only based on your specific requirements and with supporting data. Avoid making arbitrary adjustments.
-
Understand the purpose of each parameter. Note that kernel parameters can vary depending on the environment and version. For more information, see Common Linux kernel parameters.
-
Back up important data on your ECS instance. For more information, see Create a snapshot.
Modify parameters
You can use both /proc/sys/ and /etc/sysctl.conf to modify kernel parameters while an instance is running. The differences are as follows:
-
The
/proc/sys/directory is a virtual file system that provides access to kernel parameters. Thenetsubdirectory contains all enabled network kernel parameters for the current system. You can modify these parameters at runtime, but the changes do not persist after the instance restarts. This method is typically used to test changes temporarily. -
The
/etc/sysctl.conffile is a configuration file. You can modify the/etc/sysctl.conffile to change the default values of kernel parameters. The changes persist after the instance is restarted.
The files in the /proc/sys/ directory correspond to the parameter names in the /etc/sysctl.conf file. For example, the parameter net.ipv4.tcp_tw_recycle corresponds to the file /proc/sys/net/ipv4/tcp_tw_recycle, and the file's content is the parameter's value.
The tcp_tw_recycle configuration, which includes the net.ipv4.tcp_tw_recycle setting in sysctl.conf, was removed from Linux starting from kernel version 4.12. You can use the net.ipv4.tcp_tw_recycle parameter only if your system is running a kernel version earlier than 4.12.
Using /proc/sys/
-
Log on to the Linux ECS instance.
For more information, see Overview of connection methods for ECS instances.
-
Use the
catcommand to view the content of the corresponding file.For example, run the following command to view the value of
net.ipv4.tcp_tw_recycle:cat /proc/sys/net/ipv4/tcp_tw_recycle -
Use the
echocommand to modify the kernel parameter.For example, run the following command to change the value of
net.ipv4.tcp_tw_recycleto 0:echo "0" > /proc/sys/net/ipv4/tcp_tw_recycle
Using /etc/sysctl.conf
-
Log on to the Linux ECS instance.
For more information, see Overview of connection methods for ECS instances.
-
Run the following command to view all parameters that are currently in effect:
sysctl -aA partial sample output is shown below:
net.ipv4.tcp_app_win = 31 net.ipv4.tcp_adv_win_scale = 2 net.ipv4.tcp_tw_reuse = 0 net.ipv4.tcp_frto = 2 net.ipv4.tcp_frto_response = 0 net.ipv4.tcp_low_latency = 0 net.ipv4.tcp_no_metrics_save = 0 net.ipv4.tcp_moderate_rcvbuf = 1 net.ipv4.tcp_tso_win_divisor = 3 net.ipv4.tcp_congestion_control = cubic net.ipv4.tcp_abc = 0 net.ipv4.tcp_mtu_probing = 0 net.ipv4.tcp_base_mss = 512 net.ipv4.tcp_workaround_signed_windows = 0 net.ipv4.tcp_challenge_ack_limit = 1000 net.ipv4.tcp_limit_output_bytes = 262144 net.ipv4.tcp_dma_copybreak = 4096 net.ipv4.tcp_slow_start_after_idle = 1 net.ipv4.cipso_cache_enable = 1 net.ipv4.cipso_cache_bucket_size = 10 net.ipv4.cipso_rbm_optfmt = 0 net.ipv4.cipso_rbm_strictvalid = 1 -
Modify the kernel parameters.
-
To make a temporary change:
/sbin/sysctl -w kernel.parameter="[value]"NoteReplace
kernel.parameterwith the name of the kernel parameter and[value]with the desired value. For example, run thesysctl -w net.ipv4.tcp_tw_recycle="0"command to change the value of thenet.ipv4.tcp_tw_recyclekernel parameter to 0. -
To make a permanent change:
-
Run the following command to open the
/etc/sysctl.confconfiguration file:vim /etc/sysctl.conf -
Press the
ikey to enter edit mode. -
Modify the kernel parameters as needed.
The following example shows the required format:
net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 -
Press the
Esckey, enter:wq, and press Enter to save the file and exit. -
Run the following command to apply the changes:
/sbin/sysctl -p
-
-
Common network parameter issues
-
Why does the "Time wait bucket table overflow" error message appear in the /var/log/messages log?
-
Why are there many TCP connections in the FIN_WAIT2 state on a Linux ECS instance?
-
Why are there many TCP connections in the CLOSE_WAIT state on a Linux ECS instance?
-
Why can't a client access a server-side ECS or ApsaraDB RDS instance after NAT is configured?
Remote connection failure: "nf_conntrack: table full, dropping packet"
Symptoms
You cannot remotely connect to an ECS instance. Pinging the target instance results in packet loss or failure. The following error message frequently appears in the /var/log/message system log:
Feb 6 16:05:07 i-*** kernel: nf_conntrack: table full, dropping packet.
Feb 6 16:05:07 i-*** kernel: nf_conntrack: table full, dropping packet.
Feb 6 16:05:07 i-*** kernel: nf_conntrack: table full, dropping packet.
Feb 6 16:05:07 i-*** kernel: nf_conntrack: table full, dropping packet.
Cause
ip_conntrack is a Linux module that tracks connection entries for NAT. The module uses a hash table to record TCP established connection entries. When this hash table becomes full, packets for new connections are dropped, which causes the nf_conntrack: table full, dropping packet error.
The Linux system allocates a memory space to maintain each TCP connection. The size of this space is determined by the nf_conntrack_buckets and nf_conntrack_max parameters. The default value of the latter is four times the value of the former. Therefore, we recommend increasing the value of the nf_conntrack_max parameter.
Maintaining system connections consumes a large amount of memory. We recommend that you increase the value of the nf_conntrack_max parameter only when the system is idle and has sufficient memory.
Solution
-
Connect to the instance using VNC.
For more information, see Log on to a Linux instance using password authentication.
-
Modify the value of the
nf_conntrack_maxparameter.-
Run the following command to open the
/etc/sysctl.conffile:vi /etc/sysctl.conf -
Press the
ikey to enter edit mode. -
Modify the value of the
nf_conntrack_maxparameter.For example, change the maximum number of hash table entries to
655350:net.netfilter.nf_conntrack_max = 655350 -
Press the
Esckey, enter:wq, and press Enter to save the file and exit.
-
-
Modify the value of the timeout parameter
nf_conntrack_tcp_timeout_established.For example, change the timeout parameter value to 1,200. The default timeout is 432,000 seconds.
net.netfilter.nf_conntrack_tcp_timeout_established = 1200 -
Run the following command to apply the changes:
sysctl -p
The "Time wait bucket table overflow" error
Symptoms
The "kernel: TCP: time wait bucket table overflow" error message frequently appears in the /var/log/messages of a Linux ECS instance.
Feb 18 12:28:38 i-*** kernel: TCP: time wait bucket table overflow
Feb 18 12:28:44 i-*** kernel: printk: 227 messages suppressed.
Feb 18 12:28:44 i-*** kernel: TCP: time wait bucket table overflow
Feb 18 12:28:52 i-*** kernel: printk: 121 messages suppressed.
Feb 18 12:28:52 i-*** kernel: TCP: time wait bucket table overflow
Feb 18 12:28:53 i-*** kernel: printk: 351 messages suppressed.
Feb 18 12:28:53 i-*** kernel: TCP: time wait bucket table overflow
Feb 18 12:28:59 i-*** kernel: printk: 319 messages suppressed.
Cause
The net.ipv4.tcp_max_tw_buckets parameter controls the number of connections in the TIME_WAIT state that the kernel can manage. When the total number of connections that are in the TIME_WAIT state and connections that are about to transition to the TIME_WAIT state exceeds the value of the net.ipv4.tcp_max_tw_buckets parameter, the "kernel: TCP: time wait bucket table overflow" error message appears in /var/log/messages. The kernel then closes the excess TCP connections.
Solution
You can increase the value of the net.ipv4.tcp_max_tw_buckets parameter as needed. In addition, we recommend that you optimize TCP connections at the application level. This topic describes how to modify the value of the net.ipv4.tcp_max_tw_buckets parameter.
-
Connect to the instance using VNC.
For more information, see Log on to a Linux instance using password authentication.
-
Run the following command to check the number of TCP connections:
netstat -antp | awk 'NR>2 {print $6}' | sort | uniq -cThe following output indicates that 6,300 connections are in the TIME_WAIT state:
6300 TIME_WAIT 40 LISTEN 20 ESTABLISHED 20 CONNECTED -
Run the following command to view the value of the
net.ipv4.tcp_max_tw_bucketsparameter:cat /etc/sysctl.conf | grep net.ipv4.tcp_max_tw_bucketsThe output indicates that the value of the
net.ipv4.tcp_max_tw_bucketsparameter is 20000.net.ipv4.tcp_max_tw_buckets = 20000 -
Modify the value of the
net.ipv4.tcp_max_tw_bucketsparameter.-
Run the following command to open the
/etc/sysctl.conffile:vi /etc/sysctl.conf -
Press the
ikey to enter edit mode. -
Modify the value of the
net.ipv4.tcp_max_tw_bucketsparameter.For example, change the value of the
net.ipv4.tcp_max_tw_bucketsparameter to65535:net.ipv4.tcp_max_tw_buckets = 65535 -
Press the
Esckey, enter:wq, and press Enter to save the file and exit.
-
-
Run the following command to apply the changes:
sysctl -p
High number of connections in FIN_WAIT2 state
Symptoms
A large number of TCP connections on the Linux ECS instance are in the FIN_WAIT2 state.
Cause
This issue can occur for the following reasons:
-
In an HTTP service, the server might proactively close a connection for a specific reason, such as a KEEPALIVE timeout. When the server closes the connection, it enters the FIN_WAIT2 state.
-
The TCP/IP protocol stack supports half-open connections. Unlike the TIME_WAIT state, the FIN_WAIT2 state does not have a timeout. If the client does not close its end of the connection, the connection remains in the FIN_WAIT2 state until the system restarts. An increasing number of FIN_WAIT2 connections can cause the kernel to crash.
Solution
Decrease the net.ipv4.tcp_fin_timeout value to close TCP connections in the FIN_WAIT2 state more quickly.
-
Connect to the instance using VNC.
For more information, see Log on to a Linux instance using password authentication.
-
Modify the value of the
net.ipv4.tcp_fin_timeoutparameter.-
Run the following command to open the
/etc/sysctl.conffile:vi /etc/sysctl.conf -
Press the
ikey to enter edit mode. -
Modify the value of the
net.ipv4.tcp_fin_timeoutparameter.For example, change the value of the
net.ipv4.tcp_fin_timeoutparameter to 10:net.ipv4.tcp_fin_timeout = 10 -
Press the
Esckey, enter:wq, and press Enter to save the file and exit.
-
-
Run the following command to apply the changes:
sysctl -p
High number of connections in CLOSE_WAIT state
Symptoms
A large number of TCP connections on the Linux ECS instance are in the CLOSE_WAIT state.
Cause
This issue can occur when the number of connections in the CLOSE_WAIT state exceeds the normal range.
TCP uses a four-way handshake to terminate a connection. Either end of a TCP connection can initiate the closing request. If the remote peer initiates the close, but the local application does not close its end of the socket, the connection enters the CLOSE_WAIT state. Although this is a half-closed state, the connection is no longer usable for communication and should be terminated promptly.
Solution
We recommend that you investigate your application logic to ensure that it correctly handles connections that have been closed by the remote peer. The application should promptly close its socket and perform checks.
-
Connect to the ECS instance.
For more information, see Overview of connection methods for ECS instances.
-
Check for and close TCP connections that are in the CLOSE_WAIT state within your application.
The read and write functions in most programming languages can detect a connection in the CLOSE_WAIT state. The following examples show how to close a connection in Java and C:
-
Java
-
Use the
read()method to check for the end of the stream. When the method returns-1, it indicates that the peer has closed its end. -
Call the
close()method to close the connection.
-
-
C
Check the return value of the
read()system call.-
If the return value is 0, the peer has closed the connection. You can now close the socket.
-
If the return value is less than 0, check the
errno. If the error is notEAGAINorEWOULDBLOCK, an error has occurred, and you should close the socket.
-
-
Access failure after NAT configuration
Symptoms
After NAT is configured on the client side, the client cannot access server-side ECS or ApsaraDB RDS instances. This includes ECS instances in a VPC that are configured with SNAT.
Cause
This issue can occur if both the net.ipv4.tcp_tw_recycle and net.ipv4.tcp_timestamps parameters are set to 1 on the server.
When the server's kernel parameters net.ipv4.tcp_tw_recycle and net.ipv4.tcp_timestamps are both enabled (set to 1), the server checks the timestamp of each incoming TCP packet. If the timestamp of a new packet is not greater than the last recorded timestamp from that endpoint, the server drops the packet.
Solution
Choose the solution based on the server-side cloud product.
-
If the remote server is an ECS instance, set both the
net.ipv4.tcp_tw_recycleandnet.ipv4.tcp_timestampsparameters to 0 on the ECS instance. -
If the remote server is an ApsaraDB RDS instance, you cannot directly modify its kernel parameters. Instead, you must set both the
net.ipv4.tcp_tw_recycleandnet.ipv4.tcp_timestampsparameters to 0 on the client machine.
-
Connect to the instance using VNC.
For more information, see Log on to a Linux instance using password authentication.
-
Change the values of the
net.ipv4.tcp_tw_recycleandnet.ipv4.tcp_timestampsparameters to 0.-
Run the following command to open the
/etc/sysctl.conffile:vi /etc/sysctl.conf -
Press the
ikey to enter edit mode. -
Change the values of the
net.ipv4.tcp_tw_recycleandnet.ipv4.tcp_timestampsparameters to 0.net.ipv4.tcp_tw_recycle=0 net.ipv4.tcp_timestamps=0 -
Press the
Esckey, enter:wq, and press Enter to save the file and exit.
-
-
Run the following command to apply the changes:
sysctl -p
Common Linux kernel parameters
|
Parameter |
Description |
|
net.core.rmem_default |
The default size of the socket receive buffer, in bytes. |
|
net.core.rmem_max |
The maximum size of the socket receive buffer, in bytes. |
|
net.core.wmem_default |
The default size of the socket send buffer, in bytes. |
|
net.core.wmem_max |
The maximum size of the socket send buffer, in bytes. |
|
net.core.netdev_max_backlog |
Specifies the maximum number of packets that can be queued on the network interface's input queue. This queue holds packets when the network interface receives packets faster than the kernel can process them. |
|
net.core.somaxconn |
A global parameter that defines the maximum length of the listen queue for each port. This parameter is related to |
|
net.core.optmem_max |
Specifies the maximum ancillary buffer size allowed per socket. |
|
net.ipv4.tcp_mem |
Determines how the TCP stack manages memory usage. Each value is in units of memory pages (typically 4 KB).
|
|
net.ipv4.tcp_rmem |
Defines the memory reserved for TCP receive buffers.
|
|
net.ipv4.tcp_wmem |
Defines the memory reserved for TCP send buffers.
|
|
net.ipv4.tcp_keepalive_time |
The interval in seconds between TCP keepalive probes sent to verify that a connection is still active. |
|
net.ipv4.tcp_keepalive_intvl |
The interval in seconds between retries if a keepalive probe is not acknowledged. |
|
net.ipv4.tcp_keepalive_probes |
The maximum number of keepalive probes to send before a TCP connection is considered dead. |
|
net.ipv4.tcp_sack |
Enables Selective Acknowledgment (SACK). A value of 1 enables it. This feature improves performance by allowing the receiver to acknowledge out-of-order packets, so the sender retransmits only the missing segments. This option is recommended for wide area network (WAN) communication but increases CPU usage. |
|
net.ipv4.tcp_timestamps |
Enables TCP timestamps, which add 12 bytes to the TCP header. Timestamps allow for more accurate Round-Trip Time (RTT) calculations than the retransmission timeout mechanism (see RFC 1323). This option should be enabled for better performance. |
|
net.ipv4.tcp_window_scaling |
Enables window scaling as defined in RFC 1323. Set this to 1 to support TCP windows larger than 64 KB, up to a maximum of 1 GB. This option takes effect only if both ends of the TCP connection enable it. |
|
net.ipv4.tcp_syncookies |
This parameter specifies whether to enable TCP SYN cookies (
|
|
net.ipv4.tcp_tw_reuse |
Allows reusing sockets in the TIME-WAIT state for new TCP connections. |
|
net.ipv4.tcp_tw_recycle |
Enables fast recycling of TIME-WAIT sockets. |
|
net.ipv4.tcp_fin_timeout |
The time in seconds that a connection remains in the FIN-WAIT-2 state on the local end after closing a socket. The remote end might disconnect, never close the connection, or terminate unexpectedly. |
|
net.ipv4.ip_local_port_range |
Specifies the range of local port numbers that the TCP/UDP protocols can use. |
|
net.ipv4.tcp_max_syn_backlog |
Determines the maximum number of queued connection requests in the A connection is in the |
|
net.ipv4.tcp_westwood |
Enables the Westwood+ congestion control algorithm on the sender side. It optimizes bandwidth utilization by maintaining an estimate of the available throughput. This option is recommended for WAN communication. |
|
net.ipv4.tcp_bic |
Enables Binary Increase Congestion (BIC) control for fast, long-distance networks. This allows for better utilization of gigabit-speed links. This option is recommended for WAN communication. |
|
net.ipv4.tcp_max_tw_buckets |
Sets the maximum number of sockets in the TIME_WAIT state. If this limit is exceeded, they are immediately closed. The default value depends on the instance's memory, with a maximum of 262,144. |
|
net.ipv4.tcp_synack_retries |
Specifies the number of times to retransmit a SYN+ACK packet for a connection in the |
|
net.ipv4.tcp_abort_on_overflow |
If set to 1, the system sends a reset (RST) packet to terminate connections when the application cannot process a high volume of incoming requests in a short time. We recommend optimizing application performance rather than simply resetting connections. The default value is 0. |
|
net.ipv4.route.max_size |
The maximum size of the kernel routing cache. |
|
net.ipv4.ip_forward |
Enables packet forwarding between interfaces. |
|
net.ipv4.ip_default_ttl |
The default Time To Live (TTL) value for outgoing packets. |
|
net.netfilter.nf_conntrack_tcp_timeout_established |
The timeout in seconds for established TCP connections that have no activity. |
|
net.netfilter.nf_conntrack_max |
The maximum number of entries in the connection tracking table. |