All Products
Search
Document Center

Alibaba Cloud Linux:System configuration optimization

Last Updated:Nov 24, 2025

This topic describes the optimized configurations and common system configuration parameters for Alibaba Cloud Linux 3. You can adjust kernel parameters to suit specific business scenarios.

Operating system limits

Alibaba Cloud Linux 3

Important

Before you modify kernel parameters, note the following:

  • Adjust parameters only as needed and with supporting data. Do not adjust kernel parameters arbitrarily.

  • Understand the function of each parameter. Kernel parameters can differ between environment types and versions.

Optimized configurations for Alibaba Cloud Linux 3

The following system configuration parameters are optimized in Alibaba Cloud Linux 3.

Performance improvement

Configuration item

Value

Description

net.ipv4.tcp_timeout_init

1000

The initial TCP retransmission timeout.

The minimum value is 2 HZ.

Important

This is a custom feature developed for Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. This system configuration is deprecated in Alibaba Cloud Linux 4 and later.

net.ipv4.tcp_synack_timeout_init

1000

The initial timeout for SYN-ACK messages.

The minimum value is 2 HZ.

After the first retransmission, the timeout period doubles.

Important

This is a custom feature developed for Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. This system configuration is deprecated in Alibaba Cloud Linux 4 and later.

net.ipv4.tcp_synack_timeout_max

120000

The maximum SYN-ACK timeout.

The minimum value is 2 HZ.

When a SYN-ACK message is retransmitted, the retransmission timeout (RT) doubles with each attempt from the initial value set by tcp_synack_timeout_init, up to the maximum value defined by tcp_synack_timeout_max.

Important

This is a custom feature developed for Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. This system configuration is deprecated in Alibaba Cloud Linux 4 and later.

net.ipv4.tcp_ato_min

40

The ACK message timeout.

This sysctl parameter lets you flexibly control the ACK timeout period.

Valid values: 4 ms to 200 ms.

Important

This is a custom feature developed for Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. This system configuration is deprecated in Alibaba Cloud Linux 4 and later.

net.ipv4.tcp_init_cwnd

10

The initial size of the TCP congestion window.

Important

This is a custom feature developed for Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. This system configuration is deprecated in Alibaba Cloud Linux 4 and later.

net.ipv4.tcp_synack_retries

2

The number of times the server retries sending a SYN-ACK message if it does not receive the final ACK message.

If the network quality is good, three retries take about 7 seconds.

net.ipv4.tcp_slow_start_after_idle

0

Specifies whether to re-initiate a slow start after a TCP connection becomes idle. Valid values:

  • 1: Yes.

  • 0: No.

/sys/kernel/mm/transparent_hugepage/hugetext_enabled

0

Controls the code enormous pages feature. Valid values:

  • 0: Disables code enormous pages.

  • 1: Enables enormous pages for binaries and dynamic libraries only.

  • 2: Enables executable anonymous enormous pages only.

  • 3: Enables both types of enormous pages mentioned above.

You can enable Hugetext to reduce iTLB miss in programs and improve performance for workloads with large code segments, such as databases and large applications.

Important

This is a custom feature developed for Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed.

Resource utilization improvement

Configuration item

Value

Description

net.ipv4.tcp_syn_retries

4

The number of times the client retries sending a SYN message if it does not receive a SYN-ACK message.

With an initial retransmission timeout of 1 second, four retransmissions take about 15 seconds, and the final timeout occurs after about 31 seconds.

net.ipv4.tcp_retries2

8

Affects the total retransmission timeout for an active TCP connection that has not received an ACK message.

With an initial RTO of 200 ms, eight retransmissions take about 51 seconds, and the final timeout occurs after about 102 seconds.

net.ipv4.tcp_tw_timeout

60

The timeout period of a TCP socket in the TIME_WAIT state.

Valid values: 1 second to 600 seconds.

For more information, see Modify the TCP TIME-WAIT timeout period.

Important

This is a custom feature developed for Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. This system configuration is deprecated in Alibaba Cloud Linux 4 and later.

net.ipv4.tcp_max_tw_buckets

5000

The maximum number of TCP connections allowed in the TIME-WAIT state.

Connections in the TIME_WAIT state occupy the port range that a client uses to establish connections to a server. The maximum number of ports that can be used to connect to the same server ip:port is determined by the net.ipv4.ip_local_port_range parameter. When many connections in the TIME_WAIT state occupy the client ports, the client fails to allocate ports when calling the connect() function. Ultimately, the client fails to establish a connection. For more information, see Why do many "TCP: time wait bucket table overflow" errors occur on a Linux ECS instance?.

Network security

Configuration item

Value

Description

net.ipv4.conf.all.rp_filter

0

Controls reverse path filtering for all current network interface cards (NICs). Valid values:

  • 0: Disables reverse path filtering.

  • 1: Enables strict reverse path filtering. The NIC verifies each incoming packet. If the reverse path of the packet does not match the receiving interface, the packet is discarded.

  • 2: Enables loose reverse path filtering. The NIC checks if the source address of each incoming packet is reachable. If the reverse path is reachable through any interface, the check passes. Otherwise, the packet is discarded.

Warning

If this parameter is set to 1, packet loss may occur. In a system with multiple NICs, packets are discarded if their inbound and outbound NICs are different. Therefore, do not enable this setting in a multi-NIC system.

net.ipv4.conf.default.rp_filter

0

Controls reverse path filtering for new NICs. Valid values:

  • 0: disables the reverse path filtering feature.

  • 1: enables strict check of the reverse path filtering feature. After the check is enabled, a NIC verifies each incoming packet. If the port in the reverse route of the packet does not match the inbound port, the check fails, and the packet is discarded.

  • 2: enables loose check of the reverse path filtering feature. A NIC checks whether the source address is reachable for each incoming packet. If the reverse route is reachable through any port, the check is passed. If the check fails, the packet is discarded.

Warning

If this parameter is set to 1, packet loss may occur. If several NICs are equipped and the inbound port and outbound port reside on different NICs, the packets are discarded. Therefore, do not enable the reverse path filtering feature for a multi-NIC system.

net.ipv4.conf.default.arp_announce

2

Controls the selection of the source IP address in ARP requests for new NICs. Valid values:

  • 0: Default behavior. Uses any local address on any interface as the source IP address.

  • 1: Strict mode. Tries to use a source IP address from the same subnet as the destination IP address. If no such address is available on the outgoing interface, the primary address of the interface is used.

  • 2: most strict mode. The kernel must use an IP address of the outbound port as the source IP address. The kernel selects an IP address as the source IP address using the procedure described in strict mode (arp_announce=1). If the kernel cannot find an available source IP address, the kernel uses the IP address of the port that sends the ARP requests as the source IP address. If the IP address of the port that sends the ARP requests is unavailable, no ARP request is sent.

net.ipv4.conf.all.arp_announce

2

Controls the selection of the source IP address in ARP requests for all current NICs. Valid values:

  • 0: default mode. By default, the system uses any available IP address as the preferred source IP address. In this mode, when the system sends an ARP request, the kernel may select an available local address at random as the source IP address in the ARP request.

  • 1: strict mode. In most cases, the kernel does not use an IP address from a subnet different than the subnet of the destination IP address as the source IP address. The kernel selects an IP address within the same subnet as the destination IP address as the source IP address. If no such address is available, the kernel uses the primary IP address of the port.

  • 2: most strict mode. The kernel must use an IP address of the outbound port as the source IP address. The kernel selects an IP address as the source IP address by using the procedure described in strict mode (arp_announce=1). If the kernel cannot find an available source IP address, the kernel uses the IP address of the port that sends the ARP requests as the source IP address. If the IP address of the port that sends the ARP requests is unavailable, no ARP request is sent.

net.ipv4.tcp_syncookies

1

Controls SYN flood attack protection. Valid values:

  • 0: Disables TCP kernel SYN flood protection.

  • 1: Enables TCP kernel protection to prevent SYN flood attacks.

  • 2: Unconditionally enables TCP kernel protection to prevent SYN flood attacks. This is used for testing scenarios.

Other common system configurations for Alibaba Cloud Linux 3

Performance improvement

Configuration item

Default value

Description

net.ipv4.ip_local_port_range

32768 60999

The port number range.

When a client establishes a connection, the TCP/UDP protocol allows the local port number to be modified as needed. When most of the ports in this range are occupied, the kernel's linear search for a new port may cause high CPU utilization.

net.ipv4.tcp_rmem

4096 131072 6291456

The size of the recvbuf for a single TCP socket. Unit: bytes.

The initial value is independent of the instance type. The first value is the minimum size, the second is the default size, and the third is the maximum size. Increase these values based on the memory usage of the instance.

net.ipv4.tcp_wmem

4096 16384 4194304

The size of the sendbuf for a single TCP socket. Unit: bytes.

The initial value is independent of the instance type. The first value is the minimum size, the second is the default size, and the third is the maximum size. Increase these values based on the memory usage of the instance.

net.core.netdev_max_backlog

1000

The percpu parameter specifies the maximum length of the socket buffer (skb) queue.

These are cache queues primarily used for receive packet steering (RPS) or intra-host communication, such as loopback or veth.

net.core.somaxconn

4096

The maximum length of a Listen backlog queue for a single socket.

For applications such as NGINX that handle many short-lived connections, we recommend that you increase the value.

net.core.rmem_max

212992

The maximum recvbuf value allowed for a single socket.

This parameter is mainly used when the kernel needs to handle many connections over a single UDP socket.

In TCP, this option is used only when you call setsockopt() to configure SO_RCVBUF. The recvbuf value that you specify cannot exceed this value. If you do not call setsockopt(), the tcp recvbuf parameter is limited only by the net.ipv4.tcp_rmem parameter.

net.core.wmem_max

212992

The maximum sendbuf value allowed by a single socket.

This is mainly used to handle many connections on a single UDP socket.

In TCP, this option is used only when you call the setsockopt() function to configure SO_SNDBUF. The sendbuf value that you specify cannot exceed this value. If you do not call the setsockopt() function, the tcp sendbuf parameter is limited only by the net.ipv4.tcp_rmem parameter.

/sys/block/<device>/queue/nomerges

0

Specifies whether the merge attribute is disabled for the device. Valid values:

  • 0: Enables any type of merge.

  • 1: disables the complex merge feature and enables the simple one-shot merge feature.

  • 2: disables all modes of the merge feature.

When I/O read and write addresses are contiguous, the kernel I/O protocol stack merges multiple I/O operations into a single large I/O using the merge feature and then sends it to the hardware for processing, greatly improving the hardware's I/O performance.

If I/O read and write addresses are random, the chance of merging I/O operations is low. However, checking whether a merge can be performed consumes CPU cycles and affects performance. You can disable the merge feature of the device to improve performance.

/sys/block/<device>/queue/read_ahead_kb

4096

The read-ahead size of the device file system.

The kernel sets the default value to 128 KB. The tuned service increases the value to 4,096 KB.

If most of the I/O load consists of random I/O, lower this value (for example, to 128 KB) to improve business performance.

/sys/block/<device>/queue/rq_affinity

1

Specifies which CPU processes the I/O completion interrupt. Valid values:

  • 0: The I/O completion interrupt is processed on the CPU that triggered the interrupt.

  • 1: The I/O completion interrupt is sent to the group that contains the CPU that submitted the I/O request. A group generally refers to a single socket, and the CPUs on that same socket share a cache. With this setting, the I/O completion interrupt is actually sent to the first CPU of the group. The advantage of this setting is that it is cache-friendly and processing is faster. The disadvantage is that the first CPU in the group experiences high pressure when I/O operations are frequently performed.

  • 2: The I/O completion interrupt is sent to the CPU that submitted the I/O request. The advantage of this setting is that the loads of CPUs are relatively balanced. The disadvantage is that the efficiency is lower than the preceding setting.

Adjust the sending method based on the I/O pressure of the system.

/sys/block/<device>/queue/scheduler

mq-deadline (single queue) or

none (multiple queues)

The I/O device scheduler.

Alibaba Cloud Linux 3 supports the following schedulers: mq-deadline, kyber, bfq, and none.

By default, the blk-mq device selects mq-deadline if a single queue exists and none if multiple queues exist.

In most cases, the default settings are used. If you have special requirements, such as low read latency, you can switch to the kyber scheduler and specify a corresponding latency value.

/sys/kernel/mm/pagecache_limit/enabled

0

Specifies whether to enable the Page Cache limit feature in the Linux kernel. Valid values:

  • 0: disables the Page Cache limit feature.

  • 1: enables the Page Cache Limit feature.

Important

This is a custom feature developed for Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed.

/sys/fs/cgroup/memory/memory.pagecache_limit.enable

0

Specifies whether to enable the Page Cache Limit feature for memcg. Valid values:

  • 0: disables the Page Cache limit feature for the current memcg.

  • 1: enables the Page Cache Limit feature for the current memcg.

/sys/fs/cgroup/memory/memory.pagecache_limit.size

0

Specifies the limit on Page Cache usage for the current memcg in bytes.

Valid values: 0 to the memory.limit_in_bytes value of the current memcg. You can specify the memory.limit_in_bytes value.

  • 0: Disables the Page Cache Limit feature for the current memcg, regardless of the global and memcg switch settings.

  • A non-zero value: specifies the upper limit of Page Cache usage for the current memcg tree.

Network security

Configuration item

Default value

Description

net.ipv4.conf.all.arp_ignore

0

Controls whether the system responds to external ARP requests received by all current NICs. Valid values:

  • 0: Responds to ARP requests for any local IP address, including addresses on loopback interfaces, regardless of the receiving NIC.

  • 1: Responds only if the target IP address is configured on the receiving NIC.

  • 2: Responds only if the target IP address is configured on the receiving NIC, and the source IP address of the request is in the same subnet as the NIC.

For example, when eth0 receives an ARP request for the IP address of eth1, it sends an ARP Reply if the value is 0. If the value is 1 or 2, it does not reply because the destination IP address in the request does not belong to the receiving network interface.

net.ipv4.conf.default.arp_ignore

0

Controls whether the system responds to external ARP requests received by new NICs. Valid values:

  • 0: sends ARP replies in response to ARP requests destined for the local IP addresses including the loopback IP address, regardless of whether the destination IP addresses belong to the NIC that receives the ARP requests.

  • 1: sends ARP replies in response to ARP requests whose destination IP addresses belong only to the local NIC that receives the ARP requests.

  • 2: sends ARP replies in response to ARP requests only destined for the local IP address of the NIC that receives the ARP requests, provided that the source IP address is in the same CIDR block as the NIC.

For example, eth0 receives an ARP request whose destination IP address belongs to eth1. If net.ipv4.conf.all.arp_ignore is set to 0, eth0 sends an ARP reply. If net.ipv4.conf.all.arp_ignore is set to 1 or 2, eth0 does not reply to the ARP request. This is because the destination IP address of eth1 in the received ARP request does not belong to the same CIDR block as eth0.

net.ipv4.ip_forward

0

Specifies whether to enable the IPv4 forwarding feature. Valid values:

  • 0: Disables IP forwarding.

  • 1: Enables IP forwarding.

Resource utilization

Configuration item

Default value

Description

net.ipv4.tcp_fin_timeout

60

For the party that actively disconnects the socket connection, the amount of time that a TCP connection remains in the FIN-WAIT-2 state. Unit: seconds.

This is used to wait for the peer to close the connection or to receive data from the peer. Modify this value to speed up the system's closing of TCP connections in the FIN-WAIT-2 state.

In actual business scenarios, you can run the netstat -ant | grep FIN_WAIT2 | wc -l command to view the number of TCP connections in the FIN-WAIT-2 state. In most cases, we recommend that you use the default setting of 60 seconds. If many TCP connections are in the FIN-WAIT-2 state, you can reduce the value to speed up the termination of the TCP connections. For more information, see Why does a Linux ECS instance have many TCP connections in the FIN_WAIT2 state?

net.ipv4.tcp_tw_reuse

2

Specifies whether to allow TCP connection establishment over sockets in the TIME-WAIT state. Valid values:

  • 0: Off.

  • 1: Globally enabled.

  • 2: This is enabled only for loopback.

net.ipv4.tcp_keepalive_time

7200

The interval at which keepalive messages are sent when the keepalive feature is enabled. Unit: seconds.

This is used to confirm whether a TCP connection is valid.

System limits

Configuration item

Default value

Description

fs.aio-max-nr

65536

The maximum number of concurrent Linux asynchronous I/O (AIO) requests.

The value of this parameter depends on how heavily the system uses Linux aio. For example, database and search scenarios require a large aio-max-nr value.

aio-max-nr works with aio-nr. aio-nr is obtained by accumulating the values of the first parameter of the io_setup(unsigned nr_events, aio_context_t *ctx_idp) system call. If aio-nr + nr_events > aio-max-nr, the io_setup() call returns a -EAGAIN error. Therefore, you should set an appropriate value for aio-max-nr in your business environment by observing the value of aio-nr.

fs.file-max

Set based on the reserved memory size during system initialization.

The maximum number of file handles that the system allows.

Up to 10% of the reserved memory can be used by file handles during system initialization. The default value of this parameter must be greater than or equal to the NR_FILE value of 8,192.

Use the default value if you have no special requirements.

fs.nr_open

1048576

The maximum number of open file handles that a process is allowed.

The limit for an application depends on resource limit and RLIMIT_NOFILE. In most cases, you can run the ulimit -n command to set the nr_open value, but the value cannot exceed the fs.nr_open value.

Monitoring

Configuration item

Default value

Description

net.netfilter.nf_conntrack_max

262144

The maximum number of connections supported by the hash table in the nf_conntrack module.

The default value is calculated using the following formula: net.netfilter.nf_conntrack_max = 4 × net.netfilter.nf_conntrack_buckets.

For more information, see What do I do if applications on an ECS instance occasionally experience packet loss and the kernel log (dmesg) contains the "kernel: nf_conntrack: table full, dropping packet" error message?.

net.netfilter.nf_conntrack_tcp_timeout_time_wait

120

The timeout period of a TCP connection in the TIME_WAIT state in the nf_conntrack module. Unit: seconds.

net.netfilter.nf_conntrack_tcp_timeout_established

432000

The timeout period after which an established TCP connection is closed by iptables due to inactivity. Unit: seconds.

fs.inotify.max_queued_events

16384

The maximum queue length of pending events of the inotify mechanism.

inotify is an infrastructure provided by the kernel to monitor both files and directories. When a directory is monitored, inotify can monitor the directory itself and the file system events of files within that directory.

Use the default value if you have no special requirements.

fs.inotify.max_user_instances

128

The maximum number of inotify instances that can be created.

This parameter is used to prevent excessive consumption of system resources such as memory due to the creation of excess inotify instances.

Use the default value if you have no special requirements.

fs.inotify.max_user_watches

8192

The maximum number of watches that can be added.

A watch is a term in inotify that describes an object that you need to monitor. A watch usually contains two components. One is a path name that points to a file or directory to be monitored. The other is a combination of events to monitor for that file or directory, such as file access events. You can use a mask to describe the combination of events.

/sys/block/<device>/queue/hang_threshold

5000

Detects I/O operations that do not return for a long time during system operation. Unit: ms.

You can modify this based on specific business scenarios. For more information, see Detect I/O hangs in the file system and block layer.

Important

This is a custom feature developed for Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed.