All Products
Search
Document Center

Influence of TCP congestion control algorithm BBR on network performance in Aliyun Linux 2 system

Last Updated: Apr 29, 2020

Problem Description

If the following conditions are met, the algorithm for kernel TCP congestion is set to BBR by default in the console. BBR works when the CPU usage is high and the packet forwarding rate (PPS) is high: this may affect the network performance. For example, the performance of the Redis database is reduced.

  • Image: aliyun_2_1903_64_20G_alibase_20190619.vhd and all previous image versions.
  • Kernel: kernel-4.19.48-14.al7 and all previous kernel versions.

Cause of problem

The TCP congestion control kernel of the Aliyun Linux 2 system currently supports three algorithms: Reno, BBR, and cubic. The control performance is different in different network scenarios. The BBR algorithm estimates the BW (throughput) and RTT (latency) of the current connection to adjust the congestion window. The BBR algorithm relies on the TCP encapsulation feature. TCP encapsulation is implemented in two ways:

  • If network interface controller Nic uses qdisc for tc-fq scheduling, the stream-based encapsulation in tc-fq scheduling is directly reused.
  • If the Nic device does not use qdisc for tc-fq scheduling, TCP uses its internal encapsulation method instead.

 

The TCP encapsulation method depends on the Linux hrtimer, which consumes additional CPU resources. When CPU usage and network PPS are high, BBR algorithm has more obvious influence on network performance. When the CPU is idle and the network PPS is low, the impact is small.

Solution

Alibaba Cloud reminds you that:

  • If you have any risky operations on an instance or data, pay attention to the disaster tolerance and fault tolerance capabilities of the instance to ensure data security.
  • If you modify the configuration and data of an instance (including but not limited to ECS and RDS), we recommend that you create snapshots or enable RDS log backup.
  • If you have granted permissions on the Alibaba Cloud platform or submitted security information such as the logon account and password, we recommend that you modify the information as soon as possible.

Temporary solution

See the following TCP congestion control algorithm recommendations to select the solution that meets your service needs.

  • If the applications in the ECS instances only provide services for the internal network, we recommend that you use the following command to change the TCP congestion control algorithm to cubic. High bandwidth and low latency in the intranet environment.
    sysctl -w net.ipv4.tcp_congestion_control=cubic
    sh -c "echo 'net.ipv4.tcp_congestion_control=cubic'" >> /etc/sysctl.d/50-aliyun.conf
  • If an application in the ECS instance provides services, we recommend that you continue to use the BBR algorithm. However, change the scheduling policy of the network interface controller to tc-fq. For more information, see Fair Queue traffic policing. The modified command is as follows:
    tc qdisc add dev [$Dev] root fq
    Note:[$Dev] indicates the name of the network card to be adjusted.
  • We recommend that you do not use non-tc-fq scheduling policies while using the BBR algorithm. Because this will consume more additional CPU resources.

Fixed solution

Upgrade the kernel of an ECS instance to kernel-4.19.57-15.al7 or a later version.

More information

Aliyun Linux 2 system allows different connections to use different congestion algorithms and can be controlled by the net namespace. If an ECS instance has multiple containers that belong to different network namespaces, some containers provide external services only, while other containers provide internal services only. Then different congestion control algorithms can be set for these containers.

Related Documents

Applicable to

  • Elastic Compute Service