All Products
Search
Document Center

Alibaba Cloud Linux:What do I do if the TCP congestion control algorithm BBR affects network performance in Alibaba Cloud Linux 2?

Last Updated:Dec 08, 2023

This topic describes the cause of the following issue and how to resolve the issue: Bottleneck bandwidth and round-trip propagation time (BBR), a TCP congestion control algorithm, affects the network performance of an Elastic Compute Service (ECS) instance that runs Alibaba Cloud Linux 2.

Problem description

BBR is used as the default TCP congestion control algorithm for an instance that meets the following conditions. When the CPU utilization or packet forwarding rate of an instance is high, BBR affects the network performance of the instance. For example, BBR degrades the performance of Redis databases.

  • Image version: aliyun_2_1903_64_20G_alibase_20190619.vhd or earlier

  • Kernel version: kernel-4.19.48-14.al7 or earlier

Cause

The Alibaba Cloud Linux 2 kernel supports only the following TCP congestion control algorithms: Reno, BBR, and CUBIC. The control performance of the preceding algorithms varies based on the network scenario. The BBR algorithm estimates the throughput (or bandwidth) and latency (or round trip time) of the current connection to adjust the congestion window. The BBR algorithm relies on the TCP pacing feature. The TCP pacing feature is implemented in the following manners:

  • If the network interfaces of the instance use the Fair Queue traffic policing (tc-fq) of qdisc to schedule network traffic, the flow pacing of tc-fq is used.

  • If the network interfaces of the instance do not use the tc-fq of qdisc to schedule network traffic, the TCP internal pacing is used.

The Linux high-resolution timer hrtimer is used to implement the TCP internal pacing. Each hrtimer that is used consumes CPU resources. When the CPU utilization or packet forwarding rate of an instance is high, BBR significantly affects the network performance of the instance. When the CPU utilization or packet forwarding rate of an instance is low, the effect of BBR on the network performance can be ignored.

Solutions

To resolve the issue, use one of the following solutions based on your business requirements:

  • Replace BBR with another TCP congestion control algorithm.

    If applications that run on the instance provide services only over the internal network, run the following commands to replace BBR with CUBIC. CUBIC is suitable for internal network environments in which bandwidth is high and latency is low.

    sysctl -w net.ipv4.tcp_congestion_control=cubic
    sh -c "echo 'net.ipv4.tcp_congestion_control=cubic'" >> /etc/sysctl.d/50-aliyun.conf
  • Change the scheduling policy of the network interface.

    If applications that run on the instance provide services over the Internet, we recommend that you continue to use BBR and change the scheduling policy of the network interface that the applications use to tc-fq. Sample command:

    tc qdisc add dev <$Dev> root fq
    Note

    <$Dev> specifies the name of the network interface.

  • When you use BBR, use the tc-fq scheduling policy to reduce the usage of CPU resources.

  • To prevent the issue, upgrade the kernel version of the instance to kernel-4.19.57-15.al7 or later.