All Products
Search
Document Center

Alibaba Cloud Linux:Configure the blk-iocost weight-based throttling feature

Last Updated:Mar 14, 2024

The blk-iocost weight-based throttling feature is an Alibaba Cloud Linux improvement of the weight-based disk throttling feature of the cgroup I/O subsystem (blkcg). blk-iocost is an I/O controller that is used to allocate bandwidth to I/O operations on block devices based on the priorities of applications or processes. blk-iocost can also control the usage of the block device I/O bandwidth by specific applications or processes based on specified weight values. blk-iocost helps you better control and manage disk I/O resources.

Note

cgroup v1 and cgroup v2 are two versions of the resource management feature in the Linux kernel. In the Alibaba Cloud Linux kernel, the blk-iocost feature supports cgroup v1 and v2 interfaces. In most cases, only one version is activated and used in a system. To check whether the system uses the cgroup v1 interface or the cgroup v2 interface, run the stat -fc %T /sys/fs/cgroup command.

  • If tmpfs is returned, the cgroup v1 interface is used.

  • If cgroup2fs is returned, the cgroup v2 interface is used.

Usage notes

  • cost.qos

    This interface is used to enable or disable the blk-iocost feature and limits the I/O quality of service (QoS) rate based on the latency weight. The interface is a read/write interface whose file exists only in the root group of the blkcg. The full name of the interface file varies based on the cgroup version:

    • cgroup v1: blkio.cost.qos

    • cgroup v2: io.cost.qos

    Interface configuration:

    Each line in the configuration file starts with the major (MAJ) and minor (MIN) numbers of a disk in the MAJ:MIN format, followed by the following configurations. To query the MAJ and MIN numbers of a disk, run the lsblk | grep <disk name> command.

    • enable: specifies whether to enable the blk-iocost feature. Default value: 0.

      • 0: disables the blk-iocost feature.

      • 1: enables the blk-iocost feature.

    • ctrl: the control mode. Valid values: auto and user.

      • auto: The system automatically detects the disk category and uses built-in parameters.

        Important

        If you set ctrl to auto and the category of the disk attached to an Elastic Compute Service (ECS) instance is SSD, such as standard SSD, enhanced SSD (ESSD), or Non-Volatile Memory Express (NVMe) SSD, you must set the rotational attribute of the SSD to 0. blk-iocost can more accurately evaluate I/O costs and tune scheduling policies to improve the I/O performance of SSDs. Sample command:

        sudo sh -c 'echo 0 > /sys/block/<DISK_NAME>/queue/rotational' # Replace <DISK_NAME> with the actual disk name.
      • user: Configure the following control parameters.

        • rpct: the read latency percentile. Valid values: 0 to 100.

        • rlat: the read latency threshold. Unit: microseconds.

        • wpct: the write latency percentile. Valid values: 0 to 100.

        • wlat: the write latency threshold. Unit: microseconds.

        • min: the minimum scaling percentage. Valid values: 1 to 10000.

        • max: the maximum scaling percentage. Valid values: 1 to 10000.

  • cost.model

    This interface is used to configure the cost model. The interface is a read/write interface whose file exists only in the root group of the blkcg. The full name of the interface file varies based on the cgroup version:

    • cgroup v1: blkio.cost.model

    • cgroup v2: io.cost.model

    Interface configuration:

    Each line in the configuration file starts with the major (MAJ) and minor (MIN) numbers of a disk in the MAJ:MIN format, followed by the following configurations. To query the MAJ and MIN numbers of the disk, run the lsblk | grep <disk name> command.

    • ctrl: the control mode. Valid values: auto and user.

      • auto: The system automatically optimizes the I/O scheduling policy based on the current workload.

        Important

        If you set ctrl to auto and the category of the disk attached to an Elastic Compute Service (ECS) instance is SSD, such as standard SSD, enhanced SSD (ESSD), or Non-Volatile Memory Express (NVMe) SSD, you must set the rotational attribute of the SSD to 0. blk-iocost can more accurately evaluate I/O costs and tune scheduling policies to improve the I/O performance of SSDs. Sample command:

        sudo sh -c 'echo 0 > /sys/block/<DISK_NAME>/queue/rotational' # Replace <DISK_NAME> with the actual disk name.
      • user: Configure model parameters.

    • model: the model parameter. Valid value: linear. If you set the model parameter to linear, you must specify the following modeling parameters:

      • [r|w]bps: the maximum sequential I/O throughput.

      • [r|w]seqiops: the sequential input/output operations per second (IOPS).

      • [r|w]randiops: the random IOPS.

        Note

        You can use the tools/cgroup/iocost_coef_gen.py script in the kernel source code to generate the preceding parameters and then write the parameters to the cost.model interface file to configure the cost model.

  • weight (Alibaba Cloud Linux 3) or cost.weight (Alibaba Cloud Linux 2)

    This interface is used to set a weight value for each disk or modify the default weight (100) of a subgroup. Valid values: 1 to 10000. The interface is a read/write interface whose interface file exists only in the subgroup of blkcg.

    Alibaba Cloud Linux 3

    The full name of the interface file varies based on the cgroup version:

    • cgroup v1: blkio.cost.weight

    • cgroup v2: io.weight

    Alibaba Cloud Linux 2

    • cgroup v1: blkio.cost.weight

    • cgroup v2: io.cost.weight

    Interface configuration:

    • <weight>: the default weight of blkcg.

    • MAJ:MIN <weight>: the weight of the blkcg on the disk specified by MAJ:MIN.

Limits

Only Alibaba Cloud Linux images that contain the following kernel versions support the blk-iocost feature:

  • Alibaba Cloud Linux 2: 4.19.81-17 or later

  • Alibaba Cloud Linux 3: All versions

Procedure

Step 1: Use cost.qos to enable the blk-iocost feature

Example scenario: Use the cost.qos interface to enable the blk-iocost feature for the 254:48 disk. If more than 5% of read and write requests have a latency (rlat|wlat) longer than 5 milliseconds, the disk is considered to be saturated. The kernel adjusts the rate at which requests are sent to the disk within the interval from 50% to 150% of the original rate. Run the following commands for the cgroup v1 and cgroup v2 interfaces:

Command for cgroup v1

sudo sh -c 'echo "254:48 enable=1 ctrl=user rpct=95.00 rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/blkio/blkio.cost.qos'

Command for cgroup v2

sudo sh -c 'echo "254:48 enable=1 ctrl=user rpct=95.00 rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/io.cost.qos'

Step 2: Use cost.model to configure a cost model

Example scenario: Use the cost.model interface to set model to linear and specify modeling parameters to configure a model on the 254:48 disk. Run the following commands for the cgroup v1 and cgroup v2 interfaces:

Command for cgroup v1

sudo sh -c 'echo "254:48 ctrl=user model=linear rbps=2706339840 rseqiops=89698 rrandiops=110036 wbps=1063126016 wseqiops=135560 wrandiops=130734" > /sys/fs/cgroup/blkio/blkio.cost.model'

Command for cgroup v2

sudo sh -c 'echo "254:48 ctrl=user model=linear rbps=2706339840 rseqiops=89698 rrandiops=110036 wbps=1063126016 wseqiops=135560 wrandiops=130734" > /sys/fs/cgroup/io.cost.model'

Step 3: Modify the weight

Example scenario: After you configure cost.qos by performing Step 1: Use the cost.qos interface to enable the blk-iocost feature and configure cost.model by performing Step 2: Use the cost.model interface to configure the cost model, the blk-iocost feature is enabled. Then, you can create the blkcg1 (cgroup v1) or cg1 (group v2) control group and use the cost.weight interface for cgroup v1 or the weight interface for cgroup v2 to change the default weight of the control group to 50. Then, set the weight of the control group on the 254:48 disk to 50. Run the following commands for the cgroup v1 and cgroup v2 interfaces:

Commands for cgroup v1

sudo mkdir /sys/fs/cgroup/blkio/blkcg1 # Create the control group named blkcg1.
sudo sh -c 'echo "50" > /sys/fs/cgroup/blkio/blkcg1/blkio.cost.weight' # Change the default weight to 50.
sudo sh -c 'echo "254:48 50" > /sys/fs/cgroup/blkio/blkcg1/blkio.cost.weight'    # Set the weight for the disk to 50.

Commands for cgroup v2

  • Alibaba Cloud Linux 2

    sudo mkdir /sys/fs/cgroup/cg1    # Create the control group named cg1.
    sudo sh -c 'echo "50" > /sys/fs/cgroup/cg1/io.cost.weight'    # Change the default weight to 50.
    sudo sh -c 'echo "254:48 50" > /sys/fs/cgroup/cg1/io.cost.weight'    # Set the weight to 50.
  • Alibaba Cloud Linux 3

    sudo mkdir /sys/fs/cgroup/cg1    # Create the control group named cg1.
    sudo sh -c 'echo "50" > /sys/fs/cgroup/cg1/io.weight'    # Change the default weight to 50.
    sudo sh -c 'echo "254:48 50" > /sys/fs/cgroup/cg1/io.weight'    # Set the weight for the disk to 50.

Common monitoring tools

blk-iocost needs to be able to monitor and evaluate the I/O performance of your system. You can use the following tools or interfaces to monitor the I/O resource usage and then optimize the resource usage.

  • iocost monitor script

    The tools/cgroup/iocost_monitor.py script in the kernel source code uses the drgn debugger to obtain kernel parameters and provide I/O performance monitoring data. Perform the following steps to use the script:

    1. Install the drgn debugger. Sample command:

      sudo pip3 install drgn

      For information about the drgn debugger, see drgn.

    2. (Optional) Download iocost_monitor.py.

      If you did not download the complete Linux kernel source code, clone or download the iocost_monitor.py script from the public repository of the Linux kernel. Sample command:

      wget https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/tools/cgroup/iocost_monitor.py
    3. Run the iocost_monitor.py script. In the following example, the VDD is used. Sample command:

      sudo python3 ./iocost_monitor.p vdd

      The following command output is returned:

      vdd RUN  per=500.0ms cur_per=3930.839:v14620.321 busy= +1 vrate=6136.22% params=hdd
                                active    weight      hweight% inflt% dbt  delay usages%
      blkcg1                       *    50/   50   9.09/  9.09   0.00   0  0*000 009:009:009
      blkcg2                       *   500/  500  90.91/ 90.91   0.00   0  0*000 089:091:092
  • blkio.cost.stat interface file of cgroup v1

    The Alibaba Cloud Linux kernel provides the blk-iocost interface file (blkio.cost.stat) of the cgroup v1 interface. This interface file records the QoS data of each controlled device. Run the following command to view the interface file:

    cat /sys/fs/cgroup/blkio/blkcg1/blkio.cost.stat

    The following command output is returned:

    254:48 is_active=1 active=50 inuse=50 hweight_active=5957 hweight_inuse=5957 vrate=159571
  • ftrace tool

    The Alibaba Cloud Linux kernel provides the ftrace tool related to the blk-iocost feature. For the blk-iocost feature, ftrace can help trace the decision-making process of the scheduler and the I/O request processing in detail to provide in-depth performance analysis. Perform the following steps to use the ftrace tool:

    1. Run the following command to set the enable attribute to 1 to enable the ftrace tool:

      sudo sh -c 'echo 1 > /sys/kernel/debug/tracing/events/iocost/enable'
    2. Run the following command to view the output information:

      sudo cat /sys/kernel/debug/tracing/trace_pipe

      The following command output is returned:

          dd-1593  [008] d...   688.565349: iocost_iocg_activate: [vdd:/blkcg1] now=689065289:57986587662878 vrate=137438 period=22->22 vtime=0->57986365150756 weight=50/50 hweight=65536/65536
          dd-1593  [008] d.s.   688.575374: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
      <idle>-0     [008] d.s.   688.608369: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
          dd-1594  [006] d...   688.620002: iocost_iocg_activate: [vdd:/blkcg2] now=689119946:57994099611644 vrate=137438 period=22->26 vtime=0->57993412421644 weight=250/250 hweight=65536/65536
      <idle>-0     [008] d.s.   688.631367: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
      <idle>-0     [008] d.s.   688.642368: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
      <idle>-0     [008] d.s.   688.653366: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
      <idle>-0     [008] d.s.   688.664366: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1