Aliyun Linux 2 provides the weight-based throttling feature (blk-iocost) based on the cost model since the kernel version 4.19.81-17.al7.x86_64. blk-iocost improves the weight-based disk throttling feature of the I/O subsystem (blkcg) within a kernel. Both cgroup v1 and cgroup v2 interfaces support blk-iocost in Aliyun Linux 2. This topic describes the interfaces that implement throttling.

Interface description

Interface Description Configuration item
cost.qos

A read/write interface whose file is only stored in the root group of blkcg. The full name of the file is blkio.cost.qos in cgroup v1 and io.cost.qos in cgroup v2. This interface provides the blk-iocost feature and limits the rate of I/O quality of service (QoS) based on the latency weight.

After blk-iocost is enabled, the kernel calculates the proportion of requests that exceed the read and write latency rlat|wlat to all requests. When the proportion is greater than rlat|wlat, the kernel considers the device saturated and reduces the number of requests sent to the disk. By default, the value of rlat|wlat is set to 0, indicating that the rlat|wlat feature is disabled.

Each line of configuration in the interface file starts with the Major and Minor numbers of the device in the MAJ:MIN format, followed by other configuration items. The following describes configuration items:
  • enable: specifies whether to enable the blk-iocost controller, that is, whether to enable blk-iocost. The default value 0 indicates that blk-iocost is disabled. A value of 1 indicates that blk-iocost is enabled.
  • ctrl: the control mode. Valid values: auto and user. When the control mode is set to auto, the kernel automatically detects the device type and uses built-in parameters. When the control mode is set to user, you must specify the following QoS control parameters.
  • rpct: read latency percentile. Valid values: 0 to 100.
  • rlat: read latency threshold. Unit: microseconds.
  • wpct: write latency percentile. Valid values: [0,100].
  • wlat: write latency threshold. Unit: microseconds.
  • min: minimum scaling percentage. Valid values: 1 to 10000.
  • max: maximum scaling percentage. Valid values: 1 to 10000.
cost.model A read/write interface whose file is only stored in the root group of blkcg. The full name of the interface file is blkio.cost.model in cgroup v1 and io.cost.model in cgroup v2. The interface is used to set the cost model. Each line of configuration in the interface file starts with the Major and Minor numbers of the device in the MAJ:MIN format, followed by other configuration items. The following describes configuration items:
  • ctrl: the control mode. Specifies whether to enter model parameters by the user. Valid values: auto and user.
  • model: the model parameter. Valid value: linear. You must define the following modeling parameters when the value of the model parameter is linear:
    • [r|w]bps: the maximum sequential I/O throughput.
    • [r|w]seqiops: the sequential input/output operations per second (IOPS).
    • [r|w]randiops: the random IOPS.
    Note You can use the tools/cgroup/iocost_coef_gen.py script in the kernel source code to generate the preceding parameters and then write these parameters to the interface file of cost.model to configure the cost model.
cost.weight A read/write interface whose file is only stored in the sub-group of blkcg. The full name of the interface file is blkio.cost.weight in cgroup v1 and io.cost.weight in cgroup v2. This interface is used to set the weight of a sub-group. Default value: 100. Valid values: 1 to 10000. The interface can be used to set a weight for each device or change the default weight of a sub-group.
  • If you set the weight of the interface to <weight>, the default weight of blkcg is changed to <weight>.
  • If you set the port number and weight of the interface to MAJ:MIN <weight>, the weight of blkcg on the device MAJ:MIN is changed.

Precautions

The blk-iocost feature supports the system to automatically configure the IO monitoring (ctrl=auto). To use this feature, you must disable the rotational interface (echo 0 >) for Alibaba Cloud ultra disks, standard SSDs, enhanced SSDs, or local NVMe SSDs.
echo 0 > /sys/block/[$DISK_NAME]/queue/rotational    # Replace [$DISK_NAME] with the actual disk name.

Example 1

Use the cost.qos interface to enable the blk-iocost feature for the 254:48 device. If over 95% of requests have a latency (rlat|wlat) of over 5 milliseconds, the disk is considered saturated. The kernel will adjust the speed of the disk to send requests at a speed between 50% and 150% of the original speed. The commands for the cgroup v1 and cgroup v2 interfaces are as follows:

  • The command for cgroup v1:
    echo "254:48 enable=1 ctrl=user rpct=95.00 rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/blkio/blkio.cost.qos
  • The command for cgroup v2:
    echo "254:48 enable=1 ctrl=user rpct=95.00 rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/io.cost.qos

Example 2

Use the cost.model interface to configure a model on the device 254:48 based on the linear modeling parameters. The commands for the cgroup v1 and cgroup v2 interfaces are as follows:

  • The command for cgroup v1:
    echo "254:48 ctrl=user model=linear rbps=2706339840 rseqiops=89698 rrandiops=110036 wbps=1063126016 wseqiops=135560 wrandiops=130734" > /sys/fs/cgroup/blkio/blkio.cost.model
  • The command for cgroup v2:
    echo "254:48 ctrl=user model=linear rbps=2706339840 rseqiops=89698 rrandiops=110036 wbps=1063126016 wseqiops=135560 wrandiops=130734" > /sys/fs/cgroup/io.cost.model

Example 3

Use the cost.weight interface to change the default weight of blkcg1 to 50 and then set the weight of blkcg1 on the device 254:48 to 50. The commands for the cgroup v1 and cgroup v2 interfaces are as follows:

  • The command for cgroup v1:
    echo "50" > /sys/fs/cgroup/blkio/blkcg1/blkio.cost.weight    # Change the default weight to 50.
    echo "254:48 50" > /sys/fs/cgroup/blkio/blkcg1/blkio.cost.weight    # Set the weight of blkcg1 on the device to 50.
  • The command for cgroup v2:
    echo "50" > /sys/fs/cgroup/cg1/io.cost.weight    # Change the default weight to 50.
    echo "254:48 50" > /sys/fs/cgroup/cg1/io.cost.weight    # Set the weight of blkcg1 on the device to 50.

Common monitoring tools

  • iocost monitor script

    The tools/cgroup/iocost_monitor.py script in the kernel source code uses the drgn debugger to directly obtain kernel parameters and then provides the I/O performance monitoring data. For more information about drgn, see drgn. The script is used as follows:

    Run the following command to monitor the I/O performance data of the vdd disk.
    ./iocost_monitor.py vdd
    A sample response is as follows:
    vdd RUN  per=500.0ms cur_per=3930.839:v14620.321 busy= +1 vrate=6136.22% params=hdd
                              active    weight      hweight% inflt% dbt  delay usages%
    blkcg1                       *    50/   50   9.09/  9.09   0.00   0  0*000 009:009:009
    blkcg2                       *   500/  500  90.91/ 90.91   0.00   0  0*000 089:091:092
  • The blkio.cost.statcost.stat interface under cgroup v1
    The Aliyun Linux 2 kernel provides the blk-iocost statistics interface under the cgroup v1 interface. The interface file of the statistics interface records the QoS data of each controlled device. Run the following command to view the interface document:
    cat /sys/fs/cgroup/blkio/blkcg1/blkio.cost.stat
    A sample response is as follows:
    254:48 is_active=1 active=50 inuse=50 hweight_active=5957 hweight_inuse=5957 vrate=159571
  • ftrace monitoring tool

    The Aliyun Linux 2 kernel provides the ftrace tool related to blk-iocost for kernel-side analytics. The ftrace monitoring tool is used as follows:

    1. Set the enable property to 1 to enable the ftrace tool.
      echo 1 > /sys/kernel/debug/tracing/events/iocost/enable
    2. View the output information.
      cat /sys/kernel/debug/tracing/trace_pipe
      A sample response is as follows:
          dd-1593  [008] d...   688.565349: iocost_iocg_activate: [vdd:/blkcg1] now=689065289:57986587662878 vrate=137438 period=22->22 vtime=0->57986365150756 weight=50/50 hweight=65536/65536
          dd-1593  [008] d.s.   688.575374: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
      <idle>-0     [008] d.s.   688.608369: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
          dd-1594  [006] d...   688.620002: iocost_iocg_activate: [vdd:/blkcg2] now=689119946:57994099611644 vrate=137438 period=22->26 vtime=0->57993412421644 weight=250/250 hweight=65536/65536
      <idle>-0     [008] d.s.   688.631367: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
      <idle>-0     [008] d.s.   688.642368: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
      <idle>-0     [008] d.s.   688.653366: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
      <idle>-0     [008] d.s.   688.664366: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1