All Products
Search
Document Center

Alibaba Cloud Linux:Enhance the monitoring of block I/O throttling

Last Updated:Oct 13, 2023

Alibaba Cloud Linux 2 starting with the kernel version 4.19.81-17.al7 and Alibaba Cloud Linux 3 provide interfaces to better monitor block I/O throttling. This topic describes the interfaces and provides examples on how to use the interfaces.

Background information

Linux block I/O throttling (BPS or IOPS) is required in multiple scenarios, especially in scenarios where cgroup writeback is enabled. Alibaba Cloud Linux provides interfaces to enhance the monitoring of block I/O throttling and make it easier for you to perform I/O throttling related operations.

Interfaces

Interface

Description

blkio.throttle.io_service_time

The total amount of time between request dispatch and request completion for I/O operations. Unit: nanoseconds.

blkio.throttle.io_wait_time

The total amount of time that the I/O operations wait in scheduler queues. Unit: nanoseconds.

blkio.throttle.io_completed

The number of completed I/O operations. It is used to calculate the average latency of the block I/O throttling layer.

blkio.throttle.total_io_queued

The number of I/O operations that were throttled. The number of I/O operations that were throttled in the current cycle can be calculated based on periodic monitoring data and be used to analyze whether I/O latency is related to throttling.

blkio.throttle.total_bytes_queued

The number of I/O bytes that were throttled. Unit: bytes.

The preceding interfaces are stored in /sys/fs/cgroup/blkio/<cgroup>/, where <cgroup> specifies the control group.

Examples

You can obtain the average I/O latency of a disk by using the preceding interfaces. In this example, the average I/O write latency of the vdd disk is monitored at an interval of 5 seconds to calculate the average I/O latency of the vdd disk. The following table describes the relevant parameters.

Parameter

Description

write_wait_time<N>

The duration of throttling at the block I/O throttling layer.

write_service_time<N>

The total amount of time between request dispatch and request completion for I/O operations.

write_completed<N>

The number of completed I/O operations.

  1. Obtain the monitoring data at the T1 time.

    write_wait_time1 = `cat /sys/fs/cgroup/blkio/blkcg1/blkio.throttle.io_wait_time | grep -w "254:48 Write" | awk '{print $3}'`
    write_service_time1 = `cat /sys/fs/cgroup/blkio/blkcg1/blkio.throttle.io_service_time | grep -w "254:48 Write" | awk '{print $3}'`
    write_completed1 = `cat /sys/fs/cgroup/blkio/blkcg1/blkio.throttle.io_completed | grep -w "254:48 Write" | awk '{print $3}'`
  2. Wait 5 seconds and obtain the monitoring data at the T2 (T1 + 5s) time.

    write_wait_time2 = `cat /sys/fs/cgroup/blkio/blkcg1/blkio.throttle.io_wait_time | grep -w "254:48 Write" | awk '{print $3}'`
    write_service_time2 = `cat /sys/fs/cgroup/blkio/blkcg1/blkio.throttle.io_service_time | grep -w "254:48 Write" | awk '{print $3}'`
    write_completed2 = `cat /sys/fs/cgroup/blkio/blkcg1/blkio.throttle.io_completed | grep -w "254:48 Write" | awk '{print $3}'`
  3. Calculate the average I/O latency during the 5 seconds based on the following formula:

    Average I/O latency = (Total I/O duration at the T2 time - Total I/O duration at the T1 time)/(Number of completed I/O operations at the T2 time - Number of completed I/O operations at the T1 time)

    avg_delay = `echo "((write_wait_time2 + write_service_time2) - (write_wait_time1+write_service_time1)) / (write_completed2 - write_completed1)" | bc`