Alibaba Cloud Linux 2 provides the cgroup writeback feature for the cgroup v1 kernel interface (kernel 4.19.36-12.al7 and later). Once enabled, blkio.throttle.write_bps_device and blkio.throttle.write_iops_device can throttle buffered I/O—something cgroup v1 does not support by default.
Background
Control groups (cgroups) are a Linux kernel feature for resource allocation, available in two versions: cgroup v1 and cgroup v2. For details, see the What are Control Groups section of the Red Hat Resource Management Guide.
The cgroup writeback feature enables buffered I/O throttling for cgroup v1 by coordinating the memory subsystem (memcg) with the I/O subsystem (blkcg).
Prerequisites
Before you begin, ensure that you have:
Alibaba Cloud Linux 2 with kernel 4.19.36-12.al7 or later
The
blkioandmemorycgroup subsystems mountedsudoprivileges
Run uname -r to confirm your kernel version.
Limitations
The cgroup writeback feature requires each memcg to map to exactly one blkcg. The following mappings are supported:
| Mapping type | Allowed | Example |
|---|---|---|
| One-to-one | Yes | memcg1 maps to blkcg1; memcg2 maps to blkcg2 |
| Many-to-one | Yes | memcg1 and memcg2 both map to blkcg2 |
| One-to-many | No | memcg1 maps to both blkcg1 and blkcg2 |
| Many-to-many | No | memcg1 maps to blkcg1; memcg2 also maps to both blkcg1 and blkcg2 |
Before throttling a process, write its process ID to the cgroup.procs interface of the target blkcg to lock the memcg-to-blkcg mapping. See Verify memcg-to-blkcg mappings to confirm the mapping.
If a process moves between blkcgs at runtime, its memcg remaps to the root blkcg, which has no throttle threshold. This disables throttling for that memcg. To prevent this, avoid moving processes between blkcgs.
Enable the cgroup writeback feature
Add the
cgwb_v1kernel boot parameter.sudo grubby --update-kernel="/boot/vmlinuz-$(uname -r)" --args="cgwb_v1"Reboot to apply the change.
sudo rebootAfter the system restarts, verify that
cgwb_v1appears in the kernel command line.cat /proc/cmdline | grep cgwb_v1If the output includes
cgwb_v1, the feature is active andblkio.throttle.write_bps_deviceandblkio.throttle.write_iops_devicecan now throttle buffered I/O.
Associate memcg with blkcg in Kubernetes
In Kubernetes, processes may move between cgroups, which can disable throttling. To prevent this, configure systemd to join the memory and blkio controllers into a unified hierarchy.
Edit
/etc/systemd/system.conf.sudo vim /etc/systemd/system.confSet
JoinControllersto join the memory and blkio subsystems.JoinControllers=cpu,cpuacct net_cls,net_prio memory,blkioPress Esc, then type
:wqto save and exit.Rebuild the kernel image so the new systemd configuration takes effect.
sudo dracut /boot/initramfs-$(uname -r).img $(uname -r) --forceReboot.
sudo rebootConfirm that
blkioandmemoryare mounted in the same hierarchy.ls /sys/fs/cgroup
Test that the feature works
This procedure simulates two I/O-generating processes to confirm throttling is active.
Setblkio.throttle.write_bps_deviceto at least 1 MB (1,048,576 bytes). Theddcommand flushes data every 1 MB of output; a lower threshold causes I/O hangs.
Create blkcg and memcg directories, then assign the current shell process to both.
sudo mkdir /sys/fs/cgroup/blkio/blkcg1 sudo mkdir /sys/fs/cgroup/memory/memcg1 sudo bash -c "echo $$ > /sys/fs/cgroup/blkio/blkcg1/cgroup.procs" # $$ is the current process ID sudo bash -c "echo $$ > /sys/fs/cgroup/memory/memcg1/cgroup.procs"Set a 10 MB/s write bandwidth limit on the target device (replace
254:48with your device's major:minor number).sudo bash -c "echo 254:48 10485760 > /sys/fs/cgroup/blkio/blkcg1/blkio.throttle.write_bps_device"Generate buffered I/O using
ddwithout theoflag=syncoption.sudo dd if=/dev/zero of=/mnt/vdd/testfile bs=4k count=10000Monitor disk write throughput with
iostat. Check thewMB/scolumn—a value of approximately 10 MB/s confirms throttling is working.iostat -xdm 1 vddThe
ddcommand runs quickly and the terminal may scroll too fast to read. Useiostatto observe the actual write rate.
Verify memcg-to-blkcg mappings
Use either of the following methods to confirm that memcg-to-blkcg mappings are one-to-one or many-to-one.
Method 1: bdi_wb_link
sudo cat /sys/kernel/debug/bdi/bdi_wb_linkThe following output shows a one-to-one mapping (memcg inode 35, blkcg inode 48):
memory <---> blkio
memcg1: 35 <---> blkcg1: 48Method 2: ftrace
Enable the
insert_memcg_blkcg_linkevent.sudo bash -c "echo 1 > /sys/kernel/debug/tracing/events/writeback/insert_memcg_blkcg_link/enable"Read the trace output.
sudo cat /sys/kernel/debug/tracing/trace_pipeLook for a line containing
memcg_ino=35 blkcg_ino=48, which confirms the mapping. For example:<...>-1537 [006] .... 99.511327: insert_memcg_blkcg_link: memcg_ino=35 blkcg_ino=48 old_blkcg_ino=0