Control groups (cgroups) are a Linux kernel feature that restricts, accounts for, and isolates the physical resources (such as CPU, memory, and I/O) of process groups. A parent process can use cgroups to manage the resource consumption of its child process groups.
This page lists the interface differences between cgroup v1 and cgroup v2. Each table maps v1 interfaces to their v2 equivalents, or marks them as N/A when no equivalent exists in v2.
How cgroup v2 differs from cgroup v1
Before reviewing the interface tables, understanding two core architectural changes in cgroup v2 helps explain why some interfaces were removed, renamed, or restructured.
No processes in inner nodes. In cgroup v2, processes can only be attached to leaf cgroups — a cgroup that has child cgroups cannot directly contain processes. This constraint eliminates the ambiguity that arose in v1 when a process belonged to both a parent and a child cgroup.
Unified hierarchy. cgroup v1 allowed each subsystem to be mounted on a separate hierarchy. cgroup v2 uses a single unified hierarchy, which is why per-subsystem mount options and named hierarchies are gone.
These two rules explain several v1 interfaces that have no v2 equivalent: tasks, cgroup.clone_children, cgroup.sane_behavior, and the entire cpuacct subsystem (its accounting functionality is folded into the cpu subsystem in v2).
Naming conventions in v2. cgroup v2 uses consistent naming patterns across subsystems:
-
min/max— hard guarantees and limits -
low/high— soft guarantees and limits -
weight— proportional resource distribution (range: 1–10000, default: 100)
For example, memory.min is a hard memory guarantee, memory.low is a soft guarantee, and memory.high is a soft limit — all follow this pattern.
General interface differences
cgroup v1 interfaces
| Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
|---|---|---|---|
cgroup.procs |
Moves a process into the cgroup by writing its PID to this file. | No | cgroup.procs |
cgroup.clone_children |
When set to 1, a child cgroup inherits the cpuset configuration of its parent. Applies only to the cpuset subsystem. |
No | N/A |
cgroup.sane_behavior |
An interface for an experimental v2 feature, kept for backward compatibility after the official release of v2. | No | N/A |
notify_on_release |
When set to 1, the system runs the release_agent script when the cgroup becomes empty. Exists only in the root cgroup. |
No | cgroup.events, which provides similar functionality |
release_agent |
No | ||
tasks |
Moves a thread into the cgroup by writing its thread ID (TID) to this file. | No | cgroup.threads |
pool_size |
Controls the cgroup cache pool size. In high-concurrency scenarios, this accelerates cgroup creation and binding. Depends on cgroup_rename. Not available in cgroup v2. |
Yes | N/A |
cgroup v2 interfaces
| Interface name | Purpose | In-house interface | Corresponding cgroup v1 interface |
|---|---|---|---|
cgroup.procs |
Moves a process into the cgroup by writing its PID to this file. | No | cgroup.procs |
cgroup.type |
Write threaded to enable thread-granularity control. Supported only for the cpu, pids, and perf_event subsystems. |
No | N/A |
cgroup.threads |
Moves a thread into the cgroup by writing its TID to this file. Requires cgroup.type to be set to threaded. |
No | tasks |
cgroup.controllers |
Lists the subsystems enabled for the current cgroup. | No | N/A |
cgroup.subtree_control |
Controls which subsystems are enabled for child cgroups. The enabled subsystems must be a subset of those listed in cgroup.controllers. |
No | N/A |
cgroup.events |
Records whether the cgroup is managing processes and whether it is frozen. Monitor status changes using fsnotify. Does not exist in the root cgroup. |
No | notify_on_release and release_agent (similar functionality) |
cgroup.max.descendants |
Controls the maximum number of descendant cgroups. | No | N/A |
cgroup.max.depth |
Controls the maximum depth of descendant cgroups. | No | N/A |
cgroup.stat |
Shows the number of descendant cgroups and the number of descendant cgroups in a dying state (being destroyed). |
No | N/A |
cgroup.freeze |
Freezes or unfreezes all processes in the cgroup. Does not exist in the root cgroup. | No | freezer.state in the freezer subsystem |
cpu.stat |
Shows CPU usage statistics. | No | N/A |
io.pressure |
Shows Pressure Stall Information (PSI). Supports poll. See psi.rst and Enable the PSI feature for cgroup v1. |
No | io.pressure, memory.pressure, and cpu.pressure under the cpuacct subsystem (after enabling PSI for cgroup v1) |
memory.pressure |
No | ||
cpu.pressure |
No |
Subsystem interface differences
CPU
In cgroup v2, thecpuacctsubsystem no longer exists. Its functionality — CPU accounting and extended statistics — is now part of thecpusubsystem.
cgroup v1 interfaces
| Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
|---|---|---|---|
cpu.shares |
Controls the weight used to allocate CPU time proportionally. Default value: 1024. | No | cpu.weight, cpu.weight.nice (different units) |
cpu.idle |
Sets the cgroup scheduling policy to idle. An idle group receives the minimum CPU share and is no longer guaranteed a minimum runtime, making it more likely to yield the CPU to non-idle processes. When cpu.idle is 1, cpu.shares becomes read-only and changes to 3. |
No | cpu.idle |
cpu.priority |
Sets a fine-grained preemptive priority. Preemption is determined at clock interrupts or wake-ups and adjusted based on the priority difference, making it easier for high-priority tasks to preempt low-priority ones. | Yes | cpu.priority |
cpu.cfs_quota_us |
The maximum CPU runtime for tasks in a cgroup within a period defined by cpu.cfs_period_us (CFS bandwidth control). |
No | cpu.max |
cpu.cfs_period_us |
No | ||
cpu.cfs_burst_us |
The burst time a process can run within a cpu.cfs_period_us period. See Enable the CPU burst feature for cgroup v1. |
No | cpu.max.burst |
cpu.cfs_init_buffer_us |
The burst time a process can run at startup. | Yes | cpu.max.init_buffer |
cpu.stat |
Shows statistics related to CFS bandwidth control, such as the number of periods elapsed and the number of throttling events. | No | cpu.stat |
cpu.rt_runtime_us |
Real-time (RT) task bandwidth control. Within a period defined by cpu.rt_period_us, processes in the group can run for a maximum of cpu.rt_runtime_us. |
No | N/A |
cpu.rt_period_us |
No | N/A | |
cpu.bvt_warp_ns |
Controls the group identity attribute to differentiate online and offline processes, providing better CPU quality of service (QoS) for online processes. See Group identity feature. | Yes | cpu.bvt_warp_ns |
cpu.identity |
Yes | cpu.identity |
|
cpu.ht_stable |
Controls whether to generate noise on the simultaneous multithreading (SMT) sibling to stabilize SMT computing power. | Yes | N/A |
cpu.ht_ratio |
Controls whether extra quota is calculated when an SMT sibling is idle, used to stabilize SMT computing power. | Yes | cpu.ht_ratio |
cgroup v2 interfaces
| Interface name | Purpose | In-house interface | Corresponding cgroup v1 interface |
|---|---|---|---|
cpu.weight |
Controls the weight used to allocate CPU time proportionally. Default value: 100. | No | cpu.shares (different units) |
cpu.weight.nice |
Controls the weight used to allocate CPU time proportionally, expressed as a nice value. Default value: 0. | No | cpu.shares (different units) |
cpu.idle |
Sets the cgroup scheduling policy to idle. An idle group receives the minimum CPU share and yields the CPU to non-idle processes. When cpu.idle is 1, cpu.weight and cpu.weight.nice become read-only and are set to the minimum weight (0.3). Due to rounding, reading cpu.weight returns 0. |
No | cpu.idle |
cpu.priority |
Sets a fine-grained preemptive priority. Preemption is determined at clock interrupts or wake-ups and scaled based on the priority difference. | Yes | cpu.priority |
cpu.max |
CFS bandwidth control. Contains two values: quota and period. Within the period, processes in the group can run for a maximum of quota time. |
No | cpu.cfs_quota_us, cpu.cfs_period_us |
cpu.max.burst |
The burst time a process can run within the period defined by cpu.max. |
No | cpu.cfs_burst_us |
cpu.max.init_buffer |
The burst time a process can run at startup. | Yes | cpu.cfs_init_buffer_us |
cpu.bvt_warp_ns |
Controls the group identity attribute to differentiate offline processes, providing better CPU QoS for online processes. | Yes | cpu.bvt_warp_ns |
cpu.identity |
Yes | cpu.identity |
|
cpu.sched_cfs_statistics |
Provides CFS-related statistics, such as run time and time spent waiting for sibling or non-sibling cgroups. Requires kernel.sched_schedstats to be enabled. |
Yes | cpuacct.sched_cfs_statistics |
cpu.wait_latency |
The latency distribution of processes waiting in the run queue. Requires kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled to be enabled. |
Yes | cpuacct.wait_latency |
cpu.cgroup_wait_latency |
The latency distribution of process groups waiting in the run queue. Tracks the group sched_entity, while cpu.wait_latency tracks the task sched_entity. Requires kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled to be enabled. |
Yes | cpuacct.cgroup_wait_latency |
cpu.block_latency |
The latency distribution of processes blocked for non-I/O reasons. Requires kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled to be enabled. |
Yes | cpuacct.block_latency |
cpu.ioblock_latency |
The latency distribution of processes blocked for I/O reasons. Requires kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled to be enabled. |
Yes | cpuacct.ioblock_latency |
cpu.ht_ratio |
Controls whether extra quota is calculated when an SMT sibling is idle, used to stabilize SMT computing power. Takes effect only when core scheduling is enabled. | Yes | cpu.ht_ratio |
cpuset
cgroup v1 interfaces
| Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
|---|---|---|---|
cpuset.cpus |
Controls the CPUs on which tasks can run. Tasks cannot be attached to a cgroup when this interface is empty. | No | cpuset.cpus |
cpuset.mems |
Controls the non-uniform memory access (NUMA) nodes that can be allocated to tasks in a cgroup. Tasks cannot be attached to a cgroup when this interface is empty. | No | cpuset.mems |
cpuset.effective_cpus |
Queries the effective CPUs on which tasks are running. Affected by CPU hotplug events. | No | cpuset.cpus.effective |
cpuset.effective_mems |
Queries the effective NUMA nodes allocated to running tasks. Affected by memory node hotplug events. | No | cpuset.mems.effective |
cpuset.cpu_exclusive |
Controls which CPUs are used exclusively by this cgroup and cannot be used by other cpusets at the same level. | No | cpuset.cpus.partition (similar functionality) |
cpuset.mem_exclusive |
Controls which NUMA nodes are used exclusively by this cgroup and cannot be used by other cpusets at the same level. | No | N/A |
cpuset.mem_hardwall |
When set to 1, tasks can only allocate memory from the memory nodes attached to the cpuset. | No | N/A |
cpuset.sched_load_balance |
Controls whether CPUs are load-balanced within the cpuset. Enabled by default. | No | N/A |
cpuset.sched_relax_domain_level |
Controls the search range for CPUs when the scheduler migrates tasks to balance load. Default value: -1. Values: -1 (default system policy), 0 (no search), 1 (hyperthreads in same core), 2 (cores in same package), 3 (CPUs on same node), 4 (CPUs on same chunk), 5 (entire system). | No | N/A |
cpuset.memory_migrate |
When set to a non-zero value, if a task is allocated a memory page in a cpuset and then migrated to another cpuset, the memory page migrates to the new cpuset as well. | No | N/A |
cpuset.memory_pressure |
Calculates the memory paging pressure of the current cpuset. | No | N/A |
cpuset.memory_spread_page |
When set to 1, the kernel distributes the page cache evenly across the cpuset's memory nodes. | No | N/A |
cpuset.memory_spread_slab |
When set to 1, the kernel distributes slab caches evenly across the cpuset's memory nodes. | No | N/A |
cpuset.memory_pressure_enabled |
When set to 1, enables memory pressure statistics collection for the cpuset. | No | N/A |
cgroup v2 interfaces
| Interface name | Purpose | In-house interface | Corresponding cgroup v1 interface |
|---|---|---|---|
cpuset.cpus |
Controls the CPUs on which tasks can run. When empty, the CPUs of the parent cpuset are used. | No | cpuset.cpus |
cpuset.mems |
Controls the NUMA nodes that can be allocated to tasks in a cgroup. When empty, the NUMA nodes of the parent cpuset are used. | No | cpuset.mems |
cpuset.cpus.effective |
Queries the effective CPUs on which tasks are running. Affected by CPU hotplug events. | No | cpuset.effective_cpus |
cpuset.mems.effective |
Queries the effective NUMA nodes allocated to running tasks. Affected by memory node hotplug events. | No | cpuset.effective_mems |
cpuset.cpus.partition |
Controls whether the CPUs of a cpuset are used exclusively. Write root to enable exclusive use. |
No | cpuset.cpu_exclusive (similar functionality) |
.__DEBUG__.cpuset.cpus.subpartitions |
Queries which CPUs are used exclusively when root is written to cpuset.cpus.partition. Available only when the cgroup_debug feature is enabled in the kernel cmdline. |
No | N/A |
blkio
cgroup v1 interfaces
| Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
|---|---|---|---|
blkio.throttle.read_bps_device |
Specifies the maximum read throughput (bytes/second) for a device. Format: echo "<major>:<minor> <bps>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.read_bps_device |
No | io.max |
blkio.throttle.write_bps_device |
Specifies the maximum write throughput (bytes/second) for a device. Format: echo "<major>:<minor> <bps>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.write_bps_device |
No | io.max |
blkio.throttle.read_iops_device |
Specifies the maximum read IOPS for a device. Format: echo "<major>:<minor> <iops>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.read_iops_device |
No | io.max |
blkio.throttle.write_iops_device |
Specifies the maximum write IOPS for a device. Format: echo "<major>:<minor> <iops>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.write_iops_device |
No | io.max |
blkio.throttle.io_service_bytes |
Queries bandwidth statistics (read, write, sync, async, discard, and total) across all devices. Unit: bytes. | No | io.stat |
blkio.throttle.io_service_bytes_recursive |
Recursive version of blkio.throttle.io_service_bytes. Includes statistics from descendant cgroups. |
No | N/A |
blkio.throttle.io_serviced |
Queries IOPS statistics (read, write, sync, async, discard, and total) across all devices. | No | io.stat |
blkio.throttle.io_serviced_recursive |
Recursive version of blkio.throttle.io_serviced. Includes statistics from descendant cgroups. |
No | N/A |
blkio.throttle.io_service_time |
Queries the duration between request dispatch and completion for I/O operations (average I/O latency). See Enhance the monitoring of block I/O throttling. | Yes | io.extstat |
blkio.throttle.io_wait_time |
Queries how long I/O operations wait in scheduler queues (average I/O latency). See Enhance the monitoring of block I/O throttling. | Yes | io.extstat |
blkio.throttle.io_completed |
Queries the number of completed I/O operations (average I/O latency). See Enhance the monitoring of block I/O throttling. | Yes | io.extstat |
blkio.throttle.total_bytes_queued |
Queries the number of throttled I/O bytes, used to determine whether I/O latency is throttling-related. See Enhance the monitoring of block I/O throttling. | Yes | io.extstat |
blkio.throttle.total_io_queued |
Queries the number of throttled I/O operations, used to determine whether I/O latency is throttling-related. See Enhance the monitoring of block I/O throttling. | Yes | io.extstat |
blkio.cost.model |
Specifies the blk-iocost cost model. The control mode (ctrl) can be auto or user. Exists only in the root cgroup. Format: echo "<major>:<minor> ctrl=user model=linear rbps=<rbps> rseqiops=<rseqiops> rrandiops=<rrandiops> wbps=<wbps> wseqiops=<wseqiops> wrandiops=<wrandiops>" > /sys/fs/cgroup/blkio/blkio.cost.model. See Configure the blk-iocost weight-based throttling feature. |
Yes | io.cost.model |
blkio.cost.qos |
Controls the blk-iocost feature and configures a QoS policy to detect disk congestion. Exists only in the root cgroup. Format: echo "<major>:<minor> enable=1 ctrl=user rpct= rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/blkio/blkio.cost.qos. See Configure the blk-iocost weight-based throttling feature. |
Yes | io.cost.qos |
blkio.cost.weight |
Specifies the cgroup weight for blk-iocost. Exists only in non-root cgroups. Modes: weight (same weight for all devices) or major:minor weight (per-device weight). See Configure the blk-iocost weight-based throttling feature. |
Yes | io.cost.weight |
blkio.cost.stat |
Queries blk-iocost statistics. Exists only in non-root cgroups. | Yes | N/A |
cgroup v2 interfaces
| Interface name | Purpose | In-house interface | Corresponding cgroup v1 interface |
|---|---|---|---|
io.max |
Throttling interface. Specifies read and write limits in bytes/second and IOPS. Format: echo "<major>:<minor> rbps=<bps> wbps=<bps> riops=<iops> wiops=<iops>" > /sys/fs/cgroup/<cgroup>/io.max |
No | blkio.throttle.read_bps_device, blkio.throttle.read_iops_device, blkio.throttle.write_bps_device, blkio.throttle.write_iops_device |
io.stat |
Queries I/O operation statistics, including read, write, and discard rates in bytes/second and IOPS. | No | blkio.throttle.io_service_bytes, blkio.throttle.io_serviced |
io.extstat |
Queries extended I/O statistics, including wait time, service time, number of completed I/O operations, and throttling rates. | No | blkio.throttle.io_service_time, blkio.throttle.io_wait_time, blkio.throttle.io_completed, blkio.throttle.total_bytes_queued, blkio.throttle.total_io_queued |
io.cost.model |
Specifies the blk-iocost cost model. The control mode (ctrl) can be auto or user. Exists only in the root cgroup. Format: echo "<major>:<minor> ctrl=user model=linear rbps=<rbps> rseqiops=<rseqiops> rrandiops=<rrandiops> wbps=<wbps> wseqiops=<wseqiops> wrandiops=<wrandiops>" > /sys/fs/cgroup/io.cost.model. See Configure the blk-iocost weight-based throttling feature. |
No | blkio.cost.model |
io.cost.qos |
Controls the blk-iocost feature and configures a QoS policy to detect disk congestion. Exists only in the root cgroup. Format: echo "<major>:<minor> enable=1 ctrl=user rpct= rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/io.cost.qos. See Configure the blk-iocost weight-based throttling feature. |
No | blkio.cost.qos |
io.cost.weight |
Specifies the cgroup weight for blk-iocost. Exists only in non-root cgroups. Modes: weight (same weight for all devices) or major:minor weight (per-device weight). See Configure the blk-iocost weight-based throttling feature. |
No | blkio.cost.weight |
memory
cgroup v1 interfaces
| Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
|---|---|---|---|
memory.usage_in_bytes |
Queries the current memory usage. | No | N/A |
memory.max_usage_in_bytes |
Queries the peak memory usage. | No | N/A |
memory.limit_in_bytes |
Sets a hard upper limit on memory usage. | No | N/A |
memory.soft_limit_in_bytes |
Sets a soft lower limit on memory usage. | No | N/A |
memory.failcnt |
Queries the number of times memory usage reached the upper limit. | No | N/A |
memory.mglru_batch_size |
Specifies the batch size for proactive memory reclamation based on the Multi-Generational Least Recently Used (MGLRU) framework. The kernel attempts to release CPUs between reclamation batches. | Yes | N/A |
memory.mglru_reclaim_kbytes |
Specifies the amount of memory to reclaim proactively based on the MGLRU framework. | Yes | N/A |
memory.wmark_ratio |
Controls the memcg backend asynchronous reclaim feature by setting the memory watermark that triggers asynchronous reclamation. Unit: percentage of the memcg memory limit. Valid values: 0–100. When 0 (default), asynchronous reclaim is disabled. See Memcg backend asynchronous reclaim. | Yes | memory.wmark_ratio |
memory.wmark_high |
Read-only. When memory usage exceeds this value, backend asynchronous reclamation starts. Calculated as: memory.limit_in_bytes × memory.wmark_ratio / 100. Not stored in the memcg root directory. See Memcg backend asynchronous reclaim. |
Yes | |
memory.wmark_low |
Read-only. When memory usage falls below this value, backend asynchronous reclamation ends. Calculated as: memory.wmark_high - memory.limit_in_bytes × memory.wmark_scale_factor / 10000. Not stored in the memcg root directory. See Memcg backend asynchronous reclaim. |
Yes | |
memory.wmark_scale_factor |
Specifies the interval between memory.wmark_high and memory.wmark_low. Unit: 0.01% of the memcg memory limit. Valid values: 1–1000. Default: 50 (0.50% of the limit), inherited from the parent group. Not stored in the memcg root directory. See Memcg backend asynchronous reclaim. |
Yes | |
memory.wmark_min_adj |
Adjusts the global minimum watermark for this memcg. Valid values: -25–50. Default: 0, inherited from the parent cgroup. Negative values adjust within the [0, WMARK_MIN] range; positive values adjust within the [WMARK_MIN, WMARK_LOW] range. When the adjusted minimum watermark is exceeded, throttling occurs; throttling time is linearly proportional to excess memory usage (1–1000 ms). See Memcg global minimum watermark rating. | Yes | |
memory.force_empty |
Forcefully reclaims memory pages from the cgroup. | No | N/A |
memory.use_hierarchy |
Controls whether hierarchical memory accounting is enabled. | Yes | N/A |
memory.swappiness |
Sets the swappiness parameter for vmscan, which controls how aggressively the kernel uses the swap partition. | No | N/A |
memory.priority |
Sets the memcg out-of-memory (OOM) priority. Valid values: 0–12. A higher value indicates higher priority. Default: 0. Not inherited by descendant cgroups. Priority sorting applies only to sibling cgroups within the same parent; sibling memcgs with the same priority are sorted by memory usage, with the largest consumer triggering an OOM error first. See Memcg OOM priority policy. | Yes | memory.priority |
memory.move_charge_at_immigrate |
Controls whether a task's memory charges move with it when migrated between cgroups. | No | N/A |
memory.oom_control |
Controls whether the OOM killer terminates tasks when an OOM error occurs, and generates OOM status notifications. | No | N/A |
memory.oom.group |
Enables the OOM group feature, which terminates all tasks in a memcg when an OOM error occurs. | Yes | memory.oom.group |
memory.pressure_level |
Configures memory pressure notifications. | No | N/A |
memory.kmem.limit_in_bytes |
Sets a hard limit on kernel memory usage. | No | N/A |
memory.kmem.usage_in_bytes |
Queries current kernel memory usage. | No | N/A |
memory.kmem.failcnt |
Queries the number of times kernel memory usage reached the upper limit. | No | N/A |
memory.kmem.max_usage_in_bytes |
Queries the peak kernel memory usage. | No | N/A |
memory.kmem.slabinfo |
Queries kernel slab memory usage. | No | N/A |
memory.kmem.tcp.limit_in_bytes |
Sets a hard limit on kernel TCP memory usage. | No | N/A |
memory.kmem.tcp.usage_in_bytes |
Queries current kernel TCP memory usage. | No | N/A |
memory.kmem.tcp.failcnt |
Queries the number of times kernel TCP memory usage reached the upper limit. | No | N/A |
memory.kmem.tcp.max_usage_in_bytes |
Queries the peak kernel TCP memory usage. | No | N/A |
memory.memsw.usage_in_bytes |
Queries combined memory and swap usage. | No | N/A |
memory.memsw.max_usage_in_byte |
Queries the peak combined memory and swap usage. | No | N/A |
memory.memsw.limit_in_bytes |
Sets an upper limit on the total memory and swap usage of tasks in the cgroup. | No | N/A |
memory.memsw.failcnt |
Queries the number of times the combined memory and swap usage reached the upper limit. | No | N/A |
memory.swap.high |
Sets an upper limit on swap usage in a cgroup. | Yes | memory.swap.high |
memory.swap.events |
Queries events that occurred when swap usage reached the upper limit. | Yes | memory.swap.events |
memory.min |
Sets a minimum memory amount that the cgroup must retain (hard guarantee). See Memcg QoS feature of the cgroup v1 interface. | Yes | memory.min |
memory.low |
Sets a minimum memory amount that the cgroup should retain (soft guarantee). See Memcg QoS feature of the cgroup v1 interface. | Yes | memory.low |
memory.high |
Sets a throttle limit on memory usage. See Memcg QoS feature of the cgroup v1 interface. | Yes | memory.high |
memory.allow_duptext |
Controls the code duptext feature for tasks in a specific memcg, when the global code duptext feature is enabled via /sys/kernel/mm/duptext/enabled. Valid values: 0 (disabled, default) or 1 (enabled). See Code duptext feature. |
Yes | memory.allow_duptext |
memory.allow_duptext_refresh |
Controls whether the code duptext feature starts immediately when a binary file is generated or downloaded. Uses an asynchronous task mode to refresh tasks when code duptext does not take effect in PageDirty or PageWriteback scenarios. | Yes | memory.allow_duptext_refresh |
memory.duptext_nodes |
Limits the memory nodes used for duptext memory allocation. | Yes | memory.duptext_nodes |
memory.allow_text_unevictable |
Controls whether the memcg code snippet is locked (unevictable). | Yes | memory.allow_text_unevictable |
memory.text_unevictable_percent |
Specifies the ratio of locked memcg code snippet memory to total memcg code memory. | Yes | memory.text_unevictable_percent |
memory.thp_reclaim |
Controls the Transparent Huge Pages (THP) reclaim feature. Valid values: reclaim (enabled), swap (reserved for future use), disable (disabled, default). See THP reclaim. |
Yes | memory.thp_reclaim |
memory.thp_reclaim_stat |
Queries THP reclaim status. Reports per-NUMA-node counts for queue_length (THPs in the reclaim queue), split_hugepage (THPs split by reclaim), and reclaim_subpage (zero subpages reclaimed). See THP reclaim. |
Yes | memory.thp_reclaim_stat |
memory.thp_reclaim_ctrl |
Configures how THP reclaim is triggered. Parameters: threshold (maximum zero subpages in a THP before reclaim is triggered; default: 16) and reclaim (triggers reclaim). See THP reclaim. |
Yes | memory.thp_reclaim_ctrl |
memory.thp_control |
Controls the memcg THP feature. Prevents use of anonymous, shmem, and file THPs, reducing THP contention and memory waste for offline memcgs. | Yes | memory.thp_control |
memory.reclaim_caches |
Controls whether the kernel proactively reclaims cache in memcgs. Example: echo 100M > memory.reclaim_caches. |
Yes | memory.reclaim_caches |
memory.pgtable_bind |
Controls whether page table memory is forcefully allocated on the current node. | Yes | memory.pgtable_bind |
memory.pgtable_misplaced |
Queries statistics about page table memory allocated across NUMA nodes. | Yes | memory.pgtable_misplaced |
memory.oom_offline |
Marks a memcg as belonging to an offline task in the Quick OOM feature. | Yes | memory.oom_offline |
memory.async_fork |
Controls the Async-fork feature (formerly fast convergent merging, or FCM) for memcgs. | Yes | memory.async_fork |
memory.direct_compact_latency |
Specifies the latency threshold for direct memory compaction in the memsli feature. | Yes | memory.direct_compact_latency |
memory.direct_reclaim_global_latency |
Specifies the latency threshold for direct global memory reclamation in the memsli feature. | Yes | memory.direct_reclaim_global_latency |
memory.direct_reclaim_memcg_latency |
Specifies the latency threshold for direct memcg memory reclamation in the memsli feature. | Yes | memory.direct_reclaim_memcg_latency |
memory.direct_swapin_latency |
Specifies the latency threshold for direct memory swap-in in the memsli feature. | Yes | memory.direct_swapin_latency |
memory.direct_swapout_global_latency |
Specifies the latency threshold for direct global memory swap-out in the memsli feature. | Yes | memory.direct_swapout_global_latency |
memory.direct_swapout_memcg_latency |
Specifies the latency threshold for direct memcg memory swap-out in the memsli feature. | Yes | memory.direct_swapout_memcg_latency |
memory.exstat |
Queries extended memory statistics for in-house features: wmark_min_throttled_ms (throttling time since the adjusted minimum watermark was exceeded), wmark_reclaim_work_ms (duration of kernel memory reclamation attempts), unevictable_text_size_kb (size of locked code snippets), and pagecache_limit_reclaimed_kb (page cache limit reclaimed). See Memcg Exstat feature. |
Yes (self-developed enhancement) | memory.exstat |
memory.idle_page_stats |
Queries kidled memory usage statistics for a memcg and its cgroup hierarchy. | Yes | memory.idle_page_stats |
memory.idle_page_stats.local |
Queries kidled memory usage statistics for a memcg only (non-hierarchical). | Yes | memory.idle_page_stats.local |
memory.numa_stat |
Queries NUMA statistics for anonymous, file, and locked memory. | No | memory.numa_stat |
memory.pagecache_limit.enable |
Controls the Page Cache Limit feature. See Page Cache Limit feature. | Yes | memory.pagecache_limit.enable |
memory.pagecache_limit.size |
Specifies the page cache size limit. See Page Cache Limit feature. | Yes | memory.pagecache_limit.size |
memory.pagecache_limit.sync |
Specifies the Page Cache Limit mode: synchronous or asynchronous. See Page Cache Limit feature. | Yes | memory.pagecache_limit.sync |
memory.reap_background |
Controls whether zombie memcg reapers reclaim memcg memory asynchronously in the background. | Yes | memory.reap_background |
memory.stat |
Queries memory statistics. | No | memory.stat |
memory.use_priority_oom |
Controls the memcg OOM priority policy feature. See Memcg OOM priority policy. | Yes | memory.use_priority_oom |
memory.use_priority_swap |
Controls whether memory is swapped based on cgroup priorities. See Memcg OOM priority policy. | Yes | memory.use_priority_swap |
cgroup v2 interfaces
| Interface name | Purpose | In-house interface | Corresponding cgroup v1 interface |
|---|---|---|---|
memory.current |
Queries the current memory usage. | No | N/A |
memory.min |
Sets a minimum memory amount that the cgroup must retain (hard guarantee). See Memcg QoS feature of the cgroup v1 interface. | No | memory.min |
memory.low |
Sets a minimum memory amount that the cgroup should retain (soft guarantee). See Memcg QoS feature of the cgroup v1 interface. | No | memory.low |
memory.high |
Sets an upper throttle limit on memory usage. See Memcg QoS feature of the cgroup v1 interface. | No | memory.high |
memory.max |
Sets a hard limit on memory usage. | No | memory.limit_in_bytes |
memory.swap.current |
Queries current swap memory in use. | No | N/A |
memory.swap.high |
Sets an upper limit on swap usage in a cgroup. | No | N/A |
memory.swap.max |
Sets a hard limit on swap memory. | No | memory.memsw.limit_in_bytes |
memory.swap.events |
Queries events that occurred when swap usage reached the upper limit. | No | N/A |
memory.oom.group |
Controls the OOM group feature, which terminates all tasks in a memcg when an OOM error occurs. | No | memory.oom.group |
memory.wmark_ratio |
Controls the memcg backend asynchronous reclaim feature. Unit: percentage of the memcg memory limit. Valid values: 0–100. Default: 0 (disabled). See Memcg backend asynchronous reclaim. | Yes | memory.wmark_ratio |
memory.wmark_high |
Read-only. When memory usage exceeds this value, backend asynchronous reclamation starts. Calculated as: memory.limit_in_bytes × memory.wmark_ratio / 100. Not stored in the memcg root directory. See Memcg backend asynchronous reclaim. |
Yes | memory.wmark_high |
memory.wmark_low |
Read-only. When memory usage falls below this value, backend asynchronous reclamation ends. Calculated as: memory.wmark_high - memory.limit_in_bytes × memory.wmark_scale_factor / 10000. Not stored in the memcg root directory. See Memcg backend asynchronous reclaim. |
Yes | memory.wmark_low |
memory.wmark_scale_factor |
Specifies the interval between memory.wmark_high and memory.wmark_low. Unit: 0.01% of the memcg memory limit. Valid values: 1–1000. Default: 50 (0.50% of the limit), inherited from the parent group. Not stored in the memcg root directory. See Memcg backend asynchronous reclaim. |
Yes | memory.wmark_scale_factor |
memory.wmark_min_adj |
Adjusts the global minimum watermark for this memcg. Valid values: -25–50. Default: 0, inherited from the parent cgroup. See Memcg global minimum watermark rating. | Yes | memory.wmark_min_adj |
memory.priority |
Sets the memcg OOM priority. Valid values: 0–12. A higher value indicates higher priority. Default: 0. Not inherited by descendant cgroups. See Memcg OOM priority policy. | Yes | memory.priority |
memory.use_priority_oom |
Controls the memcg OOM priority policy feature. See Memcg OOM priority policy. | Yes | memory.use_priority_oom |
memory.use_priority_swap |
Controls whether memory is swapped based on cgroup priorities. See Memcg OOM priority policy. | Yes | memory.use_priority_swap |
memory.direct_reclaim_global_latency |
Specifies the latency threshold for direct global memory reclamation in the memsli feature. | Yes | memory.direct_reclaim_global_latency |
memory.direct_reclaim_memcg_latency |
Specifies the latency threshold for direct memcg memory reclamation in the memsli feature. | Yes | memory.direct_reclaim_memcg_latency |
memory.direct_compact_latency |
Specifies the latency threshold for direct memory compaction in the memsli feature. | Yes | memory.direct_compact_latency |
memory.direct_swapout_global_latency |
Specifies the latency threshold for direct global memory swap-out in the memsli feature. | Yes | memory.direct_swapout_global_latency |
memory.direct_swapout_memcg_latency |
Specifies the latency threshold for direct memcg memory swap-out in the memsli feature. | Yes | memory.direct_swapout_memcg_latency |
memory.direct_swapin_latency |
Specifies the latency threshold for direct memory swap-in in the memsli feature. | Yes | memory.direct_swapin_latency |
memory.exstat |
Queries extended memory statistics for in-house features: wmark_min_throttled_ms, wmark_reclaim_work_ms, unevictable_text_size_kb, and pagecache_limit_reclaimed_kb. See Memcg Exstat. |
Yes | memory.exstat |
memory.pagecache_limit.enable |
Controls the Page Cache Limit feature. See Page Cache Limit feature. | Yes | memory.pagecache_limit.enable |
memory.pagecache_limit.size |
Specifies the page cache size limit. See Page Cache Limit feature. | Yes | memory.pagecache_limit.size |
memory.pagecache_limit.sync |
Specifies the Page Cache Limit mode: synchronous or asynchronous. See Page Cache Limit feature. | Yes | memory.pagecache_limit.sync |
memory.idle_page_stats |
Queries kidled memory usage statistics for individual memcgs at each hierarchy level. | Yes | memory.idle_page_stats |
memory.idle_page_stats.local |
Queries kidled memory usage statistics for individual memcgs (non-hierarchical). | Yes | memory.idle_page_stats.local |
memory.numa_stat |
Queries NUMA statistics for anonymous, file, and locked memory. | Yes | memory.numa_stat |
memory.reap_background |
Controls whether zombie memcg reapers reclaim memcg memory asynchronously in the background. | Yes | memory.reap_background |
memory.stat |
Queries memory statistics. | No | memory.stat |
cpuacct
The cpuacct subsystem exists only in cgroup v1. In cgroup v2, CPU accounting is handled by the cpu subsystem.
| Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
|---|---|---|---|
cpuacct.usage |
Queries the total CPU time used. Unit: nanoseconds. | No | cpu.stat (similar data) |
cpuacct.usage_user |
Queries the CPU time used in user mode. Unit: nanoseconds. | No | |
cpuacct.usage_sys |
Queries the CPU time used in kernel mode. Unit: nanoseconds. | No | |
cpuacct.usage_percpu |
Queries the CPU time used per CPU. Unit: nanoseconds. | No | |
cpuacct.usage_percpu_user |
Queries the per-CPU time used in user mode. Unit: nanoseconds. | No | |
cpuacct.usage_percpu_sys |
Queries the per-CPU time used in kernel mode. Unit: nanoseconds. | No | |
cpuacct.usage_all |
Queries a summary of cpuacct.usage_percpu_user and cpuacct.usage_percpu_sys. Unit: nanoseconds. |
No | |
cpuacct.stat |
Queries CPU time used in user mode and kernel mode. Unit: ticks. | No | |
cpuacct.proc_stat |
Queries container-level data including CPU time, average load, and running task count. | Yes | |
cpuacct.enable_sli |
Controls whether container-level load average counting is enabled. | Yes | N/A |
cpuacct.sched_cfs_statistics |
Queries CFS statistics, including cgroup runtime and wait time relative to sibling and non-sibling cgroups. | Yes | cpu.sched_cfs_statistics |
cpuacct.wait_latency |
Queries the latency of tasks waiting in the run queue. | Yes | cpu.wait_latency |
cpuacct.cgroup_wait_latency |
Queries the latency of cgroups waiting in the run queue. Tracks the group sched_entity, while cpuacct.wait_latency tracks the task sched_entity. |
Yes | cpu.cgroup_wait_latency |
cpuacct.block_latency |
Queries the latency of tasks blocked for non-I/O reasons. | Yes | cpu.block_latency |
cpuacct.ioblock_latency |
Queries the latency of tasks blocked due to I/O operations. | Yes | cpu.ioblock_latency |
io.pressure |
Queries PSI for I/O, memory, and CPU. Supports polling. See psi.rst and Enable the PSI feature for cgroup v1. | No | N/A |
memory.pressure |
No | ||
cpu.pressure |
No |
freezer
The freezer subsystem exists only in cgroup v1. In cgroup v2, the equivalent functionality is provided by cgroup.freeze in the general interface.
| Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
|---|---|---|---|
freezer.state |
Controls the freeze state. Valid values: FROZEN and THAWED. |
No | cgroup.freeze |
freezer.self_freezing |
Queries whether a cgroup is frozen due to its own frozen state. | No | N/A |
freezer.parent_freezing |
Queries whether a cgroup is frozen because an ancestor cgroup is frozen. | No | N/A |
ioasids
The ioasids subsystem interfaces are the same in cgroup v1 and cgroup v2.
| Interface name | Purpose | In-house interface |
|---|---|---|
ioasids.current |
Queries the number of ioasids allocated to the current cgroup. | Yes |
ioasids.events |
Queries the number of events that occurred because the upper limit of allocable ioasids was exceeded. | Yes |
ioasids.max |
Queries the total number of ioasids that can be allocated to the current cgroup. | Yes |
net_cls and net_prio
Thenet_clsandnet_priointerfaces are removed in cgroup v2. Use eBPF to filter and shape network traffic instead.
| Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
|---|---|---|---|
net_cls.classid |
Specifies the class identifier that tags network packets from the current cgroup. Works with qdisc or iptables. | No | N/A |
net_prio.prioidx |
Queries the index value of the current cgroup in the internal data structure. Read-only; used internally by the kernel. | No | |
net_prio.ifpriomap |
Specifies the network priority for each network interface controller (NIC). | No |
perf_event
The perf_event subsystem provides no interfaces in either cgroup v1 or cgroup v2. It is enabled by default in cgroup v2 and provides the same functionality as in cgroup v1.
pids
The pids subsystem interfaces are the same in cgroup v1 and cgroup v2.
| Interface name | Purpose | In-house interface |
|---|---|---|
pids.max |
Specifies the maximum number of tasks in a cgroup. | No |
pids.current |
Queries the current number of tasks in a cgroup. | No |
pids.events |
Queries the number of events where a fork operation failed because the maximum task count was reached. Supports fsnotify for filesystem notifications. |
No |
rdma
The rdma subsystem interfaces are the same in cgroup v1 and cgroup v2.
| Interface name | Purpose | In-house interface |
|---|---|---|
rdma.max |
Specifies the upper limit on Remote Direct Memory Access (RDMA) adapter resource usage. | No |
rdma.current |
Queries the current RDMA adapter resource usage. | No |