Alibaba Cloud Linux 2 that uses the kernel version kernel-4.19.91-24.al7
or later and Alibaba Cloud Linux 3 that uses the kernel version kernel-5.10.46-7.al8
or later support the group identity feature. You can use the group identity feature to configure different identities for CPU control groups (cgroups) to define the priorities of process tasks in the cgroups.
Prerequisites
- Alibaba Cloud Linux 2 that uses the kernel version
kernel-4.19.91-26
,kernel-4.19.91-26.1
,kernel-4.19.91-26.2
, orkernel-4.19.91-26.3
does not support the group identity feature because the feature has been disabled in the kernel. You can run theuname -r
command to view the kernel version of Alibaba Cloud Linux 2. - Alibaba Cloud Linux 3 that uses the kernel version
kernel-5.10.112-11.al8
,kernel-5.10.112-11.1.al8
,kernel-5.10.112-11.2.al8
,kernel-5.10.134-12.al8
,kernel-5.10.134-12.1.al8
, orkernel-5.10.134-12.2.al8
does not support the group identity feature because the feature has been disabled in the kernel. You can run theuname -r
command to view the kernel version of Alibaba Cloud Linux 3.
- For Alibaba Cloud Linux 2 that uses a kernel version in the range of
kernel-4.19.91-25.1.al7
tokernel-4.19.91-25.5.al7
, if you use the group identity feature, downtime occurs. You must upgrade the kernel version tokernel-4.19.91-25.6.al7
or later. For more information, see the "FAQ" section of this topic. - If your Alibaba Cloud Linux 3 uses the kernel version
kernel-5.10.134-12.2.al8
and the x86_64 architecture, run the following commands to use the group identity feature:yum makecache yum install scheduler-group-identity.x86_64
- For Alibaba Cloud Linux 2 that uses the kernel version
kernel-4.19.91-26.4
or later and Alibaba Cloud Linux 3 that uses the kernel versionkernel-5.10.134-13.al8
or later,/proc/sys/kernel/sched_group_identity_enabled
is added for you to enable the group identity feature. Before you can use the group identity feature, you must run theecho 1 > /proc/sys/kernel/sched_group_identity_enabled
command to enable the feature.
Background information
- High-priority tasks have the minimum wakeup latency.
- Low-priority tasks do not affect the performance of high-priority tasks.
- The wakeup of low-priority tasks does not affect the performance of high-priority tasks.
- Low-priority tasks do not affect the performance of high-priority tasks by sharing hardware units.
How the group identity feature works
The group identity feature allows you to configure identities for CPU cgroups to define the priorities of tasks in the cgroups. The group identity feature relies on a dual red-black tree architecture. A low-priority red-black tree is added based on the red-black tree of the Completely Fair Scheduler (CFS) scheduling queue to store low-priority tasks.
Identity | Description |
---|---|
ID_HIGHCLASS | Identifies a high-priority task. A high-priority task has more opportunities to preempt resources than a normal- or low-priority task. When the CFS schedules high-priority tasks, the following situations may occur:
|
ID_NORMAL | Identifies a normal-priority task. A normal-priority task has more opportunities to preempt resources than a low-priority task. When the CFS schedules normal-priority tasks, the following situations may occur:
|
ID_UNDERCLASS | Identifies a low-priority task. When the CFS schedules low-priority tasks, the following situation may occur: If an |
- For tasks in cgroups of the same level, identity priorities take effect.
- For tasks in cgroups of different levels, the identity priority of the higher-level task does not take effect, and that of the lower-level task takes effect.
- For tasks with the same identity priority, resources are competed in compliance with CFS policies. Note that the runtime of tasks identified by
ID_UNDERCLASS
orID_NORMAL
is not ensured to reach the minimum value.
Identity | Description |
---|---|
ID_SMT_EXPELLER | Identifies the SMT expeller. When an SMT expeller runs on an SMT CPU, it evicts the tasks that are identified by ID_UNDERCLASS from the peer CPU. |
ID_IDLE_SEEKER | Specifies that when a task wakes up, the task attempts to find idle CPUs within the limits of scheduler policies. |
ID_IDLE_SAVER | Used with the sched_idle_saver_wmark kernel parameter. You can use sched_idle_saver_wmark to set a water mark for CPU idle time. When a task identified by ID_IDLE_SAVER wakes up, the task attempts to find an idle CPU whose idle time exceeds the specified water mark. |
Interfaces
- Interfaces used to configure identitiesThe group identity feature provides two interfaces for you to configure task identities:
/sys/fs/cgroup/cpu/$cg/cpu.identity
and/sys/fs/cgroup/cpu/$cg/cpu.bvt_warp_ns
. The $cg variable specifies the child cgroup directory node where a task is located. Before you use the interfaces, take note of the following items:- The
cpu.bvt_warp_ns
interface is a quick configuration interface. The written value of this interface can be converted to the value of identity. - Both the
cpu.identity
andcpu.bvt_warp_ns
interfaces can be used to change the identities of cgroups. - The value of identity is written by using the
cpu.identity
interface overwrites the last value of identity written by using thecpu.bvt_warp_ns
interface. The value of thecpu.bvt_warp_ns
interface remains unchanged. - The value of identity is written by using the
cpu.bvt_warp_ns
interface overwrites the last value of identity written by using thecpu.identity
interface. The value of thecpu.identity
interface remains unchanged. - You can use one of the interfaces to configure task identities. We recommend that you do not configure both of the interfaces.
- If you are unfamiliar with operations related to the operating system kernel, we recommend that you do not use the
cpu.identity
interface.
Interface Description cpu.identity
The default value is 0, which indicates the ID_NORMAL
identity.The interface is a 5-bit field. Valid values of each bit: 0 and 1. 0 indicates not to assume the identity. 1 indicates to assume the identity. Description of each bit:- If the interface value is left empty, it indicates the
ID_NORMAL
identity. - Bit 0: indicates the
ID_UNDERCLASS
identity. - Bit 1: indicates the
ID_HIGHCLASS
identity. - Bit 2: indicates the
ID_SMT_EXPELLER
identity. - Bit 3: indicates the
ID_IDLE_SAVER
identity. - Bit 4: indicates the
ID_IDLE_SEEKER
identity.
For example, if you want to set the identity of a cgroup to
ID_HIGHCLASS
andID_IDLE_SEEKER
, set bit 1 and bit 4 to 1 and the other bits to 0 to obtain a binary value of 10010, which is converted to a decimal value of 18. Then, run theecho 18 > /sys/fs/cgroup/cpu/$cg/cpu.identity
command to write 18 to cpu.identity.cpu.bvt_warp_ns
The default value is 0, which indicates the ID_NORMAL
identity. Valid values:- 2: indicates the
ID_SMT_EXPELLER
,ID_IDLE_SEEKER
, andID_HIGHCLASS
identities. The corresponding value in cpu.identity is 22. - 1: indicates the
ID_HIGHCLASS
andID_IDLE_SEEKER
identities. The corresponding value in cpu.identity is 18. - 0: indicates the
ID_NORMAL
identity. The corresponding value in cpu.identity is 0. - -1: indicates the
ID_UNDERCLASS
andID_IDLE_SAVER
identities. The corresponding value in cpu.identity is 9. - -2: indicates the
ID_UNDERCLASS
andID_IDLE_SAVER
identities. The corresponding value in cpu.identity is 9.
- The
- Interfaces used to enable or disable scheduling featuresYou can run the following command to view the default settings of kernel scheduling features by using the
sched_features
interface:
The following table describes the scheduling features.cat /sys/kernel/debug/sched_features
Scheduling feature Description Default value ID_IDLE_AVG
This feature is used with the ID_IDLE_SAVER
identity to count the runtime ofID_UNDERCLASS
tasks towards idle time. This prevents resource wastes by ensuring that no CPUs remain idle when onlyID_UNDERCLASS
tasks are running.ID_IDLE_AVG
: indicates that the feature is enabled.ID_RESCUE_EXPELLEE
This feature is used in load balancing scenarios. If tasks cannot find available CPU resources, CPUs that are evicting ID_UNDERCLASS
tasks are used for balancing loads. This feature helpsID_UNDERCLASS
tasks get out of the evicted state as soon as possible.ID_RESCUE_EXPELLEE
: indicates that the feature is enabled.ID_EXPELLEE_NEVER_HOT
After this feature is enabled, when a task that is being evicted decides whether to migrate to another CPU, hot cache will not be a reason for migration requests to be denied. This feature helps ID_UNDERCLASS
tasks get out of the evicted state as soon as possible.NO_ID_EXPELLEE_NEVER_HOT
: indicates that the feature is disabled.ID_LOOSE_EXPEL
When this feature is enabled, CPUs do not update their eviction states every time they select tasks but have the states automatically updated at the time specified by the sched_expel_update_interval
kernel parameter. The configuration of this feature affects only state updates when CPUs select tasks, not the updates of IPI interrupts.NO_ID_LOOSE_EXPEL
: indicates that the feature is disabled.ID_LAST_HIGHCLASS_STAY
When this feature is enabled, the last ID_HIGHCLASS
task that runs on a CPU cannot be migrated to another CPU.ID_LAST_HIGHCLASS_STAY
: indicates that the feature is enabled.ID_EXPELLER_SHARE_CORE
- When this feature is enabled,
ID_SMT_EXPELLER
tasks can preferentially run on physical cores on whichID_SMT_EXPELLER
tasks are already running. - When this feature is disabled,
ID_SMT_EXPELLER
tasks are distributed across physical cores so thatthey
do not interfere with each other.
ID_EXPELLER_SHARE_CORE
: indicates that the feature is enabled. - When this feature is enabled,
- Interfaces used by sysctl to configure kernel parametersSome capabilities of the group identity feature depend on the values of kernel parameters. The following table describes the parameters.
Kernel parameter Description Unit Default value /proc/sys/kernel/sched_expel_update_interval
The interval at which the eviction state is automatically updated when a CPU selects tasks. This parameter is valid only when the ID_LOOSE_EXPEL
feature is enabled.ms 10 /proc/sys/kernel/sched_expel_idle_balance_delay
The minimum idle balance
interval when a CPU is evicting tasks. A value of -1 indicates thatidle balance
is not allowed.If only
ID_UNDERCLASS
tasks exist on a CPU and the tasks are being evicted, the CPU is idle.Idle balance
is performed on this CPU to improve load-balancing effects. However, this may damageID_UNDERCLASS
tasks. You can set thesched_expel_idle_balance_delay
parameter to alleviate this issue.ms -1 /proc/sys/kernel/sched_idle_saver_wmark
The watermark for CPU idle time. When an ID_IDLE_SAVER
task wakes up, the task attempts to find an idle CPU whose idle time exceeds the specified watermark.ns 0 /proc/sys/kernel/sched_group_identity_enabled
For the kernel version kernel-4.19.91-26.4
or later,/proc/sys/kernel/sched_group_identity_enabled
is added for you to enable the group identity feature. Before you can use the group identity feature, you must run theecho 1 > /proc/sys/kernel/sched_group_identity_enabled
command to enable the feature.After the group identity feature is enabled, if the
cpu.bvt_warp_ns
orcpu.identity
value of the cgroup is not zero, data cannot be written to the/proc/sys/kernel/sched_group_identity_enabled
interface.Note Suppose that you use the kernel version4.19.91-26.4.al7
,4.19.91-26.5.al7
, or4.19.91-26.6.al7
, thesched_group_identity_enabled
value is 1, and the cpu.bvt_warp_ns value of the cgroup is not zero. When you read the/proc/sys/kernel/sched_group_identity_enabled
settings, errors are returned. This is a read vulnerability that does not affect the normal use of the interface. This vulnerability is fixed in kernel version4.19.91-27.al7
and later.N/A 0
Information output
cat /proc/sched_debug
The following table describes the output parameters.Parameter | Description |
---|---|
nr_high_running | The number of ID_HIGHCLASS tasks that are running on the current CPU. |
nr_under_running | The number of ID_UNDERCLASS tasks that are running on the current CPU. |
nr_expel_immune | The number of non-ID_UNDERCLASS tasks that are running on the current CPU. |
smt_expeller | Indicates whether ID_SMT_EXPELLER tasks are running on the current CPU. A value of 1 indicates that ID_SMT_EXPELLER tasks are running on the current CPU. A value of 0 indicates that no ID_SMT_EXPELLER tasks are running on the current CPU. |
on_expel | Indicates whether ID_SMT_EXPELLER tasks are running on the peer SMT CPU. A value of 1 indicates that ID_SMT_EXPELLER tasks are running on the current CPU. A value of 0 indicates that no ID_SMT_EXPELLER tasks are running on the current CPU. |
high_exec_sum | The cumulative runtime of ID_HIGHCLASS tasks on the current CPU. |
under_exec_sum | The cumulative runtime of ID_UNDERCLASS tasks on the current CPU. |
h_nr_expel_immune | The number of non-ID_UNDERCLASS tasks that are running on cfs_rq . |
expel_start | The difference between the minimum vruntimes of the two red-black trees when the CPU starts to evict tasks. |
expel_spread | The cumulative difference between the minimum vruntimes of the two red-black trees caused by CPU eviction states. |
min_under_vruntime | The minimum vruntime of the low-priority red-black tree. |
FAQ
How do I upgrade a kernel version in the range of kernel-4.19.91-25.1.al7
to kernel-4.19.91-25.5.al7
to kernel-4.19.91-25.6.al7
or later?
- Log on to the instance.
For more information, see Connect to a Linux instance by using a password or key.
- Run the following command to query the kernel version:
uname -r
- Run the following command to upgrade the kernel version:
yum update kernel
- Run the following command to restart the instance to make the new kernel version take effect:
reboot