Alibaba Cloud Linux 2 with kernel version 4.19.91-24.al7 and later and Alibaba Cloud Linux 3 with kernel version 5.10.46-7.al8 and later support the group identity feature. You can use the group identity feature to configure different identities for CPU control groups (cgroups) to define the priorities of processes (tasks) in the cgroups.
Background information
When you deploy latency-sensitive tasks and computing tasks on the same instance, the Linux kernel scheduler must provide more scheduling opportunities to high-priority tasks to minimize scheduling latency and the impacts of low-priority tasks on kernel scheduling. In the preceding scenario, Alibaba Cloud Linux provides the group identity feature and adds interfaces that you can use to configure scheduling priorities for CPU cgroups. Tasks that have different priorities have the following characteristics:
High-priority tasks have the minimal wakeup latency.
Low-priority tasks do not affect the performance of high-priority tasks.
Waking up low-priority tasks does not affect the performance of high-priority tasks.
Low-priority tasks do not share hardware units and do not cause negative impacts on the performance of high-priority tasks.
Prerequisites
In Alibaba Cloud Linux 2 with kernel version
4.19.91-26,4.19.91-26.1,4.19.91-26.2, or4.19.91-26.3, the group identity feature is disabled in the kernel. You can run theuname -rcommand to query the kernel version of Alibaba Cloud Linux 2.In Alibaba Cloud Linux 3 with kernel version
5.10.112-11.al8,5.10.112-11.1.al8,5.10.112-11.2.al8,5.10.134-12.al8,5.10.134-12.1.al8, or5.10.134-12.2.al8, the group identity feature is disabled in the kernel.
If you use the group identity feature on Alibaba Cloud Linux 2 with a kernel version within the range of
4.19.91-25.1.al7to4.19.91-25.5.al7, downtime occurs. In this case, upgrade the kernel version to4.19.91-25.6.al7or later. For more information, see the Upgrade the kernel section of the "Change the kernel version" topic.If Alibaba Cloud Linux 3 with kernel version
5.10.134-12.2.al8uses the x86_64 architecture, run the following commands to enable the group identity feature:yum makecache sudo yum install scheduler-group-identity.x86_64 -yIn Alibaba Cloud Linux 2 with kernel version
4.19.91-26.4or later and Alibaba Cloud Linux 3 with kernel version5.10.134-13.al8or later, the/proc/sys/kernel/sched_group_identity_enabledinterface is added to allow you to enable the group identity feature. Before you can use the group identity feature, you must run thesudo sh -c 'echo 1 > /proc/sys/kernel/sched_group_identity_enabled'command to enable the feature.
How the group identity feature works
The group identity feature allows you to configure identities for CPU cgroups to define the priorities of tasks in the cgroups. The group identity feature relies on a dual red-black tree architecture. A low-priority red-black tree based on the red-black tree of the Completely Fair Scheduler (CFS) scheduling queue is added to store low-priority tasks.
When the kernel schedules the tasks for which identities are configured, the kernel processes the tasks based on the priorities of the tasks. The following table describes the identities in descending order of priority.
Identity | Description |
| Identifies a high-priority task. A high-priority task has more opportunities to preempt resources than a normal- or low-priority task. When the CFS schedules high-priority tasks, the following scenarios may occur:
|
| Identifies a normal-priority task. A normal-priority task has more opportunities to preempt resources than a low-priority task. When the CFS schedules normal-priority tasks, the following scenarios may occur:
|
| Identifies a low-priority task. When the CFS schedules low-priority tasks, the following scenarios may occur: If an |
The preceding identities apply based on the resource management policies of the CPU cgroups.
For tasks in CPU cgroups of the same level, the identity priorities take effect.
For tasks in CPU cgroups of different levels, the identity priorities do not take effect on tasks in the parent cgroups but take effect on tasks in the child cgroups.
For tasks that have the same identity priority, resources are competed for in compliance with the CFS policies. Take note that the runtime of tasks identified by the
ID_UNDERCLASSorID_NORMALidentity may not reach the minimum value.
Other identities
Identity | Description |
| Identifies an SMT expeller task. When an SMT expeller task runs an SMT CPU, the task evicts the tasks that are identified by the |
| Specifies that when a task wakes up, the task attempts to find idle CPUs within the limits of the scheduler policies. |
| Used together with the |
Interfaces
Interfaces used to configure identities
The group identity feature provides the following interfaces to allow you to configure task identities:
/sys/fs/cgroup/cpu/$cg/cpu.identityand/sys/fs/cgroup/cpu/$cg/cpu.bvt_warp_ns. The$cgvariable specifies the child cgroup directory node on which a task runs. Before you use the interfaces, take note of the following items:The
cpu.bvt_warp_nsinterface is a quick configuration interface. The written value of the interface is converted into identity values.You can use the
cpu.identityandcpu.bvt_warp_nsinterfaces to change the identities of cgroups.The identity value that is written by using the
cpu.identityinterface overwrites the identity value that is previously written by using thecpu.bvt_warp_nsinterface, but the value of thecpu.bvt_warp_nsinterface remains unchanged.The identity value that is written by using the
cpu.bvt_warp_nsinterface overwrites the identity value that is previously written by using thecpu.identityinterface, but the value of thecpu.identityinterface remains unchanged.You can use one of the interfaces to configure task identities. We recommend that you do not use the interfaces at the same time.
If you are unfamiliar with the operations related to the operating system kernel, we recommend that you do not use the
cpu.identityinterface.
The following table describes the interfaces.
Interface
Description
cpu.identityThe default value is 0, which specifies the
ID_NORMALidentity.The interface is a 5-bit field. Valid values of each bit: 0 and 1. A value of 0 specifies that the identity is not assumed. A value of 1 specifies that the identity is assumed. Description of each bit:
If the interface is left empty, the
ID_NORMALidentity is used.Bit 0: specifies the
ID_UNDERCLASSidentity.Bit 1: specifies the
ID_HIGHCLASSidentity.Bit 2: specifies the
ID_SMT_EXPELLERidentity.Bit 3: specifies the
ID_IDLE_SAVERidentity.Bit 4: specifies the
ID_IDLE_SEEKERidentity.
For example, if you want to set the identity of a cgroup to
ID_HIGHCLASSandID_IDLE_SEEKER, set bit 1 and bit 4 to 1 and the other bits to 0 to obtain a binary value of 10010, which is converted into a decimal value of 18. Then, run theecho 18 > /sys/fs/cgroup/cpu/$cg/cpu.identitycommand to write 18 to the cpu.identity interface.cpu.bvt_warp_nsThe default value is 0, which specifies the
ID_NORMALidentity. Valid values:2: specifies the
ID_SMT_EXPELLER,ID_IDLE_SEEKER, andID_HIGHCLASSidentities. The corresponding value in the cpu.identity interface is 22.1: specifies the
ID_HIGHCLASSandID_IDLE_SEEKERidentities. The corresponding value in the cpu.identity interface is 18.0: specifies the
ID_NORMALidentity. The corresponding value in the cpu.identity interface is 0.-1: specifies the
ID_UNDERCLASSandID_IDLE_SAVERidentities. The corresponding value in the cpu.identity interface is 9.-2: specifies the
ID_UNDERCLASSandID_IDLE_SAVERidentities. The corresponding value in the cpu.identity interface is 9.
NoteBy default, Alibaba Cloud Linux supports the cgroup v1 interfaces. Alibaba Cloud Linux 3 with kernel version
5.10.134-13and later in the 5.10 kernel series also supports the following cgroup v2 interfaces for the group identity feature:/sys/fs/cgroup/$cg/cpu.identityand/sys/fs/cgroup/$cg/cpu.bvt_warp_ns. The$cgvariable specifies the child cgroup directory node on which a task runs.Interfaces used to enable or disable kernel scheduling features
You can run the following command to view the default settings of kernel scheduling features by using the
sched_featuresinterface:sudo cat /sys/kernel/debug/sched_featuresThe following table describes the kernel scheduling features.
Kernel scheduling feature
Description
Default value
ID_IDLE_AVGThis feature is used together with the
ID_IDLE_SAVERidentity to count the runtime ofID_UNDERCLASStasks towards the idle time. This ensures that no CPUs remain idle when onlyID_UNDERCLASStasks are running, and prevents resource waste.ID_IDLE_AVG: indicates that the feature is enabled.ID_RESCUE_EXPELLEEThis feature is used in load balancing scenarios. If tasks cannot find available CPU resources, CPUs that are evicting
ID_UNDERCLASStasks are used to balance loads. This feature helps moveID_UNDERCLASStasks out of the evicted state at the earliest opportunity.ID_RESCUE_EXPELLEE: indicates that the feature is enabled.ID_EXPELLEE_NEVER_HOTAfter this feature is enabled, if a request is initiated to migrate a task that is being evicted to another CPU, hot cache does not cause the migration request to be denied. This feature helps move
ID_UNDERCLASStasks out of the evicted state at the earliest opportunity.NO_ID_EXPELLEE_NEVER_HOT: indicates that the feature is disabled.ID_LOOSE_EXPELAfter this feature is enabled, CPUs do not update the eviction status every time the CPUs select tasks but automatically update the status based on the time specified by the
sched_expel_update_intervalkernel parameter. The configuration of this feature affects only status updates when CPUs select tasks. The updates for inter-processor interrupts (IPIs) are not affected.NO_ID_LOOSE_EXPEL: indicates that the feature is disabled.ID_LAST_HIGHCLASS_STAYAfter this feature is enabled, the last
ID_HIGHCLASStask that runs on a CPU cannot be migrated to another CPU.ID_LAST_HIGHCLASS_STAY: indicates that the feature is enabled.ID_EXPELLER_SHARE_COREIf this feature is enabled,
ID_SMT_EXPELLERtasks can preferentially run on physical cores on whichID_SMT_EXPELLERtasks are already running.If this feature is disabled,
ID_SMT_EXPELLERtasks are distributed across physical cores. This way, theID_SMT_EXPELLERtasks do not interfere with each other.
ID_EXPELLER_SHARE_CORE: indicates that the feature is enabled.ID_ABSOLUTE_EXPELIn Alibaba Cloud Linux 3, this feature is introduced in kernel version
5.10.134-16.3and is usable in kernel version5.10.134-16.3and later in the 5.10 kernel series. After this feature is enabled,ID_UNDERCLASStasks are absolutely suppressed and cannot be scheduled ifID_NORMALorID_HIGHCLASStasks exist in the task queues for running. In worst case scenarios,ID_UNDERCLASStasks starve. In hybrid deployment scenarios, assess the loads of tasks that have different identities before you enable the feature.NO_ID_ABSOLUTE_EXPEL: indicates that the feature is disabled.ID_LOAD_BALANCEIn Alibaba Cloud Linux 3, this feature is introduced in kernel version
5.10.134-16.3and is usable in kernel version5.10.134-16.3and later in the 5.10 kernel series. After this feature is enabled, the scheduler considers the CPUs on which onlyID_UNDERCLASStasks run to be idle and attempts to migrateID_HIGHCLASStasks to the idle CPUs when a scheduler balances loads. During the migration, the scheduler tries to distribute theID_HIGHCLASStasks among the CPUs. This prevents CPU resource contention and Hyper-Threading (HT) inference between theID_HIGHCLASStasks and ensures that eachID_HIGHCLASStask obtains sufficient CPU resources.NO_ID_LOAD_BALANCE: indicates that the feature is disabled.Interfaces used by
sysctlto configure kernel parametersSpecific capabilities of the group identity feature depend on the values of kernel parameters. The following table describes the kernel parameters.
Kernel parameter
Description
Unit
Default value
/proc/sys/kernel/sched_expel_update_intervalThe interval at which the eviction status is automatically updated when a CPU selects tasks. This kernel parameter takes effect only if the
ID_LOOSE_EXPELfeature is enabled.ms
10
/proc/sys/kernel/sched_expel_idle_balance_delayThe minimum
idle balanceinterval when a CPU is evicting tasks. A value of -1 specifies thatidle balanceis not allowed.If only
ID_UNDERCLASStasks exist on a CPU and the tasks are being evicted, the CPU is idle.Idle balanceis performed on the CPU to improve load-balancing effects. However, this may damageID_UNDERCLASStasks. You can specify thesched_expel_idle_balance_delayparameter to alleviate this issue.ms
-1
/proc/sys/kernel/sched_idle_saver_wmarkThe watermark for CPU idle time. When an
ID_IDLE_SAVERtask wakes up, the task attempts to find an idle CPU whose idle time exceeds the specified watermark.ns
0
/proc/sys/kernel/sched_group_identity_enabledStarting from kernel version
4.19.91-26.4, the/proc/sys/kernel/sched_group_identity_enabledinterface is added to allow you to enable the group identity feature. Before you can use the group identity feature, you must run theecho 1 > /proc/sys/kernel/sched_group_identity_enabledcommand to enable the feature.After the group identity feature is enabled, data cannot be written to the
/proc/sys/kernel/sched_group_identity_enabledinterface if the value of thecpu.bvt_warp_nsorcpu.identityinterface of the cgroup is not zero.NoteIf your kernel version is
4.19.91-26.4.al7,4.19.91-26.5.al7, or4.19.91-26.6.al7, thesched_group_identity_enabledinterface is set to 1, and the value of the cpu.bvt_warp_ns interface of the cgroup is not zero, errors occur when you read the/proc/sys/kernel/sched_group_identity_enabledsettings. This is a read bug that does not affect the normal usage of the interface. This bug is fixed in kernel version4.19.91-27.al7and later.N/A
0
Information output
When you use the group identity feature, you can run the following command to view various parameters:
cat /proc/sched_debugThe following table describes the output parameters.
Parameter | Description |
| The number of |
| The number of |
| The number of non- |
| Indicates whether |
| Indicates whether |
| The cumulative runtime of |
| The cumulative runtime of |
| The number of non- |
| The difference between the minimum vruntimes of the two red-black trees when the CPU starts to evict tasks. |
| The cumulative difference between the minimum vruntimes of the two red-black trees caused by the CPU eviction status. |
| The minimum vruntime of the low-priority red-black tree. |