Alibaba Cloud Linux 2 with a kernel of the kernel-4.19.91-24.al7 version or later supports the group identity feature. You can use the group identity feature to configure different identities for CPU control groups (cgroups) to prioritize process tasks in the cgroups.

Background information

Notice Elastic Compute Service (ECS) instances that run Alibaba Cloud Linux 2 with a kernel of the kernel-4.19.91-25.1.al7 version can go down when the group identity feature is used. You must upgrade the kernel version to kernel-4.19.91-25.6.al7 or later. For more information, see the FAQ section of this topic.
When latency-sensitive tasks and computing tasks are deployed on the same instance, the Linux kernel scheduler must provide more scheduling opportunities to high-priority tasks to minimize scheduling latency and the impacts of low-priority tasks on kernel scheduling. For this scenario, Alibaba Cloud Linux 2 provides the group identity feature and adds interfaces used to configure scheduling priorities to CPU cgroups. Tasks with different priorities have the following characteristics:
  • High-priority tasks have the minimum wakeup latency.
  • Low-priority tasks do not affect the performance of high-priority tasks.
    • The wakeup of low-priority tasks does not affect the performance of high-priority tasks.
    • Low-priority tasks do not use the simultaneous multithreading (SMT) scheduler to share hardware and have no negative impacts on the performance of high-priority tasks.

How the group identity feature works

The group identity feature allows you to configure identities for CPU cgroups to prioritize tasks in the cgroups. The group identity feature relies on a dual red-black tree architecture. A low-priority red-black tree is added based on the red-black tree of the Completely Fair Scheduler (CFS) scheduling queue to store low-priority tasks.

When the kernel schedules the tasks that have identities, the kernel processes the tasks based on their priorities. The following table describes the identities in descending order of priority.
Identity Description
ID_HIGHCLASS Identifies a high-priority task. A high-priority task has more opportunities to preempt resources than a low-priority task.
When the CFS schedules high-priority tasks, the following situations may occur:
  • If a high-priority task wakes up while a low-priority task is running, the high-priority task can unconditionally preempt resources from the low-priority task.
  • If a high-priority task wakes up while a normal-priority task is running and the virtual runtime (vruntime) of the high-priority task is less than that of the normal-priority task, the high-priority task can ignore the original scheduling policy and preempt resources. The original scheduling policy specifies that a task cannot preempt resources when its runtime on a CPU is less than the minimum runtime.
  • When tasks queue up to run, if a low- or normal-priority task is running, a high-priority task whose vruntime is less than that of the running task can ignore the original scheduling policy and preempt resources. The original scheduling policy specifies that a task cannot preempt resources when its runtime on a CPU is less than the minimum runtime.
ID_NORMAL Identifies a normal-priority task. A normal-priority task has more opportunities to preempt resources than a low-priority task.
When the CFS schedules normal-priority tasks, the following situations may occur:
  • If a normal-priority task wakes up while a low-priority task is running, the normal-priority task can unconditionally preempt resources from the low-priority task.
  • When tasks queue up to run, if a low-priority task is running, a normal-priority task whose vruntime is less than that of the running task can ignore the original scheduling policy and preempt resources. The original scheduling policy specifies that a task cannot preempt resources when its runtime on a CPU is less than the minimum runtime.
ID_UNDERCLASS Identifies a low-priority task.

When the CFS schedules low-priority tasks, the following situations may occur:

If the peer SMT scheduler has run the ID_SMT_EXPELLER task, low-priority tasks cannot be scheduled to CPUs and are kicked out of the queue of tasks to be run.

The preceding identities apply based on the resource management policies of CPU cgroups.
  • For tasks in cgroups of the same level, identity priorities take effect.
  • For tasks in parent cgroups, identity priorities do not take effect. For tasks in child cgroups, identity priorities take effect.
  • Resources are competed among tasks that have identities with the same priority in compliance with CFS policies. Note that the runtime of tasks identified by ID_UNDERCLASS or ID_NORMAL may not reach the minimum value.
Other identities
Identity Description
ID_SMT_EXPELLER Identifies the SMT expeller. The SMT expeller evicts the tasks that are identified by ID_UNDERCLASS from the peer CPU when the SMT scheduler runs.
ID_IDLE_SEEKER Specifies that when a task wakes up, the task attempts to find idle CPUs within the limits of scheduler policies.
ID_IDLE_SAVER Used with the sched_idle_saver_wmark kernel parameter. You can use sched_idle_saver_wmark to set a water mark for CPU idle time. When a task identified by ID_IDLE_SAVER wakes up, the task attempts to find only an idle CPU whose idle time exceeds the specified water mark.

Interfaces

  • Interfaces used to configure identities
    The group identity feature provides two interfaces for you to configure task identities: /sys/fs/cgroup/cpu/$cg/cpu.identity and /sys/fs/cgroup/cpu/$cg/cpu.bvt_warp_ns. The $cg variable indicates the child cgroup directory node where a task is located. Before you use the interfaces, take note of the following items:
    • The cpu.bvt_warp_ns interface is a quick configuration interface. The written value of this interface can be converted to the value of identity.
    • Both cpu.identity and cpu.bvt_warp_ns interfaces can be used to change the identities of cgroups.
    • After data is written to the cpu.identity interface, the last value written of the cpu.bvt_warp_ns interface is overwritten. This overwrite operation is not reflected in the cpu.bvt_warp_ns interface.
    • After the data is written to the cpu.bvt_warp_ns interface, the last written value of the cpu.identity interface is overwritten. This overwrite operation is not reflected in the cpu.identity interface.
    • You can use one of the interfaces to configure task identifies. We recommend that you do not configure both of the interfaces.
    • If you are unfamiliar with operations related to the operating system kernel, we recommend that you do not use the cpu.identity interface.
    The following table describes the interfaces.
    Interface Description
    cpu.identity The default value is 0, which indicates the ID_UNDERCLASS identity.
    The interface is a 5-bit segment. Valid values of each bit of the interface:
    • If the interface is empty, it indicates the ID_NORMAL identity.
    • Bit 0: indicates the ID_UNDERCLASS identity.
    • Bit 1: indicates the ID_HIGHCLASS identity.
    • Bit 2: indicates the ID_SMT_EXPELLER identity.
    • Bit 3: indicates the ID_IDLE_SAVER identity.
    • Bit 4: indicates the ID_IDLE_SEEKER identity.

    For example, if you want to set the identity of a cgroup to ID_HIGHCLASS and ID_IDLE_SEEKER, set bit 1 and bit 4 to 1 and the other bits to 0 to obtain a binary value of 10010, which is converted to a decimal value of 18. Then, run the echo 18 > /sys/fs/cgroup/cpu/ $cg /cpu.identity command to write 18 to cpu.identity.

    cpu.bvt_warp_ns The default value is 0, which indicates the ID_NORMAL identity. Valid values:
    • 2: indicates the ID_SMT_EXPELLER, ID_IDLE_SEEKER, and ID_HIGHCLASS identities. The value of the corresponding identity is 22.
    • 1: indicates the ID_HIGHCLASS and ID_IDLE_SEEKER identities. The value of the corresponding identity is 18.
    • 0: indicates the ID_NORMAL identity. The value of the corresponding identity is 0.
    • -1: indicates the ID_UNDERCLASS and ID_IDLE_SAVER identities. The value of the corresponding identity is 9.
    • -2: indicates the ID_UNDERCLASS and ID_IDLE_SAVER identities. The value of the corresponding identity is 9.
  • Interfaces used to enable or disable scheduling features
    You can run the following command to view the default settings of kernel scheduling features by using the sched_features interface:
    cat /sys/kernel/debug/sched_features
    The following table describes the scheduling features.
    Scheduling feature Description Default value
    ID_IDLE_AVG This feature is used with the ID_IDLE_SAVER identity to count the runtime of ID_UNDERCLASS tasks towards idle time. This ensures that no CPUs remain idle when only ID_UNDERCLASS tasks are running and prevents resource wastes. ID_IDLE_AVG: indicates that the feature is enabled.
    ID_RESCUE_EXPELLEE This feature is used in load balancing scenarios. If tasks cannot find CPU resources available for use, CPUs that are evicting ID_UNDERCLASS tasks are load-balanced. This feature helps ID_UNDERCLASS tasks get out of the evicted state as soon as possible. ID_RESCUE_EXPELLEE: indicates that the feature is enabled.
    ID_EXPELLEE_NEVER_HOT After this feature is enabled, when a task that is being evicted decides to migrate to another CPU, hot cache does not cause the migration request to be denied. This feature helps ID_UNDERCLASS tasks get out of the evicted state as soon as possible. NO_ID_EXPELLEE_NEVER_HOT: indicates that the feature is disabled.
    ID_LOOSE_EXPEL After this feature is enabled, CPUs do not update their eviction states every time they select tasks but have the states automatically updated at the time specified by the sched_expel_update_interval kernel parameter. The configuration of this feature affects only state updates when CPUs select tasks, not the updates of IPI interrupts. NO_ID_LOOSE_EXPEL: indicates that the feature is disabled.
    ID_LAST_HIGHCLASS_STAY After this feature is enabled, the last ID_HIGHCLASS task that runs on a CPU cannot be migrated to another CPU. ID_LAST_HIGHCLASS_STAY: indicates that the feature is enabled.
  • Interfaces used by sysctl to configure kernel parameters
    Some capabilities of the group identity feature depend on the values of kernel parameters. The following table describes the parameters.
    Kernel parameter Description Unit Default value
    /proc/sys/kernel/sched_expel_update_interval The interval at which the eviction state is automatically updated when a CPU selects tasks. This parameter is valid only when the ID_LOOSE_EXPEL feature is enabled. ms 10
    /proc/sys/kernel/sched_expel_idle_balance_delay The minimum idle balance interval when a CPU is evicting tasks. A value of -1 indicates that idle balance is not allowed.

    If only ID_UNDERCLASS tasks exist on a CPU and the tasks are being evicted, the CPU is idle. Idle balance is performed on this CPU to improve load-balancing effects. However, this may damage ID_UNDERCLASS tasks. You can set the sched_expel_idle_balance_delay parameter to alleviate this issue.

    ms -1
    /proc/sys/kernel/sched_idle_saver_wmark The water mark for CPU idle time. When an ID_IDLE_SAVER task wakes up, the task attempts to find an idle CPU whose idle time exceeds the specified water mark. ns 0

Information output

When you use the group identity feature, you can run the following command to view various parameters:
cat /proc/sched_debug
The following table describes the output parameters.
Parameter Description
nr_high_running The number of ID_HIGHCLASS tasks that are running on the current CPU.
nr_under_running The number ofID_UNDERCLASS tasks that are running on the current CPU.
nr_expel_immune The number of non-ID_UNDERCLASS tasks that are running on the current CPU.
smt_expeller Indicates whether ID_SMT_EXPELLER tasks are running on the current CPU. A value of 1 indicates that ID_SMT_EXPELLER tasks are running on the current CPU. A value of 0 indicates that no ID_SMT_EXPELLER tasks are running on the current CPU.
on_expel Indicates whether ID_SMT_EXPELLER tasks are running on the peer SMT scheduler. A value of 1 indicates that ID_SMT_EXPELLER tasks are running on the current CPU. A value of 0 indicates that no ID_SMT_EXPELLER tasks are running on the current CPU.
high_exec_sum The cumulative runtime of ID_HIGHCLASS tasks on the current CPU.
under_exec_sum The cumulative runtime of ID_UNDERCLASS tasks on the current CPU.
h_nr_expel_immune The number of non-ID_UNDERCLASS tasks that are running on cfs_rq.
expel_start The difference between the minimum vruntimes of the two red-black trees when the CPU starts to evict tasks.
expel_spread The cumulative difference between the minimum vruntimes of the two red-black trees caused by CPU eviction states.
min_under_vruntime The minimum vruntime of the low-priority red-black tree.

FAQ

How do I upgrade the kernel version from kernel-4.19.91-25.1.al7 to kernel-4.19.91-25.6.al7 or later?

Solution:
  1. Log on to the instance.

    For more information, see Connect to a Linux instance by using a password or key.

  2. Run the following command to query the kernel version:
    uname -r
  3. Run the following command to upgrade the kernel version:
    yum update kernel
  4. Run the following command to restart the instance to make the new kernel version take effect:
    reboot