Starting with kernel version 5.10.84-10
for the x86 architecture and 5.10.134-16
for the Arm architecture, Alibaba Cloud Linux 3 supports Kernel Electric-Fence (KFENCE). This topic describes the KFENCE feature and how to use this feature.
Usage notes
KFENCE is a built-in Linux kernel tool that can be enabled in an online environment. It detects memory pollution issues in the kernel and kernel modules. KFENCE was introduced in kernel version 5.12
of the upstream Linux kernel community. KFENCE detects accesses to freed or unallocated memory by inserting special fences close to memory boundaries. If memory pollution occurs, KFENCE detects the issue and prints an error message that contains the details of the issue. For more information about KFENCE, see KFENCE documentation and OpenAnolis.
Alibaba Cloud enhances the KFENCE feature in Alibaba Cloud Linux 3. You can flexibly and dynamically enable or disable KFENCE and use it to fully detect memory pollution issues, which facilitates online detection and offline debugging.
If you are a developer of the kernel or kernel modules, you can use KFENCE to check whether memory pollution occurs in the kernel or kernel modules that you are developing. If you are a common user and encounter a kernel crash, you can use KFENCE to help Alibaba Cloud or third-party driver developers collect detailed information.
Enable KFENCE
The KFENCE feature is used in the following business scenarios:
Online detection scenario
Scenario 1: Use KFENCE to detect whether memory pollution occurs
KFENCE in this scenario occupies 2 MiB of memory and does not affect performance.
Run the following command to enable KFENCE by adding the boot command line:
sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="kfence.sample_interval=100"
In this scenario, the configuration automatically takes effect the next time the system restarts.
Scenario 2: Use KFENCE to detect memory pollution issues
In this scenario, a large amount of memory at the GiB level is consumed. Proceed with caution when you use small-memory machines.
Create a memory allocation script and add the following content. In the following example, the script name is kfence.sh and the slab type to be monitored is
kmalloc-64
.#!/bin/bash # usage: ./kfence.sh kmalloc-64 SLAB_PREFIX=/sys/kernel/slab MODULE_PREFIX=/sys/module/kfence/parameters if [ $# -eq 0 ]; then echo "err: please input slabs" exit 1 fi #check whether slab exists for i in $@; do slab_path=$SLAB_PREFIX/$i if [ ! -d $slab_path ]; then echo "err: slab $i not exist!" exit 1 fi done #calculate num_objects sumobj=0 for i in $@; do objects=($(cat $SLAB_PREFIX/$i/objects)) maxobj=1 for ((j=1; j<${#objects[@]}; j++)); do nodeobj=$(echo ${objects[$j]} | awk -F= '{print $2}') [ $maxobj -lt $nodeobj ] && maxobj=$nodeobj done ((sumobj += maxobj)) done echo "recommend num_objects per node: $sumobj" #check kfence stats if [ $(cat $MODULE_PREFIX/sample_interval) -ne 0 ]; then echo "kfence is running, disable it and wait..." echo 0 > $MODULE_PREFIX/sample_interval sleep 1 fi #disable all slabs catching for file in $SLAB_PREFIX/* do echo 0 > $file/kfence_enable done #disable order0 page catching echo 0 > $MODULE_PREFIX/order0_page #enable setting slabs catching for i in $@; do echo 1 > $SLAB_PREFIX/$i/kfence_enable done #setting num_objects and node mode echo $sumobj > $MODULE_PREFIX/num_objects echo node > $MODULE_PREFIX/pool_mode #start kfence echo -1 > $MODULE_PREFIX/sample_interval if [ $? -ne 0 ]; then echo "err: kfence enable fail!" exit 1 fi echo "kfence enabled!"
The script is used to detect the number of active objects of the slabs, estimate the appropriate KFENCE pool size based on the number, and then enable KFENCE to obtain information about the memory allocation of all the slabs.
NoteSlabs are commonly used in memory management to optimize memory allocation and release operations. This improves system performance and efficiency. KFENCE can monitor slabs and order 0 pages. For more information, see the "Terms" section in this topic.
Run the following command to execute the script to start the probe:
sudo bash ./kfence.sh kmalloc-64
Offline debugging scenario
Enable KFENCE by specifying parameters for the x86 architecture
Run the following command to enable KFENCE:
sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="kfence.num_objects=1000000" sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="kfence.sample_interval=-1" sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="kfence.fault=panic"
num_objects
: the size of the KFENCE pool. The amount of memory occupied by the KFENCE pool is calculated by using the following formula: (num_objects + 1) × 8 KiB. We recommend that you set the num_objects value to 10% of the maximum available memory. For example,num_objects
is set to 1000000. In this case, the amount of occupied memory is (1000000 + 1) × 8 KiB, which is rounded up to 8 GiB.sample_interval
: the interval at which memory is monitored. Valid values:0: The KFENCE feature is disabled and does not monitor memory.
Positive number: the sampling interval in milliseconds. For example, a value of 100 indicates that KFENCE monitors the memory that is allocated every 100 milliseconds.
Negative number: the full mode. KFENCE monitors all memory that meets a specified condition, for example, a specified slab type.
fault
: This parameter is introduced in kernel version5.10.134-16
. Default value:report
. When the fault parameter is set topanic
, downtime occurs on the instance on which an issue was detected to preserve the core dump file that was generated when the issue occurred.
Restart the operating system for the configurations to take effect.
For more information, see Restart instances.
Use a script to enable KFENCE for the x86 or Arm architecture
After you run a script to enable KFENCE, KFENCE cannot detect the memory pollution issues that may occur during kernel startup.
If you want to change the value of the
num_objects
orsample_interval
parameter after you enable KFENCE, you must disable KFENCE.
Run the following command to enable KFENCE:
sudo sh -c 'echo 1000000 > /sys/module/kfence/parameters/num_objects'
sudo sh -c 'echo -1 > /sys/module/kfence/parameters/sample_interval'
sudo sh -c 'echo panic > /sys/module/kfence/parameters/fault'
num_objects
: the size of the KFENCE pool. The amount of memory occupied by the KFENCE pool is calculated by using the following formula: (num_objects + 1) × 8 KiB. We recommend that you set the num_objects value to 10% of the maximum available memory. For example,num_objects
is set to 1000000. In this case, the amount of occupied memory is (1000000 + 1) × 8 KiB, which is rounded up to 8 GiB.sample_interval
: the interval at which memory is monitored. Valid values:0: The KFENCE feature is disabled and does not monitor memory.
Positive number: the sampling interval in milliseconds. For example, a value of 100 indicates that KFENCE monitors the memory that is allocated every 100 milliseconds.
Negative number: the full mode. KFENCE monitors all memory that meets a specified condition, for example, a specified slab type.
fault
: This parameter is introduced in kernel version5.10.134-16
. Default value:report
. When the fault parameter is set topanic
, downtime occurs on the instance on which an issue was detected to preserve the core dump file that was generated when the issue occurred.NoteIf your kernel version is earlier than
5.10.134-16
, an error message is reported when you run the preceding command. The error message does not affect KFENCE. You can ignore the error message.
View results
After KFENCE detects memory pollution issues, you can view the number of issues and detailed error messages.
In the example shown in the following figure, the
sudo cat /sys/kernel/debug/kfence/stats
command output indicates that thetotal bugs
count increases.The system prints information in dmesg. To view KFENCE error log information, run the
dmesg | grep -i kfence
command. In the example shown in the following figure, one error message is returned.
Disable KFENCE
Run the following command to disable KFENCE:
sudo bash -c 'echo 0 > /sys/module/kfence/parameters/sample_interval'
When the KFENCE feature is disabled, KFENCE no longer detects memory allocation issues. When all monitored memory in the pool is released, KFENCE returns the memory to the kernel partner systems at a granularity of 1 GiB.
In scenarios where KFENCE is enabled by adding the boot command line, you can run the following command to remove the related parameters. Then, KFENCE is not automatically enabled the next time the system restarts.
sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --remove-args="kfence.sample_interval"
FAQ
Terms
The following table describes the basic terms of the KFENCE feature.
Term | Description |
memory pollution | The issue that memory areas are incorrectly modified or corrupted during the program runtime, causing the program to become abnormal or crash. Memory pollution can be caused by programming errors, software vulnerabilities, malware, or hardware failures. |
slab | Slabs are an efficient memory allocation mechanism in the Linux kernel. The kernel uses slabs to pre-allocate a specific number of memory objects in a memory cache pool for quick memory allocation and release. Slabs can be used to avoid frequent memory allocation and release operations and improve the efficiency of memory allocation. |
order-0 page | Order-0 pages are a memory allocation mechanism in the Linux kernel, where memory is divided into fixed-size page frames, typically 4 KiB. An order-0 page is a 4-KiB page frame that is the basic unit for memory allocation. When an application or the kernel needs to allocate small blocks of memory, memory is allocated by order-0 pages. |