High slab_unreclaimable memory on a Linux ECS instance may indicate a kernel slab memory leak. This guide helps you identify the root cause and resolve the issue.
Symptoms
Run cat /proc/meminfo | grep "SUnreclaim" on a Linux instance. If the SUnreclaim value is large (for example, SUnreclaim: 6069340 kB), the instance has high slab_unreclaimable memory. If slab_unreclaimable exceeds 10% of total memory, a slab memory leak is likely.
Cause
The kernel's slab allocator caches same-sized memory objects to reduce fragmentation. The slab_unreclaimable portion holds memory the kernel cannot free because it contains active objects such as dentry and inode caches. If these caches grow excessively, high slab_unreclaimable usage can trigger the OOM Killer.
Solution
-
Connect to the Linux instance.
For more information, see Choose a connection method.
-
Identify the unreclaimable slab with the most
objectsor highest memory usage:-
Find the slab with the most
objectsor highest memory usage.slabtop -s -aRecord the slab name (the
NAMEcolumn) with the highestOBJ/SLABvalue. -
Check if the slab memory is unreclaimable.
Replace
<slab NAME>with the slab name with the highestOBJ/SLABvalue from the previous step.cat /sys/kernel/slab/<slab NAME>/reclaim_accountFor example, check whether
kmalloc-192is reclaimable:cat /sys/kernel/slab/kmalloc-192/reclaim_accountOutput 0 means unreclaimable; output 1 means reclaimable.
-
-
Identify the root cause of the slab memory leak.
Use the crash tool for static analysis or the perf tool for dynamic analysis. The following examples use the
kmalloc-192slab.Method 1: Use crash to perform static analysis
-
Install the crash tool.
sudo yum install crash -y -
Install the kernel-debuginfo tool.
-
Alibaba Cloud Linux 3
sudo yum install -y kernel-debuginfo-<kernel version> --enablerepo=alinux3-plus-debugNoteReplace
kernel versionwith your actual kernel version. Rununame -rto check. -
Alibaba Cloud Linux 2
sudo yum install kernel-debuginfo -y
-
-
Start the crash tool.
sudo crash -
View memory statistics for
kmalloc-192in crash:kmem -S kmalloc-192To limit output, view only the last few rows. For example, to view the last 10 rows:
kmem -S kmalloc-192 | tail -n 10Sample command output:
SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea004c94e780 ffff88132539e000 0 42 29 13 ffffea004cbef900 ffff88132fbe4000 0 42 40 2 ffffea000a0e6280 ffff88028398a000 0 42 40 2 ffffea004bfa8000 ffff8812fea00000 0 42 41 1 ffffea006842b380 ffff881a10ace000 0 42 41 1 ffffea0009e7dc80 ffff880279f72000 0 42 34 8 ffffea004e67ae80 ffff881399eba000 0 42 40 2 ffffea00b18d6f80 ffff882c635be000 0 42 42 0The output shows that
ffff88028398a000has littleFREEmemory and highALLOCATEDmemory. -
View memory data for
ffff88028398a000:rd ffff88028398a000 512 -SIf the output is large, display it in pages.
If the
put_cred_rcufunction repeats multiple times in the output, search forput_cred_rcuin the Linux kernel source code:void __put_cred(struct cred *cred) { call_rcu(&cred->rcu, put_cred_rcu); }put_cred_rcuasynchronously releases thecredstruct. The repeated presence ofput_cred_rcuat the end of thecredstruct indicates a slab memory leak in the kernel.
Method 2: Use perf to perform dynamic analysis
-
Install the perf tool.
sudo yum install perf -y -
Use perf to capture unreleased memory in
kmalloc-192over 200 seconds:sudo perf record -a -e kmem:kmalloc --filter 'bytes_alloc == 192' -e kmem:kfree --filter ' ptr != 0' sleep 200 -
Save the captured data to a file.
In this example, the file is named testperf.txt:
sudo perf script > testperf.txt -
View testperf.txt:
cat testperf.txtIdentify slabs with no
freememory, then trace the responsible function in the kernel source code.
-
-
After identifying the function call path or kernel data structure causing the leak, work with kernel developers or professional O&M personnel to pinpoint the source and resolve the issue.
Possible solutions:
-
Upgrade the kernel or patch.
-
Adjust kernel parameters.
-
Restart affected services or modules.
-
Optimize applications or drivers.
-
Restart the system.
-
References
Slab memory leaks reduce available memory, cause fragmentation, and can trigger the OOM Killer or performance degradation.