The memory diagnostics feature can help you identify common memory issues in Container Service for Kubernetes (ACK) clusters, including memory leaks, memory fragmentation, and out of memory (OOM) errors. Diagnostic results are displayed in charts and tables, and the container caches and shared memory occupied by files in each folder are displayed to help you gain insights into the overall memory usage and make your O&M work easier. This topic introduces memory diagnostics.

Memory diagnostics consist of memory overview, memory analysis, and OOM analysis. You can view the memory usage of nodes and pods.

Important When you use the diagnostics feature, ACK runs a data collection program on each node in the cluster and collects diagnostic results. The collected information includes the system version, status of the loads, Docker, and kubelet, and key error information in system logs. ACK does not collect business information or sensitive data.

Memory overview

The memory overview feature displays diagnostic items related to memory risks. The following table describes the diagnostic items.

Diagnostic itemDescription
Leaked MemoryChecks for system kernel memory leaks in the Slab, Vmalloc, and buddy system (allocpage).
Memory UsageDisplays the utilization of system memory.
MemcgEvaluates whether the unreleased memory cgroups compromise system performance and cause statistical errors.
Memory FragmentationChecks for memory fragmentation, which compromises system performance.
THPZeroPageEvaluates the ratio of THP waste.
Information about system memory usage, including kernel memory, application memory in user mode, and free memory, is displayed in charts.
  • Kernel memory (kernel): the total amount of memory used by the operating system kernel.
  • Application memory (app): the total amount of memory used by programs in user mode.
  • Free memory (free): the amount of free system memory.

Terms

TermDescription
memory leaksMemory leaks refer to the release of memory resources that are dynamically allocated to programs, which causes the system memory utilization to increase. Memory leaks can compromise the performance of programs or even cause system crash.
memory utilizationThe following formula is used to calculate memory utilization: Memory utilization = (Total memory - Free memory) × 100/Total memory. File caches are free memory, which does not affect the memory utilization.
unreleased MemcgMemory cgroups that are not released due to system exceptions. These memory cgroups may compromise system performance.
memory fragmentationMemory fragmentation refers to the failure to fulfil the contiguous memory allocation request because free contiguous memory blocks are too small after the system has been running for a long period of time. The failure delays memory allocation and causes business jitters.
ratio of THP wasteTransparent Huge Pages (THPs) are huge pages whose size is 2 MiB or 1 GiB in the kernel. The size of a subpage is 4 KiB.

When THPs are enabled, the kernel dynamically allocates THPs to reduce Translation Lookaside Buffer (TLB) misses and improve application performance.

However, THPs may cause memory bloat. The kernel allocates 2 MiB blocks of memory as THPs, which are equivalent to 512 subpages. This causes memory waste and results in memory overcommitment. Memory bloat may lead to OOM errors. For example, when an application that requests only 8 KiB of memory (2 subpages) is assigned a 2-MiB THP, the remaining 510 subpages are zero subpages, which result in a waste of resident set size (RSS) and cause an OOM error.

Ratio of THP waste = Number of zero THPs × 100%/Total number of THPs

buddy systemThe buddy system is an algorithm used by the Linux kernel to manage memory pages. It divides memory pages into 11 groups. In most cases, a memory page is 4 KB in size. The buddy system manages the number of memory pages in each memory block in the power of two increments, such as 4 KB, 8 KB, 16 KB, 32 KB……4 MB.
SlabA memory allocator that allocates small pieces of memory based on the buddy system of Linux.
VmallocA memory allocator that uses nonlinear mapping based on the buddy system of Linux.
filecacheWhen Linux reads or writes a file, it caches the file content in memory. This way, programs can directly read or write the content in memory, which is much faster than reading or writing the file.
anonymous memoryAnonymous memory is dynamically allocated to the heap and stack of a process through new, malloc, or mmap. Anonymous memory is not backed by a file system.
shared memoryA memory block shared by two or more processes for communication.
tmpfsA temporary file system of Linux based on memory. The file system caches the content that it reads or writes in memory.
hugetlbThe amount of memory consumed by huge pages in a file system.

Kernel memory

In most cases, memory leaks occur if the memory usage of Sunreclaim and the buddy system is abnormal. Pay close attention to their memory usage in kernel mode.

MetricDescription
SreclaimableMemory that can be reclaimed by the Slab.
SunreclaimMemory that cannot be reclaimed by the Slab.
PageTablesMemory occupied by kernel page tables.
VmallocMemory allocated by calling the Vmalloc function.
KernelStackTotal memory occupied by the heap and stack of a process.
AllocPagesMemory allocated from the buddy system by calling functions such as alloc_pages. The memory cannot be retrieved by using any node file. Excessive use of the memory causes a blackhole.

Application memory

You need to pay close attention to anonymous memory, shared memory, and file caches when you view the memory usage of applications in user mode.

MetricDescription
filecacheFile caches that can be reclaimed by performing drop caches.
anonThe anonymous memory occupied by the heap and stack of a program. If a large amount of anonymous memory is occupied, you need to check for memory leaks in the process and check whether THPs are enabled.
mlockMemory locked by the system.
hugeMemory occupied by huge pages.
bufferThe memory occupied by the metadata of the block device and file system.
shmemShared memory (tmpfs). If the tmpfs file is not deleted after the process is terminated or the tmpfs file is deleted while the file is open, shared memory leaks occur.

Memory analysis

Memory analysis consists of process memory analysis and pod memory analysis.

Process memory

Memory usage information is displayed by process, including anonymous memory, file caches, and shared memory.

Pod memory

The pod memory analysis feature allows you to view the files that occupy the file caches and shared memory of containers and pods, the ratio of active caches, and the ratio of inactive caches.

Diagnostic itemDescription
PodThe name of the pod.
ContainerThe name of the container.
FileThe full path of the file, which includes the file name.
CacheThe file cache (filecache) occupied by the file.
Container CacheThe container cache occupied by the file. Different processes in a container may manage the same file.
Active CacheThe file cache that is in use.
Inactive CacheThe file cache that is not in use.

OOM analysis

The OOM analysis feature can quickly diagnose OOM errors and display the following diagnostic items.

Diagnostic itemDescription
OS OOM CountThe total number of OOM errors that have occurred from the time when the host starts up to the time when the diagnostic is performed.
Available MemoryThe amount of free system memory.
Low WatermarkThe specified low memory usage threshold. When the memory usage drops below the low threshold, an asynchronous memory reclaim operation is triggered.
ContainerThe name of the pod, ID of the container, or name of the cgroup.
limitThe memory limit of the container.
usageThe amount of memory used by the container.
OOM CountThe total number of OOM errors that have occurred in the container.
OOM TypeThe type of OOM error, which can be Host or cgroup.