All Products
Search
Document Center

Container Service for Kubernetes:Memory diagnostics

Last Updated:Mar 26, 2026

When a Kubernetes node or pod runs out of memory, symptoms are often intermittent and hard to trace: applications crash unexpectedly, pods get evicted, or cluster performance degrades without a clear cause. The memory diagnostics feature of Container Intelligence Service helps you identify the root cause of common memory issues in ACK clusters — including memory leaks, memory fragmentation, and out of memory (OOM) errors. Diagnostic results appear as charts and tables so you can assess system memory health at a glance.

Memory diagnostics covers three areas: memory overview, memory analysis, and OOM analysis. You can inspect memory usage at both the node and pod level.

Note

Diagnostic items may vary based on your cluster configuration. The items shown on the diagnostics page reflect your actual cluster state.

Important

When you run the diagnostics feature, ACK runs a data collection program on each node to gather diagnostic results. Collected data includes the system version, load status, Docker and kubelet status, and key error messages in system logs. ACK does not collect business information or sensitive data.

Diagnostic workflow

Use the three diagnostic areas in sequence to narrow down a memory issue:

  1. Memory overview — Check for high-level memory risks: leaked memory, memory fragmentation, unreleased Memcg entries, and THP waste. Use the charts to confirm whether abnormal usage is in kernel memory or application memory.

  2. Memory analysis — Drill down to process-level and pod-level memory usage to identify which process or container is consuming excessive anonymous memory, page cache, or shared memory.

  3. OOM analysis — Review OOM event counts and types to determine whether OOM errors are occurring at the node (Host) or container (cgroup) level, and which containers have hit their memory limits.

Memory overview

Memory overview surfaces diagnostic items related to memory risks. The following table describes each item.

Diagnostic itemDescription
Leaked MemoryChecks for system kernel memory leaks in the Slab, Vmalloc, and buddy system (allocpage).
Memory UsageDisplays system memory utilization.
MemcgEvaluates whether unreleased memory cgroups (Memcg) are degrading system performance or causing statistical errors.
Memory FragmentationChecks for memory fragmentation that degrades system performance.
THPZeroPageEvaluates the ratio of Transparent Huge Page (THP) waste.

System memory usage is also displayed in charts, broken down into three categories:

  • Kernel memory (kernel): total memory used by the operating system kernel.

  • Application memory (app): total memory used by programs in user mode.

  • Free memory (free): total free system memory.

Key concepts

The following terms are used throughout memory diagnostics.

TermDescription
Memory leakA memory leak occurs when memory dynamically allocated to a program is never released, causing system memory utilization to grow continuously. Unresolved memory leaks degrade program performance and can cause system crashes.
Memory utilizationMemory utilization = (Total memory - Free memory) x 100 / Total memory. Page cache counts as free memory and does not affect memory utilization — the kernel can reclaim and reuse it at any time.
Unreleased MemcgA memory cgroup that was not released due to a system exception. Unreleased Memcg entries can degrade system performance.
Memory fragmentationAfter a system runs for an extended period, free contiguous memory blocks become too small to satisfy large contiguous allocation requests. This delays memory allocation and causes application jitter.
Ratio of THP wasteRatio of THP waste = Number of zero THPs x 100% / Total number of THPs. See THP details below.
Buddy systemThe Linux kernel algorithm for managing memory pages. It divides memory pages into 11 groups and manages blocks in powers of two: 4 KB, 8 KB, 16 KB, 32 KB ... 4 MB. Most memory pages are 4 KB.
SlabA memory allocator that allocates small pieces of memory on top of the buddy system.
VmallocA memory allocator that uses nonlinear mapping on top of the buddy system.
Page cache (filecache)When Linux reads or writes a file, it caches the file content in memory for faster subsequent access.
Anonymous memoryMemory dynamically allocated to a process's heap and stack through new, malloc, or mmap. Not backed by a file system.
Shared memoryA memory block shared by two or more processes for inter-process communication.
tmpfsA Linux temporary file system backed by memory. Content read or written to tmpfs is cached in memory.
hugetlbMemory consumed by huge pages in a file system.

THP details

Transparent Huge Pages (THP) are huge pages sized 2 MiB or 1 GiB in the kernel. Each subpage is 4 KiB, so one 2-MiB THP equals 512 subpages.

When THP is enabled, the kernel dynamically allocates THPs to reduce Translation Lookaside Buffer (TLB) misses and improve application performance. However, THP can cause memory bloat and memory overcommitment: when an application requests only 8 KiB (2 subpages), the kernel allocates a full 2-MiB THP — leaving 510 zero subpages that waste resident set size (RSS) and can trigger OOM errors.

Kernel memory metrics

In most cases, memory leaks are indicated by abnormal usage in Sunreclaim or the buddy system. Monitor these metrics closely.

MetricDescription
SReclaimableMemory that the Slab can reclaim.
SunreclaimMemory that the Slab cannot reclaim. Abnormal growth here is a strong indicator of a kernel memory leak.
PageTablesMemory occupied by kernel page tables.
VmallocMemory allocated by the Vmalloc function.
KernelStackTotal memory occupied by the heap and stack of a process.
AllocPagesMemory allocated from the buddy system by functions such as alloc_pages. This memory cannot be retrieved through any node file — excessive use creates a memory black hole.

Application memory metrics

When analyzing user-mode memory usage, focus on anonymous memory, shared memory, and page cache.

MetricDescription
filecachePage cache that can be reclaimed by running drop caches.
anonAnonymous memory used by a program's heap and stack. High anon usage suggests a process memory leak or THP being enabled.
mlockMemory locked by the system.
hugeMemory used by huge pages.
bufferMemory used by block device and file system metadata.
shmemShared memory (tmpfs). Memory leaks occur if a tmpfs file is not deleted after the process exits, or if a file is deleted while it is still open.

Memory analysis

Memory analysis is split into two views: process memory and pod memory.

Process memory

The process memory view shows memory usage per process, including anonymous memory, page cache, and shared memory.

Pod memory

The pod memory view shows which files are occupying page cache and shared memory in each container and pod, along with active and inactive cache ratios.

Diagnostic itemDescription
PodThe name of the pod.
ContainerThe name of the container.
FileThe full path of the file, including the file name.
CacheThe page cache (filecache) occupied by the file.
Container CacheThe container-level cache occupied by the file. Multiple processes in the same container may reference the same file.
Active CachePage cache that is currently in use.
Inactive CachePage cache that is not in use and is eligible for reclaim.

OOM analysis

OOM analysis diagnoses out of memory errors and shows the following diagnostic items.

Diagnostic itemDescription
OS OOM CountTotal number of OOM errors from host startup to the time of diagnosis.
Available MemoryCurrent free system memory.
Low WatermarkThe low memory threshold. When available memory drops below this value, the kernel triggers an asynchronous memory reclaim operation to free up memory.
ContainerThe name of the pod, ID of the container, or name of the cgroup.
limitThe memory limit configured for the container.
usageCurrent memory used by the container.
OOM CountTotal number of OOM errors that have occurred in the container.
OOM TypeThe type of OOM error: Host or cgroup.