Kubernetes Eviction Policies for Handling Low RAM and Disk Space Situations - Part 1

By Alwyn Botha, Alibaba Cloud Community Blog author.

For this tutorial to be successful, you need to run it on a dedicated Kubernetes node. If other users are using Kubernetes, it will affect the RAM-sizing eviction logic carefully planned out herein.

You define thresholds for low RAM and low disk space, Kubernetes eviction policies act when those thresholds are reached. Kubernetes evicts Pods from a node to fix low RAM and low disk space problems.

Kubelet has configuration settings for defining resource thresholds. There are settings for disk space and RAM, but this tutorial will exclusively focus on RAM only.

Disk space eviction policies work the same as RAM eviction policies. Once you understand RAM eviction you will be able to easily apply your knowledge to disk space eviction.

Minikube Extra Config for Kubelet

You pass eviction thresholds to kubelet using --extra-config when you start minikube.

--extra-config=Kubernetes component.key="value"

--extra-config=kubelet.feature-gates="ExperimentalCriticalPodAnnotation=true"

ExperimentalCriticalPodAnnotation defines that critical Pods must not be evicted.

You must use this setting on a single node Kubernetes cluster. If not, your critical Pods will be evicted and you end up with a broken Kubernetes node.

( On a multiple node cluster it is possible to evict critical Pods - redundant copies on other nodes will automatically take over the workload. )

eviction-hard="memory.available<600Mi"

Defines that when less than 600Mi RAM is available, Pods must be evicted HARD ... immediately.

eviction-pressure-transition-period="30s"

From https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#oscillation-of-node-conditions

eviction-pressure-transition-period is the duration for which the kubelet has to wait before transitioning out of an eviction pressure condition: MemoryPressure.

We will see this in action below.

eviction-soft="memory.available<800Mi"

Defines that when less than 800Mi RAM is available, Pods must be POTENTIALLY evicted SOFT, allowing a grace period eviction-soft-grace-period

eviction-soft-grace-period="memory.available=2m"

Pods can exceed eviction-soft memory.available for this grace period. In this case it is 2 minutes.

If RAM available drops below SOFT threshold for less that this 2 minutes then no eviction will be done.

Prerequisites

I am running this on a 2200 MB VirtualBox virtual machine minikube. You must adjust these thresholds if you have a different size minikube node.

You will ONLY get identical eviction results on an identical size minikube node with nothing else running. ( This tutorial carefully calculated how many - which sizes - Pods will cause MemoryPressure conditions for exactly 2200 MB minikube node. )

You must follow this tutorial on a dedicated Kubernetes node. If other people are running Pods it will destroy the RAM-sizing eviction logic carefully planned out herein.

( If you run this on another size RAM server you must adjust the kernel boot parameter: mem=MEMORY_LIMIT )

From the bootparam main page:

Linux uses this BIOS call at boot to determine how much memory is installed.

You can use this boot arg to tell Linux how much memory you have. The value is in decimal or hexadecimal (prefix 0x), and the suffixes 'k' (times 1024) or 'M' (times 1048576) can be used.

Tips from This Tutorial

First time through do not follow all the links : they will break your train of though since you will spend a day reading it all. Facts at links not needed first time through. However it will add to your understanding of the topics once you did some practical exercises.
The eviction manager logs have a great deal of information. First time through do not try and decipher those logs on your server. I made a considerable effort here that makes those logs VERY easy to read. Just have a quick peek and see that your logs contain similar information as provided in this tutorial.
You need to run the date command repeatedly if you want to reconcile your logs with the creation and eviction of your Pods.
You may hit your eviction thresholds earlier or later than I did. Read both parts of this tutorial in full before you attempt to follow it step by step. This way if your experience differs, you will have some understanding as to what is happening.

First Pod Eviction Exercise

minikube start --extra-config=kubelet.eviction-hard="memory.available<600Mi" --extra-config=kubelet.feature-gates="ExperimentalCriticalPodAnnotation=true"  --extra-config=kubelet.eviction-pressure-transition-period="30s"  --extra-config=kubelet.eviction-soft="memory.available<800Mi"  --extra-config=kubelet.eviction-soft-grace-period="memory.available=2m"

Kubernetes developers provide this script that calculates RAM available - identical to the calculation kubelet uses for eviction decisions:

https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/memory-available.sh

You need to have this script on your node. We use it extensively in this tutorial. We are always only interested in the last line: memory.available_in_mb

source ./memory-available.sh | tail -n1

memory.available_in_mb 1261

You need to wait at least 1 minute ( after minikube start ) for all startup processes to complete before you start running eviction tests. ( Available RAM becomes less during this first minute of turmoil )

Create spec for our first Pod:

nano myrampod2.yaml

apiVersion: v1
kind: Pod
metadata:
  name: myram2
spec:
  containers:
  - name: myram-container-1
    image: mytutorials/centos:bench
    imagePullPolicy: IfNotPresent
    
    command: ['sh', '-c', 'stress --vm 1 --vm-bytes 50M --vm-hang 3000 -t 3600']
    
    resources:
      limits:
        memory: "600Mi"
      requests:
        memory: "10Mi"
    
  restartPolicy: Never
  terminationGracePeriodSeconds: 0

We use an image: mytutorials/centos:bench , that I created and uploaded to the docker hub.

It contains a simple CentOS 7 base operating system. It also includes stress , a benchmark and stress test application.

command: ['sh', '-c', 'stress --vm 1 --vm-bytes 50M --vm-hang 3000 -t 3600']

We run the stress benchmark utility:

vm 1 ... we use 1 virtual machine ( process here )
vm-bytes 50M ... we allocate 50 MB RAM
vm-hang 3000 ... we let allocation hang 3000 seconds, otherwise it re-allocates every second ( eating ALL CPU time )
t 3600 ... time out after 3600 seconds.

kubectl create -f myrampod3.yaml
pod/myram2 created

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 1197

Check if node has MemoryPressure condition:

kubectl describe node minikube | grep MemoryPressure
  MemoryPressure   False   Fri, 01 Feb 2019 08:02:50 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available

No, false ... kubelet has sufficient memory available

( Remember ... eviction-soft="memory.available<800Mi" )

Create another Pod that will use around 60 MB total )

nano myrampod3.yaml

apiVersion: v1
kind: Pod
metadata:
  name: myram3
spec:
  containers:
  - name: myram-container-1
    image: mytutorials/centos:bench
    imagePullPolicy: IfNotPresent
    
    command: ['sh', '-c', 'stress --vm 1 --vm-bytes 50M --vm-hang 3000 -t 3600']
    
    resources:
      limits:
        memory: "600Mi"
      requests:
        memory: "10Mi"
    
  restartPolicy: Never
  terminationGracePeriodSeconds: 0

Create :

kubectl create -f myrampod3.yaml
pod/myram3 created

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 1139

kubectl describe node minikube | grep MemoryPressure
  MemoryPressure   False   Fri, 01 Feb 2019 08:03:40 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available

No, false ... kubelet has sufficient memory available

Create another Pod that will use around 60 MB total )

nano myrampod4.yaml

apiVersion: v1
kind: Pod
metadata:
  name: myram4
spec:
  containers:
  - name: myram-container-1
    image: mytutorials/centos:bench
    imagePullPolicy: IfNotPresent
    
    command: ['sh', '-c', 'stress --vm 1 --vm-bytes 50M --vm-hang 3000 -t 3600']
    
    resources:
      limits:
        memory: "600Mi"
      requests:
        memory: "10Mi"
    
  restartPolicy: Never
  terminationGracePeriodSeconds: 0

kubectl create -f myrampod4.yaml
pod/myram4 created

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 1085

kubectl describe node minikube | grep MemoryPressure
  MemoryPressure   False   Fri, 01 Feb 2019 08:04:50 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available

Create another Pod that will use around 60 MB total )

nano myrampod5.yaml

apiVersion: v1
kind: Pod
metadata:
  name: myram5
spec:
  containers:
  - name: myram-container-1
    image: mytutorials/centos:bench
    imagePullPolicy: IfNotPresent
    
    command: ['sh', '-c', 'stress --vm 1 --vm-bytes 50M --vm-hang 3000 -t 3600']
    
    resources:
      limits:
        memory: "600Mi"
      requests:
        memory: "10Mi"
    
  restartPolicy: Never
  terminationGracePeriodSeconds: 0

kubectl create -f myrampod5.yaml
pod/myram5 created

kubectl describe node minikube | grep MemoryPressure
  MemoryPressure   True    Fri, 01 Feb 2019 08:05:20 +0200   KubeletHasInsufficientMemory   kubelet has insufficient memory available

MemoryPressure True ... why is that?

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 635

Some other process uses 500 MB RAM. ( Could not see via TOP which one ) This always happens at this exact stage - hence the careful adding Pods one by one.

kubectl get pods

NAME     READY   STATUS    RESTARTS   AGE
myram2   1/1     Running   0          3m32s
myram3   1/1     Running   0          2m22s
myram4   1/1     Running   0          84s
myram5   1/1     Running   0          39s

Right now we have 4 running Pods. This will change within seconds since the available RAM is way below soft threshold, it is at hard threshold. HARD means immediate Pod eviction.

eviction-hard="memory.available<600Mi"
eviction-soft="memory.available<800Mi"

I used the minikube logs command to capture kubelet eviction manager logs. These logs were extensively edited since each line contained too much information.

minikube logs # kubelet eviction manager logs

06:06:09 attempting to reclaim memory
06:06:09 must evict pod(s) to reclaim memory

06:06:09 pods ranked for eviction: 
 kube-apiserver-minikube_kube-system,
 kube-controller-manager-minikube_kube-system,
 etcd-minikube_kube-system,
 kube-scheduler-minikube_kube-system,
 myram2_default,
 myram5_default,
 myram3_default,
 myram4_default,
 metrics-server-6486d4db88-cf7tf_kube-system,

From https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#evicting-end-user-pods

Kubelet rankes Pods for eviction

4 critical static Pods usage of starved RAM exceeds its request by more than our 4 tiny 50 MB pods.

If we did not specify ExperimentalCriticalPodAnnotation above these critical Pods will be evicted - resulting in a broken node.

minikube logs # kubelet eviction manager logs

06:06:09 cannot evict a critical static pod kube-apiserver-minikube_kube-system
06:06:09 cannot evict a critical static pod kube-controller-manager-minikube_kube-system
06:06:09 cannot evict a critical static pod etcd-minikube_kube-system
06:06:09 cannot evict a critical static pod kube-scheduler-minikube_kube-system

06:06:09 pod myram2_default is evicted successfully
06:06:09 pods myram2_default evicted, waiting for pod to be cleaned up
06:06:12 pods myram2_default successfully cleaned up

Fortunately critical Pods do not get evicted. Pod myram2_default got evicted successfully. ( Our 4 Pods all exceed their memory request of "10Mi" by the same amount when it uses 50 MB each. So they are all ranked similarly here. The kubelet eviction manager does not display eviction rank values in the logs. )

A second later the second cycle of evictions continue: ( It would have been helpful if the eviction manager showed memory.available_in_mb while in an eviction cycle in the log. So we have to surmise the one eviction was not enough to bring RAM above eviction threshold. )

minikube logs # kubelet eviction manager logs

06:06:12 attempting to reclaim memory
06:06:12 must evict pod(s) to reclaim memory

06:06:12 pods ranked for eviction: 
 kube-apiserver-minikube_kube-system,
 kube-controller-manager-minikube_kube-system,
 etcd-minikube_kube-system,
 kube-scheduler-minikube_kube-system,
 myram5_default,
 myram3_default,
 myram4_default,
 
06:06:12 cannot evict a critical static pod kube-apiserver-minikube_kube-system
06:06:12 cannot evict a critical static pod kube-controller-manager-minikube_kube-system
06:06:12 cannot evict a critical static pod etcd-minikube_kube-system
06:06:12 cannot evict a critical static pod kube-scheduler-minikube_kube-system

06:06:12 pod myram5_default is evicted successfully
06:06:12 pods myram5_default evicted, waiting for pod to be cleaned up
06:06:14 pods myram5_default successfully cleaned up

Second Pod on the priority list gets evicted.

This is the status of our running Pods at this point.

kubectl get pods

NAME     READY   STATUS    RESTARTS   AGE
myram2   0/1     Evicted   0          4m15s
myram3   1/1     Running   0          3m5s
myram4   1/1     Running   0          2m7s
myram5   0/1     Evicted   0          82s

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 696

100 MB above hard threshold - immediate evictions not need anymore.

100 MB below soft threshold - Pods will get evicted after eviction-soft-grace-period of 2 minutes.

Status of Pods after a few minutes:

kubectl get pods

NAME     READY   STATUS    RESTARTS   AGE
myram2   0/1     Evicted   0          5m52s
myram3   0/1     Evicted   0          4m42s
myram4   0/1     Evicted   0          3m44s
myram5   0/1     Evicted   0          2m59s

pod myram3_default and pod myram4_default evicted.

minikube logs # kubelet eviction manager logs

06:07:14 pods ranked for eviction: 
 kube-apiserver-minikube_kube-system,
 etcd-minikube_kube-system,
 kube-controller-manager-minikube_kube-system,
 kube-scheduler-minikube_kube-system,
 myram3_default,
 myram4_default,
 metrics-server-6486d4db88-cf7tf_kube-system,
  
06:07:14 cannot evict a critical static pod kube-apiserver-minikube_kube-system
06:07:14 cannot evict a critical static pod etcd-minikube_kube-system
06:07:14 cannot evict a critical static pod kube-controller-manager-minikube_kube-system
06:07:14 cannot evict a critical static pod kube-scheduler-minikube_kube-system

06:07:15 pod myram3_default is evicted successfully
06:07:15 pods myram3_default evicted, waiting for pod to be cleaned up
06:07:18 pods myram3_default successfully cleaned up

06:07:18 attempting to reclaim memory
06:07:18 must evict pod(s) to reclaim memory

06:07:18 pods ranked for eviction: 
 kube-apiserver-minikube_kube-system,
 etcd-minikube_kube-system,
 kube-controller-manager-minikube_kube-system,
 kube-scheduler-minikube_kube-system,
 myram4_default,
 metrics-server-6486d4db88-cf7tf_kube-system,

06:07:18 cannot evict a critical static pod kube-apiserver-minikube_kube-system
06:07:18 cannot evict a critical static pod etcd-minikube_kube-system
06:07:18 cannot evict a critical static pod kube-controller-manager-minikube_kube-system
06:07:18 cannot evict a critical static pod kube-scheduler-minikube_kube-system

06:07:18 pod myram4_default is evicted successfully
06:07:18 pods myram4_default evicted, waiting for pod to be cleaned up
06:07:20 pods myram4_default successfully cleaned up

06:07:20 attempting to reclaim memory
06:07:20 must evict pod(s) to reclaim memory

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 677

The last 2 evictions were unable to raise available RAM to above soft threshold: more evictions are needed.

You now have seen 4 times that:

cannot evict a critical static pod kube-apiserver-minikube_kube-system

Therefore those lines will be hidden from all log output for the rest of this tutorial.

Those critical 4 Pods will also be removed from the pods ranked for eviction: list for rest of tutorial.

During the next few minutes other not-critical but still Kubernetes-system Pods are evicted.

metrics-server and kubernetes-dashboard gets evicted.

minikube logs # kubelet eviction manager logs

06:07:20 pods ranked for eviction: 
 metrics-server-6486d4db88-cf7tf_kube-system,
 kubernetes-dashboard-5bff5f8fb8-k6ff7_kube-system,
 kube-proxy-11111_kube-system, kube-addon-manager-minikube_kube-system,
 coredns-576cbf47c7-pf6gf_kube-system, coredns-576cbf47c7-bz4hm_kube-system

06:07:20 pod metrics-server-6486d4db88-cf7tf_kube-system is evicted successfully
06:07:20 pods metrics-server-6486d4db88-cf7tf_kube-system evicted, waiting for pod to be cleaned up
06:07:22 pods metrics-server-6486d4db88-cf7tf_kube-system successfully cleaned up

06:07:22 attempting to reclaim memory
06:07:22 must evict pod(s) to reclaim memory

- - - - - used to separate eviction cycles ( for easier reading )

06:07:22 pods ranked for eviction: 
 kubernetes-dashboard-5bff5f8fb8-k6ff7_kube-system,
 kube-proxy-11111_kube-system,
 kube-addon-manager-minikube_kube-system,
 coredns-576cbf47c7-pf6gf_kube-system,
 coredns-576cbf47c7-bz4hm_kube-system

06:07:22 pod kubernetes-dashboard-5bff5f8fb8-k6ff7_kube-system is evicted successfully
06:07:22 pods kubernetes-dashboard-5bff5f8fb8-k6ff7_kube-system evicted, waiting for pod to be cleaned up
06:07:24 pods kubernetes-dashboard-5bff5f8fb8-k6ff7_kube-system successfully cleaned up

06:07:24 attempting to reclaim memory
06:07:24 must evict pod(s) to reclaim memory

The kube-proxy Pod gets evicted.

However 30 seconds later a replacement Pod got started. This replacement now ranks first for eviction.

It gets evicted, but 30 seconds later a replacement exists. This bad cycle continues ( probably forever ).

I edited the kube-proxy-12fy7_kube-system to be neat kube-proxy-11111_kube-system so you can more easily see the problem.

You can recognize minikube logs output by now, so it is no longer marked as such.

06:07:24 pods ranked for eviction:
 kube-proxy-11111_kube-system,
 kube-addon-manager-minikube_kube-system,
 coredns-576cbf47c7-pf6gf_kube-system,
 coredns-576cbf47c7-bz4hm_kube-system

06:07:25 pod kube-proxy-11111_kube-system is evicted successfully
06:07:25 pods kube-proxy-11111_kube-system evicted, waiting for pod to be cleaned up
06:07:55 timed out waiting for pods kube-proxy-11111_kube-system to be cleaned up

06:07:55 attempting to reclaim memory
06:07:55 must evict pod(s) to reclaim memory

- - - - - 

06:07:55 pods ranked for eviction: 
 kube-proxy-22222_kube-system,
 kube-addon-manager-minikube_kube-system,
 coredns-576cbf47c7-pf6gf_kube-system,
 coredns-576cbf47c7-bz4hm_kube-system

06:07:55 pod kube-proxy-22222_kube-system is evicted successfully
06:07:55 pods kube-proxy-22222_kube-system evicted, waiting for pod to be cleaned up
06:08:25 timed out waiting for pods kube-proxy-22222_kube-system to be cleaned up

06:08:25 attempting to reclaim memory
06:08:25 must evict pod(s) to reclaim memory

- - - - - 

06:08:25 pods ranked for eviction: 
 kube-proxy-33333_kube-system,
 kube-addon-manager-minikube_kube-system,
 coredns-576cbf47c7-pf6gf_kube-system,
 coredns-576cbf47c7-bz4hm_kube-system

06:08:25 pod kube-proxy-vbl8q_kube-system is evicted successfully
06:08:25 pods kube-proxy-vbl8q_kube-system evicted, waiting for pod to be cleaned up
06:08:55 timed out waiting for pods kube-proxy-vbl8q_kube-system to be cleaned up

06:08:55 attempting to reclaim memory
06:08:55 must evict pod(s) to reclaim memory

- - - - - 

06:08:55 pods ranked for eviction: 
 kube-proxy-33333_kube-system,
 kube-addon-manager-minikube_kube-system,
 coredns-576cbf47c7-pf6gf_kube-system,
 coredns-576cbf47c7-bz4hm_kube-system

06:08:55 pod kube-proxy-33333_kube-system is evicted successfully
06:08:55 pods kube-proxy-33333_kube-system evicted, waiting for pod to be cleaned up

MemoryPressure True forever.

kubectl describe node minikube | grep MemoryPressure

  MemoryPressure   True    Fri, 01 Feb 2019 08:10:10 +0200   KubeletHasInsufficientMemory   kubelet has insufficient memory available

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 673

The problem is that a few minutes ago an unidentified process started using around 500 MB additional RAM.

This never gets released. So RAM available never gets below soft threshold.

Let's try deleting all evicted Pods:


kubectl delete -f myrampod2.yaml
pod "myram2" deleted

kubectl delete -f myrampod3.yaml
pod "myram3" deleted

kubectl delete -f myrampod4.yaml
pod "myram4" deleted

kubectl delete -f myrampod5.yaml
pod "myram5" deleted

Only 20 MB extra RAM available.

memory.available_in_mb 694

Still 100 MB below soft threshold.

kubectl describe node minikube | grep MemoryPressure
  MemoryPressure   True    Fri, 01 Feb 2019 08:12:21 +0200   KubeletHasInsufficientMemory   kubelet has insufficient memory available

Still under MemoryPressure.

It is now impossible to schedule new Pods on this node.

Let's attempt creating myram2 again:

kubectl create -f myrampod2.yaml

Get Pods:

kubectl get pods

NAME     READY   STATUS    RESTARTS   AGE
myram2   0/1     Pending   0          3s

See the last few lines from describe command:

kubectl describe pod/myram2

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  34s (x2 over 34s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

Lesson learned: you cannot set RAM eviction thresholds above available RAM.

memory.available_in_mb 700 MB

eviction-soft="memory.available<800Mi"

Stop minikube

minikube stop

Community

Kubernetes Eviction Policies for Handling Low RAM and Disk Space Situations - Part 1

Minikube Extra Config for Kubelet

Prerequisites

Tips from This Tutorial

First Pod Eviction Exercise

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

ECS(Elastic Compute Service)

Container Service for Kubernetes