The resource-controller component uses the Custom Resource Definition (CRD) resources
to dynamically control the resource usage of pods. resource-controller allows you
to modify the resource limit of a pod without restarting the pod. For example, you
can modify the upper limit of CPU and memory resources that a pod can use. This enables
containers in a pod to handle workloads as normal with the specified amount of resources.
This topic describes how to dynamically modify the upper limit of resources for a
pod after you deploy the resource-controller component.
Background information
In the following scenarios, we recommend that you use resource-controller to dynamically
modify the upper limit of resources for a pod:
- The pod is running. However, the specified CPU limit is low, which limits the speed
of processes in the pod.
- The CPU load of the pod is high because you did not specify the resource limit when
you initialized the pod. You want to limit the CPU usage of the pod without affecting
other applications.
- The memory usage of the pod is increasing and reaching the specified upper limit.
You want to raise the memory upper limit without restarting the pod before the Out
of Memory (OOM) killer is triggered.
In the preceding scenarios, the design principles of Kubernetes allow you to change
the resource limit for a pod only by modifying PodSpec. However, the pod is recreated
after you change the resource limit. If an online application runs in the pod, users
may fail to access the application and the network traffic may spike after the pod
is recreated. If an offline task runs in the pod, all computing data generated within
the previous hours may be lost after the pod is recreated.
Dynamically modify the CPU and memory limit for a pod
- Deploy a task in a pod for simulation. The task is a stress testing program that uses
2 vCPUs and 256 MB of memory.
Use the following template to deploy the simulation task and set the CPU limit to
1:
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
spec:
containers:
- name: pod-demo
image: polinux/stress
resources:
requests:
memory: "50Mi"
limits:
memory: "1000Mi"
cpu: 1
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "256M", "-c", "2", "--vm-hang", "1"]

The preceding figure shows that the pod can use only one vCPU.
- Submit the following CRD template to dynamically modify the upper limit of CPU and
memory usage:
apiVersion: resources.alibabacloud.com/v1alpha1
kind: Cgroups
metadata:
name: cgroups-sample
spec:
pod:
name: pod-demo
namespace: default
containers:
- name: pod-demo
cpu: 2000m
memory: 5000Mi

The preceding figure shows that the CPU usage of the pod increases from one vCPU to
two vCPUs.
- Run the following command to query the status of the pod:
kubectl describe pod pod-demo
If the following output is returned, the pod is running as expected and is not restarted:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13m default-scheduler Successfully assigned default/pod-demo to cn-zhangjiakou.192.168.3.238
Normal Pulling 13m kubelet, cn-zhangjiakou.192.168.3.238 Pulling image "polinux/stress"
Normal Pulled 13m kubelet, cn-zhangjiakou.192.168.3.238 Successfully pulled image "polinux/stress"
Normal SuccessfulChange 60s cgroups-controller Change pod pod-demo cpu to 2000
Normal SuccessfulChange 60s cgroups-controller Change pod pod-demo memory to 524288000000
Bind a pod to one or more CPUs
- Use the following template to create a pod that runs a stress testing program. The
program uses 0.5 vCPUs.
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
annotations:
cpuset-scheduler: 'true' # Add this annotation to enable topology-aware CPU scheduling.
spec:
containers:
- name: pod-demo
image: polinux/stress
resources:
requests:
memory: "50Mi"
limits:
memory: "1000Mi"
cpu: 0.5
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "556M", "-c", "2", "--vm-hang", "1"]
- Check the usage of each CPU on the node
cn-beijing.192.168.8.241
. The following result indicates that the CPUs show different usage and the usage
dynamically changes: top - 22:17:34 up 4 days, 10:29, 1 user, load average: 0.33, 0.88, 0.95
Tasks: 179 total, 3 running, 176 sleeping, 0 stopped, 0 zombie
%Cpu0 : 13.1 us, 0.7 sy, 0.0 ni, 85.9 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 7.3 us, 7.7 sy, 0.0 ni, 84.7 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 12.4 us, 0.7 sy, 0.0 ni, 86.6 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 18.3 us, 0.7 sy, 0.0 ni, 80.7 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
- Submit the following CRD template to bind the pod to CPU 2 and CPU 3:
apiVersion: resources.alibabacloud.com/v1alpha1
kind: Cgroups
metadata:
name: cgroups-sample-cpusetpod
spec:
pod:
name: pod-demo
namespace: default
containers:
- name: pod-demo
cpuset-cpus: 2-3
- Check the usage of each CPU on the node. The following result shows that the sum of
the usage of CPU 2 and CPU 3 stays around 50% and the usage of each CPU stays around
25%. This indicates that the pod is bound to CPU 2 and CPU 3 as expected and the pod
is not restarted.
top - 22:11:02 up 4 days, 10:22, 1 user, load average: 0.04, 0.36, 0.84
Tasks: 177 total, 3 running, 174 sleeping, 0 stopped, 0 zombie
%Cpu0 : 2.7 us, 0.7 sy, 0.0 ni, 96.3 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 3.3 us, 1.0 sy, 0.0 ni, 95.3 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 27.2 us, 0.7 sy, 0.0 ni, 71.8 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 21.4 us, 5.7 sy, 0.0 ni, 72.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
Bind a Deployment to one or more CPUs
- Use the following template to create a Deployment that runs a stress testing program.
The Deployment provisions two pods, each of which uses 0.5 vCPUs.
apiVersion: apps/v1
kind: Deployment
metadata:
name: go-demo
labels:
app: go-demo
spec:
replicas: 2
selector:
matchLabels:
app: go-demo
template:
metadata:
annotations:
cpuset-scheduler: "true" # Add this annotation to enable topology-aware CPU scheduling.
labels:
app: go-demo
spec:
nodeName: cn-beijing.192.168.8.240 # Schedule the pods to the same node.
containers:
- name: go-demo
image: polinux/stress
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "556M", "-c", "1", "--vm-hang", "1"]
imagePullPolicy: Always
resources:
requests:
cpu: 0.5
limits:
cpu: 0.5 # Specify the value of resources.limit.cpu.
- Check the usage of each CPU on the node
cn-beijing.192.168.8.240
. The following result indicates that the CPUs show different usage and the usage
dynamically changes: top - 11:39:01 up 23:50, 2 users, load average: 1.76, 1.91, 1.39
Tasks: 189 total, 4 running, 185 sleeping, 0 stopped, 0 zombie
%Cpu0 : 30.4 us, 5.4 sy, 0.0 ni, 64.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 29.4 us, 4.7 sy, 0.0 ni, 65.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 7.0 us, 8.7 sy, 0.0 ni, 84.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 50.3 us, 1.3 sy, 0.0 ni, 48.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
- Submit the following CRD template to bind the pods that provisioned by the Deployment
to CPU 2 and CPU 3:
apiVersion: resources.alibabacloud.com/v1alpha1
kind: Cgroups
metadata:
name: cgroups-cpuset-sample
spec:
deployment:
name: go-demo
namespace: default
containers:
- name: go-demo
cpuset-cpus: 2,3 # Bind the pods to CPU 2 and CPU 3.
- Check the usage of each CPU on the node. The following result shows that the sum of
the usage of CPU 2 and CPU 3 stays around 50%. This indicates that the two pods provisioned
by the Deployment are separately bound to CPU 2 and CPU 3 as expected, and the pods
are not restarted.
top - 11:30:56 up 23:42, 2 users, load average: 2.01, 1.95, 1.12
Tasks: 180 total, 4 running, 176 sleeping, 0 stopped, 0 zombie
%Cpu0 : 4.4 us, 2.4 sy, 0.0 ni, 93.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 4.4 us, 2.3 sy, 0.0 ni, 92.6 id, 0.3 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu2 : 52.7 us, 8.0 sy, 0.0 ni, 39.0 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 50.7 us, 10.7 sy, 0.0 ni, 38.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
Control the IOPS of a disk
Note To control the IOPS of a disk, you must first create a worker node that uses the Alibaba
Cloud Linux 2 operating system.
- Create a Fio container. The container uses Fio to perform write stress tests on a
disk.
Use the following template to create a pod that is mounted with a disk volume. The
pod is deployed on the node
cn-beijing.192.168.0.182
. The device number of the mounted disk is
/dev/vdb1
and the disk is mounted to the
/mnt path.
apiVersion: apps/v1
kind: Deployment
metadata:
name: fio
labels:
app: fio
spec:
selector:
matchLabels:
app: fio
template:
metadata:
labels:
app: fio
spec:
nodeName: cn-beijing.192.168.0.182 # The pod is deployed on the node cn-beijing.192.168.0.182.
containers:
- name: fio
image: registry.cn-beijing.aliyuncs.com/shuangkun/fio:v1
command: ["sh", "-c"]
# Use Fio to perform write stress tests on the disk.
args: ["fio -filename=/data/test -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=12000 -group_reporting -name=mytest"]
volumeMounts:
- name: pvc
mountPath: /data # The disk volume is mounted to the /data path.
volumes:
- name: pvc
hostPath:
path: /mnt
- Limit the throughput of the pod by deploying the CRD that is used to control the IOPS
of the disk.
Use the following template to set the BPS
of the /dev/vdb1
disk to 1048576, 2097152, and 3145728 in sequence and check the disk performance
when each value is specified.
apiVersion: resources.alibabacloud.com/v1alpha1
kind: Cgroups
metadata:
name: cgroups-sample-fio
spec:
pod:
name: fio-6b6c469fdf-44h7v
namespace: default
containers:
- name: fio
blkio:
device_write_bps: [{device: "/dev/vdb1", value: "1048576"}]
- Check the monitoring data of the disk on the node
cn-beijing.192.168.0.182
, as shown in the following figure.
The preceding figure shows that the throughput BPS
of the pod is as expected and the pod is not restarted during the modification.
Topology-aware CPU scheduling
You can use resource-controller with ACK schedulers to facilitate CPU binding and
automate CPU selection on physical machines with multi-core CPUs, such as Intel, AMD,
and ARM.
Topology-aware CPU scheduling manages vCPUs and hyperthreading properly to avoid the
following issues: switchover between L1 cache and L2 cache, off-chip transmission
across Non-Uniform Memory Access (NUMA), and frequent refreshing of L3 cache. This
maximizes CPU utilization for CPU-intensive applications that run with multiple threads.
For more information about topology-aware CPU scheduling, see the presentation by
the ACK team at KubeCon 2020: Practice of Fine-grained Cgroups Resources Scheduling in Kubernetes.