The resource-controller component uses the Custom Resource Definition (CRD) resources to dynamically control the resource usage of pods. resource-controller allows you to modify the resource limit of a pod without restarting the pod. For example, you can modify the upper limit of CPU and memory resources that a pod can use. This enables containers in a pod to handle workloads as normal with the specified amount of resources. This topic describes how to dynamically modify the upper limit of resources for a pod after you deploy the resource-controller component.

Prerequisites

Background information

In the following scenarios, we recommend that you use resource-controller to dynamically modify the upper limit of resources for a pod:
  • The pod is running. However, the specified CPU limit is low, which limits the speed of processes in the pod.
  • The CPU load of the pod is high because you did not specify the resource limit when you initialized the pod. You want to limit the CPU usage of the pod without affecting other applications.
  • The memory usage of the pod is increasing and reaching the specified upper limit. You want to raise the memory upper limit without restarting the pod before the Out of Memory (OOM) killer is triggered.

In the preceding scenarios, the design principles of Kubernetes allow you to change the resource limit for a pod only by modifying PodSpec. However, the pod is recreated after you change the resource limit. If an online application runs in the pod, users may fail to access the application and the network traffic may spike after the pod is recreated. If an offline task runs in the pod, all computing data generated within the previous hours may be lost after the pod is recreated.

Dynamically modify the CPU and memory limit for a pod

  1. Deploy a task in a pod for simulation. The task is a stress testing program that uses 2 vCPUs and 256 MB of memory.

    Use the following template to deploy the simulation task and set the CPU limit to 1:

    apiVersion: v1
    kind: Pod
    metadata:
      name: pod-demo
    spec:
      containers:
      - name: pod-demo
        image: polinux/stress
        resources:
          requests:
            memory: "50Mi"
          limits:
            memory: "1000Mi"
            cpu: 1
        command: ["stress"]
        args: ["--vm", "1", "--vm-bytes", "256M", "-c", "2", "--vm-hang", "1"]
    82
    The preceding figure shows that the pod can use only one vCPU.
  2. Submit the following CRD template to dynamically modify the upper limit of CPU and memory usage:
    apiVersion: resources.alibabacloud.com/v1alpha1
    kind: Cgroups
    metadata:
      name: cgroups-sample
    spec:
      pod:
        name: pod-demo
        namespace: default
        containers:
        - name: pod-demo
          cpu: 2000m
          memory: 5000Mi
    83
    The preceding figure shows that the CPU usage of the pod increases from one vCPU to two vCPUs.
  3. Run the following command to query the status of the pod:
    kubectl describe pod pod-demo

    If the following output is returned, the pod is running as expected and is not restarted:

    Events:
      Type    Reason            Age   From                                   Message
      ----    ------            ----  ----                                   -------
      Normal  Scheduled         13m   default-scheduler                      Successfully assigned default/pod-demo to cn-zhangjiakou.192.168.3.238
      Normal  Pulling           13m   kubelet, cn-zhangjiakou.192.168.3.238  Pulling image "polinux/stress"
      Normal  Pulled            13m   kubelet, cn-zhangjiakou.192.168.3.238  Successfully pulled image "polinux/stress"
      Normal  SuccessfulChange  60s   cgroups-controller                     Change pod pod-demo cpu to 2000
      Normal  SuccessfulChange  60s   cgroups-controller                     Change pod pod-demo memory to 524288000000

Bind a pod to one or more CPUs

  1. Use the following template to create a pod that runs a stress testing program. The program uses 0.5 vCPUs.
    apiVersion: v1
    kind: Pod
    metadata:
      name: pod-demo
      annotations:
        cpuset-scheduler: 'true' # Add this annotation to enable topology-aware CPU scheduling. 
    spec:
      containers:
      - name: pod-demo
        image: polinux/stress
        resources:
          requests:
            memory: "50Mi"
          limits:
            memory: "1000Mi"
            cpu: 0.5
        command: ["stress"]
        args: ["--vm", "1", "--vm-bytes", "556M", "-c", "2", "--vm-hang", "1"]
  2. Check the usage of each CPU on the node cn-beijing.192.168.8.241. The following result indicates that the CPUs show different usage and the usage dynamically changes:
    top - 22:17:34 up 4 days, 10:29,  1 user,  load average: 0.33, 0.88, 0.95
    Tasks: 179 total,   3 running, 176 sleeping,   0 stopped,   0 zombie
    %Cpu0  : 13.1 us,  0.7 sy,  0.0 ni, 85.9 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu1  :  7.3 us,  7.7 sy,  0.0 ni, 84.7 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu2  : 12.4 us,  0.7 sy,  0.0 ni, 86.6 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu3  : 18.3 us,  0.7 sy,  0.0 ni, 80.7 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
  3. Submit the following CRD template to bind the pod to CPU 2 and CPU 3:
    apiVersion: resources.alibabacloud.com/v1alpha1
    kind: Cgroups
    metadata:
      name: cgroups-sample-cpusetpod
    spec:
      pod:
        name: pod-demo
        namespace: default
        containers:
        - name: pod-demo
          cpuset-cpus: 2-3
  4. Check the usage of each CPU on the node. The following result shows that the sum of the usage of CPU 2 and CPU 3 stays around 50% and the usage of each CPU stays around 25%. This indicates that the pod is bound to CPU 2 and CPU 3 as expected and the pod is not restarted.
    top - 22:11:02 up 4 days, 10:22,  1 user,  load average: 0.04, 0.36, 0.84
    Tasks: 177 total,   3 running, 174 sleeping,   0 stopped,   0 zombie
    %Cpu0  :  2.7 us,  0.7 sy,  0.0 ni, 96.3 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu1  :  3.3 us,  1.0 sy,  0.0 ni, 95.3 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu2  : 27.2 us,  0.7 sy,  0.0 ni, 71.8 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu3  : 21.4 us,  5.7 sy,  0.0 ni, 72.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

Bind a Deployment to one or more CPUs

  1. Use the following template to create a Deployment that runs a stress testing program. The Deployment provisions two pods, each of which uses 0.5 vCPUs.
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: go-demo
      labels:
        app: go-demo
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: go-demo
      template:
        metadata:
          annotations:
            cpuset-scheduler: "true" # Add this annotation to enable topology-aware CPU scheduling. 
          labels:
            app: go-demo
        spec:
          nodeName: cn-beijing.192.168.8.240 # Schedule the pods to the same node. 
          containers:
          - name: go-demo
            image: polinux/stress
            command: ["stress"]
            args: ["--vm", "1", "--vm-bytes", "556M", "-c", "1", "--vm-hang", "1"]
            imagePullPolicy: Always
            resources:
              requests:
                cpu: 0.5
              limits:
                cpu: 0.5  # Specify the value of resources.limit.cpu. 
  2. Check the usage of each CPU on the node cn-beijing.192.168.8.240. The following result indicates that the CPUs show different usage and the usage dynamically changes:
    top - 11:39:01 up 23:50,  2 users,  load average: 1.76, 1.91, 1.39
    Tasks: 189 total,   4 running, 185 sleeping,   0 stopped,   0 zombie
    %Cpu0  : 30.4 us,  5.4 sy,  0.0 ni, 64.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu1  : 29.4 us,  4.7 sy,  0.0 ni, 65.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu2  :  7.0 us,  8.7 sy,  0.0 ni, 84.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu3  : 50.3 us,  1.3 sy,  0.0 ni, 48.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  3. Submit the following CRD template to bind the pods that provisioned by the Deployment to CPU 2 and CPU 3:
    apiVersion: resources.alibabacloud.com/v1alpha1
    kind: Cgroups
    metadata:
      name: cgroups-cpuset-sample
    spec:
      deployment:
        name: go-demo
        namespace: default
        containers:
        - name: go-demo
          cpuset-cpus: 2,3 # Bind the pods to CPU 2 and CPU 3. 
  4. Check the usage of each CPU on the node. The following result shows that the sum of the usage of CPU 2 and CPU 3 stays around 50%. This indicates that the two pods provisioned by the Deployment are separately bound to CPU 2 and CPU 3 as expected, and the pods are not restarted.
    top - 11:30:56 up 23:42,  2 users,  load average: 2.01, 1.95, 1.12
    Tasks: 180 total,   4 running, 176 sleeping,   0 stopped,   0 zombie
    %Cpu0  :  4.4 us,  2.4 sy,  0.0 ni, 93.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu1  :  4.4 us,  2.3 sy,  0.0 ni, 92.6 id,  0.3 wa,  0.0 hi,  0.3 si,  0.0 st
    %Cpu2  : 52.7 us,  8.0 sy,  0.0 ni, 39.0 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu3  : 50.7 us, 10.7 sy,  0.0 ni, 38.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

Control the IOPS of a disk

Note To control the IOPS of a disk, you must first create a worker node that uses the Alibaba Cloud Linux 2 operating system.
  1. Create a Fio container. The container uses Fio to perform write stress tests on a disk.
    Use the following template to create a pod that is mounted with a disk volume. The pod is deployed on the node cn-beijing.192.168.0.182. The device number of the mounted disk is /dev/vdb1 and the disk is mounted to the /mnt path.
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: fio
      labels:
        app: fio
    spec:
      selector:
        matchLabels:
          app: fio
      template:
        metadata:
          labels:
            app: fio
        spec:
          nodeName: cn-beijing.192.168.0.182   # The pod is deployed on the node cn-beijing.192.168.0.182. 
          containers:
          - name: fio
            image: registry.cn-beijing.aliyuncs.com/shuangkun/fio:v1
            command: ["sh", "-c"]
            # Use Fio to perform write stress tests on the disk. 
            args: ["fio -filename=/data/test -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=12000 -group_reporting -name=mytest"]
            volumeMounts:
              - name: pvc
                mountPath: /data    # The disk volume is mounted to the /data path. 
          volumes:
            - name: pvc
              hostPath:
                path: /mnt
  2. Limit the throughput of the pod by deploying the CRD that is used to control the IOPS of the disk.

    Use the following template to set the BPS of the /dev/vdb1 disk to 1048576, 2097152, and 3145728 in sequence and check the disk performance when each value is specified.

    apiVersion: resources.alibabacloud.com/v1alpha1
    kind: Cgroups
    metadata:
      name: cgroups-sample-fio
    spec:
      pod:
        name: fio-6b6c469fdf-44h7v
        namespace: default
        containers:
        - name: fio
          blkio:
            device_write_bps: [{device: "/dev/vdb1", value: "1048576"}]
  3. Check the monitoring data of the disk on the node cn-beijing.192.168.0.182, as shown in the following figure.
    8485
    The preceding figure shows that the throughput BPS of the pod is as expected and the pod is not restarted during the modification.

Topology-aware CPU scheduling

You can use resource-controller with ACK schedulers to facilitate CPU binding and automate CPU selection on physical machines with multi-core CPUs, such as Intel, AMD, and ARM.

Topology-aware CPU scheduling manages vCPUs and hyperthreading properly to avoid the following issues: switchover between L1 cache and L2 cache, off-chip transmission across Non-Uniform Memory Access (NUMA), and frequent refreshing of L3 cache. This maximizes CPU utilization for CPU-intensive applications that run with multiple threads. For more information about topology-aware CPU scheduling, see the presentation by the ACK team at KubeCon 2020: Practice of Fine-grained Cgroups Resources Scheduling in Kubernetes.