Topology-aware CPU scheduling - Container Service for Kubernetes

Container Service for Kubernetes (ACK) provides the topology-aware CPU scheduling feature based on the new Kubernetes scheduling framework. This feature can improve the performance of CPU-sensitive workloads. This topic describes how to enable topology-aware CPU scheduling.

Prerequisites

An ACK Pro cluster is created. For more information, see Create an ACK Pro cluster.
ack-koordinator (FKA ack-slo-manager) is installed. For more information, see ack-koordinator.
Note
ack-koordinator is upgraded and optimized based on resource-controller. You must uninstall resource-controller after you install ack-koordinator. For more information about how to uninstall resource-controller, resource-controller.

Background information

Multiple pods can run on a node in a Kubernetes cluster, and some of the pods may belong to CPU-intensive workloads. In this case, pods compete for CPU resources. When the competition becomes intense, the CPU cores that are allocated to each pod may frequently change. This situation intensifies when Non-Uniform Memory Access (NUMA) nodes are used. These changes degrade the performance of the workloads. The Kubernetes CPU manager provides a CPU scheduling solution to fix this issue within a node. However, the Kubernetes CPU manager cannot find the optimal way to allocate CPU cores within a cluster. The Kubernetes CPU manager works only on guaranteed pods and does not apply to other types of pods. In a guaranteed pod, each container is configured with a CPU request and a CPU limit, and the request and limit are set to the same value.

Topology-aware CPU scheduling applies to the following scenarios:

The workload is compute-intensive.
The application is CPU-sensitive.
The workload runs on multi-core Elastic Compute Service (ECS) bare metal instances with Intel CPUs or AMD CPUs.
To test topology-aware CPU scheduling, stress tests are performed on two NGINX applications that both request 4 CPU cores and 8 GB of memory. The tests are performed on applications that are deployed on an ECS bare metal instance with 104 Intel CPU cores and also applications that are deployed on an ECS bare metal instance with 256 AMD CPU cores. The results show that the application performance is improved by 22% to 43% when topology-aware CPU scheduling is enabled. The following figures show the details.
Performance metric
Intel
AMD
QPS
Improved by 22.9%
Improved by 43.6%
AVG RT
Reduced by 26.3%
Reduced by 42.5%

When you enable topology-aware CPU scheduling, you can set cpu-policy to static-burst in the template.metadata.annotations section of the Deployment object or in the metadata.annotations section of the Pod object to adjust the automatic CPU core binding policy. The policy is suitable for compute-intensive workloads and can efficiently reduce CPU core contention among processes and memory access between NUMA nodes. This maximizes the utilization of fragmented CPU resources and optimizes resource allocation for compute-intensive workloads without the need to modify the hardware and VM resources. This further improves CPU usage.

For more information about how topology-aware CPU scheduling is implemented, see Practice of Fine-grained Cgroups Resources Scheduling in Kubernetes.

Limits

The following table describes the system component versions that are required.

Component	Version
Kubernetes	≥ 1.18
ack-koordinator	≥ 0.2.0

Usage notes

Before you enable topology-aware CPU scheduling, make sure that ack-koordinator is deployed.
When you enable topology-aware CPU scheduling, make sure that cpu-policy=none is configured for the nodes.
To limit pod scheduling, add the nodeSelector parameter.
Important
Do not add the nodeName parameter, which cannot be parsed by the pod scheduler when topology-aware CPU scheduling is enabled.

Enable topology-aware CPU scheduling

Before you enable topology-aware CPU scheduling, you need to configure the annotations and Containers parameters when you configure pods. Perform the following steps to enable topology-aware CPU scheduling.

Set cpuset-scheduler to true in the template.metadata.annotations section of the Deployment object or in the metadata.annotations section of the Pod object to enable topology-aware CPU scheduling.
Set the resources.limit.cpu parameter in the containers section to an integer.

Create a file named go-demo.yaml based on the following content and configure the Deployment to use topology-aware CPU scheduling.

Important

You need to configure pod annotations in the template.metadata section of the Deployment.
When you configure topology-aware CPU scheduling, you can set cpu-policy to static-burst in the annotations section to adjust the automatic CPU core binding policy. To use the setting, delete the number sign (#) before cpu-policy.

Click to view details

apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-demo
spec:
  replicas: 4
  selector:
    matchLabels:
      app: go-demo
  template:
    metadata:
      annotations:
        cpuset-scheduler: "true" # Enable topology-aware CPU scheduling. 
      labels:
        app: go-demo
    spec:
      containers:
      - name: go-demo
        image: registry.cn-hangzhou.aliyuncs.com/polinux/stress/go-demo:1k
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 1
          limits: 
            cpu: 4  #Specify the value of resources.limit.cpu.

Run the following command to create a Deployment:
```
kubectl create -f go-demo.yaml
```

Verify topology-aware CPU scheduling

In this example, the following conditions apply:

The Kubernetes version of the ACK Pro cluster is 1.20.
Two cluster nodes are used in the test. One is used as the load generator. The other runs the workloads and serves as the tested machine.

Run the following command to add a label to the tested machine:
```
 kubectl label node 192.168.XX.XX policy=intel/amd
```

Deploy the NGINX service on the tested machine.

Use the following YAML templates to create resources for the NGINX service:

View the content of the service.yaml file

apiVersion: v1
kind: Service
metadata:
  name: nginx-service-nodeport
spec:
  selector:
      app: nginx
  ports:
    - name: http
      port: 8000
      protocol: TCP
      targetPort: 80
      nodePort: 32257
  type: NodePort

View the content of the configmap.yaml file

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-configmap
data:
  nginx_conf: |-
    user  nginx;
    worker_processes  4;
    error_log  /var/log/nginx/error.log warn;
    pid        /var/run/nginx.pid;
    events {
        worker_connections  65535;
    }
    http {
        include       /etc/nginx/mime.types;
        default_type  application/octet-stream;
        log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';
        access_log  /var/log/nginx/access.log  main;
        sendfile        on;
        #tcp_nopush     on;
        keepalive_timeout  65;
        #gzip  on;
        include /etc/nginx/conf.d/*.conf;
    }

View the content of the nginx.yaml file

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      annotations:
        #cpuset-scheduler: "true"    By default, topology-aware CPU scheduling is disabled. 
      labels:
        app: nginx
    spec:
      nodeSelector:
        policy: intel7
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 4
            memory: 8Gi
          limits:
            cpu: 4
            memory: 8Gi
        volumeMounts:
           - mountPath: /etc/nginx/nginx.conf
             name: nginx
             subPath: nginx.conf
      volumes:
        - name: nginx
          configMap:
            name: nginx-configmap
            items:
              - key: nginx_conf
                path: nginx.conf

Run the following command to create the resources that are provisioned for the NGINX service:
- ```
kubectl create -f service.yaml
```
- ```
kubectl create -f configmap.yaml
```
- ```
kubectl create -f nginx.yaml
```

Log on to the load generator, download the wrk2 open source stress test tool, and decompress the package. For more information, see wrk2 official site.
Note
For more information about how to log on to a node, see Connect to an instance by using VNC or Connect to a Windows instance by using a username and password.

Run the following command to perform stress tests and record the test data:

taskset -c 32-45 wrk --timeout 2s -t 20 -c 100 -d 60s --latency http://<IP address of the tested machine>:32257

Expected output:

20 threads and 100 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
Latency   600.58us    3.07ms 117.51ms   99.74%
Req/Sec    10.67k     2.38k   22.33k    67.79%
Latency Distribution
50%  462.00us
75%  680.00us
90%  738.00us
99%    0.90ms
12762127 requests in 1.00m, 10.10GB read
Requests/sec: 212350.15Transfer/sec:    172.13MB

Run the following command to delete the NGINX Deployment:
```
kubectl delete deployment nginx
```
Expected output:
```
deployment "nginx" deleted
```

Use the following YAML template to deploy an NGINX Deployment with topology-aware CPU scheduling enabled:

Click to view details

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      annotations:
        cpuset-scheduler: "true"
      labels:
        app: nginx
    spec:
      nodeSelector:
        policy: intel7
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 4
            memory: 8Gi
          limits:
            cpu: 4
            memory: 8Gi
        volumeMounts:
           - mountPath: /etc/nginx/nginx.conf
             name: nginx
             subPath: nginx.conf
      volumes:
        - name: nginx
          configMap:
            name: nginx-configmap
            items:
              - key: nginx_conf
                path: nginx.conf

Run the following command to perform stress tests and record the test data for comparison:

taskset -c 32-45 wrk --timeout 2s -t 20 -c 100 -d 60s --latency http://<IP address of the tested machine>:32257

Expected output:

20 threads and 100 connections
ls  Thread Stats   Avg         Stdev     Max       +/- Stdev
Latency            345.79us    1.02ms    82.21ms   99.93%
Req/Sec            15.33k      2.53k     25.84k    71.53%
Latency Distribution
50%  327.00us
75%  444.00us
90%  479.00us
99%  571.00us
18337573 requests in 1.00m, 14.52GB read
Requests/sec: 305119.06Transfer/sec:    247.34MB

Compare the data of the preceding tests. This comparison indicates that the performance of the NGINX service is improved by 43% after topology-aware CPU scheduling is enabled.

Verify that the automatic CPU core binding policy improves performance

In this example, a CPU policy is configured for a workload that runs on a node with 64 CPU cores. After you configure the automatic CPU core binding policy of an application with topology-aware CPU scheduling enabled, the CPU usage can be further improved by 7% to 8%.

Run the following command to query the pods:

kubectl get pods | grep cal-pi

Expected output:

NAME                 READY     STATUS    RESTARTS   AGE
cal-pi-d****         1/1       Running   1          9h

Run the following command to query the log of the cal-pi-d**** application:

kubectl logs cal-pi-d****

Expected output:

computing Pi with 3000 Threads...computed the first 20000 digets of pi in 620892 ms! 
the first digets are: 3.14159264
writing to pi.txt...
finished!

Use topology-aware CPU scheduling.

Configure the Deployment to use topology-aware CPU scheduling and configure the automatic CPU core binding policy. For more information, see Enable topology-aware CPU scheduling.

Create a file named go-demo.yaml based on the following content and configure the Deployment to use topology-aware CPU scheduling.

Important

You need to configure pod annotations in the template.metadata section of the Deployment.

Click to view details

apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-demo
spec:
  replicas: 4
  selector:
    matchLabels:
      app: go-demo
  template:
    metadata:
      annotations:
        cpuset-scheduler: "true" # Enable topology-aware CPU scheduling. 
        cpu-policy: 'static-burst' # Configure the automatic CPU core binding policy and improve the utilization of fragmented CPU resources. 
      labels:
        app: go-demo
    spec:
      containers:
      - name: go-demo
        image: registry.cn-hangzhou.aliyuncs.com/polinux/stress/go-demo:1k
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 1
          limits: 
            cpu: 4  #Specify the value of resources.limit.cpu.

Run the following command to create a Deployment:
```
kubectl create -f go-demo.yaml
```

Run the following command to query the pods:

kubectl get pods | grep go-demo

Expected output:

NAME                 READY     STATUS    RESTARTS   AGE
go-demo-e****        1/1       Running   1          9h

Run the following command to query the log of the go-demo-e**** application:
```
kubectl logs go-demo-e****
```
Expected output:
```
computing Pi with 3000 Threads...computed the first 20000 digets of pi in 571221 ms!
the first digets are: 3.14159264
writing to pi.txt...
finished!
```
Compare the log data with the log data in Step 2. You can find that the performance of the pod configured with a CPU policy is improved by 8%.

Performance metric	Intel	AMD
QPS	Improved by 22.9%	Improved by 43.6%
AVG RT	Reduced by 26.3%	Reduced by 42.5%