Use CPU Burst to improve the performance of latency-sensitive applications - Container Service for Kubernetes

CPU Burst is a service level objective (SLO)-aware resource scheduling feature provided by Container Service for Kubernetes (ACK). You can use CPU Burst to improve the performance of latency-sensitive applications. CPU scheduling for a container may be throttled by the kernel due to the CPU limit, which downgrades the performance of the application. The ack-koordinator component automatically detects CPU throttling events and automatically adjusts the CPU limit to a proper value. This greatly improves the performance of latency-sensitive applications. This topic introduces CPU Burst. This topic also describes how to use CPU Burst and how to verify the performance improvement.

Prerequisites

An ACK Pro cluster is created, and the Kubernetes version of the cluster is 1.18 or later. For more information, see Create an ACK Pro cluster.
ack-koordinator 0.8.0 or later is installed. For more information, see ack-koordinator (FKA ack-slo-manager).

Billing rules

No fee is charged when you install and use the ack-koordinator component. However, fees may be charged in the following scenarios:

ack-koordinator is an non-managed component that occupies worker node resources after it is installed. You can specify the amount of resources requested by each module when you install the component.
By default, ack-koordinator exposes the monitoring metrics of features such as resource profiling and fine-grained scheduling as Prometheus metrics. If you enable Prometheus metrics for ack-koordinator and use Managed Service for Prometheus, these metrics are considered as custom metrics and fees are charged for these metrics. The fee depends on factors such as the size of your cluster and the number of applications. Before you enable Prometheus metrics, we recommend that you read the Billing topic of Managed Service for Prometheus to learn the free quota and billing rules of custom metrics. For more information about how to monitor and manage resource usage see Query the amount of observable data and bills.

Use scenarios

CPU Burst is suitable in the following scenarios:

CPU throttling is triggered for an application though the CPU usage of the application is less than the CPU limit of the application. As a result, the performance of the application is degraded. You can enable CPU Burst to resolve this issue and improve the performance of the application.
The CPU usage during an application startup is higher than the CPU usage after the application is started. You can enable CPU Burst to meet the CPU requirements during an application startup. This way, you do not need to specify an excessively high CPU request for the application, which reduces resource waste.

How it works

Kubernetes allows you to specify resource limits to manage resource allocation. For time-sharing resources such as CPUs, if you specify a CPU limit for a container, the OS limits the amount of CPU resources that can be used by the container within a specific time period. For example, you set the CPU limit of a container to 2. The OS kernel limits the CPU time slices that the container can use to 200 milliseconds within each 100-millisecond period.

CPU utilization is a key metric that is used to evaluate the performance of a container. In most cases, the CPU limit is specified based on CPU utilization. CPU utilization on a per-millisecond basis shows more spikes than on a per-second basis. If the CPU utilization of a container reaches the limit within a 100-millisecond period, CPU throttling is enforced by the OS kernel and threads in the container are suspended for the rest of the time period, as shown in the following figure.

原理说明

The following figure shows the thread allocation of a web application container that runs on a node with four vCores. The CPU limit of the container is set to 2. The overall CPU utilization within the last second is low. However, Thread 2 cannot be resumed until the third 100-millisecond period starts because CPU throttling is enforced somewhere in the second 100-millisecond period. This increases the response time (RT) and causes long-tail latency problems in containers.

ack-slo-manager example.png

Alibaba Cloud Linux provides the CPU Burst feature. CPU Burst allows a container to accumulate CPU time slices when the container is idle. The container can use the accumulated CPU time slices to burst above the CPU limit when CPU utilization spikes. This improves performance and reduces the RT of the container. For more information, see Enable the CPU burst feature for cgroup v1.

CPU Burst.png

For kernel versions that do not support CPU Burst, ack-koordinator detects CPU throttling events and dynamically adjusts the CPU limit to achieve the same effect as CPU Burst.

Note

ack-koordinator achieves this by modifying the value of the CFS quota in the cgroup parameters instead of modifying the value of the CPU limit in the pod specifications.

The preceding Completely Fair Scheduler (CFS) quota adjustment policy can be used to handle CPU usage spikes For example, when traffic spikes occur, ack-koordinator can eliminate CPU bottlenecks within a few seconds, while ensuring a proper number of workloads on the node.

The CPU Burst feature supported by the kernel of Alibaba Cloud Linux handles CPU usage spikes at a faster rate. We recommend that you enable the CPU Burst feature provided by the kernel of Alibaba Cloud Linux for latency-sensitive applications.

How to use CPU Burst

Use an annotation to enable CPU Burst

Important

To enable CPU Burst for a pod, configure the annotations parameter in the metadata section of the pod configuration.
To enable CPU Burst for a Deployment, configure the annotations parameter in the template.metadata section of the Deployment configuration.

annotations:
  # Set the value to auto to enable CPU Burst for the pod. 
  koordinator.sh/cpuBurst: '{"policy": "auto"}'
  # Set the value to none to disable CPU Burst for the pod. 
  #koordinator.sh/cpuBurst: '{"policy": "none"}'

Use a ConfigMap to enable CPU Burst for all pods in a cluster

Create a file named configmap.yaml based on the following sample ConfigMap to enable CPU Burst for all pods in a cluster:

apiVersion: v1
data:
  cpu-burst-config: '{"clusterStrategy": {"policy": "auto"}}'
  #cpu-burst-config: '{"clusterStrategy": {"policy": "cpuBurstOnly"}}'
  #cpu-burst-config: '{"clusterStrategy": {"policy": "none"}}'
kind: ConfigMap
metadata:
  name: ack-slo-config
  namespace: kube-system

Check whether the ConfigMap named ack-slo-config exists in the kube-system namespace.
- If the ConfigMap ack-slo-config exists, use the patch method to update the ConfigMap in case other settings in the ConfigMap are affected.
```
kubectl patch cm -n kube-system ack-slo-config --patch "$(cat configmap.yaml)
```
- If the ConfigMap ack-slo-config does not exist, run the following command to create one:
```
kubectl apply -f configmap.yaml
```

Enable CPU Burst for pods in specified namespaces

apiVersion: v1
kind: ConfigMap
metadata:
  name: ack-slo-pod-config
  namespace: koordinator-system # You need to manually create the namespace during the first time.
data:
  # Enable or disable CPU Burst for pods in specified namespaces. 
  cpu-burst: |
    {
      "enabledNamespaces": ["white-ns"],
      "disabledNamespaces": ["black-ns"]
    }

Advanced configurations of CPU Burst

You can specify advanced configurations in the ConfigMap or the annotations parameter in the metadata section of the pod configuration.

# Example of the ack-slo-config ConfigMap. 
data:
  cpu-burst-config: |
    {
      "clusterStrategy": {
        "policy": "auto",
        "cpuBurstPercent": 1000,
        "cfsQuotaBurstPercent": 300,
        "sharePoolThresholdPercent": 50,
        "cfsQuotaBurstPeriodSeconds": -1
      }
    }

# Example of pod annotations. 
  koordinator.sh/cpuBurst: '{"policy": "auto", "cpuBurstPercent": 1000, "cfsQuotaBurstPercent": 300, "cfsQuotaBurstPeriodSeconds": -1}'

The following table describes the advanced parameters of CPU Burst.

Parameter	Type	Description
`policy`	string	`none`: disables CPU Burst. If you set the value to none, the related fields are reset to their original values. This is the default value. `cpuBurstOnly`: enables the CPU Burst feature only for the kernel of Alibaba Cloud Linux. `cfsQuotaBurstOnly`: enables automatic adjustment of the CFS quotas of general kernel versions. `auto`: enables CPU Burst and all the related features, including CPU Burst for the kernel of Alibaba Cloud Linux and automatic adjustment of the CFS quotas of general kernel versions.
`cpuBurstPercent`	int	Default value: `1000`. Unit: %. This field is used to configure the CPU Burst feature for the kernel of Alibaba Cloud Linux. This field specifies the percentage to which the CPU limit can be increased by CPU Burst. If the CPU limit is set to `1`, CPU Burst can increase the limit to 10 by default. For more information, see Enable the CPU burst feature for cgroup v1.
`cfsQuotaBurstPercent`	int	Default value: `300`. Unit: %. This field specifies the maximum percentage to which the value of cfs_quota in cgroup parameters can be increased. By default, the value of cfs_quota can be increased to at most three times.
`cfsQuotaBurstPeriodSeconds`	int	Default value: `-1`. Unit: seconds. This indicates that the duration for which the container can run with an increased CFS quota is unlimited. This field specifies the duration for which the container can run with an increased CFS quota, which cannot exceed the upper limit specified by `cfsQuotaBurstPercent`.
`sharePoolThresholdPercent`	int	Default value: `50`. Unit: %. This field specifies the CPU utilization threshold of the node. If the CPU utilization of the node exceeds the threshold, the value of `cfs_quota` in cgroup parameters is reset to the original value.

Important

After you set policy to cfsQuotaBurstOnly or auto, the CFS quota assigned by the node OS is automatically adjusted based on whether CPU throttling is triggered.
When you perform stress tests on a container, we recommend that you record the CPU utilization of the container throughout the test period or set policy to cpuBurstOnly or none. This ensures higher resource elasticity for your production environment.

Verify the effect of CPU Burst

Verification steps

Use the following YAML template to create an apache-demo.yaml file:

To enable CPU Burst for a pod, specify an annotation in the annotations parameter of the metadata section of the pod configuration.

apiVersion: v1
kind: Pod
metadata:
  name: apache-demo
  annotations:
    koordinator.sh/cpuBurst: '{"policy": "auto"}'   # The annotation is used to enable or disable CPU Burst. 
spec:
  containers:
  - command:
    - httpd
    - -D
    - FOREGROUND
    image: registry.cn-zhangjiakou.aliyuncs.com/acs/apache-2-4-51-for-slo-test:v0.1
    imagePullPolicy: Always
    name: apache
    resources:
      limits:
        cpu: "4"
        memory: 10Gi
      requests:
        cpu: "4"
        memory: 10Gi
  nodeName: $nodeName # Replace nodeName with the actual node name. 
  hostNetwork: False
  restartPolicy: Never
  schedulerName: default-scheduler

Run the following command to create an application by using Apache HTTP Server:
```
kubectl apply -f apache-demo.yaml
```
Use the wrk2 tool to perform stress tests.

# Download, decompress, and then install the wrk2 package. For more information, visit https://github.com/giltene/wrk2. 
# Gzip compression is enabled in the Apache image to simulate the request processing logic of the server. 
# Run the following command to send requests. Replace the IP address in the command with the IP address of the application. 
./wrk -H "Accept-Encoding: deflate, gzip" -t 2 -c 12 -d 120 --latency --timeout 2s -R 24 http://$target_ip_address:8010/static/file.1m.test

Note

Replace the IP address in the command with the pod IP address of the Apache application.
You can modify the -R field to change the number of queries per unit time from the sender.

Analyze the result

The following tables show metrics before and after CPU Burst is enabled for Alibaba Cloud Linux and CentOS.

The Disabled column shows the metrics when the CPU Burst policy is set to none.
The Enabled column shows the metrics when the CPU Burst policy is set to auto.

Important

The following metrics are theoretical values. Actual values are subject to your operating environment.

Alibaba Cloud Linux	Disabled	Enabled
apache RT-p99	107.37 ms	67.18 ms (-37.4%)
CPU Throttled Ratio	33.3%	0%
Average pod CPU utilization	31.8%	32.6%

CentOS	Disabled	Enabled
apache RT-p99	111.69 ms	71.30 ms (-36.2%)
CPU Throttled Ratio	33%	0%
Average pod CPU utilization	32.5%	33.8%

The preceding metrics indicate the following information:

After CPU Burst is enabled, the P99 latency is greatly reduced.
After CPU Burst is enabled, CPU throttling is stopped and the average pod CPU utilization remains approximately at the same value.

FAQ

Is the CPU Burst feature that is enabled based on the earlier version of the ack-slo-manager protocol supported after I upgrade ack-slo-manager to ack-koordinator?

The earlier version of the pod protocol requires you to add the alibabacloud.com/cpuBurst annotation. ack-koordinator is fully compatible with the earlier protocol version. You can seamlessly upgrade from ack-slo-manager to ack-koordinator.

Note

ack-koordinator is compatible with the earlier protocol version until July 30, 2023. We recommend that you upgrade the resource parameters of the earlier protocol version to the latest version.

The following table describes the compatibilities between ack-koordinator and different types of protocols.

ack-koordinator version	alibabacloud.com protocol	koordinator.sh protocol
≥ 0.2.0	Supported	Not supported
≥ 0.8.0	Supported	Supported