ack-koordinator can dynamically overcommit resources. ack-koordinator monitors the loads of a node in real time and then schedules resources that are allocated to pods but are not in use. This topic describes how to use the dynamic resource overcommitment feature. This topic describes how to use the dynamic resource overcommitment feature.
Prerequisites
- Only Container Service for Kubernetes (ACK) Pro clusters support the dynamic resource overcommitment feature. For more information, see Create an ACK Pro cluster.
- ack-koordinator (FKA ack-slo-manager) is installed. For more information, see ack-koordinator.
Background information
In Kubernetes, the kubelet manages the resources that are used by the pods on a node based on the quality of service (QoS) classes of the pods. For example, the kubelet controls the out of memory (OOM) priorities. The QoS class of a pod can be Guaranteed, Burstable, or BestEffort. The QoS classes of pods depend on the requests and limits of CPU and memory resources that are configured for the pods.
- BestEffort pods do not have resource requests or limits. As a result, even if a node is overloaded, the system can still schedule BestEffort pods to the node.
- You cannot guarantee that resources are fairly scheduled among BestEffort pods due to the lack of requests and limits that specify the amount of resources used by a pod.

You can use the Service Level Objective (SLO) capability of ACK to control the resources that are used by BestEffort pods. In the preceding line graph, the SLO of ACK classifies the resources into three categories: Usage, Buffered, and Reclaimed. Usage refers to the actual resource usage and is represented by the red line. Buffered refers to reserved resources and is represented by the area between the blue line and red line. Reclaimed refers to reclaimed resources and is represented by the area in green.
Reclaimed resources are resources that can be dynamically overcommitted, as shown in the following figure. ack-koordinator monitors the loads of a node and synchronizes resource statistics to the node metadata as extended resources in real time. To allow BestEffort pods to use reclaimed resources, you can configure requests and limits of reclaimed resources for the BestEffort pods. In addition, you can configure settings that are related to reclaimed resources in the node configuration. This ensures that resources are fairly scheduled among BestEffort pods.
To differentiate reclaimed resources from regular resources, ack-koordinator assigns the Batch priority to reclaimed resources, including batch-cpu and batch-memory.

Limits
Component | Required version |
---|---|
Kubernetes | ≥ 1.18 |
ack-koordinator | ≥ 0.8.0 |
Helm | ≥ 3.0 |
Procedure
- Run the following command to query the total amount of Batch resources. Make sure that the relevant parameters are configured before your query the total amount of reclaimed resources. For more information, see the description in Step 3.
# Replace $nodeName with the name of the node that you want to query. kubectl get node $nodeName -o yaml
Expected output:#Node status: allocatable: # Unit: millicores. In the following example, 50 cores can be allocated. kubernetes.io/batch-cpu: 50000 # Unit: bytes. In the following example, 50 GB of memory can be allocated. kubernetes.io/batch-memory: 53687091200
- Create a pod and apply for reclaimed resources. Add a label to the pod to specify the QoS class of the pod and specify the Batch resource request and Batch resource limit. This way, the pod can use reclaimed resources.
When you apply for Batch resources, take note of the following items:#Pod metadata: labels: # Required. Set the QoS class of the pod to BestEffort. koordinator.sh/qosClass: "BE" spec: containers: - resources: requests: # Unit: millicores. In the following example, the CPU request is set to one core. kubernetes.io/batch-cpu: "1k" # Unit: bytes. In the following example, the memory request is set to 1 GB. kubernetes.io/batch-memory: "1Gi" limits: kubernetes.io/batch-cpu: "1k" kubernetes.io/batch-memory: "1Gi"
- If you provision a pod by using a Deployment or other types of workloads, you need only to modify the YAML template based on the format in the preceding code block. A pod cannot apply for reclaimed resources and regular resources at the same time.
- The amount of reclaimed resources on a node is calculated based on the loads of the node in real time. If the kubelet fails to synchronize the most recent statistics about reclaimed resources to the node metadata, the kubelet may reject the request for reclaimed resources. If the request is rejected, you can delete the pod that sends the request.
- You must set the amount of extended resources to an integer in Kubernetes clusters. The unit of
batch-cpu
resources is millicores.
- Manage resources that are dynamically overcommitted. The amount of Batch resources on a node is calculated based on the actual resource utilization. You can use the following formula to calculate the amount of Batch CPU resources and the amount of Batch memory resources:
nodeBatchAllocatable = nodeAllocatable * thresholdPercent - podUsage(non-BE) - systemUsage
The following section describes the factors in the formula:- nodeAllocatable: the amount of allocatable resources on the node.
- thresholdPercent: the threshold of resources in percentile.
- podUsage(non-BE): the resource usage of pods whose QoS classes are Burstable or Guaranteed.
- systemUsage: the usage of system resources on the node.
ack-koordinator can calculate reclaimed memory resources based on the following formula and the resource requests of pods. For more information, see the memoryCalculatePolicy parameter in the following section. In the following formula,podRequest(non-BE)
refers to the resource requests of pods whose QoS classes are Burstable or Guaranteed.nodeBatchAllocatable = nodeAllocatable * thresholdPercent - podRequest(non-BE) - systemUsage
The thresholdPercent factor is configurable. The following code block shows how to manage resources by modifying a ConfigMap:apiVersion: v1 kind: ConfigMap metadata: name: ack-slo-config namespace: kube-system data: colocation-config: | { "enable": true, "metricAggregateDurationSeconds": 60, "cpuReclaimThresholdPercent": 60, "memoryReclaimThresholdPercent": 70, "memoryCalculatePolicy": "usage" }
Parameter Data type Description enable
Boolean Specifies whether to dynamically update the statistics about Batch resources. If you disable this feature, the amount of reclaimed resources is reset to 0
. Default value:false
.metricAggregateDurationSeconds
Int The minimum frequency at which the statistics about Batch resources are updated. Unit: seconds. Default value: 60. We recommend that you use the default setting. cpuReclaimThresholdPercent
Int The reclaim threshold of batch-cpu
resources. Default value:65
. Unit: %.memoryReclaimThresholdPercent
Int The reclaim threshold of batch-memory
resources in percentile. Default value:65
. Unit: %.memoryCalculatePolicy
String The policy for calculating the amount of batch-memory resources. Valid values: "usage"
: The amount of batch-memory resources is calculated based on the actual memory usage of pods whose QoS classes are Burstable or Guaranteed. If this policy is used, the batch-memory resources include resources that are not allocated and resources that are allocated but are not in use. This is the default value."request"
: The amount of batch-memory resources is calculated based on the memory requests of pods whose QoS classes are Burstable or Guaranteed. If this policy is used, the batch-memory resources include only resources that are not allocated.
Note ack-koordinator provides features that are used to limit the resource usage of BestEffort pods and evict BestEffort pods. You can use these features to eliminate the negative impact of BestEffort pods on your business. For more information, see Elastic resource limit, Memory QoS for containers, and Resource isolation based on the L3 cache and MBA. - Check whether the
ack-slo-config
ConfigMap exists in thekube-system
namespace.- If the
ack-slo-config
ConfigMap exists, we recommend that you run the kubectl patch command to update the ConfigMap. This avoids changing other settings in the ConfigMap.kubectl patch cm -n kube-system ack-slo-config --patch "$(cat configmap.yaml)"
- If the
ack-slo-config
ConfigMap does not exist, run the kubectl patch command to create a ConfigMap named ack-slo-config:kubectl apply -f configmap.yaml
- If the
- Optional. View the usage of Batch resources in Prometheus.
If this is the first time you use Prometheus dashboards, reset the dashboards and install the Dynamic Resource Overcommitment dashboard. For more information about how to reset Prometheus dashboards, see Reset dashboards.
To view details about the usage of Batch resources on the Prometheus Monitoring page of the ACK console, perform the following steps:
- Log on to the ACK console.
- In the left-side navigation pane of the ACK console, click Clusters.
- On the Clusters page, find the cluster that you want to manage and click its name or click Details in the Actions column.
- In the left-side navigation pane of the cluster details page, choose .
- On the Prometheus Monitoring page, click the Dynamic Resource Overcommitment tab.
On the Dynamic Resource Overcommitment tab, you can view details about the Batch resources. The details include the total amount of Batch resources provided by each node, the total amount of Batch resources provided by the cluster, the amount of Batch resources requested by the containers on each node, and the total amount of Batch resources requested by the containers in the cluster. For more information, see Enable ARMS Prometheus.
# The amount of allocatable batch-cpu resources on the node. koordlet_node_resource_allocatable{resource="kubernetes.io/batch-cpu",node="$node"} # The amount of batch-cpu resources that are allocated on the node. koordlet_container_resource_requests{resource="kubernetes.io/batch-cpu",node="$node"} # The amount of allocatable batch-memory resources on the node. kube_node_status_allocatable{resource="kubernetes.io/batch-memory",node="$node"} # The amount of batch-memory resources that are allocated on the node. koordlet_container_resource_requests{resource="kubernetes.io/batch-memory",node="$node"}
Examples
- Run the following command to query the total amount of reclaimed resources on the node. Make sure that the relevant parameters are configured before your query the total amount of reclaimed resources. For more information, see the description in Step 3.
kubectl get node $nodeName -o yaml
Expected output:
# The node metadata. status: allocatable: # Unit: millicores. In the following example, 50 cores can be allocated. kubernetes.io/batch-cpu: 50000 # Unit: bytes. In the following example, 50 GB of memory can be allocated. kubernetes.io/batch-memory: 53687091200
- Create a YAML file named be-pod-demo.yaml based on the following content:
apiVersion: v1 kind: Pod metadata: lables: koordinator.sh/qosClass: "BE" name: be-demo spec: containers: - command: - "sleep" - "100h" image: polinux/stress imagePullPolicy: Always name: be-demo resources: limits: kubernetes.io/batch-cpu: "50k" kubernetes.io/batch-memory: "10Gi" requests: kubernetes.io/batch-cpu: "50k" kubernetes.io/batch-memory: "10Gi" schedulerName: default-scheduler
- Run the following command to deploy be-pod-demo:
kubectl apply -f be-pod-demo.yaml
- Check whether the resource limits of the BestEffort pod take effect in the cgroup of the node.
FAQ
Is the resource overcommitment feature that is enabled based on the earlier version of the ack-slo-manager protocol supported after I upgrade from ack-slo-manager to ack-koordinator?
The earlier version of the ack-slo-manager protocol includes the following components:
- The
alibabacloud.com/qosClass
pod annotation. - The
alibabacloud.com/reclaimed
field that is used to specify the resource requests and limits of pods.
ack-koordinator is compatible with the earlier version of the ack-slo-manager protocol. The ACK Pro scheduler can calculate the amount of requested resources and the amount of available resources for both the earlier protocol version and the new protocol version. You can seamlessly upgrade from ack-slo-manager to ack-koordinator.
The following table describes the compatibility between the ACK Pro scheduler, ack-koordinator, and different types of protocols.
ACK scheduler version | ack-koordinator (ack-slo-manager) | alibabacloud.com protocol | koordinator.sh protocol |
---|---|---|---|
≥1.18 and < 1.22.15-ack-2.0 | ≥ 0.3.0 | Supported | Not supported |
≥ 1.22.15-ack-2.0 | ≥ 0.8.0 | Supported | Supported |