The scheduler in Container Service for Kubernetes (ACK) Pro clusters supports the load-aware pod scheduling feature. This feature can monitor the loads on nodes and schedule pods to nodes with lower loads to implement load balancing. This prevents application or node crashes caused by an overloaded node. This topic describes how to use load-aware pod scheduling.
Table of contents
Prerequisites
- ack-koordinator 1.1.1-ack.1 or later is installed. For more information, see ack-koordinator(ack-slo-manager).
- Helm 3.0 or later is used. For more information about how to update the Helm version, see [Component Updates] Update Helm V2 to V3 and How do I manually update Helm?.
- The following table describes the Kube Scheduler versions that are suitable for different Kubernetes versions.
Kubernetes version ack-koordinator version 1.24 ≥ 1.24.6-ack-4.0 1.22 ≥ 1.22.15-ack-4.0
Limits
Only ACK Pro clusters support load-aware pod scheduling. For more information about how to create an ACK Pro cluster, see Create an ACK Pro cluster.
Introduction to load-aware pod scheduling
The load-aware pod scheduling feature of the Kube Scheduler component provided by ACK is designed based on the Kubernetes scheduling framework. The Kubernetes scheduler schedules pods to nodes based on resource allocation. Kube Scheduler schedules pods to nodes based on the loads on nodes. After load-aware pod scheduling is enabled, the system reviews the historical statistics of loads on nodes. Then, the system schedules pods to nodes with lower loads to implement load balancing. This prevents application or node crashes caused by an overloaded node.
The following figure compares the Kubernetes scheduler and Kube Scheduler. Requested indicates the resources that are requested by pods on the node and Usage indicates the resources that are in use by pods on the node. Only resources in use are accounted for when the system calculates the loads on the node. In the scenario in the figure, Kube Scheduler schedules new pods to Node B because Node B has lower loads.
As time, the cluster environment, traffic, or requests to workloads on nodes change, the load distribution among nodes may become imbalanced. To prevent this issue, ack-koordinator provides the load-aware hotspot descheduling feature. You can use load-aware scheduling and hotspot descheduling in combination to achieve optimal load balancing among nodes. For more information about load-aware hotspot descheduling, see Work with load-aware hotspot descheduling.
How load-aware scheduling is implemented
Load-aware scheduling is implemented by using Kube Scheduler and ack-koordinator. ack-koordinator is responsible for collecting and reporting metrics about node resource utilization. Kube Scheduler is responsible for calculating the scores of nodes based on resource utilization and sorting nodes based on node scores. Kube Scheduler preferentially schedules new pods to nodes with lower loads. For more information about the architecture of ack-koordinator, see ack-koordinator architecture.
Scheduling policies
Policy | Description |
---|---|
Node sorting | The load-aware scheduling plug-in calculates the node score based on CPU utilization and memory utilization. The scheduler uses weighted scoring and schedules pods to nodes with higher scores. You can customize the CPU weight and memory weight. For more information, see Kube Scheduler parameters. The node score is calculated based on the following formula: [(1 - CPU utilization) × CPU weight + (1 - Memory utilization × Memory weight)/(CPU weight + Memory weight)]. CPU utilization and memory utilization are measured in percentages. |
Resource utilization calculation | You can configure how the average resource utilization is calculated and the percentage of data that is calculated. By default, the average resource utilization within the last 5 minutes is calculated. For more information, see Kube Scheduler parameters. |
Step 1: Enable load-aware scheduling
- Log on to the ACK console and click Clusters in the left-side navigation pane.
- On the Clusters page, click the name of a cluster and choose in the left-side navigation pane.
- On the Add-ons page, find Kube Scheduler and click Configuration in the Kube Scheduler card.
- In the Kube Scheduler Parameters dialog box, select Enable Load-aware Scheduling, set the parameters in the following table, and then click OK. For more information about other parameters, see Customize the scheduler parameters.
Parameter Data type Description Value Example loadAwareResourceWeight The value of this parameter consists of the resourceName and resourceWeight fields. The weight of the resources. - Valid values of resourceName: cpu and memory
- Valid values of resourceWeight: 1 to 100
Default value: cpu=1, memory=1
- resourceName: cpu
- resourceWeight: 1
loadAwareAggregatedUsageAggragationType The value is an enumerated value. The type of data aggregation for the statistics. Valid values: - avg: calculates the average value.
- p50: calculates 50% of the statistics.
- p90, p95, and p99: calculates 90%, 95%, and 99% of the statistics.
- avg
- p50
- p90
- p95
- p99
Default value: avg.
p90 In the left-side navigation pane of the cluster details page, click Cluster Information. On the Basic Information tab, if the cluster changes to the Running state, load-aware scheduling is enabled.
Step 2: Verify load-aware scheduling
In the following example, a cluster that contains three nodes is used. Each node has 4 vCores and 16 GiB of memory.
- Create a stress-demo.yaml file and copy the following content to the file: Example:
apiVersion: apps/v1 kind: Deployment metadata: name: stress-demo namespace: default labels: app: stress-demo spec: replicas: 1 selector: matchLabels: app: stress-demo template: metadata: name: stress-demo labels: app: stress-demo spec: containers: - args: - '--vm' - '2' - '--vm-bytes' - '1600M' - '-c' - '2' - '--vm-hang' - '2' command: - stress image: polinux/stress imagePullPolicy: Always name: stress resources: limits: cpu: '2' memory: 4Gi requests: cpu: '2' memory: 4Gi restartPolicy: Always
- Run the following command to create a pod. After you create the pod, increase the loads on a node.
kubectl create -f stress-demo.yaml deployment.apps/stress-demo created
- Run the following command to check whether the pod is in the Running state:
Expected output:kubectl get pod -o wide
TheNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES stress-demo-7fdd89cc6b-g**** 1/1 Running 0 82s 10.XX.XX.112 cn-beijing.10.XX.XX.112 <none> <none>
stress-demo-7fdd89cc6b-g****
pod is scheduled to thecn-beijing.10.XX.XX.112
node.Wait 3 minutes. Make sure that the pod is initialized and the loads on the node is increased. - Run the following command to query the loads on each node:
Expected output:kubectl top node
TheNAME CPU(cores) CPU% MEMORY(bytes) MEMORY% cn-beijing.10.XX.XX.110 92m 2% 1158Mi 9% cn-beijing.10.XX.XX.111 77m 1% 1162Mi 9% cn-beijing.10.XX.XX.112 2105m 53% 3594Mi 28%
cn-beijing.10.XX.XX.111
node has the lowest loads among all nodes. Thecn-beijing.10.XX.XX.112
node has the highest loads among all nodes. This indicates the loads among nodes are imbalanced. - Create a file named nginx-with-loadaware.yaml and copy the following content to the file: Example
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-with-loadaware namespace: default labels: app: nginx spec: replicas: 6 selector: matchLabels: app: nginx template: metadata: name: nginx labels: app: nginx spec: containers: - name: nginx image: nginx resources: limits: cpu: 500m requests: cpu: 500m
- Run the following command to create a pod:
kubectl create -f nginx-with-loadaware.yaml deployment/nginx-with-loadawre created
- Run the following command to query information about the pod:
Expected output:kubectl get pods | grep nginx
The preceding output indicates that after load-aware pod scheduling is enabled for the cluster, the cluster can monitor the loads on nodes and use a scheduling policy to schedule pods to nodes other than thenginx-with-loadaware-5646666d56-2**** 1/1 Running 0 18s 10.XX.XX.118 cn-beijing.10.XX.XX.110 <none> <none> nginx-with-loadaware-5646666d56-7**** 1/1 Running 0 18s 10.XX.XX.115 cn-beijing.10.XX.XX.110 <none> <none> nginx-with-loadaware-5646666d56-k**** 1/1 Running 0 18s 10.XX.XX.119 cn-beijing.10.XX.XX.110 <none> <none> nginx-with-loadaware-5646666d56-q**** 1/1 Running 0 18s 10.XX.XX.113 cn-beijing.10.XX.XX.111 <none> <none> nginx-with-loadaware-5646666d56-s**** 1/1 Running 0 18s 10.XX.XX.120 cn-beijing.10.XX.XX.111 <none> <none> nginx-with-loadaware-5646666d56-z**** 1/1 Running 0 18s 10.XX.XX.116 cn-beijing.10.XX.XX.111 <none> <none>
cn-beijing.10.XX.XX.112
node.
What to do next
Modify load-aware scheduling configurations
- Log on to the ACK console and click Clusters in the left-side navigation pane.
- On the Clusters page, click the name of a cluster and choose in the left-side navigation pane.
- On the Add-ons page, find Kube Scheduler and click Configuration in the Kube Scheduler card.
- In the Kube Scheduler Parameters dialog box, modify the parameters for load-aware scheduling and click OK. In the left-side navigation pane of the cluster details page, click Cluster Information. On the Basic Information tab, if the cluster changes to the Running state, the load-aware scheduling configurations are modified.
Disable load-aware scheduling
In the Kube Scheduler Parameters dialog box, clear Enable Load-aware Scheduling and click OK
In the left-side navigation pane of the cluster details page, click Cluster Information. On the Basic Information tab, if the cluster changes to the Running state, load-aware scheduling is disabled.
FAQ
Is the load-aware scheduling feature that is enabled based on an earlier version of the scheduler protocol supported after I update the scheduler version?
To use the load-aware scheduling feature of an earlier version of the scheduler protocol, add the alibabacloud.com/loadAwareScheduleEnabled: "true"
annotation to the pod configurations.
The following table describes the compatibility between different protocol versions and component versions.
Kube Scheduler version | ack-koordinator (FKA ack-slo-manager) version | Pod annotation protocol | Whether it can be enabled/disabled in the console |
---|---|---|---|
≥ 1.24.6-ack-4.0 | ≥ 1.1.1-ack.1 | Yes | Yes |
≥ 1.24.6-ack-3.1 and < 1.24.6-ack-4.0 | ≥ 0.8.0 | Yes | No |
Kube Scheduler version | ack-koordinator (FKA ack-slo-manager) version | Pod annotation protocol | Whether it can be enabled/disabled in the console |
---|---|---|---|
≥ 1.22.15-ack-4.0 | ≥ 1.1.1-ack.1 | Yes | Yes |
≥ 1.22.15-ack-2.0 and < 1.22.15-ack-4.0 | ≥ 0.8.0 | Yes | No |
| ≥ 0.3.0 and < 0.8.0 | Yes | No |