The scheduler in Container Service for Kubernetes (ACK) Pro clusters supports the load-aware pod scheduling feature. This feature can monitor the loads on nodes and schedule pods to nodes with lower loads to implement load balancing. This prevents application or node crashes caused by an overloaded node. This topic describes how to use load-aware pod scheduling.

Table of contents

Prerequisites

Limits

Only ACK Pro clusters support load-aware pod scheduling. For more information about how to create an ACK Pro cluster, see Create an ACK Pro cluster.

Introduction to load-aware pod scheduling

The load-aware pod scheduling feature of the Kube Scheduler component provided by ACK is designed based on the Kubernetes scheduling framework. The Kubernetes scheduler schedules pods to nodes based on resource allocation. Kube Scheduler schedules pods to nodes based on the loads on nodes. After load-aware pod scheduling is enabled, the system reviews the historical statistics of loads on nodes. Then, the system schedules pods to nodes with lower loads to implement load balancing. This prevents application or node crashes caused by an overloaded node.

The following figure compares the Kubernetes scheduler and Kube Scheduler. Requested indicates the resources that are requested by pods on the node and Usage indicates the resources that are in use by pods on the node. Only resources in use are accounted for when the system calculates the loads on the node. In the scenario in the figure, Kube Scheduler schedules new pods to Node B because Node B has lower loads.

1

As time, the cluster environment, traffic, or requests to workloads on nodes change, the load distribution among nodes may become imbalanced. To prevent this issue, ack-koordinator provides the load-aware hotspot descheduling feature. You can use load-aware scheduling and hotspot descheduling in combination to achieve optimal load balancing among nodes. For more information about load-aware hotspot descheduling, see Work with load-aware hotspot descheduling.

How load-aware scheduling is implemented

Load-aware scheduling is implemented by using Kube Scheduler and ack-koordinator. ack-koordinator is responsible for collecting and reporting metrics about node resource utilization. Kube Scheduler is responsible for calculating the scores of nodes based on resource utilization and sorting nodes based on node scores. Kube Scheduler preferentially schedules new pods to nodes with lower loads. For more information about the architecture of ack-koordinator, see ack-koordinator architecture.

Scheduling policies

PolicyDescription
Node sortingThe load-aware scheduling plug-in calculates the node score based on CPU utilization and memory utilization. The scheduler uses weighted scoring and schedules pods to nodes with higher scores. You can customize the CPU weight and memory weight. For more information, see Kube Scheduler parameters.

The node score is calculated based on the following formula: [(1 - CPU utilization) × CPU weight + (1 - Memory utilization × Memory weight)/(CPU weight + Memory weight)]. CPU utilization and memory utilization are measured in percentages.

Resource utilization calculationYou can configure how the average resource utilization is calculated and the percentage of data that is calculated. By default, the average resource utilization within the last 5 minutes is calculated. For more information, see Kube Scheduler parameters.

Step 1: Enable load-aware scheduling

Important You must first install ack-koordinator 1.1.1-ack.1 or later. Otherwise, load-aware scheduling cannot be enabled.
  1. Log on to the ACK console and click Clusters in the left-side navigation pane.
  2. On the Clusters page, click the name of a cluster and choose Operations > Add-ons in the left-side navigation pane.
  3. On the Add-ons page, find Kube Scheduler and click Configuration in the Kube Scheduler card.
  4. In the Kube Scheduler Parameters dialog box, select Enable Load-aware Scheduling, set the parameters in the following table, and then click OK.
    For more information about other parameters, see Customize the scheduler parameters.
    ParameterData typeDescriptionValueExample
    loadAwareResourceWeightThe value of this parameter consists of the resourceName and resourceWeight fields. The weight of the resources.
    • Valid values of resourceName: cpu and memory
    • Valid values of resourceWeight: 1 to 100

    Default value: cpu=1, memory=1

    • resourceName: cpu
    • resourceWeight: 1
    loadAwareAggregatedUsageAggragationTypeThe value is an enumerated value. The type of data aggregation for the statistics. Valid values:
    • avg: calculates the average value.
    • p50: calculates 50% of the statistics.
    • p90, p95, and p99: calculates 90%, 95%, and 99% of the statistics.
    • avg
    • p50
    • p90
    • p95
    • p99

    Default value: avg.

    p90
    In the left-side navigation pane of the cluster details page, click Cluster Information. On the Basic Information tab, if the cluster changes to the Running state, load-aware scheduling is enabled.

Step 2: Verify load-aware scheduling

In the following example, a cluster that contains three nodes is used. Each node has 4 vCores and 16 GiB of memory.

  1. Create a stress-demo.yaml file and copy the following content to the file:
    Example:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: stress-demo
      namespace: default
      labels:
        app: stress-demo
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: stress-demo
      template:
        metadata:
          name: stress-demo
          labels:
            app: stress-demo
        spec:
          containers:
            - args:
                - '--vm'
                - '2'
                - '--vm-bytes'
                - '1600M'
                - '-c'
                - '2'
                - '--vm-hang'
                - '2'
              command:
                - stress
              image: polinux/stress
              imagePullPolicy: Always
              name: stress
              resources:
                limits:
                  cpu: '2'
                  memory: 4Gi
                requests:
                  cpu: '2'
                  memory: 4Gi
          restartPolicy: Always
  2. Run the following command to create a pod. After you create the pod, increase the loads on a node.
    kubectl create -f stress-demo.yaml
    deployment.apps/stress-demo created
  3. Run the following command to check whether the pod is in the Running state:
    kubectl get pod -o wide
    Expected output:
    NAME                           READY   STATUS    RESTARTS   AGE   IP           NODE                    NOMINATED NODE   READINESS GATES
    stress-demo-7fdd89cc6b-g****   1/1     Running   0          82s   10.XX.XX.112   cn-beijing.10.XX.XX.112   <none>           <none>
    The stress-demo-7fdd89cc6b-g**** pod is scheduled to the cn-beijing.10.XX.XX.112 node.
    Wait 3 minutes. Make sure that the pod is initialized and the loads on the node is increased.
  4. Run the following command to query the loads on each node:
    kubectl top node
    Expected output:
    NAME                    CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
    cn-beijing.10.XX.XX.110   92m          2%     1158Mi          9%
    cn-beijing.10.XX.XX.111   77m          1%     1162Mi          9%
    cn-beijing.10.XX.XX.112   2105m        53%    3594Mi          28%
    The cn-beijing.10.XX.XX.111 node has the lowest loads among all nodes. The cn-beijing.10.XX.XX.112 node has the highest loads among all nodes. This indicates the loads among nodes are imbalanced.
  5. Create a file named nginx-with-loadaware.yaml and copy the following content to the file:
    Example
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-with-loadaware
      namespace: default
      labels:
        app: nginx
    spec:
      replicas: 6
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          name: nginx
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx
            resources:
              limits:
                cpu: 500m
              requests:
                cpu: 500m
  6. Run the following command to create a pod:
    kubectl create -f nginx-with-loadaware.yaml
    deployment/nginx-with-loadawre created
  7. Run the following command to query information about the pod:
    kubectl get pods | grep nginx
    Expected output:
    nginx-with-loadaware-5646666d56-2****   1/1     Running   0          18s   10.XX.XX.118   cn-beijing.10.XX.XX.110   <none>           <none>
    nginx-with-loadaware-5646666d56-7****   1/1     Running   0          18s   10.XX.XX.115   cn-beijing.10.XX.XX.110   <none>           <none>
    nginx-with-loadaware-5646666d56-k****   1/1     Running   0          18s   10.XX.XX.119   cn-beijing.10.XX.XX.110   <none>           <none>
    nginx-with-loadaware-5646666d56-q****   1/1     Running   0          18s   10.XX.XX.113   cn-beijing.10.XX.XX.111   <none>           <none>
    nginx-with-loadaware-5646666d56-s****   1/1     Running   0          18s   10.XX.XX.120   cn-beijing.10.XX.XX.111   <none>           <none>
    nginx-with-loadaware-5646666d56-z****   1/1     Running   0          18s   10.XX.XX.116   cn-beijing.10.XX.XX.111   <none>           <none>
    The preceding output indicates that after load-aware pod scheduling is enabled for the cluster, the cluster can monitor the loads on nodes and use a scheduling policy to schedule pods to nodes other than the cn-beijing.10.XX.XX.112 node.

What to do next

Modify load-aware scheduling configurations

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.
  2. On the Clusters page, click the name of a cluster and choose Operations > Add-ons in the left-side navigation pane.
  3. On the Add-ons page, find Kube Scheduler and click Configuration in the Kube Scheduler card.
  4. In the Kube Scheduler Parameters dialog box, modify the parameters for load-aware scheduling and click OK.
    In the left-side navigation pane of the cluster details page, click Cluster Information. On the Basic Information tab, if the cluster changes to the Running state, the load-aware scheduling configurations are modified.

Disable load-aware scheduling

In the Kube Scheduler Parameters dialog box, clear Enable Load-aware Scheduling and click OK

In the left-side navigation pane of the cluster details page, click Cluster Information. On the Basic Information tab, if the cluster changes to the Running state, load-aware scheduling is disabled.

FAQ

Is the load-aware scheduling feature that is enabled based on an earlier version of the scheduler protocol supported after I update the scheduler version?

To use the load-aware scheduling feature of an earlier version of the scheduler protocol, add the alibabacloud.com/loadAwareScheduleEnabled: "true" annotation to the pod configurations.

Kube Scheduler is compatible with the earlier versions of the Kube Scheduler protocol. You can seamlessly update Kube Scheduler to later versions. After you update Kube Scheduler, we recommend that you perform Step 1: Enable load-aware scheduling to enable load balancing for the cluster. This saves you the need to modify pod configurations to balance the loads among the nodes in the cluster.
Important In Kubernetes 1.22, Kube Scheduler is compatible with the earlier versions of Kube Scheduler protocol. However, in Kubernetes 1.24, Kube Scheduler is compatible with the earlier versions of Kube Scheduler protocol until August 30, 2023. We recommend that you update the Kubernetes version of your cluster and use the latest method to configure load-aware scheduling. For more information about how to update the Kubernetes version of a cluster, see Update the Kubernetes version of an ACK cluster.

The following table describes the compatibility between different protocol versions and component versions.

Kubernetes 1.24
Kube Scheduler versionack-koordinator (FKA ack-slo-manager) versionPod annotation protocolWhether it can be enabled/disabled in the console
≥ 1.24.6-ack-4.0≥ 1.1.1-ack.1YesYes
≥ 1.24.6-ack-3.1 and < 1.24.6-ack-4.0≥ 0.8.0YesNo
Kubernetes 1.22 and earlier
Kube Scheduler versionack-koordinator (FKA ack-slo-manager) versionPod annotation protocolWhether it can be enabled/disabled in the console
≥ 1.22.15-ack-4.0≥ 1.1.1-ack.1YesYes
≥ 1.22.15-ack-2.0 and < 1.22.15-ack-4.0≥ 0.8.0YesNo
  • ≥ 1.20.4-ack-4.0 ≤ 1.20.4-ack-8.0
  • v1.18-ack-4.0
≥ 0.3.0 and < 0.8.0YesNo