All Products
Search
Document Center

Container Service for Kubernetes:Work with capacity scheduling

Last Updated:Apr 25, 2025

The static resource allocation mechanism of the native Kubernetes ResourceQuota can lead to low resource utilization in clusters. To resolve this issue, Container Service for Kubernetes (ACK) provides the capacity scheduling feature based on the scheduling framework extension mechanism. This feature uses elastic quota groups to support resource sharing while ensuring resource quotas for users, which effectively improves cluster resource utilization.

Prerequisites

An ACK managed Pro cluster that runs Kubernetes 1.20 or later is created. To upgrade the cluster, see Create an ACK managed cluster.

Key features

In a multi-user cluster environment, administrators allocate fixed resources to ensure sufficient resources for different users. The traditional mode uses the native Kubernetes ResourceQuota for static resource allocation. However, due to differences in time and patterns of resource usage among users, some users may experience resource constraints while others have idle quotas. This results in lower overall resource utilization.

To resolve this issue, ACK supports the capacity scheduling feature on the scheduling side based on the scheduling framework extension mechanism. This feature improves overall resource utilization by using resource sharing and ensures resource allocation for users. The specific features of capacity scheduling are as follows.

  • Support for defining resource quotas at different levels: Configure multiple levels of elastic quotas according to business needs, such as company organization charts. The leaf nodes of an elastic quota group can correspond to multiple namespaces, but each namespace can only belong to one leaf node.37

  • Support for resource borrowing and reclaiming between different elastic quotas.

    • Min: Defines the guaranteed resources that can be used. If the resources of a cluster become insufficient, the total amount of minimum resources for all users must be lower than the total amount of resources of the cluster.

    • Max: Defines the maximum amount of resources that can be used.39

      Workloads can borrow idle resource quotas from other users, but the total amount of resources that can be used after borrowing still does not exceed the Max value. Unused Min resource quotas can be borrowed, but can be reclaimed when the original user needs to use them.

  • Support for configuring various resources: In addition to CPU and memory resources, it also supports configuring extended resources such as GPU and any other resources supported by Kubernetes.

  • Support for attaching quotas to nodes: Use ResourceFlavor to select nodes and associate ResourceFlavor with a quota in ElasticQuotaTree. After the association, pods in the elastic quota can only be scheduled to nodes selected by ResourceFlavor.

Configuration example of capacity scheduling

In this example cluster, the node is an ecs.sn2.13xlarge machine with 56 vCPUs and 224 GiB of memory.

  1. Create the following namespaces.

    kubectl create ns namespace1
    kubectl create ns namespace2
    kubectl create ns namespace3
    kubectl create ns namespace4
  2. Create the corresponding elastic quota group according to the following YAML file:

    View the YAML file

    apiVersion: scheduling.sigs.k8s.io/v1beta1
    kind: ElasticQuotaTree
    metadata:
      name: elasticquotatree
      namespace: kube-system # The elastic quota group takes effect only if it is created in the kube-system namespace.
    spec:
      root:
        name: root # Configure the resource quota of the root. The maximum amount of resources for the root must equal the minimum amount of resources for the root.
        max:
          cpu: 40
          memory: 40Gi
          nvidia.com/gpu: 4
        min:
          cpu: 40
          memory: 40Gi
          nvidia.com/gpu: 4
        children: # Configure resource quotas for level-2 leaves of the root.
          - name: root.a
            max:
              cpu: 40
              memory: 40Gi
              nvidia.com/gpu: 4
            min:
              cpu: 20
              memory: 20Gi
              nvidia.com/gpu: 2
            children: # Configure resource quotas for level-3 leaves of the root.
              - name: root.a.1
                namespaces: # Configure the namespace.
                  - namespace1
                max:
                  cpu: 20
                  memory: 20Gi
                  nvidia.com/gpu: 2
                min:
                  cpu: 10
                  memory: 10Gi
                  nvidia.com/gpu: 1
              - name: root.a.2
                namespaces: # Configure the namespace.
                  - namespace2
                max:
                  cpu: 20
                  memory: 40Gi
                  nvidia.com/gpu: 2
                min:
                  cpu: 10
                  memory: 10Gi
                  nvidia.com/gpu: 1
          - name: root.b
            max:
              cpu: 40
              memory: 40Gi
              nvidia.com/gpu: 4
            min:
              cpu: 20
              memory: 20Gi
              nvidia.com/gpu: 2
            children: # Configure resource quotas for level-3 leaves of the root.
              - name: root.b.1
                namespaces: # Configure the namespace.
                  - namespace3
                max:
                  cpu: 20
                  memory: 20Gi
                  nvidia.com/gpu: 2
                min:
                  cpu: 10
                  memory: 10Gi
                  nvidia.com/gpu: 1
              - name: root.b.2
                namespaces: # Configure the namespace.
                  - namespace4
                max:
                  cpu: 20
                  memory: 20Gi
                  nvidia.com/gpu: 2
                min:
                  cpu: 10
                  memory: 10Gi
                  nvidia.com/gpu: 1

    According to the above YAML file, configure the corresponding namespace in the namespaces field and configure the corresponding child elastic quota in the children field. The following requirements must be met:

    • In the same elastic quota, Min ≤ Max.

    • The sum of the Min values of the child elastic quotas must be less than or equal to the Min value of the parent quota.

    • The Min of the root node equals Max and is less than or equal to the total resources of the cluster.

    • Each namespace belongs to only one leaf. A leaf can contain multiple namespaces.

  3. Check whether the elastic quota group is created successfully.

    kubectl get ElasticQuotaTree -n kube-system

    Expected output:

    NAME               AGE
    elasticquotatree   68s

Borrow idle resources

  1. Deploy a service in namespace1 according to the following YAML file. The number of replicas for the pod is 5, and each pod requests 5 vCPUs.

    View the YAML file

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx1
      namespace: namespace1
      labels:
        app: nginx1
    spec:
      replicas: 5
      selector:
        matchLabels:
          app: nginx1
      template:
        metadata:
          name: nginx1
          labels:
            app: nginx1
        spec:
          containers:
          - name: nginx1
            image: nginx
            resources:
              limits:
                cpu: 5
              requests:
                cpu: 5
  2. Check the deployment status of pods in the cluster.

    kubectl get pods -n namespace1

    Expected output:

    NAME                     READY   STATUS    RESTARTS   AGE
    nginx1-744b889544-52dbg   1/1     Running   0          70s
    nginx1-744b889544-6l4s9   1/1     Running   0          70s
    nginx1-744b889544-cgzlr   1/1     Running   0          70s
    nginx1-744b889544-w2gr7   1/1     Running   0          70s
    nginx1-744b889544-zr5xz   0/1     Pending   0          70s
    • Since there are idle resources in the current cluster (root.max.cpu=40), when the CPU resources requested by pods in namespace1 exceed 10 (min.cpu=10), which is configured by root.a.1, the pods can continue to borrow other idle resources. The maximum CPU resources they can request is 20 (max.cpu=20), which is configured by root.a.1.

    • When the amount of CPU resources requested by a pod exceeds 20 (max.cpu=20), any additional pods that request resources will be in the Pending state. Therefore, out of the 5 requested pods, 4 are in the Running state and 1 is in the Pending state.

  3. Deploy a service in namespace2 according to the following YAML file. The number of replicas for the pod is 5, and each pod requests 5 vCPUs.

    View the YAML file

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx2
      namespace: namespace2
      labels:
        app: nginx2
    spec:
      replicas: 5
      selector:
        matchLabels:
          app: nginx2
      template:
        metadata:
          name: nginx2
          labels:
            app: nginx2
        spec:
          containers:
          - name: nginx2
            image: nginx
            resources:
              limits:
                cpu: 5
              requests:
                cpu: 5
  4. Check the deployment status of pods in the cluster.

    kubectl get pods -n namespace1

    Expected output:

    NAME                     READY   STATUS    RESTARTS   AGE
    nginx1-744b889544-52dbg   1/1     Running   0          111s
    nginx1-744b889544-6l4s9   1/1     Running   0          111s
    nginx1-744b889544-cgzlr   1/1     Running   0          111s
    nginx1-744b889544-w2gr7   1/1     Running   0          111s
    nginx1-744b889544-zr5xz   0/1     Pending   0          111s
    kubectl get pods -n namespace2

    Expected output:

    NAME                     READY   STATUS    RESTARTS   AGE
    nginx2-556f95449f-4gl8s   1/1     Running   0          111s
    nginx2-556f95449f-crwk4   1/1     Running   0          111s
    nginx2-556f95449f-gg6q2   0/1     Pending   0          111s
    nginx2-556f95449f-pnz5k   1/1     Running   0          111s
    nginx2-556f95449f-vjpmq   1/1     Running   0          111s
    • Similar to nginx1. Since there are idle resources in the current cluster (root.max.cpu=40), when the CPU resources requested by pods in namespace2 exceed 10 (min.cpu=10), which is configured by root.a.2, the pods can continue to borrow other idle resources. The maximum CPU resources they can request is 20 (max.cpu=20), which is configured by root.a.2.

    • When the amount of CPU resources requested by a pod exceeds 20 (max.cpu=20), any additional pods that request resources will be in the Pending state. Therefore, out of the 5 requested pods, 4 are in the Running state and 1 is in the Pending state.

    • At this point, the resources occupied by pods in namespace1 and namespace2 in the cluster have reached 40 (root.max.cpu=40), which is configured by root.

Return borrowed resources

  1. Deploy a service in namespace3 according to the following YAML file. The number of replicas for the pod is 5, and each pod requests 5 vCPUs.

    View the YAML file

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx3
      namespace: namespace3
      labels:
        app: nginx3
    spec:
      replicas: 5
      selector:
        matchLabels:
          app: nginx3
      template:
        metadata:
          name: nginx3
          labels:
            app: nginx3
        spec:
          containers:
          - name: nginx3
            image: nginx
            resources:
              limits:
                cpu: 5
              requests:
                cpu: 5
  2. Run the following command to check the deployment status of pods in the cluster:

    kubectl get pods -n namespace1

    Expected output:

    NAME                      READY   STATUS    RESTARTS   AGE
    nginx1-744b889544-52dbg   1/1     Running   0          6m17s
    nginx1-744b889544-cgzlr   1/1     Running   0          6m17s
    nginx1-744b889544-nknns   0/1     Pending   0          3m45s
    nginx1-744b889544-w2gr7   1/1     Running   0          6m17s
    nginx1-744b889544-zr5xz   0/1     Pending   0          6m17s
    kubectl get pods -n namespace2

    Expected output:

    NAME                      READY   STATUS    RESTARTS   AGE
    nginx2-556f95449f-crwk4   1/1     Running   0          4m22s
    nginx2-556f95449f-ft42z   1/1     Running   0          4m22s
    nginx2-556f95449f-gg6q2   0/1     Pending   0          4m22s
    nginx2-556f95449f-hfr2g   1/1     Running   0          3m29s
    nginx2-556f95449f-pvgrl   0/1     Pending   0          3m29s
    kubectl get pods -n namespace3

    Expected output:

    NAME                     READY   STATUS    RESTARTS   AGE
    nginx3-578877666-msd7f   1/1     Running   0          4m
    nginx3-578877666-nfdwv   0/1     Pending   0          4m10s
    nginx3-578877666-psszr   0/1     Pending   0          4m11s
    nginx3-578877666-xfsss   1/1     Running   0          4m22s
    nginx3-578877666-xpl2p   0/1     Pending   0          4m10s

    The min parameter of the elastic quota root.b.1 for nginx3 is set to 10. To ensure that the configured min resources are available, the scheduler will return the pod resources that were previously borrowed from root.b under root.a. This allows nginx3 to obtain at least 10 (min.cpu=10) CPU cores to ensure the operation.

    The scheduler will comprehensively consider factors such as the priority, availability, and creation time of jobs under root.a, and select corresponding pods to return previously occupied resources (10 vCPUs). Therefore, after nginx3 obtains the 10 (min.cpu=10) CPU cores, 2 pods are in a Running state, and the other 3 remain in a Pending state.

  3. Deploy a service nginx4 in namespace4 according to the following YAML file. The number of replicas for the pod is 5, and each pod requests 5 CPU cores.

    View the YAML file

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx4
      namespace: namespace4
      labels:
        app: nginx4
    spec:
      replicas: 5
      selector:
        matchLabels:
          app: nginx4
      template:
        metadata:
          name: nginx4
          labels:
            app: nginx4
        spec:
          containers:
          - name: nginx4
            image: nginx
            resources:
              limits:
                cpu: 5
              requests:
                cpu: 5
  4. Run the following command to check the deployment status of pods in the cluster:

    kubectl get pods -n namespace1

    Expected output:

    NAME                      READY   STATUS    RESTARTS   AGE
    nginx1-744b889544-cgzlr   1/1     Running   0          8m20s
    nginx1-744b889544-cwx8l   0/1     Pending   0          55s
    nginx1-744b889544-gjkx2   0/1     Pending   0          55s
    nginx1-744b889544-nknns   0/1     Pending   0          5m48s
    nginx1-744b889544-zr5xz   1/1     Running   0          8m20s
    kubectl get pods -n namespace2

    Expected output:

    NAME                      READY   STATUS    RESTARTS   AGE
    nginx2-556f95449f-cglpv   0/1     Pending   0          3m45s
    nginx2-556f95449f-crwk4   1/1     Running   0          9m31s
    nginx2-556f95449f-gg6q2   1/1     Running   0          9m31s
    nginx2-556f95449f-pvgrl   0/1     Pending   0          8m38s
    nginx2-556f95449f-zv8wn   0/1     Pending   0          3m45s
    kubectl get pods -n namespace3

    Expected output:

    NAME                     READY   STATUS    RESTARTS   AGE
    nginx3-578877666-msd7f   1/1     Running   0          8m46s
    nginx3-578877666-nfdwv   0/1     Pending   0          8m56s
    nginx3-578877666-psszr   0/1     Pending   0          8m57s
    nginx3-578877666-xfsss   1/1     Running   0          9m8s
    nginx3-578877666-xpl2p   0/1     Pending   0          8m56s
    kubectl get pods -n namespace4

    Expected output:

    nginx4-754b767f45-g9954   1/1     Running   0          4m32s
    nginx4-754b767f45-j4v7v   0/1     Pending   0          4m32s
    nginx4-754b767f45-jk2t7   0/1     Pending   0          4m32s
    nginx4-754b767f45-nhzpf   0/1     Pending   0          4m32s
    nginx4-754b767f45-tv5jj   1/1     Running   0          4m32s

    The min parameter of the elastic quota root.b.2 for nginx4 is set to 10. To ensure that the configured min resources are available, the scheduler will return the pod resources that were previously borrowed from root.b under root.a. This allows nginx4 to obtain at least 10 (min.cpu=10) CPU cores to ensure the operation.

    The scheduler will comprehensively consider factors such as the priority, availability, and creation time of jobs under root.a, and select corresponding pods to return previously occupied resources (10 vCPUs). Therefore, after nginx4 obtains the 10 (min.cpu=10) CPU cores, 2 pods are in a Running state, and the other 3 remain in a Pending state.

    At this point, all elastic quotas in the cluster are using their guaranteed resources set by min.

ResourceFlavor configuration example

Prerequisites

  • ResourceFlavor is installed by referring to ResourceFlavorCRD (ACK Scheduler does not install it by default).

    Only the nodeLabels field is effective in the ResourceFlavor resource.

  • The scheduler version is higher than 6.9.0. For more information about component release notes, see kube-scheduler. For more information about component upgrade entry, see Components.

ResourceFlavor, as a Kubernetes CustomResourceDefinition (CRD), establishes a binding relationship between elastic quotas and nodes by defining node labels (NodeLabels). When associated with a specific elastic quota, pods under that quota are not only limited by the total amount of quota resources but can also only be scheduled to desired nodes that match the NodeLabels.

ResourceFlavor example

An example of a ResourceFlavor is as follows.

apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: "spot"
spec:
  nodeLabels:
    instance-type: spot

Example of associating an elastic quota

To associate an elastic quota with ResourceFlavor, you must declare it in ElasticQuotaTree by using the attributes field. The following code shows an example.

apiVersion: scheduling.sigs.k8s.io/v1beta1
kind: ElasticQuotaTree
metadata:
  name: elasticquotatree
  namespace: kube-system
spec:
  root:
    children:
    - attributes:
        resourceflavors: spot
      max:
        cpu: 99
        memory: 40Gi
        nvidia.com/gpu: 10
      min:
        cpu: 99
        memory: 40Gi
        nvidia.com/gpu: 10
      name: child
      namespaces:
      - default
    max:
      cpu: 999900
      memory: 400000Gi
      nvidia.com/gpu: 100000
    min:
      cpu: 999900
      memory: 400000Gi
      nvidia.com/gpu: 100000
    name: root

After submission, pods belonging to Quota child will only be scheduled to nodes with the instance-type: spot label.

References

  • For more information about release records of kube-scheduler, see kube-scheduler.

  • kube-scheduler supports gang scheduling, which requires that associated groups of pods must be scheduled successfully at the same time. Otherwise, none of them will be scheduled. kube-scheduler is suitable for big data processing task scenarios, such as Spark and Hadoop. For more information, see Work with gang scheduling.