All Products
Search
Document Center

Container Service for Kubernetes:Work with capacity scheduling

Last Updated:Mar 26, 2026

Capacity scheduling lets you define hierarchical resource quotas with guaranteed minimums and flexible maximums, so idle resources can be shared across teams while each team still gets its guaranteed allocation when needed.

The native Kubernetes ResourceQuota enforces fixed resource caps, which often leaves resources idle when some teams use less than their quota. ACK implements capacity scheduling through the scheduling framework extension mechanism, replacing this static model with elastic quota groups: resources are shared when idle and reclaimed automatically when the original owner needs them. This improves overall cluster utilization without compromising resource guarantees.

Prerequisites

Before you begin, ensure that you have:

Key concepts

ElasticQuotaTree is a CustomResourceDefinition (CRD) that defines a hierarchy of elastic quota groups. Each node in the tree represents a quota boundary. Leaf nodes map to one or more namespaces. Pods in those namespaces are scheduled within the quota limits defined at their leaf node.

The two core fields in each quota node are:

FieldMeaning
minGuaranteed resources. The scheduler always ensures this amount is available to the quota node, reclaiming borrowed resources from other nodes if necessary.
maxMaximum resources the quota node can use, including idle resources borrowed from other nodes.

Resource borrowing and reclaiming work as follows:

  • A pod can be scheduled if the resources it requests, when added to the quota node's current usage, stay within max.

  • If the quota node's current usage exceeds min, the excess is borrowed from idle capacity elsewhere in the tree.

  • When another quota node needs its min resources back, the scheduler selects pods from the borrowing node to evict. The scheduler comprehensively considers factors such as job priority, availability, and creation time when choosing which pods to evict.

Features

  • Hierarchical quotas: Configure elastic quotas at multiple levels (for example, matching your organization structure). Each leaf node can map to multiple namespaces, but each namespace belongs to only one leaf node.37

  • Resource borrowing and reclaiming: Idle min resources can be borrowed by other quota nodes. Borrowed resources are reclaimed automatically when the original owner needs them.39

  • Extended resource support: In addition to CPU and memory, capacity scheduling supports GPU (nvidia.com/gpu) and any other Kubernetes-supported resource types.

  • Node affinity via ResourceFlavor: Attach a ResourceFlavor to a quota node to restrict pods in that quota to specific nodes. See ResourceFlavor configuration example.

Configure capacity scheduling

This example uses a cluster with ecs.sn2.13xlarge nodes (56 vCPUs, 224 GiB of memory each).

Step 1: Create namespaces

kubectl create ns namespace1
kubectl create ns namespace2
kubectl create ns namespace3
kubectl create ns namespace4

Step 2: Create an ElasticQuotaTree

Create the ElasticQuotaTree in the kube-system namespace. The following example defines a two-level hierarchy with four leaf quota nodes.

ElasticQuotaTree only takes effect when created in the kube-system namespace.
apiVersion: scheduling.sigs.k8s.io/v1beta1
kind: ElasticQuotaTree
metadata:
  name: elasticquotatree
  namespace: kube-system
spec:
  root:
    name: root
    max:
      cpu: 40
      memory: 40Gi
      nvidia.com/gpu: 4
    min:
      cpu: 40
      memory: 40Gi
      nvidia.com/gpu: 4
    children:
      - name: root.a
        max:
          cpu: 40
          memory: 40Gi
          nvidia.com/gpu: 4
        min:
          cpu: 20
          memory: 20Gi
          nvidia.com/gpu: 2
        children:
          - name: root.a.1
            namespaces:
              - namespace1
            max:
              cpu: 20
              memory: 20Gi
              nvidia.com/gpu: 2
            min:
              cpu: 10
              memory: 10Gi
              nvidia.com/gpu: 1
          - name: root.a.2
            namespaces:
              - namespace2
            max:
              cpu: 20
              memory: 40Gi
              nvidia.com/gpu: 2
            min:
              cpu: 10
              memory: 10Gi
              nvidia.com/gpu: 1
      - name: root.b
        max:
          cpu: 40
          memory: 40Gi
          nvidia.com/gpu: 4
        min:
          cpu: 20
          memory: 20Gi
          nvidia.com/gpu: 2
        children:
          - name: root.b.1
            namespaces:
              - namespace3
            max:
              cpu: 20
              memory: 20Gi
              nvidia.com/gpu: 2
            min:
              cpu: 10
              memory: 10Gi
              nvidia.com/gpu: 1
          - name: root.b.2
            namespaces:
              - namespace4
            max:
              cpu: 20
              memory: 20Gi
              nvidia.com/gpu: 2
            min:
              cpu: 10
              memory: 10Gi
              nvidia.com/gpu: 1
Important

The ElasticQuotaTree must satisfy all of the following constraints:

  • Within each quota node: minmax

  • For each parent node: sum of children's min values ≤ parent's min value

  • For the root node: min = max ≤ total cluster resources

  • Each namespace belongs to exactly one leaf node; a leaf node can contain multiple namespaces

Step 3: Verify the ElasticQuotaTree

kubectl get ElasticQuotaTree -n kube-system

Expected output:

NAME               AGE
elasticquotatree   68s

Observe resource borrowing and reclaiming

The following scenarios walk through how the scheduler handles borrowing and reclaiming as workloads are deployed across the four namespaces.

Borrow idle resources

  1. Deploy a workload in namespace1. This Deployment requests 5 replicas, each using 5 vCPUs (25 vCPUs total).

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx1
      namespace: namespace1
      labels:
        app: nginx1
    spec:
      replicas: 5
      selector:
        matchLabels:
          app: nginx1
      template:
        metadata:
          name: nginx1
          labels:
            app: nginx1
        spec:
          containers:
          - name: nginx1
            image: nginx
            resources:
              limits:
                cpu: 5
              requests:
                cpu: 5
  2. Check pod status in namespace1.

    kubectl get pods -n namespace1

    Expected output:

    NAME                     READY   STATUS    RESTARTS   AGE
    nginx1-744b889544-52dbg   1/1     Running   0          70s
    nginx1-744b889544-6l4s9   1/1     Running   0          70s
    nginx1-744b889544-cgzlr   1/1     Running   0          70s
    nginx1-744b889544-w2gr7   1/1     Running   0          70s
    nginx1-744b889544-zr5xz   0/1     Pending   0          70s

    root.a.1 (namespace1) has min=10 CPU and max=20 CPU. The 5 pods request 25 vCPUs total, which exceeds max=20. The first 4 pods (20 vCPUs) are scheduled — 10 vCPUs from the guaranteed min and 10 vCPUs borrowed from idle capacity in the cluster. The 5th pod stays Pending because the total request exceeds max.

  3. Deploy a workload in namespace2. This Deployment also requests 5 replicas, each using 5 vCPUs.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx2
      namespace: namespace2
      labels:
        app: nginx2
    spec:
      replicas: 5
      selector:
        matchLabels:
          app: nginx2
      template:
        metadata:
          name: nginx2
          labels:
            app: nginx2
        spec:
          containers:
          - name: nginx2
            image: nginx
            resources:
              limits:
                cpu: 5
              requests:
                cpu: 5
  4. Check pod status in both namespaces.

    kubectl get pods -n namespace1

    Expected output:

    NAME                     READY   STATUS    RESTARTS   AGE
    nginx1-744b889544-52dbg   1/1     Running   0          111s
    nginx1-744b889544-6l4s9   1/1     Running   0          111s
    nginx1-744b889544-cgzlr   1/1     Running   0          111s
    nginx1-744b889544-w2gr7   1/1     Running   0          111s
    nginx1-744b889544-zr5xz   0/1     Pending   0          111s
    kubectl get pods -n namespace2

    Expected output:

    NAME                     READY   STATUS    RESTARTS   AGE
    nginx2-556f95449f-4gl8s   1/1     Running   0          111s
    nginx2-556f95449f-crwk4   1/1     Running   0          111s
    nginx2-556f95449f-gg6q2   0/1     Pending   0          111s
    nginx2-556f95449f-pnz5k   1/1     Running   0          111s
    nginx2-556f95449f-vjpmq   1/1     Running   0          111s

    The same borrowing logic applies to namespace2. root.a.2 has min=10 and max=20, so 4 pods run and 1 stays Pending. At this point, namespace1 and namespace2 together consume all 40 vCPUs allocated to root (root.max.cpu=40).

Return borrowed resources

  1. Deploy a workload in namespace3. This Deployment requests 5 replicas, each using 5 vCPUs.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx3
      namespace: namespace3
      labels:
        app: nginx3
    spec:
      replicas: 5
      selector:
        matchLabels:
          app: nginx3
      template:
        metadata:
          name: nginx3
          labels:
            app: nginx3
        spec:
          containers:
          - name: nginx3
            image: nginx
            resources:
              limits:
                cpu: 5
              requests:
                cpu: 5
  2. Check pod status across all three namespaces.

    kubectl get pods -n namespace1

    Expected output:

    NAME                      READY   STATUS    RESTARTS   AGE
    nginx1-744b889544-52dbg   1/1     Running   0          6m17s
    nginx1-744b889544-cgzlr   1/1     Running   0          6m17s
    nginx1-744b889544-nknns   0/1     Pending   0          3m45s
    nginx1-744b889544-w2gr7   1/1     Running   0          6m17s
    nginx1-744b889544-zr5xz   0/1     Pending   0          6m17s
    kubectl get pods -n namespace2

    Expected output:

    NAME                      READY   STATUS    RESTARTS   AGE
    nginx2-556f95449f-crwk4   1/1     Running   0          4m22s
    nginx2-556f95449f-ft42z   1/1     Running   0          4m22s
    nginx2-556f95449f-gg6q2   0/1     Pending   0          4m22s
    nginx2-556f95449f-hfr2g   1/1     Running   0          3m29s
    nginx2-556f95449f-pvgrl   0/1     Pending   0          3m29s
    kubectl get pods -n namespace3

    Expected output:

    NAME                     READY   STATUS    RESTARTS   AGE
    nginx3-578877666-msd7f   1/1     Running   0          4m
    nginx3-578877666-nfdwv   0/1     Pending   0          4m10s
    nginx3-578877666-psszr   0/1     Pending   0          4m11s
    nginx3-578877666-xfsss   1/1     Running   0          4m22s
    nginx3-578877666-xpl2p   0/1     Pending   0          4m10s

    root.b.1 (namespace3) has a guaranteed min=10 CPU. To provide this guarantee, the scheduler reclaims 10 vCPUs that root.a had borrowed from root.b. It comprehensively considers factors such as the priority, availability, and creation time of jobs under root.a when selecting which pods to evict to free the 10 vCPUs. As a result, nginx3 gets its 10-vCPU minimum: 2 pods run, and 3 stay Pending.

  3. Deploy a workload in namespace4. This Deployment requests 5 replicas, each using 5 vCPUs.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx4
      namespace: namespace4
      labels:
        app: nginx4
    spec:
      replicas: 5
      selector:
        matchLabels:
          app: nginx4
      template:
        metadata:
          name: nginx4
          labels:
            app: nginx4
        spec:
          containers:
          - name: nginx4
            image: nginx
            resources:
              limits:
                cpu: 5
              requests:
                cpu: 5
  4. Check pod status across all four namespaces.

    kubectl get pods -n namespace1

    Expected output:

    NAME                      READY   STATUS    RESTARTS   AGE
    nginx1-744b889544-cgzlr   1/1     Running   0          8m20s
    nginx1-744b889544-cwx8l   0/1     Pending   0          55s
    nginx1-744b889544-gjkx2   0/1     Pending   0          55s
    nginx1-744b889544-nknns   0/1     Pending   0          5m48s
    nginx1-744b889544-zr5xz   1/1     Running   0          8m20s
    kubectl get pods -n namespace2

    Expected output:

    NAME                      READY   STATUS    RESTARTS   AGE
    nginx2-556f95449f-cglpv   0/1     Pending   0          3m45s
    nginx2-556f95449f-crwk4   1/1     Running   0          9m31s
    nginx2-556f95449f-gg6q2   1/1     Running   0          9m31s
    nginx2-556f95449f-pvgrl   0/1     Pending   0          8m38s
    nginx2-556f95449f-zv8wn   0/1     Pending   0          3m45s
    kubectl get pods -n namespace3

    Expected output:

    NAME                     READY   STATUS    RESTARTS   AGE
    nginx3-578877666-msd7f   1/1     Running   0          8m46s
    nginx3-578877666-nfdwv   0/1     Pending   0          8m56s
    nginx3-578877666-psszr   0/1     Pending   0          8m57s
    nginx3-578877666-xfsss   1/1     Running   0          9m8s
    nginx3-578877666-xpl2p   0/1     Pending   0          8m56s
    kubectl get pods -n namespace4

    Expected output:

    NAME                      READY   STATUS    RESTARTS   AGE
    nginx4-754b767f45-g9954   1/1     Running   0          4m32s
    nginx4-754b767f45-j4v7v   0/1     Pending   0          4m32s
    nginx4-754b767f45-jk2t7   0/1     Pending   0          4m32s
    nginx4-754b767f45-nhzpf   0/1     Pending   0          4m32s
    nginx4-754b767f45-tv5jj   1/1     Running   0          4m32s

    The same reclaim logic applies for root.b.2 (namespace4): the scheduler reclaims 10 vCPUs borrowed by root.a, and nginx4 gets its 10-vCPU minimum — 2 pods run, 3 stay Pending. At this point, all four quota nodes are running on their guaranteed min resources, with no idle capacity remaining in the cluster.

ResourceFlavor configuration example

ResourceFlavor is a Kubernetes CRD from the Kueue project. It binds an elastic quota node to specific nodes by matching node labels, so pods in that quota are only scheduled to the matching nodes.

Prerequisites

Before you begin, ensure that you have:

Only the nodeLabels field is effective in the ResourceFlavor resource.

Create a ResourceFlavor

The following example creates a ResourceFlavor named spot that targets nodes with the label instance-type: spot.

apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: "spot"
spec:
  nodeLabels:
    instance-type: spot

Associate a ResourceFlavor with an elastic quota

To bind a ResourceFlavor to a quota node, declare it in the ElasticQuotaTree using the attributes.resourceflavors field.

apiVersion: scheduling.sigs.k8s.io/v1beta1
kind: ElasticQuotaTree
metadata:
  name: elasticquotatree
  namespace: kube-system
spec:
  root:
    name: root
    max:
      cpu: 999900
      memory: 400000Gi
      nvidia.com/gpu: 100000
    min:
      cpu: 999900
      memory: 400000Gi
      nvidia.com/gpu: 100000
    children:
    - name: child
      namespaces:
      - default
      attributes:
        resourceflavors: spot
      max:
        cpu: 99
        memory: 40Gi
        nvidia.com/gpu: 10
      min:
        cpu: 99
        memory: 40Gi
        nvidia.com/gpu: 10

After applying this configuration, pods in the child quota node (namespace default) are only scheduled to nodes with the instance-type: spot label.

What's next

  • For kube-scheduler release notes, see kube-scheduler.

  • kube-scheduler also supports gang scheduling, which requires all pods in an associated group to be scheduled at the same time — if any pod cannot be scheduled, none are. This is suited for big data workloads such as Apache Spark and Apache Hadoop. See Work with gang scheduling.