All Products
Search
Document Center

Container Service for Kubernetes:Enable dynamic resource overcommitment

Last Updated:Mar 26, 2026

Online workloads typically reserve CPU and memory based on peak estimates, but actual usage is often much lower. This leaves a large pool of allocated-but-idle resources that standard BestEffort pods can share — but without scheduling guarantees or fairness controls. Dynamic resource overcommitment solves both problems: the ack-koordinator component monitors node load in real time, calculates reclaimable capacity, and exposes it as Batch extended resources (kubernetes.io/batch-cpu and kubernetes.io/batch-memory) that BestEffort pods can explicitly request.

To get the most out of this feature, read Pod Quality of Service Classes and Assign Memory Resources to Containers and Pods in the Kubernetes documentation.

How it works

ack-koordinator tracks per-node load continuously and publishes reclaimable capacity as extended resources on each node. BestEffort pods declare explicit requests and limits against these Batch resources, so the ACK scheduler can make informed placement decisions and enforce resource limits through the node's cgroup hierarchy.

The following diagram illustrates why standard resource overcommitment falls short:

image

Without dynamic overcommitment, the scheduler has no visibility into real node load, so it may place BestEffort pods on already-overloaded nodes. There is also no way to express different resource amounts per pod, so resources cannot be distributed fairly among BestEffort pods.

ack-koordinator introduces three terms to describe reclaimed resource capacity:

Term Description
Reclaimed Resources that can be dynamically overcommitted at this moment
Buffered Reserved resources held back from reclamation
Usage Actual resource consumption
image

QoS classes and Batch resources

Kubernetes assigns each pod a quality of service (QoS) class based on its resource configuration. Batch resources are designed specifically for the BestEffort class:

QoS class Resource configuration Use case
Guaranteed requests == limits for all containers Latency-sensitive production services
Burstable requests < limits for at least one container General online workloads
BestEffort No requests or limits — use Batch resources instead Batch jobs and offline tasks

To use dynamic resource overcommitment, set koordinator.sh/qosClass: "BE" on the pod and replace standard resource fields with kubernetes.io/batch-cpu and kubernetes.io/batch-memory.

Billing

No fee is charged to install or use the ack-koordinator component. Note the following:

  • ack-koordinator is a non-managed component. After installation, it occupies worker node resources. Specify per-module resource requests at install time.

  • ack-koordinator can expose Prometheus metrics for features such as resource profiling and fine-grained scheduling. If you enable Prometheus metrics for ack-koordinator and use Managed Service for Prometheus, those metrics count as custom metrics and are billed accordingly. Before enabling, review the Billing topic for Managed Service for Prometheus and read Query the amount of observable data and bills to understand how costs are calculated.

Prerequisites

Before you begin, ensure that you have:

Enable dynamic resource overcommitment

Enable and configure the feature by creating or updating a ConfigMap in the kube-system namespace.

Step 1: Create the ConfigMap

Create a file named configmap.yaml with the following content:

apiVersion: v1
kind: ConfigMap
metadata:
  name: ack-slo-config
  namespace: kube-system
data:
  # colocation-config controls dynamic Batch resource calculation and updates.
  # Related features: Dynamic resource overcommitment, load-aware scheduling.
  colocation-config: |
    {
      "enable": true,                        # Required: enables Batch resource updates. Setting to false resets reclaimed resources to 0.
      "metricAggregateDurationSeconds": 60,  # How often (seconds) node metrics are aggregated. Use the default value.
      "cpuReclaimThresholdPercent": 60,      # Reclaim threshold for batch-cpu, as a % of allocatable CPU. Default: 65.
      "memoryReclaimThresholdPercent": 70,   # Reclaim threshold for batch-memory, as a % of allocatable memory. Default: 65.
      "memoryCalculatePolicy": "usage"       # How batch-memory capacity is calculated: "usage" (default) or "request".
    }
The cpuReclaimThresholdPercent and memoryReclaimThresholdPercent values in this example (60 and 70) are sample values. The actual defaults are 65 for both parameters.

The following table describes each parameter in detail:

Parameter Type Default Description
enable Boolean false Enables dynamic Batch resource updates. Setting this to false resets reclaimable resources to 0.
metricAggregateDurationSeconds Int 60 How often (in seconds) the system aggregates node metrics to recalculate Batch resource capacity. Use the default value.
cpuReclaimThresholdPercent Int 65 Reclaim threshold for batch-cpu resources, as a percentage of allocatable CPU. See Calculate Batch resource capacity.
memoryReclaimThresholdPercent Int 65 Reclaim threshold for batch-memory resources, as a percentage of allocatable memory. See Calculate Batch resource capacity.
memoryCalculatePolicy String "usage" How batch-memory capacity is calculated. "usage": includes unallocated resources and allocated-but-idle resources (based on actual usage of Guaranteed and Burstable pods). "request": includes only unallocated resources (based on memory requests of Guaranteed and Burstable pods).

Calculate Batch resource capacity

ack-koordinator applies the following formula to calculate the amount of Batch resources available on each node.

Usage-based calculation (default, memoryCalculatePolicy: "usage"):

nodeBatchAllocatable = nodeAllocatable × thresholdPercent − podUsage(non-BE) − systemUsage

Request-based calculation (memoryCalculatePolicy: "request", applies to batch-memory only):

nodeBatchAllocatable = nodeAllocatable × thresholdPercent − podRequest(non-BE) − systemUsage

Where:

Variable Description
nodeAllocatable Total allocatable CPU or memory on the node
thresholdPercent The configured reclaim threshold percentage
podUsage(non-BE) Actual resource usage of Guaranteed and Burstable pods
podRequest(non-BE) Sum of resource requests for Guaranteed and Burstable pods
systemUsage System-level resource consumption on the node

Step 2: Apply the ConfigMap

Check whether the ack-slo-config ConfigMap already exists in the kube-system namespace:

  • If it exists, use kubectl patch to merge your changes without overwriting other settings:

    kubectl patch cm -n kube-system ack-slo-config --patch "$(cat configmap.yaml)"
  • If it does not exist, create it:

    kubectl apply -f configmap.yaml

Apply for Batch resources

After enabling dynamic resource overcommitment, configure pods to request Batch resources.

Important
  • A pod cannot request both Batch resources and standard resources at the same time.

  • For Deployments or other workloads, set the label on template.metadata, not on the workload object itself.

  • ack-koordinator dynamically adjusts available Batch capacity based on real-time node load. In rare cases, kubelet may lag in reporting node status, causing pods to fail scheduling due to insufficient resources. If this happens, delete and recreate the affected pods.

  • Batch resource amounts must be integers. batch-cpu uses the millicore unit (1 core = 1000 millicores).

Step 1: Check available Batch resources on the node

# Replace $nodeName with the actual node name.
kubectl get node $nodeName -o yaml

Look for the status.allocatable section in the output:

status:
  allocatable:
    # Unit: millicore. The following example shows 50 cores available.
    kubernetes.io/batch-cpu: 50000
    # Unit: bytes. The following example shows 50 GB available.
    kubernetes.io/batch-memory: 53687091200

Step 2: Configure the pod to use Batch resources

Add the koordinator.sh/qosClass: "BE" label to the pod metadata and set kubernetes.io/batch-cpu and kubernetes.io/batch-memory in the container's resources field:

metadata:
  labels:
    # Required: sets the pod's QoS class to BestEffort.
    koordinator.sh/qosClass: "BE"
spec:
  containers:
  - resources:
      requests:
        # Unit: millicore. "1k" = 1000 millicores = 1 core.
        kubernetes.io/batch-cpu: "1k"
        # Unit: bytes.
        kubernetes.io/batch-memory: "1Gi"
      limits:
        kubernetes.io/batch-cpu: "1k"
        kubernetes.io/batch-memory: "1Gi"

Example

This example deploys a BestEffort test pod that uses Batch resources and verifies that the resource limits are enforced in the node's cgroup.

  1. Check available Batch resources on the node:

    kubectl get node $nodeName -o yaml

    Expected output:

    status:
      allocatable:
        kubernetes.io/batch-cpu: 50000
        kubernetes.io/batch-memory: 53687091200
  2. Create a file named be-pod-demo.yaml:

    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        koordinator.sh/qosClass: "BE"
      name: be-demo
    spec:
      containers:
      - command:
        - "sleep"
        - "100h"
        image: registry-cn-beijing.ack.aliyuncs.com/acs/stress:v1.0.4
        imagePullPolicy: Always
        name: be-demo
        resources:
          limits:
            kubernetes.io/batch-cpu: "50k"
            kubernetes.io/batch-memory: "10Gi"
          requests:
            kubernetes.io/batch-cpu: "50k"
            kubernetes.io/batch-memory: "10Gi"
      schedulerName: default-scheduler
  3. Deploy the pod:

    kubectl apply -f be-pod-demo.yaml
  4. Verify that the resource limits are reflected in the node's cgroup. Check the CPU limit:

    cat /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod4b6e96c8_042d_471c_b6ef_b7e0686a****.slice/cri-containerd-11111c202adfefdd63d7d002ccde8907d08291e706671438c4ccedfecba5****.scope/cpu.cfs_quota_us

    Expected output (50 cores):

    5000000

    Check the memory limit:

    cat /sys/fs/cgroup/memory/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod4b6e96c8_042d_471c_b6ef_b7e0686a****.slice/cri-containerd-11111c202adfefdd63d7d002ccde8907d08291e706671438c4ccedfecba5****.scope/memory.limit_in_bytes

    Expected output (10 GB):

    10737418240

Monitor Batch resource usage

ACK clusters integrate with Managed Service for Prometheus. To view Batch resource usage:

  1. Log on to the ACK console. In the left navigation pane, click ACK consoleClusters.

  2. On the Clusters page, click the target cluster name. In the left-side pane, choose Operations > Prometheus Monitoring.

  3. Click the Others tab, then click the k8s-reclaimed-resource tab. This dashboard shows cluster mixed revenue, and resource capacity at the cluster, node, and pod levels. For more information, see Enable the colocation monitoring feature.

If you have built a custom Prometheus dashboard, use the following metrics to query Batch resource data:

# Allocatable batch-cpu on the node
koordlet_node_resource_allocatable{resource="kubernetes.io/batch-cpu",node="$node"}
# batch-cpu already allocated on the node
koordlet_container_resource_requests{resource="kubernetes.io/batch-cpu",node="$node"}
# Allocatable batch-memory on the node
kube_node_status_allocatable{resource="kubernetes.io/batch-memory",node="$node"}
# batch-memory already allocated on the node
koordlet_container_resource_requests{resource="kubernetes.io/batch-memory",node="$node"}

FAQ

After upgrading from ack-slo-manager to ack-koordinator, does the old overcommitment configuration still work?

Yes. ack-koordinator is backward compatible with the earlier ack-slo-manager protocol. The ACK Pro cluster scheduler can calculate requested and available resources using both the old and new protocol formats simultaneously, so you can upgrade without reconfiguring existing workloads.

The earlier protocol uses:

  • The alibabacloud.com/qosClass pod annotation

  • The alibabacloud.com/reclaimed field for resource requests and limits

ack-koordinator supports these through protocol versions dated no later than July 30, 2023. Migrate existing workloads to the koordinator.sh protocol when convenient.

The following table shows compatibility across component versions:

Scheduler version ack-koordinator alibabacloud.com protocol koordinator.sh protocol
≥1.18 and <1.22.15-ack-2.0 ≥0.3.0 Supported Not supported
≥1.22.15-ack-2.0 ≥0.8.0 Supported Supported

Why does memory usage spike right after the pod starts?

Symptom: Memory usage jumps immediately after a container starts, exceeding the expected kubernetes.io/batch-memory limit.

Cause: When a container is created, ack-koordinator sets the cgroup memory limit based on kubernetes.io/batch-memory. Some applications read the cgroup limit at startup to determine how much memory to allocate internally. If the application reads the cgroup before ack-koordinator has written the limit, it may allocate more memory than intended. The operating system does not immediately reclaim that memory, so usage stays elevated until it naturally drops below the configured limit.

Check: Run the following command inside the container to confirm the memory limit is set correctly:

# Unit: bytes
cat /sys/fs/cgroup/memory/memory.limit_in_bytes
# Expected output example
1048576000

Fix: Configure the application's memory limit in its startup script before the main process begins. This ensures the limit is in place before the application reads the cgroup.

Why does a BestEffort pod stay in Pending state?

Symptom: A pod configured with Batch resources remains in Pending state and cannot be scheduled.

Check: Run kubectl describe pod <pod-name> and look for scheduling failure events.

Common causes and fixes:

Cause Fix
Insufficient Batch resources on all nodes Run kubectl get node <node> -o yaml and check status.allocatable for batch-cpu and batch-memory. Reduce pod requests or wait for resources to be reclaimed.
kubelet has not yet synchronized node status Delete and recreate the pod. ack-koordinator dynamically adjusts Batch capacity, and kubelet may lag in reporting the updated allocatable resources.
Pod is requesting both Batch and standard resources A pod cannot request Batch resources and standard resources at the same time. Remove one set of resource fields.

What's next

ack-koordinator provides additional controls to protect online workloads from interference caused by BestEffort pods. See the following topics: