Advanced resource and workload scheduling features - Container Service for Kubernetes

ACK provides specialized scheduling capabilities for AI training, batch inference, heterogeneous GPU and FPGA workloads, and large-scale batch jobs. Use the tables below to identify the right feature for your scenario.

Elastic scheduling

Mix ECS instances, Elastic Container Instances (ECI), and preemptible instances in a single application, then define priority-based policies that control which resource type is used first during scale-out and which is released first during scale-in.

Feature	Scenario	References
Elastic scheduling	Reduce costs by prioritizing cheaper resources during scale-out (for example, exhaust ECS instances before falling back to ECI) and releasing them first during scale-in. Supports subscription, pay-as-you-go, and preemptible instances.	Use Elastic Container Instance-based scheduling and Configure priority-based resource scheduling

Task scheduling

ACK provides gang scheduling, Capacity Scheduling, and Kube Queue for batch processing and AI workloads.

Feature	Scenario	References
Gang scheduling	Distributed training or batch jobs that require all tasks to start simultaneously. Without gang scheduling, partially started jobs block cluster resources and cause deadlock (all jobs stuck in Pending). Gang scheduling starts all correlated processes at the same time, preventing the process group from blocking.	Work with gang scheduling
Capacity Scheduling	Multi-team clusters where different teams use resources at different times. Standard Kubernetes resource quotas allocate fixed amounts per namespace, leading to idle resources when a team's quota goes unused. Capacity Scheduling, built on the Yarn capacity scheduler and the Kubernetes scheduling framework, lets teams share idle resources across quota boundaries.	Use Capacity Scheduling
Kube Queue (ack-kube-queue)	Large clusters running AI, machine learning, and batch workloads submitted by multiple users. Pod-level scheduling degrades when job counts are high, and jobs from different users can interfere during scheduling. ack-kube-queue manages job queues with customizable policies and an integrated quota system to maximize resource utilization.	Use ack-kube-queue to manage job queues

Scheduling of heterogeneous resources

ACK provides cGPU, topology-aware CPU scheduling, and topology-aware GPU scheduling features to schedule heterogeneous resources. For the node labels that control GPU scheduling, see Labels used by ACK to control GPUs.

GPU sharing with cGPU

cGPU lets multiple pods share a single GPU while isolating each pod's GPU memory. ACK Pro clusters support the following GPU policies based on your workload type:

Policy	Use when	Description
One-pod-one-GPU sharing and memory isolation	Model inference	A single pod uses one GPU with memory isolation enforced between pods on the same GPU.
One-pod-multi-GPU sharing and memory isolation	Building code to train distributed models	A single pod spans multiple GPUs with memory isolation, suited for building code to train distributed models.
binpack or spread allocation	Improving GPU utilization and ensuring high availability	GPU allocation based on the binpack or spread algorithm to improve GPU utilization and ensure the high availability of GPUs.

See cGPU Professional Edition for setup instructions.

Topology-aware CPU scheduling and topology-aware GPU scheduling

For performance-sensitive workloads, the scheduler selects an optimal placement based on the hardware topology of the node: GPU-to-GPU communication paths (NVLink and PCIe Switches) and the non-uniform memory access (NUMA) topology of CPUs.

Feature	References
Topology-aware CPU scheduling	Topology-aware CPU scheduling
Topology-aware GPU scheduling	Overview

FPGA scheduling

Schedule workloads that require FPGA resources to FPGA-accelerated nodes using labels, and manage all FPGA resources in the cluster in a unified manner.

Feature	References
FPGA scheduling	Use labels to schedule pods to FPGA-accelerated nodes

Task queue scheduling

ACK lets you customize task queue scheduling for AI workloads, machine learning workloads, and batch jobs.