All Products
Search
Document Center

Container Service for Kubernetes:AI workload scheduling

Last Updated:Mar 25, 2026

ACK provides specialized scheduling capabilities for AI training, batch inference, heterogeneous GPU and FPGA workloads, and large-scale batch jobs. Use the tables below to identify the right feature for your scenario.

Elastic scheduling

Mix ECS instances, Elastic Container Instances (ECI), and preemptible instances in a single application, then define priority-based policies that control which resource type is used first during scale-out and which is released first during scale-in.

FeatureScenarioReferences
Elastic schedulingReduce costs by prioritizing cheaper resources during scale-out (for example, exhaust ECS instances before falling back to ECI) and releasing them first during scale-in. Supports subscription, pay-as-you-go, and preemptible instances.Use Elastic Container Instance-based scheduling and Configure priority-based resource scheduling

Task scheduling

ACK provides gang scheduling, Capacity Scheduling, and Kube Queue for batch processing and AI workloads.

FeatureScenarioReferences
Gang schedulingDistributed training or batch jobs that require all tasks to start simultaneously. Without gang scheduling, partially started jobs block cluster resources and cause deadlock (all jobs stuck in Pending). Gang scheduling starts all correlated processes at the same time, preventing the process group from blocking.Work with gang scheduling
Capacity SchedulingMulti-team clusters where different teams use resources at different times. Standard Kubernetes resource quotas allocate fixed amounts per namespace, leading to idle resources when a team's quota goes unused. Capacity Scheduling, built on the Yarn capacity scheduler and the Kubernetes scheduling framework, lets teams share idle resources across quota boundaries.Use Capacity Scheduling
Kube Queue (ack-kube-queue)Large clusters running AI, machine learning, and batch workloads submitted by multiple users. Pod-level scheduling degrades when job counts are high, and jobs from different users can interfere during scheduling. ack-kube-queue manages job queues with customizable policies and an integrated quota system to maximize resource utilization.Use ack-kube-queue to manage job queues

Scheduling of heterogeneous resources

ACK provides cGPU, topology-aware CPU scheduling, and topology-aware GPU scheduling features to schedule heterogeneous resources. For the node labels that control GPU scheduling, see Labels used by ACK to control GPUs.

GPU sharing with cGPU

cGPU lets multiple pods share a single GPU while isolating each pod's GPU memory. ACK Pro clusters support the following GPU policies based on your workload type:

PolicyUse whenDescription
One-pod-one-GPU sharing and memory isolationModel inferenceA single pod uses one GPU with memory isolation enforced between pods on the same GPU.
One-pod-multi-GPU sharing and memory isolationBuilding code to train distributed modelsA single pod spans multiple GPUs with memory isolation, suited for building code to train distributed models.
binpack or spread allocationImproving GPU utilization and ensuring high availabilityGPU allocation based on the binpack or spread algorithm to improve GPU utilization and ensure the high availability of GPUs.

See cGPU Professional Edition for setup instructions.

Topology-aware CPU scheduling and topology-aware GPU scheduling

For performance-sensitive workloads, the scheduler selects an optimal placement based on the hardware topology of the node: GPU-to-GPU communication paths (NVLink and PCIe Switches) and the non-uniform memory access (NUMA) topology of CPUs.

FeatureReferences
Topology-aware CPU schedulingTopology-aware CPU scheduling
Topology-aware GPU schedulingOverview

FPGA scheduling

Schedule workloads that require FPGA resources to FPGA-accelerated nodes using labels, and manage all FPGA resources in the cluster in a unified manner.

FeatureReferences
FPGA schedulingUse labels to schedule pods to FPGA-accelerated nodes

Task queue scheduling

ACK lets you customize task queue scheduling for AI workloads, machine learning workloads, and batch jobs.