All Products
Search
Document Center

Container Service for Kubernetes:Overview of ACK clusters for heterogeneous computing

Last Updated:Jul 08, 2025

Alibaba Cloud Container Service for Kubernetes (ACK) supports unified scheduling and operational management of various heterogeneous computing resources, which improves resource utilization in heterogeneous computing clusters. This topic describes the heterogeneous computing clusters supported by ACK.

Introduction to ACK clusters for heterogeneous computing

ACK supports unified scheduling and operational management of heterogeneous resources, such as GPUs, Application-Specific Integrated Circuits (ASICs), and remote direct memory access (RDMA), which improves cluster resource utilization. The following table describes the heterogeneous computing clusters and features supported by ACK.

Heterogeneous resource

Description

GPU

ACK allows you to create clusters that contain the NVIDIA T4, P100, V100, and A100 GPUs.

  • ACK supports resource requests for individual GPUs.

  • ACK supports auto scaling based on GPU metrics.

  • ACK supports GPU sharing and computing power isolation. The GPU sharing developed by Alibaba Cloud allows multiple model inference applications to run on the same GPU at the same time. This significantly reduces costs. With the cGPU solution provided by Alibaba Cloud, isolation capabilities for GPU memory and compute power are achieved without the need to modify application containers, which enhances the stability of the applications. The following list describes the supported GPU allocation policies.

    • GPU sharing on a one-pod-one-GPU basis: This policy is commonly used in model inference scenarios.

    • GPU sharing on a one-pod-multi-GPU basis: This policy is commonly used to develop distributed training.

    • Binpack allocation policy: If you use the binpack allocation policy, the system preferentially shares one GPU with multiple pods. This algorithm is suitable for scenarios where high GPU utilization must be guaranteed.

    • Spread allocation policy: If you use the spread algorithm, the system attempts to allocate one GPU to each pod. This algorithm is suitable for scenarios where the high availability of GPUs must be guaranteed.

  • ACK supports topology-aware GPU scheduling: This feature retrieves the topology of heterogeneous resources from nodes and enables the scheduler to make scheduling decisions based on node topology information, NVlinks, peripheral component interconnect express (PCIe) switches, QuickPath Interconnect (QPI), and RDMA NICs. This optimizes scheduling options and achieves optimal performance.

  • ACK supports GPU resource monitoring: This feature collects the metrics of nodes and applications, detects and sends alerts on device (software and hardware) exceptions, and can be used to monitor dedicated GPUs and shared GPUs.

ASIC

ACK allows you to create clusters that contain NETINT ASIC devices and supports resource requests for individual NETINT ASIC cards.

eRDMA

ACK allows you to create ACK clusters that contain eRDMA devices.

  • You can use Arena to submit distributed deep learning jobs to eRDMA devices.

  • Allows you to create training jobs that require high bandwidth, such as distributed deep learning jobs.

GPU instance types supported by ACK

ACK supports multiple GPU-accelerated compute-optimized instance families. If you want to add GPU nodes to an ACK cluster, you need to select from the Elastic Compute Service (ECS) instance families listed below.

Note

ACK does not support selecting vGPU-accelerated instances as cluster nodes in the console. For more information, see Does ACK support vGPU-accelerated instances?

ASIC instance types supported by ACK

If you want to add ASIC nodes to an ACK cluster, you can select the instance type ecs.video-trans.26xhevc.

eRDMA instance types supported by ACK

ACK supports multiple eRDMA compute-optimized instance families. You can select from the ECS instance families listed below. For more information, see Enable eRDMA on an enterprise-level instance and Enable eRDMA on a GPU-accelerated instance.