All Products
Search
Document Center

Container Service for Kubernetes:Overview

Last Updated:Nov 20, 2023

Container Service for Kubernetes (ACK) allows you to centrally schedule, manage, and maintain heterogeneous computing resources. This significantly improves the utilization of GPU resources in ACK clusters for heterogeneous computing. This topic describes the features that ACK provides to manage heterogeneous resources in ACK clusters for heterogeneous computing.

Background information

With the emergence of 5G, AI, high performance computing (HPC), and edge computing services, the demand for computing power increases. General computing that is based on CPUs cannot meet the growing demand for computing power. Heterogeneous computing that is based on the Domain Specific Architecture (DSA) can meet the growing demand for computing power. Various heterogeneous, such as GPUs, and Field Programmable Gate Arrays (FPGAs), are widely used in the preceding services.

However, enterprises find it difficult to manage a large number of heterogeneous resources. Alibaba Cloud provides an all-in-one solution for the management of heterogeneous resources. You can use this solution to schedule and manage heterogeneous resources in a unified manner.

Introduction to ACK clusters for heterogeneous computing

ACK allows you to centrally schedule, manage, and maintain heterogeneous resources in ACK clusters, such as GPUs, FPGAs, and Application-Specific Integrated Circuits (ASICs). This improves resource utilization in ACK clusters for heterogeneous computing. The following table describes the features that ACK provides to manage heterogeneous resources in clusters for heterogeneous computing.

Heterogeneous resource

Feature description

GPU

ACK allows you to create clusters that contain NVIDIA T4, P100 and V100 GPUs. For more information, see Create an ACK cluster with GPU-accelerated nodes and Create an ACK dedicated cluster with GPU-accelerated nodes.

  • ACK supports resource requests for individual GPUs in a cluster.

  • ACK supports automatic scaling of GPU-accelerated nodes in a cluster. For more information, see Enable auto scaling based on GPU metrics

  • ACK supports GPU sharing, GPU scheduling, and computing power isolation. The GPU sharing and scheduling capability provided by Alibaba Cloud allows you to schedule one GPU to multiple model inference applications. This significantly reduces costs. The cGPU solution provided by Alibaba Cloud allows you to isolate the GPU memory and computing power that are allocated to different applications without the need to modify application configurations. This improves application stability. The following list describes the GPU allocation policies supported by cGPU. For more information, see cGPU overview and Use cGPU to allocate computing power.

    • GPU sharing and memory isolation on a one-pod-one-GPU basis: This policy is commonly used in model inference scenarios.

    • GPU sharing and memory isolation on a one-pod-multi-GPU basis: This policy is commonly used to build the code to train distributed models.

    • GPU allocation by using the binpack or spread algorithm: If you use the binpack algorithm, the system preferentially shares one GPU with multiple pods. This algorithm is suitable for scenarios where high GPU utilization is required. If you use the spread algorithm, the system attempts to allocate one GPU to each pod. This algorithm is suitable for scenarios where the high availability of GPUs is required.

  • ACK supports topology-aware GPU scheduling. This feature retrieves the topology of heterogeneous resources from nodes and enables the scheduler to make scheduling decisions based on node topology information, NVlinks, peripheral component interconnect express (PCIe) switches, QuickPath Interconnect (QPI), and remote direct memory access (RDMA) NICs. This optimizes scheduling options and achieves optimal performance. For more information, see Overview

  • ACK supports GPU resource monitoring. This feature collects the metrics of nodes and applications, detects and sends alerts on device (software and hardware) exceptions, and can be used to monitor dedicated GPUs and shared GPUs. For more information, see Monitor GPU errors and Use Prometheus Service to monitor the GPU resources of a Kubernetes cluster.

FPGA

ACK allows you to create clusters that contain FPGA devices. For more information, see Create an ACK cluster with FPGA-accelerated nodes.

ASIC

ACK allows you to create clusters that contain NETINT ASIC devices and supports resource requests for individual NETINT ASIC cards in a cluster. For more information, see Create an ASIC-accelerated cluster.