Add GPU-accelerated nodes to a cluster - Container Service for Kubernetes

NVIDIA GPUs are used to accelerate scientific computation and graphics rendering. Container Service for Kubernetes (ACK) supports unified scheduling and operations management for various models of compute-optimized GPU resources. This significantly improves the utilization of GPU resources in a cluster. This topic describes how to add GPU nodes to a cluster.

Prerequisites

An ACK Pro cluster or an ACK dedicated cluster (no longer available for creation) is created.

Create a GPU-accelerated node pool

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, find the cluster to manage and click its name. In the left navigation pane, choose Nodes > Node Pools.
Click Create Node Pool, set Instance Type to Elastic GPU Service, and set Expected Nodes to the required number of nodes. For more information about other parameters, see Create and manage a node pool.
For more information about GPU-accelerated instance types, see GPU-accelerated ECS instance types supported by ACK.
Note
- If no GPU-accelerated instance type is available, change the specified vSwitches and try again.
- If your node operating system is Ubuntu 22.04 or Red Hat Enterprise Linux (RHEL) 9.3 64-bit, the ack-nvidia-device-plugin component sets the NVIDIA_VISIBLE_DEVICES=all environment variable for pods by default. After the node runs the systemctl daemon-reload or systemctl daemon-reexec command, it may fail to access GPU devices. This causes the NVIDIA Device Plugin to stop working. For more information, see What do I do if the "Failed to initialize NVML: Unknown Error" error occurs when I run a GPU container?.

View GPUs that are attached to GPU-accelerated nodes

After you create a node pool, you can view GPUs that are attached to GPU-accelerated nodes.

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the one you want to change. In the left navigation pane, choose Nodes > Nodes.
Find the target node and click Details in the Actions column to view the GPUs that are attached to it.