NVIDIA GPUs are used to accelerate scientific computation and graphics rendering. Container Service for Kubernetes (ACK) supports unified scheduling and operations management for various models of compute-optimized GPU resources. This significantly improves the utilization of GPU resources in a cluster. This topic describes how to add GPU nodes to a cluster.
Prerequisites
An ACK Pro cluster or an ACK dedicated cluster (no longer available for creation) is created.
Create a GPU-accelerated node pool
Log on to the ACK console. In the navigation pane on the left, click Clusters.
On the Clusters page, find the cluster to manage and click its name. In the left-side navigation pane, choose .
Click Create Node Pool, select Instance Type as Elastic GPU Service, and set Desired Number Of Nodes to the required number of nodes. For more information about other parameters, see Create and manage a node pool.
For more information about GPU-accelerated instance types, see GPU-accelerated ECS instance types supported by ACK.
NoteIf no GPU-accelerated instance type is available, change the specified vSwitches and try again.
If your node operating system is Ubuntu 22.04 or Red Hat Enterprise Linux (RHEL) 9.3 64-bit, the NVIDIA Device Plugin component configures the environment variable
NVIDIA_VISIBLE_DEVICES=all
for pods by default. After the node executes thesystemctl daemon-reload
orsystemctl daemon-reexec
command, the NVIDIA Device Plugin may not work properly because the GPU devices cannot be accessed. For more information, see How do I resolve the "Failed to initialize NVML: Unknown Error" error when running GPU containers?.
View GPUs that are attached to GPU-accelerated nodes
After you create a node pool, you can view GPUs that are attached to GPU-accelerated nodes.
Log on to the ACK console. In the navigation pane on the left, click Clusters.
On the Clusters page, click the name of the one you want to change. In the navigation pane on the left, choose .
Find the target node and click Details in the Actions column to view the GPUs that are attached to the node.