Container Service for Kubernetes (ACK) Edge clusters facilitate the management of on-premises GPU resources within edge node pools. This topic describes how to add a GPU-accelerated node to an edge node pool in an ACK Edge cluster.
Prerequisites
You have created an ACK Edge cluster.
A GPU driver is installed in the cluster before the node is added. For more information about driver versions, see NVIDIA driver versions supported by ACK.
Limits
Make sure that your cluster has a sufficient node quota. To add more nodes, submit a request to increase the quota. For more information about the quota limits of ACK Edge clusters, see Quota and limits.
When you add a GPU-accelerated node, access to some endpoints is required. You must configure a security group on the node side to remove any restrictions and allow this access. For more information, see Configuration of domain name and IP routing network segment for edge node access.
Procedure
Kubernetes 1.26 or later
When you add a GPU-accelerated node that is equipped with an NVIDIA GPU to an ACK Edge cluster that runs Kubernetes 1.26 or later, you do not need to configure the gpuVersion
parameter. The system automatically checks the GPU model and installs the relevant components.
The steps to add a GPU-accelerated node are similar to the steps to add an edge node. Fore more information, see Add an edge node.
ACK Edge clusters that run Kubernetes 1.26 or later support all series of production-grade GPUs provided NVIDIA, including Tesla, Hopper, Ada Lovelace, and L.
Kubernetes versions earlier than 1.26
When you add a GPU-accelerated node to an ACK Edge cluster that runs a Kubernetes version earlier than 1.26, the GPU model must meet the requirements in the following table. If you want to use a GPU model that does not the following requirements, submit a ticket.
OS architecture | GPU model | Kubernetes version |
AMD64/x86_64 | Nvidia_Tesla_T4 | ≥1.16.9-aliyunedge.1 |
AMD64/x86_64 | Nvidia_Tesla_P4 | ≥1.16.9-aliyunedge.1 |
Nvidia_Tesla_P100 | ≥1.16.9-aliyunedge.1 | |
AMD64/x86_64 | Nvidia_Tesla_V100 | ≥1.18.8-aliyunedge.1 |
AMD64/x86_64 | Nvidia_Tesla_A10 | ≥1.20.11-aliyunedge.1 |
AMD64/x86_64 | Nvidia_L40 | ≥1.26.3-aliyun.1 |
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster to manage and click its name. In the left-side navigation pane, choose .
On the Node Pools page, find the target node pool and choose More >
> Add Existing Node > Add Existing Node in the column.
On the Add Existing ECS Instance page, select Manual as Method and then select an existing instance.
Click Next Step to go to the Specify Instance Information step. You can set the parameters that are used to add the node. For more information about the parameters, see Parameter list.
NoteYou must configure the
gpuVersion
parameter in the script to connect the node to the cloud. For more information about the supported GPU models, see Limits.After you configure the parameters, the connection tool automatically installs nvidia-containerd-runtime. For more information, see nvidia-containerd-runtime.
After you set the parameters, click Next Step. In the Complete step, click Copy to copy the script to the edge node that you want to add. Then, execute the script on the node.
If the following result is returned, the node is added to the cluster.
References
If you have any problems when you add edge nodes, see Diagnose edge node problems.
For more information about how to remove an edge node, see Remove edge nodes.
ACK Edge clusters support edge node autonomy. Edge node autonomy ensures that applications on an edge node can still run as expected when the edge node is disconnected from the cloud. For more information, see Configure edge node autonomy.