- How do I update the kernel of a GPU node?
- How do I fix a container startup exception on a GPU node?
- Troubleshoot issues in GPU monitoring
- How do I fix the error that the number of available GPUs is less than the actual number of GPUs
- How do I fix errors that occur on GPU nodes when kubelet or Docker is restarted?
- Fix the issue that the IDs of GPUs are changed after a GPU-accelerated ECS instance is restarted or replaced
- Does ACK support vGPU-accelerated instances?
Does ACK support vGPU-accelerated instances?
A vGPU-accelerated instance can work as normal only when an NVIDIA GRID license is purchased and a GRID license server is set up. However, Alibaba Cloud does not provide GRID license servers. As a result, after a Container Service for Kubernetes (ACK) cluster that contains vGPU-accelerated instances is created, you cannot directly use the vGPU-accelerated instances in the cluster. Therefore, ACK no longer allows you to select vGPU-accelerated instances when you create clusters in the ACK console.
You cannot select the vGPU-accelerated Elastic Compute Service (ECS) instance types whose names are prefixed with ecs.vgn5i, ecs.vgn6i, ecs.vgn7i, or ecs.sgn7i in the ACK console. If your workloads are highly dependent on vGPU-accelerated instances, you can purchase NVIDIA GRID licenses and set up GRID license servers on your own.
- GRID license servers are required to renew the NVIDIA driver licenses of vGPU-accelerated instances.
- You must purchase vGPU-accelerated ECS instances and familiarize yourself with the NVIDIA documentation about how to set up GRID license servers. For more information, see the NVIDIA official website.
After you have set up a GRID license server, perform the following steps to add a vGPU-accelerated instance to your ACK cluster.
Add a vGPU-accelerated instance to your ACK cluster
- Submit a ticket to apply to be allowed to use custom images.
- Create a custom image that is based on CentOS 7.X or Alibaba Cloud Linux 2. The custom image must be installed with the NVIDIA GRID driver and configured with an NVIDIA license. For more information, see Create a custom image from an instance and Install a GRID driver on a Linux vGPU-accelerated instance.
- Create a node pool. For more information, see Manage node pools.
- Add a vGPU-instance to the node pool that you created in Step 3. For more information, see Add existing ECS instances to an ACK cluster.
What to do next: Renew the NVIDIA driver license of a vGPU-accelerated instance in an ACK cluster
For more information about how to renew the NVIDIA driver license of a vGPU-accelerated instance in an ACK cluster, see Renew the NVIDIA driver license of a vGPU-accelerated instance in an ACK cluster.