Container Service for Kubernetes (ACK) supports GPU sharing. To enable GPU sharing,
you must install cGPU on a node. This topic describes how to upgrade the cGPU version
on a node by using a CLI and the ACK console.
Upgrade the cGPU version on a node by using a CLI
The cgpu-installer component runs as a DaemonSet, which is used to install cGPU on
nodes. To upgrade cGPU, you must change the image version of cgpu-installer to the
version to which you want to upgrade.
The following cGPU image versions are supported:
Note During the upgrade process, the node where cGPU is deployed is restarted. Make sure
that no workload is running on the node before you upgrade cGPU.
- Run the following command to modify the image version of cgpu-installer.
kubectl edit ds cgpu-installer -n kube-system
In this example, the image version is changed to V0.8.10.
- Uninstall the earlier version of cGPU.
- Log on to the node. For more information about how to log on to a node, see Connect to a Linux instance by using password authentication.
- Run the following command to stop Docker:
- Run the following command to uninstall cGPU:
bash /usr/local/cgpu-installer/uninstall.sh
Note
If
/usr/local/cgpu-installer/uninstall.sh does not exist, run the following command to uninstall the earlier version of cGPU.
wget http://aliacs-k8s-cn-beijing.oss-cn-beijing.aliyuncs.com/gpushare/cgpu-uninstall.sh -O /usr/local/cgpu-installer/uninstall.sh
- Restart the node. For more information about how to restart a node, see Reboot the instance.
Verify the result
After the node is restarted, log on to the node and run the following command to query
the cGPU version:
cat /proc/cgpu_km/version
Expected output:
0.8.10
The output indicates that the cGPU version is upgraded to V0.8.10.
Upgrade the cGPU version on a node by using the ACK console
The cgpu-installer component runs as a DaemonSet, which is used to install cGPU on
nodes. To upgrade cGPU, you must change the image version of cgpu-installer to the
version to which you want to upgrade.
The following cGPU image versions are supported:
- Run the following command to modify the image version of cgpu-installer.
kubectl edit ds cgpu-installer -n kube-system
In this example, the image version is changed to V0.8.10.
- Remove the node whose cGPU is to be upgraded from the cluster.
- Log on to the ACK console.
- In the left-side navigation pane of the ACK console, click Clusters.
- On the Clusters page, find the cluster that you want to manage and click the name of the cluster
or click Details in the Actions column. The details page of the cluster appears.
- In the left-side navigation pane of the details page, choose .
- On the Nodes page, select the node that you want to remove and click Batch Remove.
- In the Remove Node dialog box, select Drain the Node.
- Click OK.
- Create a node pool.
Create a node pool and add the node that you removed to the node pool.
For more information, see Manage a node pool.
- In the left-side navigation pane of the details page, choose .
- On the Node Pools page, click Create Node Pool.
- In the Create Node Pool dialog box, configure the parameters.
For more information about the parameters, see
Create a dedicated Kubernetes cluster. The following list describes some of the parameters:
Parameter |
Description |
Recommended value |
Quantity |
Specify the initial number of nodes in the node pool. |
In this example, set Quantity to 0. |
Node Label |
You can add labels to nodes in the node pool. |
- If ack-ai-installer is installed in the cluster, set Key to ack.node.gpu.schedule and Value to cgpu.
- If ack-cgpu is installed in the cluster, set Key to cgpu and Value to true.
|
- Click Confirm Order to create the node pool.
- Add the node to the node pool.
Verify the result
After the node is added to the node pool, verify that the cGPU version is upgraded.
- Run the following command to query the pods that run cgpu-installer on the node:
kubectl get po -l name=cgpu-installer -n kube-system -o wide
Expected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cgpu-installer-kkmp6 1/1 Running 0 4d2h 192.168.XXX.XX1 cn-beijing.192.168.XXX.XX1 <none> <none>
cgpu-installer-**2 1/1 Running 0 4d2h 192.168.XXX.XX2 cn-beijing.192.168.XXX.XX2 <none> <none>
cgpu-installer-**3 1/1 Running 0 4d2h 192.168.XXX.XX3 cn-beijing.192.168.XXX.XX3 <none> <none>
- Run the following command to query the pod named
cgpu-installer-kkmp6
: kubectl exec -ti cgpu-installer-kkmp6 -n kube-system -- bash
- Run the following command to query the current cGPU version:
nsenter -t 1 -i -p -n -u -m -- cat /proc/cgpu_km/version
Expected output:
0.8.10
The output indicates that the cGPU version is V0.8.10.