If cGPU Basic Edition is installed in a dedicated Kubernetes cluster, cGPU cannot work as normal after you migrate the cluster workloads to a professional Kubernetes cluster. Professional Kubernetes clusters support only cGPU Professional Edition. In this case, you must upgrade cGPU Basic Edition to cGPU Professional Edition in the professional Kubernetes cluster after the migration is completed. This topic describes how to upgrade cGPU Basic Edition to cGPU Professional Edition in a professional Kubernetes cluster.

Prerequisites

Workloads are migrated from a dedicated Kubernetes cluster to a professional Kubernetes cluster. cGPU Basic Edition is installed in the dedicated Kubernetes cluster before the migration. For more information, see Hot migration from dedicated Kubernetes clusters to professional managed Kubernetes clusters.

Procedure

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, click Clusters.
  3. On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
  4. In the left-side navigation pane of the details page, choose Workloads > Jobs.
  5. On the Jobs page, click Create from YAML in the upper-right corner.
  6. On the Create page, set Sample Template to Custom. Copy the following YAML template to the Template section.
    This template is used to create a Job that uninstalls cGPU Basic Edition and modifies the labels of GPU-accelerated nodes.
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: gpushare-migration
      namespace: kube-system
    spec:
      backoffLimit: 0
      template:
        spec:
          serviceAccount: admin
          containers:
          - name: gpushare-migration
            # Replace <cn-beijing> in the following image address with the ID of the region where the cluster is deployed. 
            image: registry-vpc.cn-beijing.aliyuncs.com/acs/gpushare-migration:v0.1.0
            env:
              - name: CHANGE_LABELS_INFO
                value: "cgpu=true::ack.node.gpu.schedule=cgpu,gpushare=true::ack.node.gpu.schedule=share"
          restartPolicy: Never
  7. Click Create. Click the Job name gpushare-migration to view the deployment progress.
    gpushare-migration

    On the details page of the gpushare-migration Job, click the Pods tab. If the state of the pod is Completed, it indicates that the Job succeeds.

  8. Install cGPU Professional Edition. For more information, see Step 1: Install ack-ai-installer.
  9. Install a GPU memory inspection tool in the cluster. For more information, see Step 4: Install and use the GPU scheduling inspection tool.

What to do next

For more information about how to test the GPU sharing, GPU scheduling, and GPU memory isolation features of cGPU Professional Edition, see Enable GPU sharing.