GPUs provide higher parallel computing power than CPUs for workloads on Windows nodes and can accelerate operations by orders of magnitude. This reduces costs and improves throughput. Windows containers support GPU acceleration for Direct eXtension (DirectX) and all the frameworks that are built on top of DirectX. This topic describes how to install the DirectX device plug-in on Windows nodes and how to enable GPU acceleration for DirectX.

Prerequisites

Background information

DirectX is a type of API. DirectX can enable Windows-based games and multimedia programs to achieve higher execution efficiency, enhance 3D graphics and sound effects, and provide designers with a common hardware driver standard. This reduces the complexity of installing and setting up hardware. DirectX can allow GPUs to perform more general-purpose computing. It also reduces overload and encourages developers to better use GPUs as parallel processors.

Step 1: Install the DirectX device plug-in on Windows nodes

Deploy the DirectX device plug-in as a DaemonSet on Windows nodes.

  1. Create a file named directx-device-plugin-windows.yaml and copy the following code to the file:
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      labels:
        k8s-app: directx-device-plugin-windows
      name: directx-device-plugin-windows
      namespace: kube-system
    spec:
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          k8s-app: directx-device-plugin-windows
      template:
        metadata:
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ""
          labels:
            k8s-app: directx-device-plugin-windows
        spec:
          tolerations:
            - operator: Exists
          # since 1.18, we can specify "hostNetwork: true" for Windows workloads, so we can deploy an application without NetworkReady.
          hostNetwork: true
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                  - matchExpressions:
                      - key: type
                        operator: NotIn
                        values:
                          - virtual-kubelet
                      - key: beta.kubernetes.io/os
                        operator: In
                        values:
                          - windows
                      - key: windows.alibabacloud.com/deployment-topology
                        operator: In
                        values:
                          - "2.0"
                      - key: windows.alibabacloud.com/directx-supported
                        operator: In
                        values:
                          - "true"
                  - matchExpressions:
                      - key: type
                        operator: NotIn
                        values:
                          - virtual-kubelet
                      - key: kubernetes.io/os
                        operator: In
                        values:
                          - windows
                      - key: windows.alibabacloud.com/deployment-topology
                        operator: In
                        values:
                          - "2.0"
                      - key: windows.alibabacloud.com/directx-supported
                        operator: In
                        values:
                          - "true"
          containers:
            - name: directx
              command:
                - pwsh.exe
                - -NoLogo
                - -NonInteractive
                - -File
                - entrypoint.ps1
              # Replace <cn-hangzhou> in the following image address with the ID of the region where the cluster is deployed. 
              image: registry-vpc.cn-hangzhou.aliyuncs.com/acs/directx-device-plugin-windows:v1.0.0
              imagePullPolicy: IfNotPresent
              volumeMounts:
                - name: host-binary
                  mountPath: c:/host/opt/bin
                - name: wins-pipe
                  mountPath: \\.\pipe\rancher_wins
          volumes:
            - name: host-binary
              hostPath:
                path: c:/opt/bin
                type: DirectoryOrCreate
            - name: wins-pipe
              hostPath:
                path: \\.\pipe\rancher_wins
  2. Run the following command to install the DirectX device plug-in:
    kubectl create -f directx-device-plugin-windows.yaml

Step 2: Deploy a Windows workload that has GPU acceleration enabled for DirectX

The DirectX device plug-in can automatically add the class/<interface class GUID> device for Windows containers. This enables Windows containers to access DirectX services on the Elastic Compute Service (ECS) host. For more information, see Devices in containers on Windows. Add the following resources parameter for the Windows workload that requires GPU acceleration and redeploy the workload.
spec:
  ...
  template:
    ...
    spec:
      ...
      containers:
        - name: gpu-user
          ...
+         resources:
+           limits:
+             windows.alibabacloud.com/directx: "1"
+           requests:
+             windows.alibabacloud.com/directx: "1"
Notice The preceding configuration does not allocate all GPU resources on the ECS host to the containers, or prevent other applications from accessing the GPUs on the ECS host. The GPU resources are dynamically scheduled between the ECS host and containers. This means that you can run multiple Windows containers on the ECS host and each container can use DirectX hardware acceleration.
For more information about GPU acceleration in Windows containers, see GPU acceleration in Windows containers.

Check whether GPU acceleration is enabled for the Windows workload

Use the following sample application to check whether the DirectX device plug-in is successfully deployed on Windows nodes.

  1. Create a file named gpu-job-windows.yaml and copy the following code to the file:
    apiVersion: batch/v1
    kind: Job
    metadata:
      labels:
        k8s-app: gpu-job-windows
      name: gpu-job-windows
      namespace: default
    spec:
      parallelism: 1
      completions: 1
      backoffLimit: 3
      manualSelector: true
      selector:
        matchLabels:
          k8s-app: gpu-job-windows
      template:
        metadata:
          labels:
            k8s-app: gpu-job-windows
        spec:
          restartPolicy: Never
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                  - matchExpressions:
                      - key: type
                        operator: NotIn
                        values:
                          - virtual-kubelet
                      - key: beta.kubernetes.io/os
                        operator: In
                        values:
                          - windows
                  - matchExpressions:
                      - key: type
                        operator: NotIn
                        values:
                          - virtual-kubelet
                      - key: kubernetes.io/os
                        operator: In
                        values:
                          - windows
          tolerations:
            - key: os
              value: windows
          containers:
            - name: gpu
              # Replace <cn-hangzhou> in the following image address with the ID of the region where the cluster is deployed. 
              image: registry-vpc.cn-hangzhou.aliyuncs.com/acs/sample-gpu-windows:v1.0.0
              imagePullPolicy: IfNotPresent
              resources:
                limits:
                  windows.alibabacloud.com/directx: "1"
                requests:
                  windows.alibabacloud.com/directx: "1"
    Note
    • Image registry-vpc.{region}.aliyuncs.com/acs/sample-gpu-windows is a sample image for GPU acceleration in Windows containers provided by ACK. This image is built on top of microsoft-windows. For more information, see mcr.microsoft.com/windows.
    • The image file is 15.3 GB in size and may require a long time to download when you use it to deploy applications.
    • In this example, WinMLRunner is used to generate simulated input data. After GPU acceleration is enabled for the gpu-job-windows Job, 100 evaluations are performed based on the Tiny YOLOv2 model to output the final performance data.
  2. Run the following command to create the sample application:
    kubectl create -f gpu-job-windows.yaml
  3. Run the following command to query the log of the gpu-job-windows application:
    kubectl logs -f gpu-job-windows

    Expected output:

    INFO: Executing model of "tinyyolov2-7" 100 times within GPU driver ...
    
    Created LearningModelDevice with GPU: NVIDIA GRID T4-8Q
    Loading model (path = c:\data\tinyyolov2-7\model.onnx)...
    =================================================================
    Name: Example Model
    Author: OnnxMLTools
    Version: 0
    Domain: onnxconverter-common
    Description: The Tiny YOLO network from the paper 'YOLO9000: Better, Faster, Stronger' (2016), arXiv:1612.08242
    Path: c:\data\tinyyolov2-7\model.onnx
    Support FP16: false
    
    Input Feature Info:
    Name: image
    Feature Kind: Image (Height: 416, Width:  416)
    
    Output Feature Info:
    Name: grid
    Feature Kind: Float

    The output shows that GPU acceleration is enabled for the gpu-job-windows application.