This topic describes how to disable cGPU when you use a YAML file to create a container that uses a shared graphics processing unit (GPU). cGPU is used to isolate GPU resources that are allocated to different containers when these containers share the same physical GPU.
Prerequisites
Procedure
Result
You can use one of the following methods to check whether cGPU is disabled for the container:
- Method 1: Run the following command to print the application log:
kubectl logs binpack-0 --tail=1
2020-08-25 08:14:54.927965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15024 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:07.0, compute capability: 7.0)
The printed log shows that the GPU memory available for the containerized application is 15,024 MiB. This indicates that cGPU is disabled for the container. If cGPU is enabled, the amount of GPU memory that the containerized application can use is 3 GiB.
- Method 2: Run the following command to log on to the container and view the amount
of GPU memory that is allocated to the container:
kubectl exec binpack-0 nvidia-smi
Tue Aug 25 08:23:33 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:00:07.0 Off | 0 | | N/A 33C P0 55W / 300W | 15453MiB / 16130MiB | 1% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| +-----------------------------------------------------------------------------+
The output shows that the GPU memory capacity of the host is 16,130 MiB and the amount of GPU memory that is allocated to the container is 15,453 MiB. This indicates that cGPU is disabled for the container. If cGPU is enabled, the amount of GPU memory that is allocated to the container is 3 GiB.