When you run the docker run --gpus all [Image name]
command to start a container image on a GPU-accelerated instance where Docker is installed but the NVIDIA Container Toolkit is missing, the docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]
error may be reported. This topic describes how to resolve the issue.
Problem description
The following error is reported when the docker run --gpus all [Image name]
command is run to start a Docker container image on a GPU-accelerated instance where Docker is installed.
Cause
The NVIDIA Container Toolkit is a collection of libraries and utilities that enable Docker to access GPU resources. The preceding issue occurs because the GPU-accelerated instance is installed with Docker but without the NVIDIA Container Toolkit. As a result, Docker fails to select a GPU device.
Solution
Run the following command to check whether an NVIDIA GPU driver is installed on the GPU-accelerated instance:
NoteGPU-accelerated instances do not come pre-configured with built-in drivers. You must install the corresponding drivers on GPU-accelerated instances. If no NVIDIA GPU driver is installed on a GPU-accelerated instance, Docker on the instance cannot access the GPU device.
nvidia-smi
If the following command output that contains a driver version is displayed, an NVIDIA GPU driver is installed on the instance. If not, you must install an appropriate driver on the instance. For more information, see Install the NVIDIA Tesla driver or Install the NVIDIA GRID driver.
Run the following command to check whether Docker is installed on the GPU-accelerated instance:
sudo docker -v
If the following command output that contains a Docker version is displayed, Docker is installed on the instance. If not, you must install Docker on the instance. For more information, see Install Docker.
Run the following commands to install the NVIDIA Container Toolkit:
In this example, commands for CentOS, Alibaba Cloud Linux, and Ubuntu are provided. For more information about commands for other operating systems, see Installing the NVIDIA Container Toolkit.
CentOS or Alibaba Cloud Linux
# Configure the source. curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \ sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo # Install the NVIDIA Container Toolkit. sudo yum install -y nvidia-container-toolkit # Restart Docker. sudo systemctl restart docker
Ubuntu
# Configure the source. curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update # Install the NVIDIA Container Toolkit. sudo apt-get install -y nvidia-container-toolkit # Restart Docker. sudo systemctl restart docker
Run the following command to check whether the NVIDIA Container Toolkit is installed:
CentOS or Alibaba Cloud Linux
sudo rpm -qa | grep nvidia-container-toolkit
Ubuntu
sudo dpkg -l | grep nvidia-container-toolkit
If the following command output that contains an NVIDIA Container Toolkit version is displayed, the NVIDIA Container Toolkit is installed.
Run
docker run --gpus all [Image name]
. If the command is run with no errors, the issue is resolved.