When you start a container image on an Elastic GPU Service instance by running the docker run --gpus all [Image Name] command, you may receive the docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]] error. This error occurs if the NVIDIA Container Toolkit is not installed. This topic describes how to resolve this issue.
Problem description
When you run the docker run --gpus all [image-name] command to start a Docker container on an Elastic GPU Service instance, the following error message appears:
Cause
The NVIDIA Container Toolkit allows Docker to access GPU resources. If the NVIDIA Container Toolkit is not installed, Docker cannot select a GPU device and causes this error.
Solution
Verify that the NVIDIA GPU driver is installed on the GPU-accelerated instance.
NoteGPU-accelerated instances do not have pre-installed drivers. You must install the required driver. If the NVIDIA GPU driver is not installed, Docker cannot access the GPU device.
nvidia-smiIf the output shows the driver version, the driver is installed. If not, see Install a Tesla driver or Install a GRID driver.

Verify that Docker is installed on the GPU-accelerated instance.
sudo docker -vIf the output shows the Docker version, Docker is installed. If not, see Install and use Docker and Docker Compose.

Install the NVIDIA Container Toolkit.
The commands in this step are for CentOS, Alibaba Cloud Linux, and Ubuntu. For installation commands on other operating systems, see Installing the NVIDIA Container Toolkit.
Alibaba Cloud Linux 3/CentOS 8
# Configure the repository. curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \ sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo # Install the toolkit. sudo yum install -y nvidia-container-toolkit # Restart the Docker service. sudo systemctl restart dockerUbuntu
# Configure the repository. curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update # Install the toolkit. sudo apt-get install -y nvidia-container-toolkit # Restart the Docker service. sudo systemctl restart docker
Verify that the NVIDIA Container Toolkit is installed.
CentOS/Alibaba Cloud Linux
sudo rpm -qa | grep nvidia-container-toolkitUbuntu
sudo dpkg -l | grep nvidia-container-toolkit
If the output shows the NVIDIA Container Toolkit version, the toolkit is installed correctly.

Run
docker run --gpus all [Image Name]to verify that the issue is resolved.