All Products
Search
Document Center

Elastic GPU Service:GPU unavailable from nvidia-fabricmanager version mismatch

Last Updated:Jun 21, 2026

On GPU-accelerated instances, such as ebmgn7, ebmgn7e, running the Ubuntu operating system, if you installed the nvidia-fabricmanager service from a package, the apt-daily service may automatically update the package. This causes a version mismatch with the Tesla driver, which prevents the nvidia-fabricmanager service from starting and makes the GPU unavailable. This topic describes how to resolve this issue.

Problem description

After you install nvidia-fabricmanager by using an installation package, the following error message appears when you view the service status. In this case, the GPU fails to work as expected.

root@xxx:~# systemctl status nvidia-fabricmanager
× nvidia-fabricmanager.service - NVIDIA fabric manager service
     Loaded: loaded (/lib/systemd/system/nvidia-fabricmanager.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Mon 2024-09-09 18:05:58 CST; 22s ago
    Process: 36178 ExecStart=/usr/bin/nv-fabricmanager -c /usr/share/nvidia/nvswitch/fabricmanager.cfg (code=exited, status=1/FAILURE)
        CPU: 66ms
Sep 09 18:05:58 iZ2xxx0d5fZ systemd[1]: Starting NVIDIA fabric manager service...
Sep 09 18:05:58 iZ2xxx fZ nv-fabricmanager[36180]: fabric manager NVIDIA GPU driver interface version 550.90.07 don't match with driver version 550.54.15. Please up
Sep 09 18:05:58 iZ2xxx fZ nv-fabricmanager[36180]: fabric manager NVIDIA GPU driver interface version 550.90.07 don't match with driver version 550.54.15. Please up
Sep 09 18:05:58 iZ2xxx fZ systemd[1]: nvidia-fabricmanager.service: Control process exited, code=exited, status=1/FAILURE
Sep 09 18:05:58 ixxxd5fZ systemd[1]: nvidia-fabricmanager.service: Failed with result 'exit-code'.
Sep 09 18:05:58 iZ2xxx5fZ systemd[1]: Failed to start NVIDIA fabric manager service.

Cause

If you install nvidia-fabricmanager by using an installation package on a GPU-accelerated compute-optimized instance that runs Ubuntu, the apt-daily service automatically updates nvidia-fabricmanager. This results in version inconsistency between nvidia-fabricmanager and the Tesla driver. As a result, nvidia-fabricmanager fails to start and the GPU fails to work as expected.

Solution

The GPU can work as expected only if the nvidia-fabricmanager version is consistent with the Tesla driver version. To prevent or resolve GPU unavailability caused by version inconsistency between nvidia-fabricmanager and the Tesla driver, perform the following steps:

  1. Check the nvidia-fabricmanager version and the Tesla driver version.

    • Run the following command to check the nvidia-fabricmanager version:

      sudo dpkg --list |grep nvidia-fabricmanager

      In this example, the nvidia-fabricmanager version is 550.90.07. nvidia-fabricmanager-550 is the name of the installation package.

      ii  nvidia-fabricmanager-550                 550.90.07-1                              amd64        Fabric Manager for NVSwitch based systems.
    • Run the following command to check the Tesla driver version:

      nvidia-smi

      In this example, the Tesla driver version is 550.90.07.

      NVIDIA-SMI 550.90.07        Driver Version: 550.90.07    CUDA Version: 12.4
       GPU  Name           Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC
       Fan  Temp  Perf     Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M.
                                         |                      |               MIG M.
      ======================================================================================
         0  NVIDIA A10              On  | 00000000:00:07.0 Off |                    0
        0%   35C    P8        9W / 150W |      1MiB / 23028MiB |      0%    Default |
                                         |                      |               N/A |
      Processes:
        GPU   GI   CI        PID   Type   Process name            GPU Memory
              ID   ID                                             Usage
        No running processes found
  2. Check whether the current nvidia-fabricmanager version is consistent with the Tesla driver version.

    • If the two versions are consistent, proceed to the next step.

    • If the two versions are inconsistent, perform one of the following operations:

      • Upgrade the Tesla driver to ensure that the Tesla driver version is consistent with the nvidia-fabricmanager version. For more information, see Upgrade an NVIDIA Tesla driver.

      • Uninstall and reinstall nvidia-fabricmanager. Then, proceed to the next step.

        Note

        For information about how to uninstall nvidia-fabricmanager, see Step 1: Uninstall nvidia-fabricmanager.

  3. Run the following command to prevent nvidia-fabricmanager from being automatically updated:

    In this example, the installation package nvidia-fabricmanager-550 is used. Replace the installation package name in the command with the actual nvidia-fabricmanager package name.

    sudo apt-mark hold nvidia-fabricmanager-550 

    If the following result is displayed, nvidia-fabricmanager is prohibited from being updated.

    nvidia-fabricmanager-550 set on hold.
  4. Run the following command to verify that updates to nvidia-fabricmanager are prohibited:

    sudo apt-mark showhold

    If the cloud-init and nvidia-fabricmanager-550 information is displayed, updates to nvidia-fabricmanager are prohibited.