All Products
Search
Document Center

Elastic GPU Service:What do I do if Persistence Mode that I enabled does not take effect and the ECC status or the MIG feature fails to be configured after a GPU-accelerated instance is restarted?

Last Updated:Dec 11, 2024

This topic describes how to solve the issues that Persistence Mode (Persistence-M) that you enabled by running the nvidia-smi -pm 1 command does not take effect and the elastic compute container (ECC) status or the multi-instance GPU (MIG) feature fails to be configured after you restart a GPU-accelerated compute-optimized instance on which you install a later version of the Tesla driver, such as 535 or later.

Problem description

You install the Tesla driver of version 535 or later on a GPU-accelerated compute-optimized Linux instance and then run the nvidia-smi -pm 1 command to enable Persistence Mode. As a result, the following issues may occur:

  • After you restart the GPU-accelerated compute-optimized instance, Persistence Mode is in the Off state, which indicates that Persistence Mode is disabled.

  • The ECC status fails to be configured.

  • The MIG feature fails to be configured.

Cause

The version of the Tesla driver is not compatible with the instance. When you run the nvidia-smi -pm 1 command to enable Persistence Mode and restart the GPU-accelerated compute-optimized instance, the preceding issues may occur.

Solution

If dmesg logs contain the following information, enable Persistence Mode by using the NVIDIA Persistence Daemon. For more information, see the Enable Persistence Mode by using the NVIDIA Persistence Daemon step in the "Step 2: Install the Tesla driver" section of the "Manually install the Tesla driver on a GPU-accelerated compute-optimized Linux instance" topic.

NVRM: Persistence mode is deprecated and will be removed in a future release. Please use nvidia-persistenced instead.