This topic describes common issues with GPU-accelerated instances and provides solutions.
What are the driver and CUDA versions for Function Compute GPU-accelerated instances?
What do I do if I encounter a CUFFT_INTERNAL_ERROR during execution?
How do I resolve a CUDA GPG error that occurs when I build an image?
Should the model be packaged in the image or separated from it?
How do I perform a model warm-up, and are there any best practices?
My function has high and fluctuating end-to-end latency. How do I handle this?
What are the driver and CUDA versions for Function Compute GPU-accelerated instances?
The component versions for GPU-accelerated instances are divided into two parts:
Driver version: This includes the kernel mode driver
nvidia.koand the CUDA user mode driverlibcuda.so. The driver for Function Compute GPU-accelerated instances is provided by NVIDIA and deployed by the Function Compute platform. The driver version for GPU-accelerated instances may change because of feature iterations, new card models, bug fixes, or driver lifecycle expiration. Avoid adding driver-specific content to your container image. For more information, see What do I do if the NVIDIA driver cannot be found?.CUDA Toolkit version: This includes CUDA Runtime, cuDNN, and cuFFT. You determine the CUDA Toolkit version when you build the container image.
The GPU driver and CUDA Toolkit are released by NVIDIA. They have a specific version correspondence. For more information, see the CUDA Toolkit Release Notes for the relevant version.
Function Compute currently uses driver version 580.95.05. The corresponding CUDA user mode driver version is 13.0. For the best compatibility, use a CUDA Toolkit version that is 11.8 or later, but not later than the CUDA user mode driver version provided by the platform.
What do I do if I encounter a CUFFT_INTERNAL_ERROR during execution?
The cuFFT library in CUDA 11.7 has a known forward compatibility issue that may cause this error on newer card models. To resolve this issue, upgrade to CUDA 11.8 or later. For more information about GPU card models, see Instance types and specifications.
For example, in PyTorch, you can use the following code snippet to verify the upgrade. If no error is reported, the upgrade is successful.
import torch
out = torch.fft.rfft(torch.randn(1000).cuda())How do I resolve a CUDA GPG error that occurs when I build an image?
A GPG error is reported during the image build. The error message is as follows.
W: GPG error: https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease' is not signed.Add the following script after the RUN rm command in your Dockerfile, and then rebuild the image.
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CCWhy is my GPU-accelerated instance type displayed as g1?
Setting the instance type to g1 is the same as setting it to fc.gpu.tesla.1. For more information, see Specifications.
Why does my instance fail to start?
An instance may fail to start for the following reasons:
Startup timeout
Error code: "FunctionNotStarted"
Error message: "Function instance health check failed on port XXX in 120 seconds"
Solution: Check the application startup logic for tasks, such as downloading models from the public network or loading large models that are over 10 GB. Start the web server first, and then load the models.
The maximum number of instances for the function or region is reached
Error code: "ResourceThrottled"
Error message: "Reserve resource exceeded limit"
Solution: By default, the maximum number of physical GPUs for a single Alibaba Cloud account in a region is 30. The actual value is subject to the information in Quota Center. If you require more physical GPUs, go to Quota Center to submit a request.
What do I do if elastic GPU instances cannot be created and a "ResourceExhausted" or "ResourceThrottled" error is reported?
GPU resources are limited. Fluctuations in the resource pool may prevent elastic GPU instances from being created in time to meet invocation requests. To ensure predictable resource delivery, configure a minimum number of instances for your function to reserve resources in advance. For more information, see Configure an elastic policy with a minimum number of instances.
What is the size limit for a GPU image?
The image size limit applies to the compressed image, not the uncompressed image. You can view the compressed image size in the Container Registry console. You can also run the docker images command locally to query the uncompressed image size.
Typically, an image that is smaller than 20 GB before compression can be deployed to and used in Function Compute.
What do I do if GPU image acceleration fails?
As the image size increases, the time required for accelerated image conversion also increases. This may cause the conversion to fail because of a timeout. To retrigger the conversion, edit and save the function configuration in the Function Compute console. You do not need to change any parameters.
Should the model be packaged in the image or separated from it?
If your model file is large, is updated frequently, or causes the image to exceed the platform's size limit, separate the model from the image. If you separate the model from the image, store the model in a NAS or OSS file system. For more information, see Best practices for model storage in GPU-accelerated instances.
How do I perform a model warm-up, and are there any best practices?
Perform a model warm-up in the /initialize method. The instance starts to receive production traffic only after the /initialize method is complete. For more information, see the following documents:
What do I do if a GPU image fails to start and reports "FunctionNotStarted: Function Instance health check failed on port xxx in 120 seconds"?
Cause: The AI/GPU application takes too long to start, which causes the health check to fail on the Function Compute (FC) platform. A common reason for a long startup time is that loading the model takes an excessive amount of time, which causes the web server to time out.
Solution:
Do not dynamically load models from the public network during application startup. For faster loading, place the models in the image or in a File Storage NAS file system.
Place the model initialization in the
/initializemethod. This allows the web server to start first, before the model is loaded.NoteFor more information about the instance lifecycle, see Configure the instance lifecycle.
My function has high and fluctuating end-to-end latency. How do I handle this?
First, confirm that the image acceleration status is `Available` in the environment context.
Confirm the type of the NAS file system. If your function needs to read data from a NAS file system, such as to read a model, use a compute-optimized General-purpose NAS file system for better performance. Do not use a storage-optimized file system. For more information, see General-purpose NAS file system.
What do I do if the NVIDIA driver cannot be found?
When you use the docker run --gpus all command to specify a container and then use docker commit to build the application image, the resulting image contains local NVIDIA driver information. This prevents the driver from being mounted correctly after the image is deployed to Function Compute. In this case, the system cannot find the NVIDIA driver.
To resolve this issue, use a Dockerfile to build the application image. For more information, see dockerfile.
Additionally, do not add driver-related components to the image or make your application dependent on specific driver versions. For example, do not include libcuda.so in the image. This dynamic library provides the CUDA Driver API and is tightly coupled with the device's kernel driver version. If the dynamic library in the image does not match the host's kernel driver, your application may behave abnormally because of compatibility issues.
When a function instance is created, the Function Compute platform injects the user mode driver components into the container. These components match the driver version provided by the platform. This is also the behavior of GPU container virtualization technologies, such as NVIDIA Container Runtime. This behavior delegates driver-specific tasks to the platform resource provider to maximize the environmental adaptability of the GPU container image. The driver for Function Compute GPU-accelerated instances is provided by NVIDIA. The driver version may change because of feature iterations, new card models, bug fixes, or driver lifecycle expiration.
If you are already using a GPU container virtualization technology, such as NVIDIA Container Runtime, avoid using the docker commit command to create images. These images contain the injected driver components. When you use such an image on the Function Compute platform, a component version mismatch may cause undefined behavior, such as application exceptions.