This topic describes common issues you may encounter when using GPU-accelerated instances and provides solutions.
-
What are the driver and CUDA versions for GPU-accelerated instances?
-
How do I resolve a CUDA GPG error when building a container image?
-
Should I package the model in the image or store it separately?
-
Why does my GPU container image fail to start with a health check error?
-
My function has high and fluctuating end-to-end latency. How do I fix this?
Driver and CUDA
What are the driver and CUDA versions for GPU-accelerated instances?
GPU-accelerated instances have two version components:
-
Driver version: The kernel mode driver (
nvidia.ko) and the CUDA user mode driver (libcuda.so). NVIDIA provides the drivers and the Function Compute platform deploys them. Do not include driver-related content in your container image. -
CUDA Toolkit version: Includes CUDA Runtime, cuDNN, and cuFFT. Choose this version when you build your container image.
The GPU driver and CUDA Toolkit have specific version dependencies. For details, see the CUDA Toolkit Release Notes.
Function Compute GPU-accelerated instances currently use driver version 580.95.05, which corresponds to CUDA user mode driver version 13.0. For optimal compatibility, use a minimum CUDA Toolkit version of 11.8. The CUDA Toolkit version must not exceed the CUDA user mode driver version provided by the platform.
The driver version may change due to feature iterations, new card models, bug fixes, or driver lifecycle expiration.
What do I do if the NVIDIA driver cannot be found?
This happens when you build an application image with docker run --gpus all followed by docker commit. The resulting image contains local NVIDIA driver information, which prevents the driver from mounting correctly on Function Compute.
Build your application image with a Dockerfile instead of docker commit. Do not add driver-related components to the image or make your application dependent on a specific driver version. For example, do not include libcuda.so (the CUDA Driver API library) in your image. This library is strongly tied to the device kernel driver version. A mismatch between the library in the image and the host environment causes compatibility issues.
When a function instance starts, the Function Compute platform injects user mode driver components into the container. These components match the driver version provided by the platform. This is the same behavior as GPU container virtualization technologies such as NVIDIA Container Runtime, which delegates driver-specific tasks to the platform to maximize the environmental adaptability of the GPU container image.
If you already use GPU container virtualization technologies such as NVIDIA Container Runtime, do not create images with docker commit. Such images contain injected driver components. A version mismatch with the platform's components may cause undefined behavior.
What do I do if I encounter a CUFFT_INTERNAL_ERROR?
The cuFFT library in CUDA 11.7 has a known forward compatibility issue that can cause this error on newer GPU card models. Upgrade to at least CUDA 11.8. For more information about GPU card models, see Specifications.
Verify the fix in PyTorch with the following snippet. If no error occurs, the upgrade was successful.
import torch
out = torch.fft.rfft(torch.randn(1000).cuda())
How do I resolve a CUDA GPG error when building a container image?
You may encounter this GPG error during image builds:
W: GPG error: https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease' is not signed.
Add the following command after the RUN rm command in your Dockerfile, then rebuild the image:
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC
Container images and models
What is the size limit for a GPU container image?
The size limit applies to the compressed image, not the uncompressed image. Check the compressed size in the Alibaba Cloud Container Registry console. Run docker images locally to check the uncompressed size.
An image with an uncompressed size under 20 GB can typically be deployed and used normally.
What do I do if image acceleration fails?
Image acceleration takes longer for larger images and may time out. Re-trigger the accelerated image conversion by editing and saving the function configuration in the Function Compute console. No parameter changes are needed.
Should I package the model in the image or store it separately?
Separate the model from the image when:
-
The model file is large.
-
The model iterates frequently.
-
The model exceeds the platform's image size limit.
Store separated models in File Storage NAS or Object Storage Service (OSS). For more information, see Best practices for model storage on GPU-accelerated instances.
Startup and performance
Why does my GPU container image fail to start with a health check error?
If you see the error [FunctionNotStarted] Function Instance health check failed on port xxx in 120 seconds, the AI or GPU application is taking too long to start. The most common cause is that model loading delays the web server from responding to health checks in time.
To fix this:
-
Do not dynamically load models from the public network during startup. Place the model in the image or in File Storage NAS.
-
Place model initialization in the
/initializemethod. This allows the web server to start and respond to health checks before the model finishes loading.
For more information about the function instance lifecycle, see Function instance lifecycle.
How do I perform model warm-up?
Perform model warm-up in the /initialize method. The instance starts receiving production traffic only after the /initialize method completes. For more information, see:
My function has high and fluctuating end-to-end latency. How do I fix this?
-
Confirm that image acceleration is active in the environment context.
-
Check your NAS file system type. If your function reads data from a NAS file system (such as a model), use a compute-optimized General-purpose NAS file system for better performance. Do not use a storage-optimized file system. For more information, see General-purpose NAS.
Instance types and quotas
Why is my GPU-accelerated instance type displayed as g1?
The instance type g1 is equivalent to fc.gpu.tesla.1. For more information, see Specifications.
Why did provisioning fail for my GPU-accelerated instance?
Provisioning can fail for the following reasons:
Startup timeout
-
Error code:
FunctionNotStarted -
Error message: "Function instance health check failed on port XXX in 120 seconds"
Check your application startup logic. Look for logic that downloads models from the public network or loads Large Language Models (LLMs) larger than 10 GB. Start the web server first, then load the model. See Why does my GPU container image fail to start with a health check error? for detailed solutions.
Instance quota reached
-
Error code:
ResourceThrottled -
Error message: "Reserve resource exceeded limit"
The default quota for physical GPU cards is 30 per Alibaba Cloud account per region. Check your current quota in the Quota Center. To request more physical cards, submit a request in the Quota Center.
What do I do if on-demand GPU instance creation fails with ResourceExhausted or ResourceThrottled errors?
GPU resources are scarce. On-demand calls are affected by fluctuations in the resource pool, which may prevent instances from being created in time. For predictable resource delivery, configure scaling rules for your function to reserve resources. For billing details, see Billing overview.