FAQ about GPU-accelerated instances - Function Compute (2.0)

This topic provides answers to some commonly asked questions about GPU-accelerated instances.

What is the version of the driver used by GPU-accelerated instances in Function Compute?
What is the CUDA version of GPU-accelerated instances in Function Compute?
What do I do if a CUDA GPG error occurs when I build an image?
Why is the type of my GPU-accelerated instance g1?
Why do my provisioned GPU-accelerated instances fail to be allocated?
What is the limit on the size of a GPU image?
What do I do if a GPU image fails to be converted to an accelerated image?
Should a model be integrated into or separated from an image?
How do I perform a model warm-up?
What do I do if the "[FunctionNotStarted] Function Instance health check failed on port xxx in 120 seconds" error is reported when I start a GPU image?
What do I do if the end-to-end latency of my function is large and fluctuates greatly?
What do I do if the system fails to find the NVIDIA driver?

What is the version of the driver used by GPU-accelerated instances of Function Compute?

The current driver version is 535.129.03.

NVIDIA provides the drivers that are used by GPU-accelerated instances of Function Compute. The driver version used by GPU-accelerated instances may change in the future as a result of feature iterations, releases of new card models, bug fixes, and driver lifecycle expiration. We recommend that you do not specify a specific driver version in container images. For more information, see Image usage notes.

What is the CUDA version of GPU-accelerated instances of Function Compute?

The CUDA version varies based on the container image that you use. We recommend that you use CUDA 11.x or later in Function Compute.

What do I do if a CUDA GPG error is reported when I build an image?

The following GPG error is reported during the image building process:

W: GPG error: https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease' is not signed.

In this case, you can add the following script to the RUN rm command line of the Dockerfile file and rebuild your image.

RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC

Why is the instance type of my GPU-accelerated instance g1?

The g1 instance type is the same as fc.gpu.tesla.1. For more information, see the "Instance types" section of the Instance types and usage modes topic.

Why do my provisioned GPU-accelerated instances fail to be allocated?

The allocation of provisioned instances may fail due to the following reasons:

The startup of the provisioned instances times out.
- Error code: FunctionNotStarted.
- Error message: Function instance health check failed on port XXX in 120 seconds.
- Solution: View the application startup logic to check whether the logic of downloading models from the Internet and loading large models (larger than 10 GB) exists. We recommend that you start the web server before you run the model loading logic.
The maximum number of instances at the function level or region level is reached.
- Error code: ResourceThrottled.
- Error message: Reserve resource exceeded limit.
- Solution: If you have higher requirements on physical cards, join the DingTalk group 11721331 for technical support.

What is the limit on the size of a GPU image?

The image size limit applies only to compressed images. You can view the size of a compressed image in the Container Registry console. You can also run the docker image command to query the size of a compressed image.

In most cases, an uncompressed image that is smaller than 20 GB in size can be deployed to Function Compute and used as expected.

What do I do if a GPU image fails to be converted to an accelerated image?

The time required for image acceleration conversion increases as the size of your image grows. This may cause a conversion failure. You can configure and save function configurations in the Function Compute console to trigger the conversion of the image again without the need to modify parameters.

Should a model be integrated into or separated from an image?

We recommend that you integrate a model into an image. This way, the model can reuse image cache to accelerate distribution without generating additional storage costs.

If the model cannot be integrated into the image due to reasons such as an oversized model (larger than 5 GB), we recommend that you store the model in Apsara File Storage NAS (NAS) and load the model when the application starts. We recommend that you use a performance general-purpose NAS file system instead of a capacity NAS file system. For more information, see General-purpose NAS file systems.

How do I perform a model warm-up?

We recommend that you warm up a model in the /initialize method. The model is connected to production traffic only after the /initialize method is completed. You can refer to the following topics to learn more about model warm-up:

What do I do if the [FunctionNotStarted] Function Instance health check failed on port xxx in 120 seconds error is reported when I start a GPU image?

Cause: The AI/GPU application takes too long to start. As a result, the health check of Function Compute fails. A common reason an AI/GPU applications take too long to start is that it takes too long to load the model, which causes the startup of the web server to time out.
Solution:
- Do not dynamically load the model from the Internet when the application starts. We recommend that you place the model in an image or in a NAS file system and load the model from the nearest path.
- Place model initialization in the /initialize method to preferentially start the application. That is, load the mode after the web server is started.
  Note
  For more information about the lifecycle of a function instance, see Function instance lifecycle.

What do I do if the end-to-end latency of my function is large and fluctuates greatly?

Make sure that the state of image acceleration is Available in the environment information.
Check the type of the NAS file system. If your function needs to read data, such as a model, from a NAS file system, we recommend that you use a performance general-purpose NAS file system, instead of a capacity NAS file system, to ensure the performance. For more information, see General-purpose NAS file systems.

What do I do if the system fails to find the NVIDIA driver?

This issue occurs if you run the docker run --gpus all command to specify a container and use the docker commit method to build an application image. On-premises NVIDIA information is contained in the built image and the driver cannot be mounted after the image is deployed to Function Compute. The system cannot find the NVIDIA driver.

To resolve the issue, we recommend that you use Dockerfile to build an application image. For more information, see Dockerfile.

Do not specify a specific driver version in a container image. For more information, see Image usage notes.