This topic describes how to use GPU-based elastic container instances (ECIs). In the example, TensorFlow is used to recognize images. This feature is applicable to serverless Kubernetes clusters and virtual nodes in the clusters.
Background information
Alibaba Cloud Serverless Kubernetes supports GPU-based ECIs. Users can quickly run AI computing tasks in a serverless manner. This facilitates the operations and maintenance of the AI platform and improves the computing efficiency.
AI computing depends on GPU instances. However, building a GPU cluster environment is a complex task that involves purchasing GPU specifications, preparing machines, and installing drivers and the container environment. The serverless delivery mode of GPU resources demonstrates the core advantages of going serverless. This mode provides standardized and immediate resource supply capabilities. Users do not need to purchase machines or log on to nodes to install the GPU drivers. This simplifies the deployment of the AI platform and allows you to focus on the development and maintenance of AI models and applications rather than the infrastructure of the AI platform. GPU and CPU resources are ready to use and easy to obtain. Compared with the subscription billing method, the pay-as-you-go billing method reduces both the costs and resource waste.
You can create a GPU-mounted pod in Alibaba Cloud Serverless Kubernetes, use an annotation to specify the GPU type, and specify the type and the number of GPU instances in resource.limits. Each pod exclusively occupies the GPU. The costs of the GPU instances are the same as those for ECS GPU resources and no extra charges are incurred.
Prerequisites
A serverless Kubernetes cluster is created or a virtual node is created in a Kubernetes cluster. For more information, see Create a serverless Kubernetes cluster and Virtual nodes.

apiVersion: v1
kind: Pod
metadata:
name: tensorflow
annotations:
k8s.aliyun.com/eci-gpu-type : "P4"
spec:
containers:
- image: registry-vpc.cn-hangzhou.aliyuncs.com/ack-serverless/tensorflow
name: tensorflow
command:
- "sh"
- "-c"
- "python models/tutorials/image/imagenet/classify_image.py"
resources:
limits:
nvidia.com/gpu: "1"
restartPolicy: OnFailure
nodeName: virtual-kubelet