In some scenarios, you may want to share a GPU among multiple inference tasks to improve GPU utilization. This topic describes how to use Arena to submit an inference task to use shared GPU resources.
Prerequisites
A Container Service for Kubernetes (ACK) Pro cluster is created and the Kubernetes version of the cluster is 1.18.8 or later. For more information, see Create an ACK Pro cluster.
The Arena client is installed and the Arena version is 0.5.0 or later. For more information, see Configure the Arena client.
Procedure
Run the following command to query the available GPU resources in the cluster:
arena top nodeExpected output:
NAME IPADDRESS ROLE STATUS GPU(Total) GPU(Allocated) cn-beijing.192.168.1.108 192.168.20.255 <none> Ready 0 0 cn-beijing.192.168.8.10 192.168.8.10 <none> Ready 0 0 cn-beijing.192.168.1.101 192.168.1.101 <none> Ready 1 0 cn-beijing.192.168.1.112 192.168.1.112 <none> Ready 1 0 cn-beijing.192.168.8.252 192.168.8.252 <none> Ready 1 0 --------------------------------------------------------------------------------------------------- Allocated/Total GPUs In Cluster: 0/3 (0.0%)The preceding output shows that the cluster has three GPUs, and the utilization of each GPU is 0.0%.
Use Arena to submit an inference task.
ImportantThis example submits a TensorFlow inference task. The training model was added to the Docker image during image creation.
If you have not added the model file to the image, you need to configure a shared NAS volume. For more information, see Configure a shared NAS volume.
Run the following command to submit an inference task:
arena serve tensorflow \ --name=mymnist2 \ --model-name=mnist \ --gpumemory=3 \ --gpucore=10 \ --image=registry.cn-beijing.aliyuncs.com/ai-samples/tensorflow:latest-gpu-mnist \ --model-path=/tfmodel/mnist \ --version-policy=specific:2 \ --data=mydata=/mnt/dataThe following table describes the parameters.
Parameter
Description
--nameThe name of the task.
--model-nameThe name of the model.
--gpumemoryThe amount of GPU memory to request in GiB. For example, a GPU has 8 GiB of memory. If the first task requests 3 GiB (
--gpumemory=3), 5 GiB of memory remains. If a second task requests 4 GiB (--gpumemory=4), both tasks can run on the same GPU.--gpucoreThe percentage of computing power to request. By default, a GPU has 100 units of computing power. For example, if the first task requests 10% of the computing power (
--gpucore=10), 90% of the computing power remains. If a second task requests 50% of the computing power (--gpucore=50), both tasks can run on the same GPU.--imageThe image that is used to run the task.
--model-pathThe path of the model in the container.
--version-policyThe model version. For example,
--version-policy=specific:2specifies that version2of the model is used. A folder named2must exist in the path specified by--model-path.--data=mydataThe directory where the volume is mounted. This example uses
/mnt/data.Run the following command to query all tasks:
arena serve listThe following is an example of the output:
NAME TYPE VERSION DESIRED AVAILABLE ADDRESS PORTS mymnist1 Tensorflow 202101162119 1 0 172.16.3.123 GRPC:8500,RESTFUL:8501 mymnist2 Tensorflow 202101191447 1 1 172.16.1.147 GRPC:8500,RESTFUL:8501Run the following command to query the details of the submitted task:
arena serve get mymnist2Expected output:
Name: mymnist2 Namespace: default Type: Tensorflow Version: 202101191447 Desired: 1 Available: 1 Age: 20m Address: 172.16.1.147 Port: GRPC:8500,RESTFUL:8501 GPUMemory(GiB): 3 Instances: NAME STATUS AGE READY RESTARTS GPU(Memory/GiB) NODE ---- ------ --- ----- -------- --------------- ---- mymnist2-202101191447-tensorflow-serving-7f64bf9749-mtnpc Running 20m 1/1 0 3 cn-beijing.192.168.1.112NoteIf the value of Desired equals the value of Available, the task is ready.
Optional: Run the following command to print task logs:
arena serve logs mymnist2 -t 10Note-t 10displays the last 10 lines of the log.The system returns output similar to the following:
2021-01-18 13:21:58.482985: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle. 2021-01-18 13:21:58.483673: I external/org_tensorflow/tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2500005000 Hz 2021-01-18 13:21:58.508734: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: /tfmodel/mnist/2 2021-01-18 13:21:58.513041: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 798017 microseconds. 2021-01-18 13:21:58.513263: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /tfmodel/mnist/2/assets.extra/tf_serving_warmup_requests 2021-01-18 13:21:58.513467: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: mnist2 version: 2} 2021-01-18 13:21:58.516620: I tensorflow_serving/model_servers/server.cc:371] Running gRPC ModelServer at 0.0.0.0:8500 ... [warn] getaddrinfo: address family for nodename not supported 2021-01-18 13:21:58.521317: I tensorflow_serving/model_servers/server.cc:391] Exporting HTTP/REST API at:localhost:8501 ... [evhttp_server.cc : 238] NET_LOG: Entering the event loop ...Deploy and verify the TensorFlow inference service.
Create a file named
tfserving-test-client.yamlthat contains the following content.cat <<EOF | kubectl create -f - kind: Pod apiVersion: v1 metadata: name: tfserving-test-client spec: containers: - name: test-client image: registry.cn-beijing.aliyuncs.com/ai-samples/tensorflow-serving-test-client:curl command: ["sleep","infinity"] imagePullPolicy: IfNotPresent EOFRun the following command to deploy a pod:
kubectl apply -f tfserving-test-client.yamlRun the following command to query the IP address and port of the service:
arena serve listThe output is similar to the following. The IP address of
mymnist2is 172.16.1.147, and the port is 8501.NAME TYPE VERSION DESIRED AVAILABLE ADDRESS PORTS mymnist1 Tensorflow 202101162119 1 0 172.16.3.123 GRPC:8500,RESTFUL:8501 mymnist2 Tensorflow 202101191447 1 1 172.16.1.147 GRPC:8500,RESTFUL:8501Run the following command to verify that the TensorFlow inference service is available.
kubectl exec -ti tfserving-test-client bash validate.sh 172.16.1.147 8501Expected output:
{ "predictions": [ [2.04608277e-05, 1.72721537e-09, 7.74099826e-05, 0.00364777911, 1.25222937e-06, 2.27521796e-05, 1.14668763e-08, 0.99597472, 3.68833389e-05, 0.000218785644] ] }The output indicates the following information:
The data requested in the
validate.shscript is a list of pixel values from an image in themnisttest dataset.The model predicts that the input data is 7 among all single-digit numbers from 0 to 9, with the highest probability of 0.99597472.