This topic describes how to use Arena to deploy a TensorFlow model as an inference
service.
Procedure
- Run the following command to query the GPU resources available in the cluster:
arena top node
Expected output:
NAME IPADDRESS ROLE STATUS GPU(Total) GPU(Allocated)
cn-beijing.192.168.0.100 192.168.0.100 <none> Ready 1 0
cn-beijing.192.168.0.101 192.168.0.101 <none> Ready 1 0
cn-beijing.192.168.0.99 192.168.0.99 <none> Ready 1 0
---------------------------------------------------------------------------------------------------
Allocated/Total GPUs of nodes which own resource nvidia.com/gpu In Cluster:
0/3 (0.0%)
The preceding output shows that the cluster has three GPU-accelerated nodes on which
you can deploy the model.
- Upload the model file to your Object Storage Service (OSS) bucket. For more information,
see Upload objects.
- Use the following YAML file to create a persistent volume (PV) and a persistent volume
claim (PVC):
apiVersion: v1
kind: PersistentVolume
metadata:
name: model-csi-pv
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: ossplugin.csi.alibabacloud.com
volumeHandle: model-csi-pv // The value must be the same as the name of the PV.
volumeAttributes:
bucket: "Your Bucket"
url: "Your oss url"
akId: "Your Access Key Id"
akSecret: "Your Access Key Secret"
otherOpts: "-o max_stat_cache_size=0 -o allow_other"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
- Run the following command to deploy the model by using
TensorFlow Serving
: arena serve tensorflow \
--name=bert-tfserving \
--model-name=chnsenticorp \
--gpus=1 \
--image=tensorflow/serving:1.15.0-gpu \
--data=model-pvc:/models \
--model-path=/models/tensorflow \
--version-policy=specific:1623831335
Expected output:
configmap/bert-tfserving-202106251556-tf-serving created
configmap/bert-tfserving-202106251556-tf-serving labeled
configmap/bert-tfserving-202106251556-tensorflow-serving-cm created
service/bert-tfserving-202106251556-tensorflow-serving created
deployment.apps/bert-tfserving-202106251556-tensorflow-serving created
INFO[0003] The Job bert-tfserving has been submitted successfully
INFO[0003] You can run `arena get bert-tfserving --type tf-serving` to check the job status
- Run the following command to query the deployment result in
TensorFlow Serving
: arena serve list
Expected output:
NAME TYPE VERSION DESIRED AVAILABLE ADDRESS PORTS
bert-tfserving Tensorflow 202106251556 1 1 172.16.95.171 GRPC:8500,RESTFUL:8501
- Run the following command to query the details about the inference service:
arena serve get bert-tfserving
Expected output:
Name: bert-tfserving
Namespace: inference
Type: Tensorflow
Version: 202106251556
Desired: 1
Available: 1
Age: 4m
Address: 172.16.95.171
Port: GRPC:8500,RESTFUL:8501
Instances:
NAME STATUS AGE READY RESTARTS NODE
---- ------ --- ----- -------- ----
bert-tfserving-202106251556-tensorflow-serving-8554d58d67-jd2z9 Running 4m 1/1 0 cn-beijing.192.168.0.88
The preceding output shows that the model is successfully deployed by using TensorFlow Serving
. Port 8500 is exposed for gRPC and port 8501 is exposed for HTTP.
- Configure an Internet-facing Ingress. For more information, see Create an Ingress.
Note By default, the inference service that is deployed by running the
arena serve tensorflow
command is assigned only a cluster IP address. Therefore, the service cannot be accessed
over the Internet. You must create an Ingress for the inference service based on the
following configurations:
- Set Namespace to inference.
- Set Service Port to 8501. This port is exposed for the RESTful API.
- After you create the Ingress, go to the Ingresses page and find the Ingress. The value in the Rules column contains the address of the Ingress.

- Run the following command to call the inference service by using the address of the
Ingress. For more information about
TensorFlow Serving
, see TensorFlow Serving API. curl "http://<Ingress address>"
Expected output:
{
"model_version_status": [
{
"version": "1623831335",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": ""
}
}
]
}
The output shows that the inference service is available, which indicates that the
inference service is successfully deployed.