After training a TensorFlow model, you need to serve it as a network-accessible API for applications to call. This guide shows you how to use Arena to deploy a TensorFlow SavedModel as a TensorFlow Serving inference service on an ACK cluster, covering model upload to OSS, persistent storage configuration, serving instance launch, and external access through an Ingress.
Prerequisites
Before you begin, ensure that you have:
This guide uses a BERT model trained with TensorFlow 1.15 and exported as a SavedModel.
Prepare model storage
Step 1: Check available GPU resources
Run the following command to verify GPU availability in the cluster:
arena top nodeThe output lists all GPU nodes and their allocation status:
NAME IPADDRESS ROLE STATUS GPU(Total) GPU(Allocated)
cn-beijing.192.168.0.100 192.168.0.100 <none> Ready 1 0
cn-beijing.192.168.0.101 192.168.0.101 <none> Ready 1 0
cn-beijing.192.168.0.99 192.168.0.99 <none> Ready 1 0
---------------------------------------------------------------------------------------------------
Allocated/Total GPUs of nodes which own resource nvidia.com/gpu In Cluster:
0/3 (0.0%)The cluster has three GPU nodes, each with one unallocated GPU.
Step 2: Upload the model to OSS
The following steps use ossutil on Linux. For other operating systems, see ossutil.
Create a bucket named
examplebucket:ossutil64 mb oss://examplebucketThe following output confirms the bucket is created:
0.668238(s) elapsedUpload the SavedModel to the bucket:
ossutil64 cp model.savedmodel oss://examplebucket
Step 3: Create a persistent volume and persistent volume claim
To mount the OSS bucket as a volume inside the serving container, create a persistent volume (PV) and a persistent volume claim (PVC).
Create a file named
Tensorflow.yamlwith the following content:apiVersion: v1 kind: PersistentVolume metadata: name: model-csi-pv spec: capacity: storage: 5Gi accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain csi: driver: ossplugin.csi.alibabacloud.com volumeHandle: model-csi-pv # Must match the PV name above volumeAttributes: bucket: "Your Bucket" url: "Your oss url" akId: "Your Access Key Id" akSecret: "Your Access Key Secret" otherOpts: "-o max_stat_cache_size=0 -o allow_other" --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: model-pvc spec: accessModes: - ReadWriteMany resources: requests: storage: 5GiReplace the following parameters:
Parameter Description bucketThe name of the OSS bucket. For naming rules, see Bucket naming conventions. urlThe endpoint URL for the bucket. For instructions on getting the URL, see Obtain the URL of a single object or the URLs of multiple objects. akIdThe AccessKey ID used to access the OSS bucket. Use a Resource Access Management (RAM) user's AccessKey. For details, see Create an AccessKey pair. akSecretThe AccessKey secret paired with the AccessKey ID above. otherOptsCustom mount options for the OSS bucket. -o max_stat_cache_size=0disables metadata caching so the system always reads the latest object metadata from OSS.-o allow_otherlets other users on the node access the mounted bucket. For additional options, see Custom parameters supported by ossfs.Apply the manifest to create the PV and PVC:
kubectl apply -f Tensorflow.yaml
Deploy the inference service
Step 4: Launch TensorFlow Serving
Run the following command to deploy a TensorFlow Serving instance named bert-tfserving:
arena serve tensorflow \
--name=bert-tfserving \
--model-name=chnsenticorp \
--gpus=1 \
--image=tensorflow/serving:1.15.0-gpu \
--data=model-pvc:/models \
--model-path=/models/tensorflow \
--version-policy=specific:1623831335| Parameter | Description |
|---|---|
--name | The name of the serving job. |
--model-name | The model name TensorFlow Serving uses to identify the model in API requests. |
--gpus | The number of GPUs to allocate to the serving instance. |
--image | The TensorFlow Serving container image. Must match the TensorFlow version used during training. |
--data | Mounts the PVC into the container. Format: <pvc-name>:<mount-path>. |
--model-path | The path inside the container where the model is stored. |
--version-policy | The model version to load. specific:<version> pins serving to a single version. |
The following output confirms the job is submitted:
configmap/bert-tfserving-202106251556-tf-serving created
configmap/bert-tfserving-202106251556-tf-serving labeled
configmap/bert-tfserving-202106251556-tensorflow-serving-cm created
service/bert-tfserving-202106251556-tensorflow-serving created
deployment.apps/bert-tfserving-202106251556-tensorflow-serving created
INFO[0003] The Job bert-tfserving has been submitted successfully
INFO[0003] You can run `arena get bert-tfserving --type tf-serving` to check the job statusStep 5: Verify the service is running
List all running inference services:
arena serve listThe output shows the bert-tfserving service with its address and ports:
NAME TYPE VERSION DESIRED AVAILABLE ADDRESS PORTS
bert-tfserving Tensorflow 202106251556 1 1 172.16.95.171 GRPC:8500,RESTFUL:8501Get the full details of the service:
arena serve get bert-tfservingName: bert-tfserving
Namespace: inference
Type: Tensorflow
Version: 202106251556
Desired: 1
Available: 1
Age: 4m
Address: 172.16.95.171
Port: GRPC:8500,RESTFUL:8501
Instances:
NAME STATUS AGE READY RESTARTS NODE
---- ------ --- ----- -------- ----
bert-tfserving-202106251556-tensorflow-serving-8554d58d67-jd2z9 Running 4m 1/1 0 cn-beijing.192.168.0.88The service is deployed in the inference namespace. Port 8500 serves gRPC requests and port 8501 serves HTTP/RESTful requests.
Access the service externally
arena serve tensorflow assigns a cluster IP by default, which is only reachable from within the cluster. Create an Ingress to expose the service for external access.
Step 6: Create an Ingress
In the ACK console, go to the Clusters page, click the target cluster, and navigate to Network > Ingresses in the left-side navigation pane.
From the Namespace drop-down list at the top of the page, select the
inferencenamespace (the same namespace shown in the service details above).Click Create Ingress in the upper-right corner. For a full description of Ingress parameters, see Create an NGINX Ingress. Use the following settings:
Name:
TensorflowRules:
Domain name: Enter a custom domain, for example
test.example.comPath:
/Rule: ImplementationSpecific (default)
Service name: The service name returned by
kubectl get servicePort:
8501
After the Ingress is created, return to the Ingresses page. The Rules column shows the Ingress address.

Step 7: Call the inference API
Run the following command to call the API of the inference service. For more information about TensorFlow Serving, see TensorFlow Serving API.
curl "http://<Ingress address>"A successful response looks like this:
{
"model_version_status": [
{
"version": "1623831335",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": ""
}
}
]
}The state: AVAILABLE field confirms the model is loaded and ready to serve inference requests.