This topic describes how to use Arena to deploy a TensorFlow model as an inference service.



  1. Run the following command to query the GPU resources available in the cluster:
    arena top node

    Expected output:

    NAME                      IPADDRESS      ROLE    STATUS  GPU(Total)  GPU(Allocated)
    cn-beijing.  <none>  Ready   1           0
    cn-beijing.  <none>  Ready   1           0
    cn-beijing.   <none>  Ready   1           0
    Allocated/Total GPUs of nodes which own resource In Cluster:
    0/3 (0.0%)

    The preceding output shows that the cluster has three GPU-accelerated nodes on which you can deploy the model.

  2. Upload the model file to your Object Storage Service (OSS) bucket. For more information, see Upload objects.
  3. Use the following YAML file to create a persistent volume (PV) and a persistent volume claim (PVC):
    apiVersion: v1
    kind: PersistentVolume
      name: model-csi-pv
        storage: 5Gi
        - ReadWriteMany
      persistentVolumeReclaimPolicy: Retain
        volumeHandle: model-csi-pv   // The value must be the same as the name of the PV. 
          bucket: "Your Bucket"
          url: "Your oss url"
          akId: "Your Access Key Id"
          akSecret: "Your Access Key Secret"
          otherOpts: "-o max_stat_cache_size=0 -o allow_other"
    apiVersion: v1
    kind: PersistentVolumeClaim
      name: model-pvc
      - ReadWriteMany
          storage: 5Gi
  4. Run the following command to deploy the model by using TensorFlow Serving:
    arena serve tensorflow \
      --name=bert-tfserving \
      --model-name=chnsenticorp  \
      --gpus=1  \
      --image=tensorflow/serving:1.15.0-gpu \
      --data=model-pvc:/models \
      --model-path=/models/tensorflow \

    Expected output:

    configmap/bert-tfserving-202106251556-tf-serving created
    configmap/bert-tfserving-202106251556-tf-serving labeled
    configmap/bert-tfserving-202106251556-tensorflow-serving-cm created
    service/bert-tfserving-202106251556-tensorflow-serving created
    deployment.apps/bert-tfserving-202106251556-tensorflow-serving created
    INFO[0003] The Job bert-tfserving has been submitted successfully
    INFO[0003] You can run `arena get bert-tfserving --type tf-serving` to check the job status
  5. Run the following command to query the deployment result in TensorFlow Serving:
    arena serve list

    Expected output:

    NAME            TYPE        VERSION       DESIRED  AVAILABLE  ADDRESS        PORTS
    bert-tfserving  Tensorflow  202106251556  1        1  GRPC:8500,RESTFUL:8501
  6. Run the following command to query the details about the inference service:
    arena serve get bert-tfserving

    Expected output:

    Name:       bert-tfserving
    Namespace:  inference
    Type:       Tensorflow
    Version:    202106251556
    Desired:    1
    Available:  1
    Age:        4m
    Port:       GRPC:8500,RESTFUL:8501
      NAME                                                             STATUS   AGE  READY  RESTARTS  NODE
      ----                                                             ------   ---  -----  --------  ----
      bert-tfserving-202106251556-tensorflow-serving-8554d58d67-jd2z9  Running  4m   1/1    0         cn-beijing.

    The preceding output shows that the model is successfully deployed by using TensorFlow Serving. Port 8500 is exposed for gRPC and port 8501 is exposed for HTTP.

  7. Configure an Internet-facing Ingress. For more information, see Create an Ingress.
    Note By default, the inference service that is deployed by running the arena serve tensorflow command is assigned only a cluster IP address. Therefore, the service cannot be accessed over the Internet. You must create an Ingress for the inference service based on the following configurations:
    • Set Namespace to inference.
    • Set Service Port to 8501. This port is exposed for the RESTful API.
  8. After you create the Ingress, go to the Ingresses page and find the Ingress. The value in the Rules column contains the address of the Ingress. 12
  9. Run the following command to call the inference service by using the address of the Ingress. For more information about TensorFlow Serving, see TensorFlow Serving API.
    curl "http://<Ingress address>"

    Expected output:

     "model_version_status": [
       "version": "1623831335",
       "state": "AVAILABLE",
       "status": {
        "error_code": "OK",
        "error_message": ""

    The output shows that the inference service is available, which indicates that the inference service is successfully deployed.