TensorFlow Serving uses Alibaba Cloud elastic computing resources (Elastic Compute Service (ECS) or EGS), Server Load Balancer, and Object Storage Service (OSS) to perform prediction for TensorFlow models.


Before running a model training task, make sure you have performed the following operations:

  • Create a container cluster that contains a certain number of elastic computing resources (ECS or EGS). For more information, see Create a container cluster.
  • Use the same account to create an OSS bucket to store data for model training.
  • Create data volumes for the preceding container cluster to mount the OSS bucket as a local directory to the container in which you want to run the training task. For more information, see Create a data volume.
  • Understand the basic concept and working process of TensorFlow Serving. For more information, see Serving a TensorFlow Model.


To simplify the model prediction, note the following conventions:

  • Create a folder using the model name in the root directory of the OSS.
  • Make sure that the frontend and backend ports of the Server Load Balancer are consistent with the application ports in model prediction.


  1. Store the prediction model in the shared storage.

    You can complete this by uploading the model folder in the OSS client.

    1. Create a folder using the model name in the root directory of the OSS bucket. In this example, use mnist.

    2. Upload the TensorFlow Serving model folder with the version number to mnist.

  2. Configure the Server Load Balancer listening port.
    1. On the Server Load Balancer console, click Create Server Load Balancer in the upper-right corner to create a Server Load Balancer instance for routing.

      In this example, create an Internet Server Load Balancer instance. You can select to create an Internet or intranet Server Load Balancer instance as per your needs.

      Select the same region as the used Container Service cluster because Server Load Balancer does not support cross-region deployment.
    2. Return to the Server Load Balancer console and name the created Server Load Balancer instance as TensorFlow-serving. Container Service will reference this Server Load Balancer instance by using this name.

      Click Instances in the left-side navigation pane. Select the region in which the Server Load Balancer instance resides. Edit the instance name and then click Confirm.

    3. Create a listening port.

      Click Manage at the right of the instance. Click Listeners in the left-side navigation pane. Click Add Listener > in the upper-right corner and then set the configurations. Select TCP as the frontend protocol and configure the port mapping as 8000:8000.

  3. Start model prediction.
    1. Log on to the Container Service console.
    2. Click Images and Templates > > Solutionsin the left-side navigation pane.
    3. Click Launch in Prediction.

    4. Configure the basic information for the model prediction task.

      • Cluster: Select the cluster in which the model prediction will be run.
      • Application Name: Name of the application used for running the standalone model prediction, which can be 1–64 characters long and contain numbers, English letters, and hyphens (-).
      • Framework: Select the framework used for model training, including TensorFlow Serving and customized image. Here select TensorFlow Serving.
      • Model Name: Here the model name must be the same as that of the model folder created in step 1.
      • Number of Instances: The number of TensorFlow Serving instances, which cannot exceed the number of nodes in the cluster.
      • GPUs Per Instance: Specify the number of GPUs in use. Setting this parameter to 0 indicates CPU, instead of GPU, is used for prediction.
      • Data Source: Select the data volume created in the cluster by the OSS bucket to store the prediction model. For how to create a data volume, see Create a data volume.
      • Load Balancer Instance: Select the Server Load Balancer instance created in step 2.
      • Load Balancer Port: Enter the port set in step 2.
  4. After completing the configurations, click OK.
  5. On the Application List page, select the cluster from the Cluster drop-down list and then click the name of the created application.
  6. Click the Routes tab. The terminal address provided by Server Load Balancer is displayed. Then, you can use the gRPC client to access the Server Load Balancer address:Server Load Balancer port.