All Products
Search
Document Center

Platform For AI:Deploy model services with TensorFlow Serving

Last Updated:Apr 02, 2026

Deploy TensorFlow Serving model services using SavedModel format, with built-in version management and rolling updates.

Prerequisites

Before you begin, make sure you have:

  • An OSS bucket for storing model files

  • A trained model exported in SavedModel format

  • Access to the PAI console

Prepare model files

Directory structure

Store model files in OSS using this directory layout. Version directories must be named with integers — higher numbers indicate newer versions. TensorFlow Serving automatically loads from the highest-numbered version directory.

tf_serving/
├── modelA/
│   └── 1/
│       ├── saved_model.pb
│       └── variables/
│           ├── variables.data-00000-of-00001
│           └── variables.index
├── modelB/
│   ├── 1/
│   │   └── ...
│   └── 2/
│       └── ...
└── modelC/
    ├── 1/
    │   └── ...
    ├── 2/
    │   └── ...
    └── 3/
        └── ...

Steps:

  1. Create a model directory in your OSS bucket, for example oss://examplebucket/models/tf_serving/. For details, see Manage directories.

  2. Upload your SavedModel files into the directory. Download the sample TensorFlow Serving model file to follow along with the examples in this document.

Model configuration file (multi-model deployments)

Skip this section for single-model deployments.

To run multiple models in a single service, create a model_config.pbtxt file and upload it to OSS alongside your model directories. The sample archive includes a model_config.pbtxt you can use directly or adapt.

Example `model_config.pbtxt`:

model_config_list {
  config {
    name: 'modelA'
    base_path: '/models/modelA/'
    model_platform: 'tensorflow'
    model_version_policy {
      all: {}
    }
  }
  config {
    name: 'modelB'
    base_path: '/models/modelB/'
    model_platform: 'tensorflow'
    model_version_policy {
      specific {
        versions: 1
        versions: 2
      }
    }
    version_labels {
      key: 'stable'
      value: 1
    }
    version_labels {
      key: 'canary'
      value: 2
    }
  }
  config {
    name: 'modelC'
    base_path: '/models/modelC/'
    model_platform: 'tensorflow'
    model_version_policy {
      latest {
        num_versions: 2
      }
    }
  }
}

Key parameters:

Parameter Required Description
name No Custom name for the model. Without this, model_name is empty and the service cannot be invoked.
base_path Yes Path to the model directory inside the service instance. For example, if the mount path is /models and you want to load modelA, set this to /models/modelA.
model_version_policy No Which versions to load. Without configuration, only the latest version loads. Options: all{} (all versions), latest{ num_versions: N } (N most recent versions), specific{ versions: N } (named versions).
version_labels No Named aliases for specific versions. Use version labels to implement canary releases and rollbacks without changing request paths. See Canary releases and rollbacks with version labels.

Canary releases and rollbacks with version labels

Version labels let you route traffic to a specific version by name (stable, canary) rather than version number. This is the standard approach for rolling out a new version to a subset of traffic before promoting it fully.

Typical workflow:

  1. Start with stable pointing to version 1 and canary pointing to version 2:

    version_labels { key: 'stable'  value: 1 }
    version_labels { key: 'canary'  value: 2 }
  2. After validating version 2, promote stable to version 2:

    version_labels { key: 'stable'  value: 2 }
    version_labels { key: 'canary'  value: 2 }
  3. To roll back, revert stable to version 1:

    version_labels { key: 'stable'  value: 1 }
    version_labels { key: 'canary'  value: 2 }

Request path with version labels: <service_url>/v1/models/<model_name>/labels/<version_label>:predict

By default, version labels can only be assigned to already-loaded versions. This prevents requests from failing while a new version is still loading. To assign a label to a version before it finishes loading, add --allow_version_labels_for_unavailable_models=true to the startup command. This option is only available in custom model deployment, not scenario-based deployment.

Deploy the service

TensorFlow Serving exposes two ports by default:

  • 8501 — HTTP/REST requests

  • 8500 — gRPC requests

Choose a deployment method based on your needs:

Scenario-based deployment Custom deployment
Best for Standard use cases, quick setup Advanced configuration, gRPC support, custom ports
Port 8501 (fixed) 8501 or 8500 (configurable)
gRPC Not supported Supported
Startup parameters Not configurable Fully configurable

Scenario-based deployment

  1. Log in to the PAI console. Select a region, choose your workspace, and click Elastic Algorithm Service (EAS).

  2. Click Deploy Service. In the Scenario-based Model Deployment section, click Tensorflow Serving Deployment.

  3. On the Tensorflow Serving Deployment page, configure the following parameters. For other parameters, see Custom Deployment.

    Parameter Description
    Deployment Method Standard Model Deployment for a single model, or Configuration File Deployment for multiple models.
    Model Settings For Standard Model Deployment, set the OSS path to the model directory. For Configuration File Deployment, set the OSS model directory path, the Mount Path inside the instance, and the OSS path to model_config.pbtxt.

    Example configurations:

    Parameter Single-model example (modelA) Multi-model example
    Service Name modela_scene multi_scene
    Deployment Method Standard Model Deployment Configuration File Deployment
    OSS oss://examplebucket/models/tf_serving/modelA/ oss://examplebucket/models/tf_serving/
    Mount Path /models
    Configuration File oss://examplebucket/models/tf_serving/model_config.pbtxt
  4. Click Deploy.

Custom deployment

Use custom deployment to configure gRPC support, startup parameters, or advanced options such as configuration file polling.

  1. Log in to the PAI console. Select a region, choose your workspace, and click Elastic Algorithm Service (EAS).

  2. Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

  3. On the Custom Deployment page, configure the following parameters. For other parameters, see Custom Deployment.

    Parameter Description
    Image Configuration Under Alibaba Cloud Image, select tensorflow-serving and the image version. For GPU workloads, choose an image tagged x.xx.x-gpu.
    Model Settings Set the model type to OSS. Configure Uri (OSS path to your model files) and Mount Path (destination path inside the instance).
    Command to Run Startup command for tensorflow-serving. The entry point /usr/bin/tf_serving_entrypoint.sh is pre-filled when you select the TensorFlow Serving image. Append the parameters below.

    Startup parameters:

    Parameter Required Description
    --model_name No (single-model) Model name used in the request URL. Defaults to model if not set.
    --model_base_path No (single-model) Path to the model directory inside the instance. Defaults to /models/model.
    --model_config_file Yes (multi-model) Path to the model configuration file inside the instance.
    --model_config_file_poll_wait_seconds No (multi-model) How often (in seconds) the service re-reads the configuration file. Set this to update model configuration without restarting the service. For example, --model_config_file_poll_wait_seconds=30 re-reads the file every 30 seconds. When the service detects a new configuration, it applies only the changes — unloading removed models and loading added ones.
    --allow_version_labels_for_unavailable_models No Allows assigning version labels to versions that are not yet loaded. Defaults to false to prevent traffic loss while a version is loading. Set to true only when you need to pre-assign labels before a new version finishes loading.

    Example configurations:

    Parameter Single-model example (modelA) Multi-model example
    Deployment Method Image-based Deployment
    Image Configuration tensorflow-serving > tensorflow-serving:2.14.1
    Uri oss://examplebucket/models/tf_serving/
    Mount Path /models
    Command to Run /usr/bin/tf_serving_entrypoint.sh --model_name=modelA --model_base_path=/models/modelA /usr/bin/tf_serving_entrypoint.sh --model_config_file=/models/model_config.pbtxt --model_config_file_poll_wait_seconds=30 --allow_version_labels_for_unavailable_models=true
  4. To enable gRPC (port 8500), perform these additional steps in the Environment Information section:

    • Set Port Number to 8500.

    • Turn on Enable gRPC.

    • In Service Configurations, add: ``json "networking": { "path": "/" } ``

  5. Click Deploy.

Send requests

TensorFlow Serving supports HTTP and gRPC requests. The examples below use modelA, an image classification model trained on Fashion-MNIST (28×28 grayscale images, 10 classes). The test input is [[[[1.0]] * 28] * 28].

HTTP requests (port 8501)

All HTTP requests use the format:

POST <service_url>/v1/models/<model_name>:predict

For multi-model deployments, you can also target a specific version or label:

Request type Path
Latest version <service_url>/v1/models/<model_name>:predict
Specific version <service_url>/v1/models/<model_name>/versions/<version_num>:predict
Version label <service_url>/v1/models/<model_name>/labels/<version_label>:predict

The <model_name> value depends on how you deployed:

  • Scenario-based, single model: always model

  • Custom deployment, single model: the value of --model_name (defaults to model if not set)

  • Multi-model: the name field in model_config.pbtxt

To find <service_url>, go to the Elastic Algorithm Service (EAS) page and click Invocation Information in the Service Type column for your service.

Test with online debugging

After deployment, click Online Debugging in the Actions column of your service. The Request Parameter Online Tuning panel pre-fills <service_url>. Append the path /v1/models/model:predict and set the Body to:

{"signature_name": "serving_default", "instances": [[[[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]]]]}

Click Send Request. The response shows the class probabilities for your input.

image

Send an HTTP request using Python

from urllib import request
import json

# Replace with your service endpoint and token.
# Find them on the Shared Gateway - Internet Endpoint tab of the Call Info page.
service_url = '<service_url>'
token = '<token>'

# For a scenario-based single-model deployment, the model name is 'model'.
# For other cases, see the path format table above.
model_name = 'model'
url = '{}/v1/models/{}:predict'.format(service_url, model_name)

# Build the request.
req = request.Request(url, method='POST')
req.add_header('authorization', token)
data = {
    'signature_name': 'serving_default',
    'instances': [[[[1.0]] * 28] * 28]
}

# Send the request and print the response.
response = request.urlopen(req, data=json.dumps(data).encode('utf-8')).read()
print(json.loads(response))

gRPC requests (port 8500)

gRPC support requires custom deployment with port 8500 and Enable gRPC turned on.

import grpc
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

# Service endpoint: omit "http://" and append ":80".
# Find the endpoint on the EAS page under Invocation Information.
host = 'tf-serving-multi-grpc-test.166233998075****.cn-hangzhou.pai-eas.aliyuncs.com:80'

# Service token. Find it on the Shared Gateway - Internet Endpoint tab.
token = '<token>'

# Model name and version.
# - Single model, scenario-based: always 'model'
# - Single model, custom deployment: the --model_name value (defaults to 'model')
# - Multi-model: the 'name' field in model_config.pbtxt
name = '<model_name>'
signature_name = 'serving_default'

# Only one version can be requested per call.
version = <version_num>

# Build the request.
req = predict_pb2.PredictRequest()
req.model_spec.name = name
req.model_spec.signature_name = signature_name
req.model_spec.version.value = version
req.inputs['keras_tensor'].CopyFrom(tf.make_tensor_proto([[[[1.0]] * 28] * 28]))

# Send the request.
channel = grpc.insecure_channel(host)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
metadata = (('authorization', token),)
response, _ = stub.Predict.with_call(req, metadata=metadata)

print(response)

Key parameters:

Parameter Description
host The service endpoint without http://, with :80 appended. Find it on the EAS page under Invocation Information.
name Model name. For a single-model scenario-based deployment, use model. For custom deployment, use the value of --model_name. For multi-model, use the name field from model_config.pbtxt.
version Version number to query. Only one version per request.
metadata Service token from the Invocation Information page.

What's next