Deploy TensorFlow Serving model services using SavedModel format, with built-in version management and rolling updates.
Prerequisites
Before you begin, make sure you have:
-
An OSS bucket for storing model files
-
A trained model exported in SavedModel format
-
Access to the PAI console
Prepare model files
Directory structure
Store model files in OSS using this directory layout. Version directories must be named with integers — higher numbers indicate newer versions. TensorFlow Serving automatically loads from the highest-numbered version directory.
tf_serving/
├── modelA/
│ └── 1/
│ ├── saved_model.pb
│ └── variables/
│ ├── variables.data-00000-of-00001
│ └── variables.index
├── modelB/
│ ├── 1/
│ │ └── ...
│ └── 2/
│ └── ...
└── modelC/
├── 1/
│ └── ...
├── 2/
│ └── ...
└── 3/
└── ...
Steps:
-
Create a model directory in your OSS bucket, for example
oss://examplebucket/models/tf_serving/. For details, see Manage directories. -
Upload your SavedModel files into the directory. Download the sample TensorFlow Serving model file to follow along with the examples in this document.
Model configuration file (multi-model deployments)
Skip this section for single-model deployments.
To run multiple models in a single service, create a model_config.pbtxt file and upload it to OSS alongside your model directories. The sample archive includes a model_config.pbtxt you can use directly or adapt.
Example `model_config.pbtxt`:
model_config_list {
config {
name: 'modelA'
base_path: '/models/modelA/'
model_platform: 'tensorflow'
model_version_policy {
all: {}
}
}
config {
name: 'modelB'
base_path: '/models/modelB/'
model_platform: 'tensorflow'
model_version_policy {
specific {
versions: 1
versions: 2
}
}
version_labels {
key: 'stable'
value: 1
}
version_labels {
key: 'canary'
value: 2
}
}
config {
name: 'modelC'
base_path: '/models/modelC/'
model_platform: 'tensorflow'
model_version_policy {
latest {
num_versions: 2
}
}
}
}
Key parameters:
| Parameter | Required | Description |
|---|---|---|
name |
No | Custom name for the model. Without this, model_name is empty and the service cannot be invoked. |
base_path |
Yes | Path to the model directory inside the service instance. For example, if the mount path is /models and you want to load modelA, set this to /models/modelA. |
model_version_policy |
No | Which versions to load. Without configuration, only the latest version loads. Options: all{} (all versions), latest{ num_versions: N } (N most recent versions), specific{ versions: N } (named versions). |
version_labels |
No | Named aliases for specific versions. Use version labels to implement canary releases and rollbacks without changing request paths. See Canary releases and rollbacks with version labels. |
Canary releases and rollbacks with version labels
Version labels let you route traffic to a specific version by name (stable, canary) rather than version number. This is the standard approach for rolling out a new version to a subset of traffic before promoting it fully.
Typical workflow:
-
Start with
stablepointing to version 1 andcanarypointing to version 2:version_labels { key: 'stable' value: 1 } version_labels { key: 'canary' value: 2 } -
After validating version 2, promote
stableto version 2:version_labels { key: 'stable' value: 2 } version_labels { key: 'canary' value: 2 } -
To roll back, revert
stableto version 1:version_labels { key: 'stable' value: 1 } version_labels { key: 'canary' value: 2 }
Request path with version labels: <service_url>/v1/models/<model_name>/labels/<version_label>:predict
By default, version labels can only be assigned to already-loaded versions. This prevents requests from failing while a new version is still loading. To assign a label to a version before it finishes loading, add --allow_version_labels_for_unavailable_models=true to the startup command. This option is only available in custom model deployment, not scenario-based deployment.
Deploy the service
TensorFlow Serving exposes two ports by default:
-
8501 — HTTP/REST requests
-
8500 — gRPC requests
Choose a deployment method based on your needs:
| Scenario-based deployment | Custom deployment | |
|---|---|---|
| Best for | Standard use cases, quick setup | Advanced configuration, gRPC support, custom ports |
| Port | 8501 (fixed) | 8501 or 8500 (configurable) |
| gRPC | Not supported | Supported |
| Startup parameters | Not configurable | Fully configurable |
Scenario-based deployment
-
Log in to the PAI console. Select a region, choose your workspace, and click Elastic Algorithm Service (EAS).
-
Click Deploy Service. In the Scenario-based Model Deployment section, click Tensorflow Serving Deployment.
-
On the Tensorflow Serving Deployment page, configure the following parameters. For other parameters, see Custom Deployment.
Parameter Description Deployment Method Standard Model Deployment for a single model, or Configuration File Deployment for multiple models. Model Settings For Standard Model Deployment, set the OSS path to the model directory. For Configuration File Deployment, set the OSS model directory path, the Mount Path inside the instance, and the OSS path to model_config.pbtxt.Example configurations:
Parameter Single-model example (modelA) Multi-model example Service Name modela_scenemulti_sceneDeployment Method Standard Model Deployment Configuration File Deployment OSS oss://examplebucket/models/tf_serving/modelA/oss://examplebucket/models/tf_serving/Mount Path — /modelsConfiguration File — oss://examplebucket/models/tf_serving/model_config.pbtxt -
Click Deploy.
Custom deployment
Use custom deployment to configure gRPC support, startup parameters, or advanced options such as configuration file polling.
-
Log in to the PAI console. Select a region, choose your workspace, and click Elastic Algorithm Service (EAS).
-
Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.
-
On the Custom Deployment page, configure the following parameters. For other parameters, see Custom Deployment.
Parameter Description Image Configuration Under Alibaba Cloud Image, select tensorflow-serving and the image version. For GPU workloads, choose an image tagged x.xx.x-gpu.Model Settings Set the model type to OSS. Configure Uri (OSS path to your model files) and Mount Path (destination path inside the instance). Command to Run Startup command for tensorflow-serving. The entry point/usr/bin/tf_serving_entrypoint.shis pre-filled when you select the TensorFlow Serving image. Append the parameters below.Startup parameters:
Parameter Required Description --model_nameNo (single-model) Model name used in the request URL. Defaults to modelif not set.--model_base_pathNo (single-model) Path to the model directory inside the instance. Defaults to /models/model.--model_config_fileYes (multi-model) Path to the model configuration file inside the instance. --model_config_file_poll_wait_secondsNo (multi-model) How often (in seconds) the service re-reads the configuration file. Set this to update model configuration without restarting the service. For example, --model_config_file_poll_wait_seconds=30re-reads the file every 30 seconds. When the service detects a new configuration, it applies only the changes — unloading removed models and loading added ones.--allow_version_labels_for_unavailable_modelsNo Allows assigning version labels to versions that are not yet loaded. Defaults to falseto prevent traffic loss while a version is loading. Set totrueonly when you need to pre-assign labels before a new version finishes loading.Example configurations:
Parameter Single-model example (modelA) Multi-model example Deployment Method Image-based Deployment Image Configuration tensorflow-serving > tensorflow-serving:2.14.1Uri oss://examplebucket/models/tf_serving/Mount Path /modelsCommand to Run /usr/bin/tf_serving_entrypoint.sh --model_name=modelA --model_base_path=/models/modelA/usr/bin/tf_serving_entrypoint.sh --model_config_file=/models/model_config.pbtxt --model_config_file_poll_wait_seconds=30 --allow_version_labels_for_unavailable_models=true -
To enable gRPC (port 8500), perform these additional steps in the Environment Information section:
-
Set Port Number to
8500. -
Turn on Enable gRPC.
-
In Service Configurations, add: ``
json "networking": { "path": "/" }``
-
-
Click Deploy.
Send requests
TensorFlow Serving supports HTTP and gRPC requests. The examples below use modelA, an image classification model trained on Fashion-MNIST (28×28 grayscale images, 10 classes). The test input is [[[[1.0]] * 28] * 28].
HTTP requests (port 8501)
All HTTP requests use the format:
POST <service_url>/v1/models/<model_name>:predict
For multi-model deployments, you can also target a specific version or label:
| Request type | Path |
|---|---|
| Latest version | <service_url>/v1/models/<model_name>:predict |
| Specific version | <service_url>/v1/models/<model_name>/versions/<version_num>:predict |
| Version label | <service_url>/v1/models/<model_name>/labels/<version_label>:predict |
The <model_name> value depends on how you deployed:
-
Scenario-based, single model: always
model -
Custom deployment, single model: the value of
--model_name(defaults tomodelif not set) -
Multi-model: the
namefield inmodel_config.pbtxt
To find <service_url>, go to the Elastic Algorithm Service (EAS) page and click Invocation Information in the Service Type column for your service.
Test with online debugging
After deployment, click Online Debugging in the Actions column of your service. The Request Parameter Online Tuning panel pre-fills <service_url>. Append the path /v1/models/model:predict and set the Body to:
{"signature_name": "serving_default", "instances": [[[[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]]]]}
Click Send Request. The response shows the class probabilities for your input.
Send an HTTP request using Python
from urllib import request
import json
# Replace with your service endpoint and token.
# Find them on the Shared Gateway - Internet Endpoint tab of the Call Info page.
service_url = '<service_url>'
token = '<token>'
# For a scenario-based single-model deployment, the model name is 'model'.
# For other cases, see the path format table above.
model_name = 'model'
url = '{}/v1/models/{}:predict'.format(service_url, model_name)
# Build the request.
req = request.Request(url, method='POST')
req.add_header('authorization', token)
data = {
'signature_name': 'serving_default',
'instances': [[[[1.0]] * 28] * 28]
}
# Send the request and print the response.
response = request.urlopen(req, data=json.dumps(data).encode('utf-8')).read()
print(json.loads(response))
gRPC requests (port 8500)
gRPC support requires custom deployment with port 8500 and Enable gRPC turned on.
import grpc
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
# Service endpoint: omit "http://" and append ":80".
# Find the endpoint on the EAS page under Invocation Information.
host = 'tf-serving-multi-grpc-test.166233998075****.cn-hangzhou.pai-eas.aliyuncs.com:80'
# Service token. Find it on the Shared Gateway - Internet Endpoint tab.
token = '<token>'
# Model name and version.
# - Single model, scenario-based: always 'model'
# - Single model, custom deployment: the --model_name value (defaults to 'model')
# - Multi-model: the 'name' field in model_config.pbtxt
name = '<model_name>'
signature_name = 'serving_default'
# Only one version can be requested per call.
version = <version_num>
# Build the request.
req = predict_pb2.PredictRequest()
req.model_spec.name = name
req.model_spec.signature_name = signature_name
req.model_spec.version.value = version
req.inputs['keras_tensor'].CopyFrom(tf.make_tensor_proto([[[[1.0]] * 28] * 28]))
# Send the request.
channel = grpc.insecure_channel(host)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
metadata = (('authorization', token),)
response, _ = stub.Predict.with_call(req, metadata=metadata)
print(response)
Key parameters:
| Parameter | Description |
|---|---|
host |
The service endpoint without http://, with :80 appended. Find it on the EAS page under Invocation Information. |
name |
Model name. For a single-model scenario-based deployment, use model. For custom deployment, use the value of --model_name. For multi-model, use the name field from model_config.pbtxt. |
version |
Version number to query. Only one version per request. |
metadata |
Service token from the Invocation Information page. |
What's next
-
Deploy a model service using Triton Inference Server: Deploy a service using a Triton Inference Server image
-
Package a custom runtime and use it to deploy a service: Custom images