Deploy a Platform of Artificial Intelligence (PAI) Elastic Algorithm Service (EAS) model service using the TensorFlow Serving inference engine - Platform For AI

Prerequisites

Model files

To use image-based deployment with TensorFlow Serving, store your model files in Object Storage Service (OSS) with a directory structure that meets the following requirements:

Model version directory: Each model must have at least one version directory with an integer name. This number serves as the version identifier. A larger number indicates a newer version.
Model files: The version directory stores the exported model files in the SavedModel format. The service automatically loads the model files from the directory with the highest version number.

Follow these steps:

Create a model storage directory in an OSS bucket. For example, oss://examplebucket/models/tf_serving/. For more information, see Manage directories.

Upload the model files to the directory that you created in the previous step. You can download and use the sample TensorFlow Serving model files for this example. The model storage directory has the following structure:

tf_serving 
├── modelA
│   └── 1
│       ├── saved_model.pb
│       └── variables
│           ├── variables.data-00000-of-00001
│           └── variables.index
│
├── modelB
│   ├── 1
│   │   └── ...
│   └── 2
│       └── ...
│
└── modelC
    ├── 1
    │   └── ...
    ├── 2
    │   └── ...
    └── 3
        └── ...

Model configuration file

Use a configuration file to run multiple models in a single service. If you need to deploy only a single-model service, you can skip this section.

Create a configuration file and upload it to OSS. The sample files provided in the Prepare model files section include a model configuration file named model_config.pbtxt, which you can use directly or modify as needed. In this example, the model configuration file is uploaded to the oss://examplebucket/models/tf_serving/ directory.

The following is a sample model_config.pbtxt configuration file:

model_config_list {
  config {
    name: 'modelA'
    base_path: '/models/modelA/'
    model_platform: 'tensorflow'
    model_version_policy{
        all: {}
    }
  }
  config {
    name: 'modelB'
    base_path: '/models/modelB/'
    model_platform: 'tensorflow'
    model_version_policy{
        specific {
            versions: 1
            versions: 2
        }
    }
    version_labels {
      	key: 'stable'
      	value: 1
    }
    version_labels {
      	key: 'canary'
      	value: 2
    }
  }
  config {
    name: 'modelC'
    base_path: '/models/modelC/'
    model_platform: 'tensorflow'
    model_version_policy{
        latest {
            num_versions: 2
        }
    }
  }
}

The following table describes the key parameters.

Parameter	Required	Description
name	No	A custom name for the model. This parameter is recommended. If a name is not specified, the service cannot be invoked because the model name is empty.
base_path	Yes	The model's storage path within the instance. The service reads model files from this path during deployment. For example, if the mount path is `/models` and the model to be loaded is in the `/models/modelA` directory, set this parameter to `/models/modelA`.
model_version_policy	No	The policy for loading model versions. If you do not specify this parameter, the latest version of the model is loaded by default. all{}: Loads all versions of the model. In the example, all versions of modelA are loaded. latest{}: In the example, `num_versions: 2` is specified for modelC, which loads the two latest versions (versions 2 and 3). specific{}: Loads the specified versions. In the example, versions 1 and 2 of modelB are loaded.
version_labels	No	Specifies a custom version label for a model version. If you do not configure version labels, you can distinguish model versions only by version number. The request path is `/v1/models/<model name>/versions/<version number>:predict`. If you configure version labels, you can use a version label to point to a specific version. The request path is `/v1/models/<model name>/labels/<version label>:predict`. Note By default, you can assign a version label only to a model version that is loaded and running. To assign a label to a version that is not yet loaded, you must specify the `--allow_version_labels_for_unavailable_models=true` startup parameter in the Command to Run field. Scenario-based deployment does not support configuring the Command to Run. Use custom deployment instead.

Service deployment

Deploy a TensorFlow Serving service by using one of the following image-based deployment methods:

Scenario-based deployment: This method is ideal for basic scenarios and requires configuring only a few parameters.
Custom deployment: This method offers more flexibility to configure options, such as changing the port or setting the polling interval for model files.

Important

TensorFlow Serving model services support ports 8501 and 8500.

8501: supports HTTP requests. An HTTP or REST service is started on port 8501.
8500: supports gRPC requests. A gRPC service is started on port 8500.

The scenario-based deployment method uses port 8501 by default, which cannot be changed. If you need to use port 8500, select custom deployment.

Scenario deployment

Follow these steps:

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Scenario-based Model Deployment section, click Tensorflow Serving Deployment.

On the Tensorflow Serving Deployment page, configure the parameters. The following table describes the key parameters. For information about other parameters, see Custom Deployment.

Parameter

Description

Deployment Method

The following deployment methods are supported:

Standard Model Deployment: Deploys a single-model service.
Configuration File Deployment: Deploys a multi-model service.

Model Settings

If you select Deployment Method for Standard Model Deployment, specify the OSS path to your model files.

If you select Deployment Method for Configuration File Deployment, configure the following parameters:

OSS: Select the OSS path where the model files are located.
Mount Path: The destination path in the service instance where the model files are mounted.
Configuration File: Select the OSS path where the model configuration file is located.

The following tables show example configurations.

Parameter	Single-model example	Multi-model example
Service Name	modela_scene	multi_scene
Deployment Method	Select Standard Model Deployment.	Select Configuration File Deployment.
Model Settings	OSS: `oss://examplebucket/models/tf_serving/modelA/`.	OSS: `oss://examplebucket/models/tf_serving/`. Mount Path: /models Configuration File: `oss://examplebucket/models/tf_serving/model_config.pbtxt`

After you configure the parameters, click Deploy.

Custom deployment

Follow these steps:

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

On the Custom Deployment page, configure the parameters. The following table describes the key parameters. For information about other parameters, see Custom Deployment.

Parameter	Description
Image Configuration	In the Alibaba Cloud Image list, select tensorflow-serving and the corresponding image version. We recommend selecting the latest version. Note If the service requires a GPU, the image version must be x.xx.x-gpu.
Model Settings	You can configure model files in multiple ways. This example uses OSS. Uri: Select the OSS path where the model files are located. Mount Path: The destination path in the service instance where the model files are mounted.
Command to Run	The startup parameters for tensorflow-serving. When you select a tensorflow-serving image, the command `/usr/bin/tf_serving_entrypoint.sh` is automatically loaded. Specify the following additional parameters. Startup parameters for single-model deployment: `--model_name`: The model name, used in the service request URL. If unspecified, the default value is `model`. `--model_base_path`: The model storage path in the instance. If unspecified, the default path is `/models/model`. Startup parameters for multi-model deployment: `--model_config_file`: Required. Specifies the model configuration file. `--model_config_file_poll_wait_seconds`: Optional. To allow for modifications to the configuration file after the service starts, set a polling interval in seconds. The service then periodically re-reads the file at this interval. For example, `--model_config_file_poll_wait_seconds=30` indicates that the service reads the model configuration file every 30 seconds. Note When the model service reads a new model configuration file, it processes only the content of the new file. For example, if the old configuration file includes model A and the new configuration file removes model A and adds the configuration of model B, the service unloads model A and loads model B. `--allow_version_labels_for_unavailable_models`: Optional. The default value is false. If you want to pre-assign a label to a model version that is not yet loaded, set this parameter to true. For example, `--allow_version_labels_for_unavailable_models=true`.

The following tables show example configurations.

Parameter	Single-model example	Multi-model example
Deployment Method	Select Image-based Deployment.
Image Configuration	Select Alibaba Cloud Image: tensorflow-serving > tensorflow-serving:2.14.1.
Model Settings	Set the model type to OSS. Uri: `oss://examplebucket/models/tf_serving/`. Mount Path: Set this parameter to `/models`.
Command to Run	`/usr/bin/tf_serving_entrypoint.sh --model_name=modelA --model_base_path=/models/modelA`	`/usr/bin/tf_serving_entrypoint.sh --model_config_file=/models/model_config.pbtxt --model_config_file_poll_wait_seconds=30 --allow_version_labels_for_unavailable_models=true`

The port number is 8501 by default. An HTTP or REST service is started on port 8501 to support HTTP requests. To enable the service to support gRPC requests, perform the following steps:

In the Environment Information section, change the Port Number to 8500.
In the Environment Information section, turn on the Enable gRPC switch.
In the Service Configurations section, add the following configuration:
```
"networking": {
    "path": "/"
}
```

Click Deploy.

Service requests

The request protocol (HTTP or gRPC) depends on the port configured during service deployment. The following examples use modelA.

Prepare test data

modelA is an image classification model trained on the Fashion-MNIST dataset. Each sample is a 28x28 grayscale image. The model outputs the probability of the sample belonging to each of 10 classes. For testing purposes, use [[[[1.0]] * 28] * 28] as the input data for the modelA service request.

Request examples

HTTP requests

When the port number is set to 8501, the service supports HTTP requests. The following table summarizes the HTTP request paths for single-model and multi-model deployments.

Single model

Multiple models

Path format: <service_url>/v1/models/<model_name>:predict

Where:

Scenario-based deployment: <model_name> cannot be customized and defaults to model.
Custom deployment: <model_name> is the model name configured in the Command to Run field. If no model name is specified, the default value is model.

You can send requests without specifying a version or by specifying a model version. The path formats are as follows:

Without specifying a version (the latest version is loaded by default):

<service_url>/v1/models/<model_name>:predict
By specifying a model version:

<service_url>/v1/models/<model_name>/versions/<version_num>:predict
If version labels are set:

<service_url>/v1/models/<model_name>/labels/<version label>:predict

In this case, <model_name> is the model name specified in the model configuration file.

The <service_url> is the URL of your deployed service. On the Elastic Algorithm Service (EAS) page, click Invocation Information in the Service Type column for the service to view the service URL. When using online debugging, the console automatically fills in this part of the path.

For a scenario-based deployment of the single model modelA, the HTTP request path is <service_url>/v1/models/model:predict.

The following examples show how to send a service request by using the online debugging feature in the console and by using Python code.

Online debugging

After the service is deployed, click Online Debugging in the Actions column for the service. In the Request Parameter Online Tuning section, the service URL is pre-filled. Append the path /v1/models/model:predict to the URL and configure the request data in the Body field:

{"signature_name": "serving_default", "instances": [[[[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], "], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0] "], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0] "], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]

After the parameters are configured, click Send Request. The output is similar to the following.

Status Code: 200
Connection: Keep-Alive
Content-Length: 158
Content-Type: application/json
Date: Fri, 08 Nov 2024 02:28:08 GMT
Body: {
    "predictions": [[-9.40927601, -5.41267443, -15.9157038, -15.7119455, -16.194952, -42.5621414, -7.23207045, -43.1042442, 4.25585461, -31.3991375]
    ]
}

Send an HTTP request with Python

The following is a Python code example:

from urllib import request
import json
# Replace the following placeholders with your service URL and token.
# To obtain this information, click Invocation Information in the inference services list and go to the Public Endpoint tab.
service_url = '<service_url>'
token = '<test-token>'
# For a scenario-specific single-model deployment, set model_name to "model".
# For other scenarios, refer to the path description table above.
model_name = "model"
url = "{}/v1/models/{}:predict".format(service_url, model_name)
# Create an HTTP request.
req = request.Request(url, method="POST")
req.add_header('authorization', token)
data = {
    'signature_name': 'serving_default',
    'instances': [[[[1.0]] * 28] * 28]
}
# Send the request to the service.
response = request.urlopen(req, data=json.dumps(data).encode('utf-8')).read()
# View the response.
response = json.loads(response)
print(response)

gRPC request

After you set the port number to 8500 and add gRPC configurations, the service supports gRPC requests. The following Python code provides an example:

import grpc
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow.core.framework import tensor_shape_pb2
# The service URL. For the format, see the description of the host parameter below.
host = "tf-serving-multi-grpc-test.166233998075****.cn-hangzhou.pai-eas.aliyuncs.com:80"
# Replace <test-token> with the service token. You can find the token on the Public Endpoint tab.
token = "<test-token>"
# The model name. For more information, see the description of the name parameter below.
name = "<model_name>"
signature_name = "serving_default"
# The model version number. You can send a request to only one model version at a time.
version = "<version_num>"
# Create a gRPC request.
request = predict_pb2.PredictRequest()
request.model_spec.name = name
request.model_spec.signature_name = signature_name
request.model_spec.version.value = version
request.inputs["keras_tensor"].CopyFrom(tf.make_tensor_proto([[[[1.0]] * 28] * 28]))
# Send the request to the service.
channel = grpc.insecure_channel(host)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
metadata = (("authorization", token),)
response, _ = stub.Predict.with_call(request, metadata=metadata)
print(response)

The key parameters are described as follows:

Parameter	Description
host	Set this parameter to the service URL. Omit `http://` from the URL and append `:80` to the end. To find the service URL, go to the Elastic Algorithm Service (EAS) page. Then, find the service that you want to call and click Invocation Information in the Service Type column.
name	To send a gRPC request for a single model: For scenario-based deployments, set this parameter to `model`. For custom deployments, set this parameter to the model name specified in the Command to Run. If no model name is specified in the command, the default value is `model`. To send a gRPC request for multiple models: Set this parameter to the model name specified in the model configuration file.
version	Set this parameter to the model version number. You can send a request to only one model version at a time.
metadata	Set this parameter to the service token. You can find the token on the Invocation Information page.

Platform For AI:Deploy a model service by using a TensorFlow Serving image

Prerequisites

Model files

Model configuration file

Service deployment

Scenario deployment

Custom deployment

Service requests

HTTP requests

Online debugging

Send an HTTP request with Python

gRPC request

Related documents