Deploy TensorFlow models (SavedModel format) as REST API services with CPU or GPU support.
Prerequisites
Keras or Checkpoint models must be converted to SavedModel format before deployment. For more information, see Export TensorFlow models in the SavedModel format. Models optimized by PAI-Blade run directly without conversion.
Deploy model
-
Create service configuration file.
Set the processor type parameter to the processor name when you deploy a TensorFlow model service using EASCMD client.
{ "name": "tf_serving_test", "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/model.tar.gz", "processor": "tensorflow_cpu_1.15", "metadata": { "instance": 1, "cpu": 1, "memory": 4000 } }For more information about deploying services using the client, see Service Deployment: EASCMD & DSW.
To deploy TensorFlow model services from the console, see Service Deployment: Console.
-
After deploying the service, go to the Elastic Algorithm Service (EAS) page. Find the service and click View Invocation Information in the Service Method column to view endpoint and authentication token.
Advanced: Model warm-up
When some TensorFlow models are called for the first time, their related files and parameters must be loaded into memory. This process can take a long time and cause high response times or errors (408 timeout, 450 queue full). Configure warm-up to prevent service jitter during rolling updates.
-
Create a request file for warm-up.
For more information, see Advanced Configuration: Model service warm-up.
-
Add warm-up configuration to service file.
{ "name": "tf_serving_test", "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/model.tar.gz", "processor": "tensorflow_cpu_1.15", "warm_up_data_path":"oss://path/to/warm_up_test.bin", // The path of the request file for model warm-up. "metadata": { "instance": 1, "cpu": 1, "memory": 4000 } }
Call service
TensorFlow services use protobuf format for input and output. Online debugging does not support protobuf format.
EAS provides a software development kit (SDK) to encapsulate service requests and responses with direct connection and fault tolerance mechanisms. We recommend using the SDK.
-
Query the model structure.
Models in standard SavedModel format return model structure information in JSON format when you send an empty request.
// Empty request $ curl 1828488879222***.cn-shanghai.pai-eas.aliyuncs.com/api/predict/mnist_saved_model_example -H 'Authorization: YTg2ZjE0ZjM4ZmE3OTc0NzYxZDMyNmYzMTJjZTQ1***' // Response: model structure { "inputs": [ { "name": "images", "shape": [ -1, 784 ], "type": "DT_FLOAT" } ], "outputs": [ { "name": "scores", "shape": [ -1, 10 ], "type": "DT_FLOAT" } ], "signature_name": "predict_images" }NoteFrozen pb format models do not return model structure information.
-
Send an inference request.
The following code shows how to send a model request using the Python SDK.
#!/usr/bin/env python from eas_prediction import PredictClient from eas_prediction import TFRequest if __name__ == '__main__': # Initialize client with endpoint and service name client = PredictClient('http://1828488879222***.cn-shanghai.pai-eas.aliyuncs.com', 'mnist_saved_model_example') client.set_token('YTg2ZjE0ZjM4ZmE3OTc0NzYxZDMyNmYzMTJjZTQ1****') client.init() # Build request with signature name req = TFRequest('predict_images') req.add_feed('images', [1, 784], TFRequest.DT_FLOAT, [1] * 784) # Send requests for x in range(0, 1000000): resp = client.predict(req) print(resp)For more information about the parameter settings in the code, see Using the Python SDK.
To build service requests manually, see Request format.
Request format
TensorFlow processor uses protobuf format for input and output. The SDK encapsulates requests automatically. To build service requests manually, generate code from the following protobuf definition. For more information, see Construct a TensorFlow service request.
syntax = "proto3";
option cc_enable_arenas = true;
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "PredictProtos";
enum ArrayDataType {
// Not a legal value for DataType. Used to indicate a DataType field
// has not been set.
DT_INVALID = 0;
// Data types that all computation devices are expected to be
// capable to support.
DT_FLOAT = 1;
DT_DOUBLE = 2;
DT_INT32 = 3;
DT_UINT8 = 4;
DT_INT16 = 5;
DT_INT8 = 6;
DT_STRING = 7;
DT_COMPLEX64 = 8; // Single-precision complex.
DT_INT64 = 9;
DT_BOOL = 10;
DT_QINT8 = 11; // Quantized int8.
DT_QUINT8 = 12; // Quantized uint8.
DT_QINT32 = 13; // Quantized int32.
DT_BFLOAT16 = 14; // Float32 truncated to 16 bits. Only for cast ops.
DT_QINT16 = 15; // Quantized int16.
DT_QUINT16 = 16; // Quantized uint16.
DT_UINT16 = 17;
DT_COMPLEX128 = 18; // Double-precision complex.
DT_HALF = 19;
DT_RESOURCE = 20;
DT_VARIANT = 21; // Arbitrary C++ data types.
}
// Dimensions of an array.
message ArrayShape {
repeated int64 dim = 1 [packed = true];
}
// Protocol buffer representing an array.
message ArrayProto {
// Data Type.
ArrayDataType dtype = 1;
// Shape of the array.
ArrayShape array_shape = 2;
// DT_FLOAT.
repeated float float_val = 3 [packed = true];
// DT_DOUBLE.
repeated double double_val = 4 [packed = true];
// DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
repeated int32 int_val = 5 [packed = true];
// DT_STRING.
repeated bytes string_val = 6;
// DT_INT64.
repeated int64 int64_val = 7 [packed = true];
// DT_BOOL.
repeated bool bool_val = 8 [packed = true];
}
// PredictRequest specifies which TensorFlow model to run, along with
// how inputs are mapped to tensors and how outputs are filtered before
// returning to user.
message PredictRequest {
// A named signature to evaluate. If unspecified, the default signature
// will be used.
string signature_name = 1;
// Input tensors.
// Names of input tensor are alias names. The mapping from aliases to real
// input tensor names is expected to be stored as named generic signature
// under the key "inputs" in the model export.
// Each alias listed in a generic signature named "inputs" should be provided
// exactly once to run the prediction.
map<string, ArrayProto> inputs = 2;
// Output filter.
// Names specified are alias names. The mapping from aliases to real output
// tensor names is expected to be stored as named generic signature under
// the key "outputs" in the model export.
// Only tensors specified here will be run/fetched and returned, with the
// exception that when none is specified, all tensors specified in the
// named signature will be run/fetched and returned.
repeated string output_filter = 3;
}
// Response for PredictRequest on successful run.
message PredictResponse {
// Output tensors.
map<string, ArrayProto> outputs = 1;
}
Processor versions
TensorFlow is available in multiple versions with CPU and GPU support. Use the latest version unless you have specific requirements. Newer versions provide better performance and forward compatibility. The following table lists processor names for each version.
|
Processor name |
TensorFlow version |
GPU support |
|
tensorflow_cpu_1.12 |
TensorFlow 1.12 |
No |
|
tensorflow_cpu_1.14 |
TensorFlow 1.14 |
No |
|
tensorflow_cpu_1.15 |
TensorFlow 1.15 |
No |
|
tensorflow_cpu_2.3 |
TensorFlow 2.3 |
No |
|
tensorflow_cpu_2.4 |
TensorFlow 2.4 |
No |
|
tensorflow_cpu_2.7 |
TensorFlow 2.7 |
No |
|
tensorflow_gpu_1.12 |
TensorFlow 1.12 |
Yes |
|
tensorflow_gpu_1.14 |
TensorFlow 1.14 |
Yes |
|
tensorflow_gpu_1.15 |
TensorFlow 1.15 |
Yes |
|
tensorflow_gpu_2.4 |
TensorFlow 2.4 |
Yes |
|
tensorflow_gpu_2.7 |
TensorFlow 2.7 |
Yes |