A processor is a package containing online prediction logic. Elastic Algorithm Service (EAS) provides several common built-in processors for deploying standard models and reducing development costs for online prediction logic.
EAS provides the processors described in the following table. When using EASCMD to deploy a service, specify a processor code.
|
Processor name |
Processor code (for EASCMD deployment only) |
Documentation |
|
|
CPU version |
GPU version |
||
|
EasyRec |
easyrec-2.4 |
easyrec-2.4 |
|
|
TorchEasyRec |
easyrec-torch-1.0 |
easyrec-torch-1.0 |
|
|
PMML |
pmml |
None |
|
|
TensorFlow 1.12 |
tensorflow_cpu_1.12 |
tensorflow_gpu_1.12 |
|
|
TensorFlow 1.14 |
tensorflow_cpu_1.14 |
tensorflow_gpu_1.14 |
|
|
TensorFlow 1.15 |
tensorflow_cpu_1.15 |
tensorflow_gpu_1.15 |
TensorFlow 1.15 processor (with a built-in PAI-Blade agile optimization engine) |
|
TensorFlow 2.3 |
tensorflow_cpu_2.3 |
None |
|
|
PyTorch 1.6 |
pytorch_cpu_1.6 |
pytorch_gpu_1.6 |
PyTorch 1.6 processor (with a built-in PAI-Blade agile optimization engine) |
|
Caffe |
caffe_cpu |
caffe_gpu |
|
|
PS algorithm |
parameter_sever |
None |
|
|
Alink |
alink_pai_processor |
None |
None |
|
xNN |
xnn_cpu |
None |
None |
|
EasyVision |
easy_vision_cpu_tf1.12_torch151 |
easy_vision_gpu_tf1.12_torch151 |
|
|
EasyTransfer |
easytransfer_cpu |
easytransfer_gpu |
|
|
EasyNLP |
easynlp |
easynlp |
|
|
EasyCV |
easycv |
easycv |
|
|
Blade |
blade_cpu |
blade_cuda10.0_beta |
None |
|
MediaFlow |
None |
mediaflow |
|
|
Triton |
None |
triton |
|
PMML processor
The built-in PMML processor in EAS performs these operations:
-
Load a PMML model file as a service.
-
Process requests sent to the model service.
-
Calculate request results using the model and return the results to clients.
The PMML processor provides a default policy to handle missing values. If the isMissing policy is not specified for feature fields in the PMML model file, the system uses these default values for padding.
|
Data type |
Default padding value |
|
BOOLEAN |
false |
|
DOUBLE |
0.0 |
|
FLOAT |
0.0 |
|
INT |
0 |
|
STRING |
"" |
Deploy a PMML model in any of these ways:
-
Upload in the console
Set Processor Type to PMML. For more information, see Deploy a service by uploading a model in the console.
-
Deploy using a local client
In the service.json configuration file, set the processor field to the corresponding processor code, which is pmml. The following code provides an example.
{ "processor": "pmml", "generate_token": "true", "model_path": "http://xxxxx/lr.pmml", "name": "eas_lr_example", "metadata": { "instance": 1, "cpu": 1 # 4 GB of memory is automatically allocated to each CPU. This is called 1 Quota. } } -
DSW deployment
This method is similar to deploying using a local client. Edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD).
TensorFlow 1.12 processor
The TensorFlow 1.12 processor provided by EAS can load TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deploying them. For more information, see TensorFlow FAQ.
This processor does not support custom TensorFlow OPs.
Deploy a TensorFlow model in any of these ways:
-
Upload in the console
Set Processor Type to TensorFlow1.12. For more information, see Deploy a service by uploading a model in the console.
-
Deploy using a local client
In the service.json configuration file, set the processor field to the corresponding processor code, which is tensorflow_cpu_1.12 or tensorflow_gpu_1.12. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. The following code provides an example.
{ "name": "tf_serving_test", "generate_token": "true", "model_path": "http://xxxxx/savedmodel_example.zip", "processor": "tensorflow_cpu_1.12", "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } } -
Deploy using DSW
This method is similar to deploying using a local client. Edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD).
TensorFlow 1.14 processor
The TensorFlow 1.14 processor provided by EAS can load TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deploying them. For more information, see TensorFlow FAQ.
This processor does not support custom TensorFlow OPs.
Deploy a TensorFlow model in any of these ways:
-
Upload in the console
Set Processor Type to TensorFlow1.14. For more information, see Deploy a service by uploading a model in the console.
-
Deploy using a local client
In the service.json configuration file, set the processor field to the corresponding processor code, which is tensorflow_cpu_1.14 or tensorflow_gpu_1.14. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. The following code provides an example.
{ "name": "tf_serving_test", "generate_token": "true", "model_path": "http://xxxxx/savedmodel_example.zip", "processor": "tensorflow_cpu_1.14", "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } } -
DSW Deployment
This method is similar to deploying using a local client. Edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD).
TensorFlow 1.15 processor (with a built-in PAI-Blade agile optimization engine)
The TensorFlow 1.15 processor provided by EAS can load TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deploying them. For more information, see TensorFlow FAQ.
-
This processor does not support custom TensorFlow OPs.
-
This processor has the built-in PAI-Blade agile optimization engine. Use it to deploy TensorFlow models optimized by PAI-Blade agile optimization engine.
Deploy a TensorFlow model in any of these ways:
-
Upload in the console
Set Processor Type to TensorFlow1.15. For more information, see Deploy a service by uploading a model in the console.
-
Deploy using a local client
In the service.json configuration file, set the processor field to the corresponding processor code, which is tensorflow_cpu_1.15 or tensorflow_gpu_1.15. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. The following code provides an example.
{ "name": "tf_serving_test", "generate_token": "true", "model_path": "http://xxxxx/savedmodel_example.zip", "processor": "tensorflow_cpu_1.15", "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } } -
DSW deployment
This method is similar to deploying using a local client. You need to edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD). For more information about the parameters in the configuration file, see Create a service.
TensorFlow 2.3 processor
The TensorFlow 2.3 processor provided by EAS can load TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deploying them. For more information, see TensorFlow FAQ.
This processor does not support custom TensorFlow OPs.
Deploy a TensorFlow model in any of these ways:
-
Upload in the console
Set Processor Type to TensorFlow2.3. For more information, see Deploy a service by uploading a model in the console.
-
Deploy using a local client
In the service.json configuration file, set the processor field to the corresponding processor code, which is tensorflow_cpu_2.3. The following code provides an example.
{ "name": "tf_serving_test", "generate_token": "true", "model_path": "http://xxxxx/savedmodel_example.zip", "processor": "tensorflow_cpu_2.3", "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } } -
DSW deployment
This method is similar to deploying using a local client. Edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD).
PyTorch 1.6 processor (with a built-in PAI-Blade agile optimization engine)
The PyTorch 1.6 processor provided by EAS can load models in TorchScript format. For more information, see the official TorchScript documentation.
-
This processor does not support PyTorch extensions or model inputs and outputs not of tensor type.
-
This processor has the built-in PAI-Blade agile optimization engine and can be used to deploy optimized PyTorch models.
Deploy a TorchScript model in any of these ways:
-
Upload in the console
Set Processor Type to PyTorch1.6. For more information, see Deploy a service by uploading a model in the console.
-
Deploy using a local client
In the service.json configuration file, set the processor field to the corresponding processor code, which is pytorch_cpu_1.6 or pytorch_gpu_1.6. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. The following code provides an example.
{ "name": "pytorch_serving_test", "generate_token": "true", "model_path": "http://xxxxx/torchscript_model.pt", "processor": "pytorch_gpu_1.6", "metadata": { "instance": 1, "cpu": 1, "gpu": 1, "cuda": "10.0", "memory": 2000 } } -
DSW Deployment
This method is similar to deploying using a local client. You need to edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD). For more information about the parameters in the configuration file, see Create a service.
Caffe processor
The Caffe processor provided by EAS can load deep learning models trained using the Caffe framework. Because the Caffe framework is flexible, specify the names of model and weight files in the model package when deploying a Caffe model.
This processor does not support custom data layers.
Deploy a Caffe model in any of these ways:
-
Upload in the console
Set Processor Type to Caffe. For more information, see Deploy a service by uploading a model in the console.
-
Deploy using a local client
In the service.json configuration file, set the processor field to the corresponding processor code, which is caffe_cpu or caffe_gpu. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. The following code provides an example.
{ "name": "caffe_serving_test", "generate_token": "true", "model_path": "http://xxxxx/caffe_model.zip", "processor": "caffe_cpu", "model_config": { "model": "deploy.prototxt", "weight": "bvlc_reference_caffenet.caffemodel" }, "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } } -
DSW Deployment
This method is similar to deploying using a local client. Edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD).
PS algorithm processor
The PS algorithm processor provided by EAS can load models in PS format.
This section describes how to use the PS algorithm processor to deploy a model service and send service requests.
-
Deploy a model in PS format in any of these ways:
-
Upload in the console
Set Processor Type to PS Algorithm. For more information, see Custom deployment.
-
Deploy using a local client
In the service.json configuration file, set the processor field to the corresponding processor code, which is parameter_sever. The following code provides an example.
{ "name":"ps_smart", "model_path": "oss://examplebucket/xlab_m_pai_ps_smart_b_1058272_v0.tar.gz", "processor": "parameter_sever", "metadata": { "region": "beijing", "cpu": 1, "instance": 1, "memory": 2048 } } -
DSW deployment
This method is similar to deploying using a local client. Edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD).
-
-
Description
Single and batch prediction requests are supported. The request data structure is the same for both types of requests and is a JSON array that contains feature objects.
-
Example of a single request
curl "http://eas.location/api/predict/ps_smart" -d "[ { "f0": 1, "f1": 0.2, "f3": 0.5 } ]" -
Request examples
curl "http://eas.location/api/predict/ps_smart" -d "[ { "f0": 1, "f1": 0.2, "f3": 0.5 }, { "f0": 1, "f1": 0.2, "f3": 0.5 } ]" -
Return value
The return value format is the same for single and batch requests. The return value is an array that contains return objects. The position of each return object corresponds to the position of the data in the request.
[ { "label":"xxxx", "score" : 0.2, "details" : [{"k1":0.3}, {"k2":0.5}] }, { "label":"xxxx", "score" : 0.2, "details" : [{"k1":0.3}, {"k2":0.5}] } ]
-
EasyTransfer processor
The EasyTransfer processor provided by EAS can load TensorFlow-based deep learning NLP models trained using the EasyTransfer framework.
Deploy an EasyTransfer model in any of these ways:
-
Upload in the console
Set Processor Type to EasyTransfer. For more information, see Deploy a service by uploading a model in the console.
-
Deploy using a local client
In the service.json configuration file, set the processor field to the corresponding processor code, which is easytransfer_cpu or easytransfer_gpu. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. In the model_config section, specify the model type in the type field. The following code provides an example for a text classification model. For more information about other parameters, see Create a service.
-
Example configuration for a GPU-based deployment (using a public resource group)
{ "name": "et_app_demo", "metadata": { "instance": 1 }, "cloud": { "computing": { "instance_type": "ecs.gn6i-c4g1.xlarge" } }, "model_path": "http://xxxxx/your_model.zip", "processor": "easytransfer_gpu", "model_config": { "type": "text_classify_bert" } } -
Example configuration for a CPU-based deployment
{ "name": "et_app_demo", "model_path": "http://xxxxx/your_model.zip", "processor": "easytransfer_cpu", "model_config": { "type":"text_classify_bert" }, "metadata": { "instance": 1, "cpu": 1, "memory": 4000 } }
The supported task types are listed in the following table.
Task type
type
Text matching
text_match_bert
Text classification
text_classify_bert
Sequence labeling
sequence_labeling_bert
Text embedding
vectorization_bert
-
EasyNLP processor
The EasyNLP processor provided by EAS can load PyTorch-based deep learning NLP models trained using the EasyNLP framework.
Deploy an EasyNLP model in any of these ways:
-
Upload in the console
Set Processor Type to EasyNLP . For more information, see Deploy a service by uploading a model in the console.
-
Deploy using a local client
In the service.json configuration file, set the processor field to the corresponding processor code, which is easynlp. In the model_config section, specify the model type in the type field. The following code provides an example for a single-label text classification model. For more information about other parameters, see Create a service.
{ "name": "easynlp_app_demo", "metadata": { "instance": 1 }, "cloud": { "computing": { "instance_type": "ecs.gn6i-c4g1.xlarge" } }, "model_config": { "app_name": "text_classify", "type": "text_classify" }, "model_path": "http://xxxxx/your_model.tar.gz", "processor": "easynlp" }The supported task types are listed in the following table.
Task type
type
Text classification (single-label)
text_classify
Text classification (multi-label)
text_classify_multi
Text matching
text_match
Sequence labeling
sequence_labeling
Text embedding
vectorization
Chinese summary generation (GPU)
sequence_generation_zh
English summary generation (GPU)
sequence_generation_en
Machine reading comprehension (Chinese)
machine_reading_comprehension_zh
Machine reading comprehension (English)
machine_reading_comprehension_en
WUKONG_CLIP (GPU)
wukong_clip
CLIP (GPU)
clip
After the service is deployed, on the Elastic Algorithm Service (EAS) page, find the service to call and click View Endpoint Information in the Service Type column. View the service endpoint and token required for authentication. Call the service using the following Python code.
import requests
# Replace <eas-service-url> with the endpoint of your service.
url = '<eas-service-url>'
# Replace <eas-service-token> with the token of your service.
token = '<eas-service-token>'
# Specify the data for prediction. The following code provides an example for text classification.
request_body = {
"first_sequence": "hello"
}
headers = {"Authorization": token}
resp = requests.post(url=url, headers=headers, json=request_body)
print(resp.content.decode())
EasyCV processor
The EasyCV processor provided by EAS can load deep learning models trained using the EasyCV framework.
Deploy an EasyCV model in any of these ways:
-
Upload in the console
Set Processor Type to EasyCV. For more information, see Deploy a service by uploading a model in the console.
-
Deploy using a local client
In the service.json configuration file, set the processor field to the corresponding processor code, which is easycv. In the model_config section, specify the model type in the type field. The following code provides an example for an image classification model. For more information about other parameters, see Create a service.
{ "name": "easycv_classification_example", "processor": "easycv", "model_path": "oss://examplebucket/epoch_10_export.pt", "model_config": {"type":"TorchClassifier"}, "metadata": { "instance": 1 }, "cloud": { "computing": { "instance_type": "ecs.gn5i-c4g1.xlarge" } } }The supported task types are listed in the following table.
Task type
model_config
Image classification
{"type":"TorchClassifier"}
Object detection
{"type":"DetectionPredictor"}
Semantic segmentation
{"type":"SegmentationPredictor"}
YOLOX object detection
{"type":"YoloXPredictor"}
Video classification
{"type":"VideoClassificationPredictor"}
After the service is deployed, on the Elastic Algorithm Service (EAS) page, find the service to call and click View Endpoint Information in the Service Type column. View the service endpoint and token required for authentication. Call the service using the following Python code.
import requests
import base64
import json
resp = requests.get('http://exmaplebucket.oss-cn-zhangjiakou.aliyuncs.com/images/000000123213.jpg')
ENCODING = 'utf-8'
datas = json.dumps( {
"image": base64.b64encode(resp.content).decode(ENCODING)
})
# Replace the value with the token that you obtained.
head = {
"Authorization": "NTFmNDJlM2E4OTRjMzc3OWY0NzI3MTg5MzZmNGQ5Yj***"
}
for x in range(0,10):
# Replace the value with the endpoint of the service.
resp = requests.post("http://150231884461***.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test_easycv_classification_example", data=datas, headers=head)
print(resp.text)
Encode image or video data in Base64 format for transmission. Use the image keyword for image data and the video keyword for video data.
EasyVision processor
The EasyVision processor provided by EAS can load deep learning models trained using the EasyVision framework.
Deploy an EasyVision model in any of these ways:
-
Upload in the console
Set Processor Type to EasyVision. For more information, see Deploy a service by uploading a model in the console.
-
Deploy using a local client
In the service.json configuration file, set the processor field to the corresponding processor code, which is easy_vision_cpu_tf1.12_torch151 or easy_vision_gpu_tf1.12_torch151. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. In the model_config section, specify the model type in the type field. The following code provides an example. For more information about other parameters, see Create a service.
-
Example configuration for a GPU-based deployment
{ "name": "ev_app_demo", "processor": "easy_vision_gpu_tf1.12_torch151", "model_path": "oss://path/to/your/model", "model_config": "{\"type\":\"classifier\"}", "metadata": { "resource": "your_resource_name", "cuda": "9.0", "instance": 1, "memory": 4000, "gpu": 1, "cpu": 4, "rpc.worker_threads" : 5 } } -
Example configuration for a CPU-based deployment
{ "name": "ev_app_cpu_demo", "processor": "easy_vision_cpu_tf1.12_torch151", "model_path": "oss://path/to/your/model", "model_config": "{\"type\":\"classifier\"}", "metadata": { "resource": "your_resource_name", "instance": 1, "memory": 4000, "gpu": 0, "cpu": 4, "rpc.worker_threads" : 5 } }
-
MediaFlow processor
The MediaFlow processor provided by EAS is a universal orchestration engine that can analyze and process videos, audio, and images.
Deploy a MediaFlow model in any of these ways:
-
Upload in the console
Set Processor Type to MediaFlow. For more information, see Deploy a service by uploading a model in the console.
-
Deploy using a local client
In the service.json configuration file, set the processor field to the corresponding processor code, which is mediaflow. When using the MediaFlow processor to deploy a model, also add these specific fields to the configuration file. For more information about other fields, see Create a service.
-
graph_pool_size: the number of graph pools.
-
worker_threads: the number of scheduling threads.
The following is an example.
-
Example configuration for deploying a video classification model
{ "model_entry": "video_classification/video_classification_ext.js", "name": "video_classification", "model_path": "oss://path/to/your/model", "generate_token": "true", "processor": "mediaflow", "model_config" : { "graph_pool_size":8, "worker_threads":16 }, "metadata": { "eas.handlers.disable_failure_handler" :true, "resource": "your_resource_name", "rpc.worker_threads": 30, "rpc.enable_jemalloc": true, "rpc.keepalive": 500000, "cpu": 4, "instance": 1, "cuda": "9.0", "rpc.max_batch_size": 64, "memory": 10000, "gpu": 1 } } -
Configuration for speech recognition (ASR) model deployment
{ "model_entry": "asr/video_asr_ext.js", "name": "video_asr", "model_path": "oss://path/to/your/model", "generate_token": "true", "processor": "mediaflow", "model_config" : { "graph_pool_size":8, "worker_threads":16 }, "metadata": { "eas.handlers.disable_failure_handler" :true, "resource": "your_resource_name", "rpc.worker_threads": 30, "rpc.enable_jemalloc": true, "rpc.keepalive": 500000, "cpu": 4, "instance": 1, "cuda": "9.0", "rpc.max_batch_size": 64, "memory": 10000, "gpu": 1 } }
The main differences in the service.json configuration between speech recognition and video classification are the values of model_entry, name, and model_path fields. Modify these fields based on the type of model to deploy.
-
Triton processor
Triton Inference Server is an online service framework from NVIDIA. It provides easy-to-use deployment and management interfaces for models on GPUs and is compatible with KFServing API standards. Its features include:
-
Supports deployment of multiple open source frameworks, such as TensorFlow, PyTorch, ONNX Runtime, and TensorRT. Also supports custom service backends.
-
Supports running multiple models on a single GPU at the same time to improve GPU utilization.
-
Supports HTTP and gRPC communication protocols. Also provides a binary format extension to compress sent request size.
-
Supports dynamic batching to improve service throughput.
Triton Inference Server is available on EAS as the built-in Triton processor.
-
The Triton processor is in public preview and is available only in the China (Shanghai) region. Other regions are not supported.
-
All models deployed using the Triton service must be stored in OSS. Therefore, activate OSS and upload your model files to OSS in advance. For more information about how to upload files to OSS, see Simple upload.
This section describes how to use the Triton processor to deploy a model service and how to call the service:
-
Deploy a model service using the Triton processor
Deploy a Triton model service only using the eascmd client tool. For more information, see Create a service. When deploying the model service, in the service.json configuration file, set the processor field to the corresponding processor code, which is triton. Because Triton retrieves the model from OSS, also configure parameters related to OSS. The following code shows an example of the service.json file.
{ "name": "triton_test", "processor": "triton", "processor_params": [ "--model-repository=oss://triton-model-repo/models", "--allow-http=true", ], "metadata": { "instance": 1, "cpu": 4, "gpu": 1, "memory": 10000, "resource":"<your resource id>" } }The following table describes specific parameters required to deploy a Triton model service. For more information about other general-purpose parameters, see Parameters in service.json.
Parameter
Description
processor_params
Parameters passed to Triton Server at service startup. Unsupported parameters are automatically filtered. For the set of supported parameters that can be passed to Triton Server, see Parameters that can be passed to Triton Server. The model-repository parameter is required. For more information about other optional parameters, see main.cc.
oss_endpoint
The OSS endpoint. If not specified, the system automatically uses the OSS service in the region where the EAS service is deployed. To use an OSS service in a different region, specify this parameter. For information about valid values, see Endpoints.
metadata
resource
The ID of the dedicated EAS resource group used to deploy the model service. When using the Triton processor to deploy a model service, the resources must belong to a dedicated EAS resource group. For more information about how to create a dedicated EAS resource group, see Use EAS resource groups.
Table 1. Parameters that can be passed to Triton Server
Parameter
Required
Description
model-repository
Yes
The path must be an OSS path. The system does not support using the root directory of a bucket as model-repository. Specify a subdirectory within the bucket.
For example,
oss://triton-model-repo/models, where triton-model-repo is the bucket name and models is a subdirectory in the bucket.log-verbose
No
For more information about the parameters, see main.cc.
log-info
No
log-warning
No
log-error
No
exit-on-error
No
strict-model-config
No
strict-readiness
No
allow-http
No
http-thread-count
No
pinned-memory-pool-byte-size
No
cuda-memory-pool-byte-size
No
min-supported-compute-capability
No
buffer-manager-thread-count
No
backend-config
No
-
Use the native Triton client to call the EAS Triton processor service
To send a request using a Python client, first run these commands to install the official Triton client.
pip3 install nvidia-pyindex pip3 install tritonclient[all]Run the following command to download a test image to the current directory.
wget http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/doc-assets/cat.pngThe following example shows how to use a Python client to send a request in binary format to the Triton processor service.
import numpy as np import time from PIL import Image import tritonclient.http as httpclient from tritonclient.utils import InferenceServerException URL = "<service url>" # Replace <service url> with the endpoint of the service. HEADERS = {"Authorization": "<service token>"} # Replace <service token> with the access token of the service. input_img = httpclient.InferInput("input", [1, 299, 299, 3], "FP32") img = Image.open('./cat.png').resize((299, 299)) img = np.asarray(img).astype('float32') / 255.0 input_img.set_data_from_numpy(img.reshape([1, 299, 299, 3]), binary_data=True) output = httpclient.InferRequestedOutput( "InceptionV3/Predictions/Softmax", binary_data=True ) triton_client = httpclient.InferenceServerClient(url=URL, verbose=False) start = time.time() for i in range(10): results = triton_client.infer( "inception_graphdef", inputs=[input_img], outputs=[output], headers=HEADERS ) res_body = results.get_response() elapsed_ms = (time.time() - start) * 1000 if i == 0: print("model name: ", res_body["model_name"]) print("model version: ", res_body["model_version"]) print("output name: ", res_body["outputs"][0]["name"]) print("output shape: ", res_body["outputs"][0]["shape"]) print("[{}] Avg rt(ms): {:.2f}".format(i, elapsed_ms)) start = time.time()