A processor is a package of online prediction logic. Elastic Algorithm Service (EAS) of Machine Learning Platform for AI (PAI) provides built-in processors, which are commonly used to deploy models. Using these built-in processors can help you reduce the expenses on developing the online prediction logic of models.
Processor name | Processor code (only required when EASCMD is used) | References | |
---|---|---|---|
CPU edition | GPU edition | ||
PMML | pmml | None | PMML processor |
TensorFlow1.12 | tensorflow_cpu_1.12 | tensorflow_gpu_1.12 | TensorFlow1.12 processor |
TensorFlow1.14 | tensorflow_cpu_1.14 | tensorflow_gpu_1.14 | TensorFlow1.14 processor |
TensorFlow1.15 | tensorflow_cpu_1.15 | tensorflow_gpu_1.15 | TensorFlow1.15 processor with a built-in optimization engine based on PAI-Blade of the agility edition |
TensorFlow2.3 | tensorflow_cpu_2.3 | None | TensorFlow2.3 processor |
PyTorch1.6 | pytorch_cpu_1.6 | pytorch_gpu_1.6 | PyTorch1.6 processor with a built-in optimization engine based on PAI-Blade of the agility edition |
Caffe | caffe_cpu | caffe_gpu | Caffe processor |
Parameter server algorithm | parameter_sever | None | PS processor |
Alink | alink_pai_processor | None | None |
xNN | xnn_cpu | None | None |
EasyVision | easy_vision_cpu_tf1.12_torch151 | easy_vision_gpu_tf1.12_torch151 | EasyVision processor |
EasyTransfer | easytransfer_cpu | easytransfer_gpu | EasyTransfer processor |
EasyNLP | easynlp | easynlp | EasyNLP processor |
EasyCV | easycv | easycv | EasyCV processor |
Blade | blade_cpu | blade_cuda10.0_beta | None |
MediaFlow | None | mediaflow | MediaFlow processor |
Triton | None | triton | Triton processor |
PMML processor
- Loads a model service from a PMML file.
- Processes requests that are sent to call the model service.
- Uses the model to calculate the request results and returns the results to clients.
DataType | Default imputed value |
---|---|
BOOLEAN | false |
DOUBLE | 0.0 |
FLOAT | 0.0 |
INT | 0 |
STRING | "" |
- Upload the model file to the console
Set the Processor Type parameter to PMML. For more information, see Upload and deploy models in the console.
- Use the EASCMD client to deploy the modelIn the service.json service configuration file, set the processor parameter to pmml. The following code block shows an example:
{ "processor": "pmml", "generate_token": "true", "model_path": "http://xxxxx/lr.pmml", "name": "eas_lr_example", "metadata": { "instance": 1, "cpu": 1 # Allocate 4 GB memory for each CPU. One CPU and 4 GB memory are considered one quota. } }
- Use Data Science Workshop (DSW) to deploy the model
Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW.
TensorFlow1.12 processor
- Upload the model file to the console
Set the Processor Type parameter to TensorFlow1.12. For more information, see Upload and deploy models in the console.
- Use the EASCMD client to deploy the modelIn the service.json service configuration file, set the processor parameter to tensorflow_cpu_1.12 or tensorflow_gpu_1.12 based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. The following code block shows an example:
{ "name": "tf_serving_test", "generate_token": "true", "model_path": "http://xxxxx/savedmodel_example.zip", "processor": "tensorflow_cpu_1.12", "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } }
- Use DSW to deploy the model
Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW.
TensorFlow1.14 processor
- Upload the model file to the console
Set the Processor Type parameter to TensorFlow1.14. For more information, see Upload and deploy models in the console.
- Use the EASCMD client to deploy the modelIn the service.json service configuration file, set the processor parameter to tensorflow_cpu_1.14 or tensorflow_gpu_1.14 based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. The following code block shows an example:
{ "name": "tf_serving_test", "generate_token": "true", "model_path": "http://xxxxx/savedmodel_example.zip", "processor": "tensorflow_cpu_1.14", "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } }
- Use DSW to deploy the model
Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW.
TensorFlow1.15 processor with a built-in optimization engine based on PAI-Blade of the agility edition
- The general-purpose processor does not support custom TensorFlow operations.
- TensorFlow1.15 processor provides a built-in optimization engine based on PAI-Blade of the agility edition. You can use this processor to deploy TensorFlow models that are optimized by PAI-Blade of the agility edition.
- Upload the model file to the console
Set the Processor Type parameter to TensorFlow1.15. For more information, see Upload and deploy models in the console.
- Use the EASCMD client to deploy the modelIn the service.json service configuration file, set the processor parameter to tensorflow_cpu_1.15 or tensorflow_gpu_1.15 based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. The following code block shows an example:
{ "name": "tf_serving_test", "generate_token": "true", "model_path": "http://xxxxx/savedmodel_example.zip", "processor": "tensorflow_cpu_1.15", "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } }
- Use DSW to deploy the model
Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW. For more information about the parameters in the service configuration file, see Create a service.
TensorFlow2.3 processor
- Upload the model file to the console
Set the Processor Type parameter to TensorFlow2.3. For more information, see Upload and deploy models in the console.
- Use the EASCMD client to deploy the modelIn the service.json service configuration file, set the processor parameter to mediaflow.The following code block shows an example:
{ "name": "tf_serving_test", "generate_token": "true", "model_path": "http://xxxxx/savedmodel_example.zip", "processor": "tensorflow_cpu_2.3", "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } }
- Use DSW to deploy the model
Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW.
PyTorch1.6 processor with a built-in optimization engine based on PAI-Blade of the agility edition
- The general-purpose processor does not support PyTorch extensions. You cannot use this processor to import or export models other than TensorFlow models.
- The PyTorch1.6 processor provides a built-in optimization engine based on PAI-Blade of the agility edition. You can use this processor to deploy PyTorch models that are optimized by PAI-Blade of the agility edition.
- Upload the model file to the console
Set the Processor Type parameter to PyTorch1.6. For more information, see Upload and deploy models in the console.
- Use the EASCMD client to deploy the modelIn the service.json service configuration file, set the processor parameter to pytorch_cpu_1.6 or pytorch_gpu_1.6 based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. The following code block shows an example:
{ "name": "pytorch_serving_test", "generate_token": "true", "model_path": "http://xxxxx/torchscript_model.pt", "processor": "pytorch_gpu_1.6", "metadata": { "instance": 1, "cpu": 1, "gpu": 1, "cuda": "10.0", "memory": 2000 } }
- Use DSW to deploy the model
Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW. For more information about the parameters in the service configuration file, see Create a service.
Caffe processor
- Upload the model file to the console
Set the Processor Type parameter to Caffe. For more information, see Upload and deploy models in the console.
- Use the EASCMD client to deploy the modelIn the service.json service configuration file, set the processor parameter to caffe_cpu or caffe_gpu based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. The following code block shows an example:
{ "name": "caffe_serving_test", "generate_token": "true", "model_path": "http://xxxxx/caffe_model.zip", "processor": "caffe_cpu", "model_config": { "model": "deploy.prototxt", "weight": "bvlc_reference_caffenet.caffemodel" }, "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } }
- Use DSW to deploy the model
Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW.
PS processor
The PS processor is provided by EAS and developed based on PS algorithms. This processor can load models in the PS format.
- You can deploy a PS model by using one of the following methods:
- Upload the model file to the console
Set the Processor Type parameter to PS Algorithm. For more information, see Model service deployment by using the PAI console and Machine Learning Designer.
- Use the EASCMD client to deploy the modelIn the service.json service configuration file, set the processor parameter to parameter_sever. The following code block shows an example:
{ "name":"ps_smart", "model_path": "oss://examplebucket/xlab_m_pai_ps_smart_b_1058272_v0.tar.gz", "processor": "parameter_sever", "metadata": { "region": "beijing", "cpu": 1, "instance": 1, "memory": 2048 } }
- Use DSW to deploy the model
Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW.
- Upload the model file to the console
- Request descriptionYou can use the PS model service to send a single request or send multiple requests at once. The two methods have the same request syntax. This means that the feature objects contained in the JSON arrays are the same.
- Sample syntax to send a single request
curl "http://eas.location/api/predict/ps_smart" -d "[ { "f0": 1, "f1": 0.2, "f3": 0.5, } ]"
- Sample syntax to send multiple requests at once
curl "http://eas.location/api/predict/ps_smart" -d "[ { "f0": 1, "f1": 0.2, "f3": 0.5, }, { "f0": 1, "f1": 0.2, "f3": 0.5, } ]"
- Responses
The two methods also have the same response syntax. This means that the returned objects contained in the JSON arrays are the same. In addition, the returned objects have the same order as the request syntax.
[ { "lable":"xxxx", "score" : 0.2, "details" : [{"k1":0.3}, {"k2":0.5}] }, { "lable":"xxxx", "score" : 0.2, "details" : [{"k1":0.3}, {"k2":0.5}] } ]
- Sample syntax to send a single request
EasyTransfer processor
The EasyTransfer processor that EAS provides can load TensorFlow-based deep learning natural language processing (NLP) models that are trained based on the EasyTransfer framework.
- Upload the model file to the console
Set the Processor Type parameter to EasyTransfer. For more information, see Upload and deploy models in the console.
- Use the EASCMD client to deploy the modelIn the service.json service configuration file, set the processor parameter to easytransfer_cpu or easytransfer_gpu based on the model resources. If the value of the processor parameter does not match the type of the resources, a deployment error occurs. Set the type field of the model_config parameter to the model type that you want to use. In the following examples, a text classification model is used. For more information about other parameters, see Create a service.
- Deploy the model on a GPU node (the public resource group is used in this example)
{ "name": "et_app_demo" "metadata": { "instance": 1 }, "cloud": { "computing": { "instance_type": "ecs.gn6i-c4g1.xlarge" } }, "model_path": "http://xxxxx/your_model.zip", "processor": "easytransfer_gpu", "model_config": { "type": "text_classify_bert" } }
- Deploy the model on a CPU node
{ "name": "et_app_demo", "model_path": "http://xxxxx/your_model.zip", "processor": "easytransfer_cpu", "model_config": { "type":"text_classify_bert" } "metadata": { "instance": 1, "cpu": 1, "memory": 4000 } }
Model type Name Text matching text_match_bert Text classification text_classify_bert Sequence labeling sequence_labeling_bert Text vectorization vectorization_bert - Deploy the model on a GPU node (the public resource group is used in this example)
EasyNLP processor
The EasyNLP processor that EAS provides can load PyTorch-based deep learning NLP models that are trained based on the EasyNLP framework.
- Upload the model file to the console
Set the Processor Type parameter to EasyNLP. For more information, see Upload and deploy models in the console.
- Use the EASCMD client to deploy the modelIn the service.json service configuration file, set the processor parameter to easynlp. Set the type field of the model_config parameter to the model type that you want to use. In the following examples, a single-label text classification model is used. For more information about other parameters, see Create a service.
The following table lists the supported model types.{ "name": "easynlp_app_demo", "metadata": { "instance": 1 }, "cloud": { "computing": { "instance_type": "ecs.gn6i-c4g1.xlarge" } }, "model_config": { "app_name": "text_classify", "type": "text_classify" }, "model_path": "http://xxxxx/your_model.tar.gz", "processor": "easynlp" }
Model type Name Single-label text classification text_classify Multi-label text classification text_classify_multi Text matching text_match Sequence labeling sequence_labeling Text vectorization vectorization Summary generation for Chinese text (GPU) sequence_generation_zh Summary generation for English text (GPU) sequence_generation_en Machine reading comprehension for Chinese text machine_reading_comprehension_zh Machine reading comprehension for English text machine_reading_comprehension_en WUKONG_CLIP (GPU) wukong_clip CLIP (GPU) clip
EasyCV processor
The EasyCV processor that EAS provides can load deep learning models that are trained based on the EasyCV framework.
- Upload the model file to the console
Set the Processor Type parameter to EasyCV. For more information, see Upload and deploy models in the console.
- Use the EASCMD client to deploy the modelIn the service.json service configuration file, set the processor parameter to easycv. Set the type field of the model_config parameter to the model type that you want to use. In the following examples, an image classification model is used. For more information about other parameters, see Create a service.
The following table lists the supported model types.{ "name": "easycv_classification_example", "processor": "easycv", "model_path": "oss://examplebucket/epoch_10_export.pt", "model_config": {"type":"TorchClassifier"}, "metadata": { "instance": 1 }, "cloud": { "computing": { "instance_type": "ecs.gn5i-c4g1.xlarge" } } }
Model type model_config Image classification {"type":"TorchClassifier"} Object detection {"type":"DetectionPredictor"} Semantic segmentation {"type":"SegmentationPredictor"} YOLOX {"type":"YoloXPredictor"} Video classification {"type":"VideoClassificationPredictor"}
import requests
import base64
import json
resp = requests.get('http://exmaplebucket.oss-cn-zhangjiakou.aliyuncs.com/images/000000123213.jpg')
ENCODING = 'utf-8'
datas = json.dumps( {
"image": base64.b64encode(resp.content).decode(ENCODING)
})
head = {
"Authorization": "NTFmNDJlM2E4OTRjMzc3OWY0NzI3MTg5MzZmNGQ5Yj***"
}
for x in range(0,10):
resp = requests.post("http://150231884461***.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test_easycv_classification_example", data=datas, headers=head)
print(resp.text)
You must convert the images and video files to the Base64 format for transmission. Use image to indicate image data and use video to indicate video data. EasyVision processor
The EasyVision processor that EAS provides can load deep learning models that are trained based on the EasyVision framework.
- Upload the model file to the console
Set the Processor Type parameter to EasyVision. For more information, see Upload and deploy models in the console.
- Use the EASCMD client to deploy the modelIn the service.json service configuration file, set the processor parameter to easy_vision_cpu_tf1.12_torch151 or easy_vision_gpu_tf1.12_torch151 based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. Set the type parameter in the model_config section to the type of the model that is trained. The following code block shows an example. For more information about other parameters, see Create a service.
- Deploy the model on a GPU node
{ "name": "ev_app_demo", "processor": "easy_vision_gpu_tf1.12_torch151", "model_path": "oss://path/to/your/model", "model_config": "{\"type\":\"classifier\"}", "metadata": { "resource": "your_resource_name", "cuda": "9.0", "instance": 1, "memory": 4000, "gpu": 1, "cpu": 4, "rpc.worker_threads" : 5 } }
- Deploy the model on a CPU node
{ "name": "ev_app_cpu_demo", "processor": "easy_vision_cpu_tf1.12_torch151", "model_path": "oss://path/to/your/model", "model_config": "{\"type\":\"classifier\"}", "metadata": { "resource": "your_resource_name", "instance": 1, "memory": 4000, "gpu": 0, "cpu": 4, "rpc.worker_threads" : 5 } }
- Deploy the model on a GPU node
MediaFlow processor
The MediaFlow processor that EAS provides is a general-purpose orchestration engine that can analyze and process video, audio, and images.
- Upload the model file to the console
Set the Processor Type parameter to MediaFlow. For more information, see Upload and deploy models in the console.
- Use the EASCMD client to deploy the modelIn the service.json service configuration file, set the processor parameter to mediaflow. In addition, you must set the following parameters if you use the MediaFlow processor to deploy models. For more information about other parameters, see Create a service.
- graph_pool_size: the number of graph pools.
- worker_threads: the number of worker threads.
- Deploy a model for video classification
{ "model_entry": "video_classification/video_classification_ext.js", "name": "video_classification", "model_path": "oss://path/to/your/model", "generate_token": "true", "processor": "mediaflow", "model_config" : { "graph_pool_size":8, "worker_threads":16 }, "metadata": { "eas.handlers.disable_failure_handler" :true, "resource": "your_resource_name", "rpc.worker_threads": 30, "rpc.enable_jemalloc": true, "rpc.keepalive": 500000, "cpu": 4, "instance": 1, "cuda": "9.0", "rpc.max_batch_size": 64, "memory": 10000, "gpu": 1 } }
- Deploy a model for automated speech recognition (ASR)
{ "model_entry": "asr/video_asr_ext.js", "name": "video_asr", "model_path": "oss://path/to/your/model", "generate_token": "true", "processor": "mediaflow", "model_config" : { "graph_pool_size":8, "worker_threads":16 }, "metadata": { "eas.handlers.disable_failure_handler" :true, "resource": "your_resource_name", "rpc.worker_threads": 30, "rpc.enable_jemalloc": true, "rpc.keepalive": 500000, "cpu": 4, "instance": 1, "cuda": "9.0", "rpc.max_batch_size": 64, "memory": 10000, "gpu": 1 } }
Triton processor
- Supports multiple open source frameworks such as TensorFlow, PyTorch, ONNX Runtime, TensorRT, and custom framework backends.
- Concurrently runs multiple models on one GPU to maximize GPU utilization.
- Supports the HTTP and gRPC protocols and allows you to send requests in binary format to reduce the request size.
- Supports the dynamic batching feature to improve service throughput.
- The Triton processor is available for public preview only in the China (Shanghai) region. The processor is unavailable in other regions.
- The models that are deployed by using the Triton processor must be stored in Object Storage Service (OSS). Therefore, you must activate OSS and upload model files to OSS before you can use the Triton processor to deploy models. For more information about how to upload objects to OSS, see Upload objects.
- Use the Triton processor to deploy a modelYou can use the Triton processor to deploy models only by using the EASCMD client. For more information about how to use the EASCMD client to deploy models, see Create a service. In the service.json service configuration file, set the processor parameter to triton. In addition, you must set the parameters related to OSS so that the Triton processor can obtain model files from OSS. The following code block shows how to modify the service.json service configuration file:
The following table describes the parameters that are required if you use the Triton processor to deploy models. For more information about other parameters, see Run commands to use the EASCMD client.{ "name": "triton_test", "processor": "triton", "processor_params": [ "--model-repository=oss://triton-model-repo/models", "--allow-http=true", ], "metadata": { "instance": 1, "cpu": 4, "gpu": 1, "memory": 10000, "resource":"<your resource id>" } }
Parameter Description processor_params The parameters that you want to pass to Triton Inference Server when the deployment starts. Parameters that are not supported are automatically filtered out by Triton Inference Server. The following Parameters that can be passed to Triton Inference Server table describes the parameters that can be passed to Triton Inference Server. The model-repository parameter is required. For more information about optional parameters, see main.cc. oss_endpoint The endpoint of OSS. If you do not specify an endpoint, the system automatically uses the OSS service in the region where the EAS service is deployed. If you want to use the OSS service that is activated in another region, you must set this parameter. For more information about the valid values of this parameter, see Regions and endpoints. metadata resource The ID of the dedicated resource group that is used to deploy the model to EAS. If you want to deploy a model by using the Triton processor, the resources to be used must belong to the dedicated resource group of EAS. For more information about how to create a dedicated resource group in EAS, see Dedicated resource group. Table 1. Parameters that can be passed to Triton Inference Server Parameter Required Description model-repository Yes The OSS path of the model. You must set the model-repository parameter to a subdirectory of an OSS bucket instead of the root directory of the OSS bucket. For example, you can set the parameter to
oss://triton-model-repo/models
. triton-model-repo is the name of the OSS bucket, and models is a subdirectory of the OSS bucket.log-verbose No For more information, see main.cc. log-info No log-warning No log-error No exit-on-error No strict-model-config No strict-readiness No allow-http No http-thread-count No pinned-memory-pool-byte-size No cuda-memory-pool-byte-size No min-supported-compute-capability No buffer-manager-thread-count No backend-config No - Use the official Triton client to call the service deployed by using the Triton processorBefore you use the Triton client for Python to call the deployed service, run the following commands to install the official Triton client:
Run the following command to download a test image to the current directory:pip3 install nvidia-pyindex pip3 install tritonclient[all]
The following code block shows that the Triton client for Python sends a request in binary format to the service that is deployed by using the Triton processor:wget http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/doc-assets/cat.png
import numpy as np import time from PIL import Image import tritonclient.http as httpclient from tritonclient.utils import InferenceServerException URL = "<servcice url>" # Replace <servcice url> with the endpoint of the deployed service. HEADERS = {"Authorization": "<service token>"} # Replace <service token> with the token that is used to access the service. input_img = httpclient.InferInput("input", [1, 299, 299, 3], "FP32") img = Image.open('./cat.png').resize((299, 299)) img = np.asarray(img).astype('float32') / 255.0 input_img.set_data_from_numpy(img.reshape([1, 299, 299, 3]), binary_data=True) output = httpclient.InferRequestedOutput( "InceptionV3/Predictions/Softmax", binary_data=True ) triton_client = httpclient.InferenceServerClient(url=URL, verbose=False) start = time.time() for i in range(10): results = triton_client.infer( "inception_graphdef", inputs=[input_img], outputs=[output], headers=HEADERS ) res_body = results.get_response() elapsed_ms = (time.time() - start) * 1000 if i == 0: print("model name: ", res_body["model_name"]) print("model version: ", res_body["model_version"]) print("output name: ", res_body["outputs"][0]["name"]) print("output shape: ", res_body["outputs"][0]["shape"]) print("[{}] Avg rt(ms): {:.2f}".format(i, elapsed_ms)) start = time.time()