A processor is a package of online prediction logic. Elastic Algorithm Service (EAS) provides built-in processors to deploy standard models, eliminating the need to develop this logic yourself.
The following table lists the processor names and codes in EAS. Provide the processor code when deploying a service with EASCMD.
|
Processor name |
Processor code (EASCMD only) |
Reference |
|
|
CPU edition |
GPU edition |
||
|
EasyRec |
easyrec-2.4 |
easyrec-2.4 |
|
|
TorchEasyRec |
easyrec-torch-1.0 |
easyrec-torch-1.0 |
|
|
PMML |
pmml |
None |
|
|
TensorFlow 1.12 |
tensorflow_cpu_1.12 |
tensorflow_gpu_1.12 |
|
|
TensorFlow 1.14 |
tensorflow_cpu_1.14 |
tensorflow_gpu_1.14 |
|
|
TensorFlow 1.15 |
tensorflow_cpu_1.15 |
tensorflow_gpu_1.15 |
TensorFlow 1.15 processor (includes the PAI-Blade agility edition optimization engine) |
|
TensorFlow 2.3 |
tensorflow_cpu_2.3 |
None |
|
|
PyTorch 1.6 |
pytorch_cpu_1.6 |
pytorch_gpu_1.6 |
PyTorch 1.6 processor (includes the PAI-Blade agility edition optimization engine) |
|
Caffe |
caffe_cpu |
caffe_gpu |
|
|
Parameter Server |
parameter_server |
None |
|
|
Alink |
alink_pai_processor |
None |
None |
|
xNN |
xnn_cpu |
None |
None |
|
EasyVision |
easy_vision_cpu_tf1.12_torch151 |
easy_vision_gpu_tf1.12_torch151 |
|
|
EasyTransfer |
easytransfer_cpu |
easytransfer_gpu |
|
|
EasyNLP |
easynlp |
easynlp |
|
|
EasyCV |
easycv |
easycv |
|
|
Blade |
blade_cpu |
blade_cuda10.0_beta |
None |
|
MediaFlow |
None |
mediaflow |
|
|
Triton |
None |
triton |
|
PMML processor
The PMML processor in EAS:
-
Loads a PMML model file as a service.
-
Processes requests to the model service.
-
Calculates and returns prediction results to the client.
The PMML processor provides a default strategy for handling missing values. If no isMissing policy is specified for the feature columns in the PMML model file, the system imputes them with the following defaults.
|
Type |
Default |
|
BOOLEAN |
false |
|
DOUBLE |
0.0 |
|
FLOAT |
0.0 |
|
INT |
0 |
|
STRING |
"" |
Deploy a PMML model in any of the following ways:
-
Console
Set the Processor Type parameter to PMML. For more information, see Deploy a model service by using the console.
-
EASCMD client
In the service.json configuration file, set processor to pmml. Example:
{ "processor": "pmml", "generate_token": "true", "model_path": "http://xxxxx/lr.pmml", "name": "eas_lr_example", "metadata": { "instance": 1, "cpu": 1 # EAS allocates 4 GB of memory per CPU core (1 Quota). } } -
Data Science Workshop (DSW)
Similar to using the EASCMD client. Create a service.json configuration file. For more information, see Deploy a model service by using EASCMD.
TensorFlow 1.12 processor
The EAS TensorFlow 1.12 processor loads TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deployment. For more information, see TensorFlow FAQ.
This processor does not support custom TensorFlow operations.
Deploy a TensorFlow model in one of the following ways:
-
Console
Set Processor Type to TensorFlow1.12. For more information, see Deploy a custom inference service.
-
EASCMD client
In the service.json configuration file, set processor to tensorflow_cpu_1.12 or tensorflow_gpu_1.12. Select the code based on deployment resources. A mismatch between processor and resource type causes deployment failure. Example:
{ "name": "tf_serving_test", "generate_token": "true", "model_path": "http://xxxxx/savedmodel_example.zip", "processor": "tensorflow_cpu_1.12", "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } } -
DSW
Similar to using the EASCMD client. Create a service.json configuration file. For more information, see Deploy model services by using EASCMD.
TensorFlow 1.14 processor
The EAS TensorFlow 1.14 processor loads TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deployment. For more information, see TensorFlow FAQ.
This processor does not support custom TensorFlow operations.
Deploy a TensorFlow model in one of the following ways:
-
Console
Set Processor Type to TensorFlow1.14. For more information, see Deploy a custom inference service.
-
EASCMD client
In the service.json configuration file, set processor to tensorflow_cpu_1.14 or tensorflow_gpu_1.14. Select the code that matches your deployment resources. A mismatch between processor and resource type causes deployment failure. Example:
{ "name": "tf_serving_test", "generate_token": "true", "model_path": "http://xxxxx/savedmodel_example.zip", "processor": "tensorflow_cpu_1.14", "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } } -
DSW
Similar to using the EASCMD client. Create a service.json configuration file. For more information, see Deploy model services by using EASCMD.
TensorFlow 1.15 processor (PAI-Blade Agility Edition)
The EAS TensorFlow 1.15 processor loads TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deployment. For more information, see TensorFlow FAQ.
-
This processor does not support custom TensorFlow operations.
-
This processor includes the PAI-Blade Agility Edition optimization engine for deploying PAI-Blade-optimized TensorFlow models.
Deploy a TensorFlow model in one of the following ways:
-
Console
Set Processor Type to TensorFlow1.15. For more information, see Deploy a custom inference service.
-
EASCMD
In the service.json configuration file, set processor to tensorflow_cpu_1.15 or tensorflow_gpu_1.15. Select the code that matches your deployment resources. A mismatch between processor and resource type causes deployment failure. Example:
{ "name": "tf_serving_test", "generate_token": "true", "model_path": "http://xxxxx/savedmodel_example.zip", "processor": "tensorflow_cpu_1.15", "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } } -
DSW
Similar to using EASCMD. Create a service.json configuration file. For more information, see Deploy model services by using EASCMD. For parameter descriptions, see Create a service.
TensorFlow 2.3 processor
The EAS TensorFlow 2.3 processor loads TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deployment. For more information, see TensorFlow FAQ.
This processor does not support custom TensorFlow operations.
Deploy a TensorFlow model in one of the following ways:
-
Console
Set Processor Type to TensorFlow2.3. For more information, see Deploy a service by using the console.
-
EASCMD
In the service.json configuration file, set processor to tensorflow_cpu_2.3 Example:
{ "name": "tf_serving_test", "generate_token": "true", "model_path": "http://xxxxx/savedmodel_example.zip", "processor": "tensorflow_cpu_2.3", "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } } -
DSW
Similar to using EASCMD. Create a service.json configuration file. For more information, see Deploy model services by using EASCMD.
PyTorch 1.6 processor (PAI-Blade Agility Edition)
The EAS PyTorch 1.6 processor loads models in TorchScript format. For more information, see the official TorchScript documentation.
-
This processor does not support PyTorch extensions or non-tensor model inputs and outputs.
-
This processor includes the PAI-Blade (Agility Edition) optimization engine for deploying optimized PyTorch models.
Deploy a TorchScript model in one of the following ways:
-
Console
Set Processor Type to PyTorch 1.6. For more information, see Deploy a custom inference service.
-
EASCMD client
In the service.json configuration file, set processor to pytorch_cpu_1.6 or pytorch_gpu_1.6. Select a value based on deployment resources. A mismatch between processor and resource type causes deployment failure. Example:
{ "name": "pytorch_serving_test", "generate_token": "true", "model_path": "http://xxxxx/torchscript_model.pt", "processor": "pytorch_gpu_1.6", "metadata": { "instance": 1, "cpu": 1, "gpu": 1, "cuda": "10.0", "memory": 2000 } } -
DSW
Similar to using the EASCMD client. Create a service.json configuration file. For more information, see Deploy model services by using EASCMD. For parameter descriptions, see Create a service.
Caffe processor
The EAS Caffe processor loads deep learning models trained with Caffe. Specify the model and weight file names in the model package.
This processor does not support custom data layers.
Deploy a Caffe model in the following ways:
-
Console
Set Processor Type to Caffe. For more information, see Deploy a custom inference service.
-
EASCMD client
In the service.json configuration file, set processor to caffe_cpu or caffe_gpu based on the resource type. A mismatch between processor and resource type causes deployment failure. Example:
{ "name": "caffe_serving_test", "generate_token": "true", "model_path": "http://xxxxx/caffe_model.zip", "processor": "caffe_cpu", "model_config": { "model": "deploy.prototxt", "weight": "bvlc_reference_caffenet.caffemodel" }, "metadata": { "instance": 1, "cpu": 1, "gpu": 0, "memory": 2000 } } -
DSW
Similar to using the EASCMD client. Create a service.json configuration file. For more information, see Deploy model services by using EASCMD.
PS processor
The EAS PS processor loads models in the PS format.
Deploy a PS model and send requests to the service.
-
Deploy a PS model in one of the following ways:
-
Console
Set Processor Type to PS Algorithm. For more information, see Custom deployment.
-
EASCMD client
In the service.json configuration file, set processor to parameter_sever.
{ "name":"ps_smart", "model_path": "oss://examplebucket/xlab_m_pai_ps_smart_b_1058272_v0.tar.gz", "processor": "parameter_sever", "metadata": { "region": "beijing", "cpu": 1, "instance": 1, "memory": 2048 } } -
DSW
Similar to using the EASCMD client. Create a service.json configuration file. For more information, see Deploy model services using the EASCMD client.
-
-
Request format
The processor supports both single and batch predictions. The request format is the same: a JSON array of feature objects.
-
Single request example
curl "http://eas.location/api/predict/ps_smart" -d "[ { "f0": 1, "f1": 0.2, "f3": 0.5 } ]" -
Batch request example
curl "http://eas.location/api/predict/ps_smart" -d "[ { "f0": 1, "f1": 0.2, "f3": 0.5 }, { "f0": 1, "f1": 0.2, "f3": 0.5 } ]" -
Response
The response format is the same for single and batch requests: an array of response objects. Each response object corresponds to the request object at the same position.
[ { "label":"xxxx", "score" : 0.2, "details" : [{"k1":0.3}, {"k2":0.5}] }, { "label":"xxxx", "score" : 0.2, "details" : [{"k1":0.3}, {"k2":0.5}] } ]
-
EasyTransfer processor
The EAS EasyTransfer processor loads TensorFlow-based NLP models trained with EasyTransfer.
Deploy an EasyTransfer model in the following ways:
-
Console
Select EasyTransfer for the Processor Type parameter. For more information, see Deploy a custom inference service.
-
EASCMD client
In the service.json configuration file, set processor to easytransfer_cpu or easytransfer_gpu based on deployment resources. A mismatch between processor and resources causes deployment failure. In model_config, set type to the model type used during training. The following example uses a text classification model. For other parameters, see Create a service.
-
Configuration for GPU deployment (using a public resource group as an example)
{ "name": "et_app_demo", "metadata": { "instance": 1 }, "cloud": { "computing": { "instance_type": "ecs.gn6i-c4g1.xlarge" } }, "model_path": "http://xxxxx/your_model.zip", "processor": "easytransfer_gpu", "model_config": { "type": "text_classify_bert" } } -
Configuration for CPU deployment
{ "name": "et_app_demo", "model_path": "http://xxxxx/your_model.zip", "processor": "easytransfer_cpu", "model_config": { "type":"text_classify_bert" }, "metadata": { "instance": 1, "cpu": 1, "memory": 4000 } }
Supported task types:
Task type
Type
Text matching
text_match_bert
Text classification
text_classify_bert
Sequence labeling
sequence_labeling_bert
Text vectorization
vectorization_bert
-
EasyNLP processor
The EAS EasyNLP processor loads PyTorch-based NLP models trained with EasyNLP.
Deploy an EasyNLP model in one of the following ways:
-
Console
Set Processor Type to EasyNLP. For more information, see Deploy a custom inference service.
-
EASCMD client
In the service.json configuration file, set processor to easynlp. In model_config, set type to the training task type. The following example uses a single-label text classification model. For other parameters, see Create a service.
{ "name": "easynlp_app_demo", "metadata": { "instance": 1 }, "cloud": { "computing": { "instance_type": "ecs.gn6i-c4g1.xlarge" } }, "model_config": { "app_name": "text_classify", "type": "text_classify" }, "model_path": "http://xxxxx/your_model.tar.gz", "processor": "easynlp" }Supported task types:
Task type
Value
Single-label text classification
text_classify
Multi-label text classification
text_classify_multi
Text matching
text_match
Sequence labeling
sequence_labeling
Text vectorization
vectorization
Chinese text summarization (GPU)
sequence_generation_zh
English text summarization (GPU)
sequence_generation_en
Machine reading comprehension (Chinese)
machine_reading_comprehension_zh
Machine reading comprehension (English)
machine_reading_comprehension_en
WUKONG_CLIP (GPU)
wukong_clip
CLIP (GPU)
clip
After deployment, on the Elastic Algorithm Service (EAS) page, click Invocation Information in the Service Type column of the target service to view the endpoint and token. Call the service using the following Python example.
import requests
# Replace with your service endpoint.
url = '<eas-service-url>'
# Replace with your token.
token = '<eas-service-token>'
# Prepare the request data. The following example is for text classification.
request_body = {
"first_sequence": "hello"
}
headers = {"Authorization": token}
resp = requests.post(url=url, headers=headers, json=request_body)
print(resp.content.decode())
EasyCV processor
The EAS EasyCV processor loads deep learning models trained with EasyCV.
Deploy an EasyCV model in one of the following ways:
-
Console
Set Processor Type to EasyCV. For more information, see Deploy a custom inference service.
-
EASCMD client
In the service.json configuration file, set processor to easycv. In model_config, set type to the model type used during training. The following example uses an image classification model. For other parameters, see Create a service.
{ "name": "easycv_classification_example", "processor": "easycv", "model_path": "oss://examplebucket/epoch_10_export.pt", "model_config": {"type":"TorchClassifier"}, "metadata": { "instance": 1 }, "cloud": { "computing": { "instance_type": "ecs.gn5i-c4g1.xlarge" } } }Supported job types:
Job type
model_config
Image classification
{"type":"TorchClassifier"}
Object detection
{"type":"DetectionPredictor"}
Semantic segmentation
{"type":"SegmentationPredictor"}
YOLOX
{"type":"YoloXPredictor"}
Video classification
{"type":"VideoClassificationPredictor"}
After deployment, go to the Elastic Algorithm Service (EAS) page. Find the service, and in the Service Type column, click Invocation Information to view the endpoint and token. The following Python example shows how to call the service.
import requests
import base64
import json
resp = requests.get('http://examplebucket.oss-cn-zhangjiakou.aliyuncs.com/images/000000123213.jpg')
ENCODING = 'utf-8'
datas = json.dumps( {
"image": base64.b64encode(resp.content).decode(ENCODING)
})
# Replace with your authentication token.
head = {
"Authorization": "NTFmNDJlM2E4OTRjMzc3OWY0NzI3MTg5MzZmNGQ5Yj***"
}
for x in range(0,10):
# Replace with your service endpoint.
resp = requests.post("http://150231884461***.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/easycv_classification_example", data=datas, headers=head)
print(resp.text)
Base64-encode the image or video data for transmission. Use the image key for image data and the video key for video data.
EasyVision processor
The EAS EasyVision processor loads deep learning models trained with EasyVision.
Deploy an EasyVision model in one of the following ways:
-
Console
Set Processor Type to EasyVision. For more information, see Deploy a custom inference service.
-
EASCMD client
In the service.json configuration file, set processor to easy_vision_cpu_tf1.12_torch151 or easy_vision_gpu_tf1.12_torch151. Select the code that matches your deployment resources. A mismatch between processor and resource type causes deployment failure. In model_config, set type to the model type used for training. Examples. For other parameters, see Create a service:
-
Configuration for GPU deployment
{ "name": "ev_app_demo", "processor": "easy_vision_gpu_tf1.12_torch151", "model_path": "oss://path/to/your/model", "model_config": "{\"type\":\"classifier\"}", "metadata": { "resource": "your_resource_name", "cuda": "9.0", "instance": 1, "memory": 4000, "gpu": 1, "cpu": 4, "rpc.worker_threads" : 5 } } -
Configuration for CPU deployment
{ "name": "ev_app_cpu_demo", "processor": "easy_vision_cpu_tf1.12_torch151", "model_path": "oss://path/to/your/model", "model_config": "{\"type\":\"classifier\"}", "metadata": { "resource": "your_resource_name", "instance": 1, "memory": 4000, "gpu": 0, "cpu": 4, "rpc.worker_threads" : 5 } }
-
MediaFlow processor
The EAS MediaFlow processor is an orchestration engine for analyzing and processing video, audio, and images.
Deploy a MediaFlow model in one of the following ways:
-
Console
Set Processor Type to MediaFlow. For more information, see Deploy a custom inference service.
-
EASCMD client
In the service.json configuration file, set processor to mediaflow. This processor requires additional configuration fields. For other fields, see Create a service:
-
graph_pool_size: Number of graph pools.
-
worker_threads: Number of worker threads.
Examples:
-
Configuration for deploying a video classification model.
{ "model_entry": "video_classification/video_classification_ext.js", "name": "video_classification", "model_path": "oss://path/to/your/model", "generate_token": "true", "processor": "mediaflow", "model_config" : { "graph_pool_size":8, "worker_threads":16 }, "metadata": { "eas.handlers.disable_failure_handler" :true, "resource": "your_resource_name", "rpc.worker_threads": 30, "rpc.enable_jemalloc": true, "rpc.keepalive": 500000, "cpu": 4, "instance": 1, "cuda": "9.0", "rpc.max_batch_size": 64, "memory": 10000, "gpu": 1 } } -
Configuration for deploying an automated speech recognition (ASR) model.
{ "model_entry": "asr/video_asr_ext.js", "name": "video_asr", "model_path": "oss://path/to/your/model", "generate_token": "true", "processor": "mediaflow", "model_config" : { "graph_pool_size":8, "worker_threads":16 }, "metadata": { "eas.handlers.disable_failure_handler" :true, "resource": "your_resource_name", "rpc.worker_threads": 30, "rpc.enable_jemalloc": true, "rpc.keepalive": 500000, "cpu": 4, "instance": 1, "cuda": "9.0", "rpc.max_batch_size": 64, "memory": 10000, "gpu": 1 } }
The configurations for ASR and video classification differ mainly in model_entry, name, and model_path. Modify these fields for your model.
-
Triton processor
Triton Inference Server is an NVIDIA online serving framework. It provides an interface for deploying and managing models on GPUs and is compatible with the KFServing API standard. Key features:
-
Deploys models from various frameworks, such as TensorFlow, PyTorch, ONNX Runtime, TensorRT, and custom backends.
-
Runs multiple models concurrently on a GPU to improve utilization.
-
Supports HTTP/gRPC protocols and binary format extension to reduce request size.
-
Supports Dynamic Batching to improve service throughput.
Triton Inference Server is available on EAS as a built-in Triton processor.
-
Available only in public preview in the China (Shanghai) region.
-
All models must be stored in OSS. Activate OSS and upload your model files to an OSS bucket first. For more information, see Simple Upload.
Deploy and call a Triton processor service.
-
Deploy with the Triton processor
Deploy Triton model services only by using EASCMD. For more information, see Create a service. In the service.json configuration file, set processor to triton. Because Triton retrieves models from OSS, configure the required OSS parameters. Example service.json:
{ "name": "triton_test", "processor": "triton", "processor_params": [ "--model-repository=oss://triton-model-repo/models", "--allow-http=true", ], "metadata": { "instance": 1, "cpu": 4, "gpu": 1, "memory": 10000, "resource":"<your resource id>" } }Triton-specific parameters are listed below. For other parameters, see Parameters in service.json.
Parameter
Description
processor_params
Parameters passed to Triton Server at startup. Unsupported parameters are automatically filtered. Supported parameters are listed in the following set of parameters that can be passed to the Triton server. model-repository is required. For optional parameters, see main.cc.
oss_endpoint
OSS endpoint. If not specified, the system uses OSS in the same region as the EAS service. Specify this for cross-region OSS. For values, see Regions and Endpoints.
metadata
resource
ID of the EAS exclusive resource group for deploying the model service. The Triton processor requires an EAS exclusive resource group. For more information, see Use EAS exclusive resource groups.
Table 1. Supported parameters for the Triton server
Parameter
Required
Description
model-repository
是
路径需要指定为OSS路径,系统不支持直接使用Bucket根目录作为model-repository,需要指定Bucket下的某个子目录才可以。
例如,
oss://triton-model-repo/models,其中triton-model-repo为Bucket名称,models为Bucket下的一个子目录。log-verbose
No
For more information, see main.cc.
log-info
No
log-warning
No
log-error
No
exit-on-error
No
strict-model-config
No
strict-readiness
No
allow-http
No
http-thread-count
No
pinned-memory-pool-byte-size
No
cuda-memory-pool-byte-size
No
min-supported-compute-capability
No
buffer-manager-thread-count
No
backend-config
No
-
Call the service with the native Triton client
Install NVIDIA's official Triton client:
pip3 install nvidia-pyindex pip3 install tritonclient[all]Download a test image:
wget http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/doc-assets/cat.pngSend a binary-format request to the Triton processor service using the Python client:
import numpy as np import time from PIL import Image import tritonclient.http as httpclient from tritonclient.utils import InferenceServerException URL = "<service url>" # Replace <service url> with your service endpoint. HEADERS = {"Authorization": "<service token>"} # Replace <service token> with your service access token. input_img = httpclient.InferInput("input", [1, 299, 299, 3], "FP32") img = Image.open('./cat.png').resize((299, 299)) img = np.asarray(img).astype('float32') / 255.0 input_img.set_data_from_numpy(img.reshape([1, 299, 299, 3]), binary_data=True) output = httpclient.InferRequestedOutput( "InceptionV3/Predictions/Softmax", binary_data=True ) triton_client = httpclient.InferenceServerClient(url=URL, verbose=False) start = time.time() for i in range(10): results = triton_client.infer( "inception_graphdef", inputs=[input_img], outputs=[output], headers=HEADERS ) res_body = results.get_response() elapsed_ms = (time.time() - start) * 1000 if i == 0: print("model name: ", res_body["model_name"]) print("model version: ", res_body["model_version"]) print("output name: ", res_body["outputs"][0]["name"]) print("output shape: ", res_body["outputs"][0]["shape"]) print("[{}] Avg rt(ms): {:.2f}".format(i, elapsed_ms)) start = time.time()