Built-in processors - Platform For AI - Alibaba Cloud Documentation Center

A processor is a package of online prediction logic. Elastic Algorithm Service (EAS) provides built-in processors to deploy standard models, eliminating the need to develop this logic yourself.

The following table lists the processor names and codes in EAS. Provide the processor code when deploying a service with EASCMD.

Processor name	Processor code (EASCMD only)		Reference
Processor name	CPU edition	GPU edition	Reference
EasyRec	easyrec-2.4	easyrec-2.4	EasyRec processor
TorchEasyRec	easyrec-torch-1.0	easyrec-torch-1.0	TorchEasyRec processor
PMML	pmml	None	PMML processor
TensorFlow 1.12	tensorflow_cpu_1.12	tensorflow_gpu_1.12	TensorFlow 1.12 processor
TensorFlow 1.14	tensorflow_cpu_1.14	tensorflow_gpu_1.14	TensorFlow 1.14 processor
TensorFlow 1.15	tensorflow_cpu_1.15	tensorflow_gpu_1.15	TensorFlow 1.15 processor (includes the PAI-Blade agility edition optimization engine)
TensorFlow 2.3	tensorflow_cpu_2.3	None	TensorFlow 2.3 processor
PyTorch 1.6	pytorch_cpu_1.6	pytorch_gpu_1.6	PyTorch 1.6 processor (includes the PAI-Blade agility edition optimization engine)
Caffe	caffe_cpu	caffe_gpu	Caffe processor
Parameter Server	parameter_server	None	Parameter Server processor
Alink	alink_pai_processor	None	None
xNN	xnn_cpu	None	None
EasyVision	easy_vision_cpu_tf1.12_torch151	easy_vision_gpu_tf1.12_torch151	EasyVision processor
EasyTransfer	easytransfer_cpu	easytransfer_gpu	EasyTransfer processor
EasyNLP	easynlp	easynlp	EasyNLP processor
EasyCV	easycv	easycv	EasyCV processor
Blade	blade_cpu	blade_cuda10.0_beta	None
MediaFlow	None	mediaflow	MediaFlow processor
Triton	None	triton	Triton processor

PMML processor

The PMML processor in EAS:

Loads a PMML model file as a service.
Processes requests to the model service.
Calculates and returns prediction results to the client.

The PMML processor provides a default strategy for handling missing values. If no isMissing policy is specified for the feature columns in the PMML model file, the system imputes them with the following defaults.

Type	Default
BOOLEAN	false
DOUBLE	0.0
FLOAT	0.0
INT	0
STRING	""

Deploy a PMML model in any of the following ways:

Console
Set the Processor Type parameter to PMML. For more information, see Deploy a model service by using the console.

EASCMD client

In the service.json configuration file, set processor to pmml. Example:

{
  "processor": "pmml",
  "generate_token": "true",
  "model_path": "http://xxxxx/lr.pmml",
  "name": "eas_lr_example",
  "metadata": {
    "instance": 1,
    "cpu": 1 # EAS allocates 4 GB of memory per CPU core (1 Quota).
  }
}

Data Science Workshop (DSW)
Similar to using the EASCMD client. Create a service.json configuration file. For more information, see Deploy a model service by using EASCMD.

TensorFlow 1.12 processor

The EAS TensorFlow 1.12 processor loads TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deployment. For more information, see TensorFlow FAQ.

Note

This processor does not support custom TensorFlow operations.

Deploy a TensorFlow model in one of the following ways:

Console
Set Processor Type to TensorFlow1.12. For more information, see Deploy a custom inference service.

EASCMD client

In the service.json configuration file, set processor to tensorflow_cpu_1.12 or tensorflow_gpu_1.12. Select the code based on deployment resources. A mismatch between processor and resource type causes deployment failure. Example:

{
  "name": "tf_serving_test",
  "generate_token": "true",
  "model_path": "http://xxxxx/savedmodel_example.zip",
  "processor": "tensorflow_cpu_1.12",
  "metadata": {
    "instance": 1,
    "cpu": 1,
    "gpu": 0,
    "memory": 2000
  }
}

DSW
Similar to using the EASCMD client. Create a service.json configuration file. For more information, see Deploy model services by using EASCMD.

TensorFlow 1.14 processor

The EAS TensorFlow 1.14 processor loads TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deployment. For more information, see TensorFlow FAQ.

Note

This processor does not support custom TensorFlow operations.

Deploy a TensorFlow model in one of the following ways:

Console
Set Processor Type to TensorFlow1.14. For more information, see Deploy a custom inference service.
EASCMD client
In the service.json configuration file, set processor to tensorflow_cpu_1.14 or tensorflow_gpu_1.14. Select the code that matches your deployment resources. A mismatch between processor and resource type causes deployment failure. Example:
```
{
  "name": "tf_serving_test",
  "generate_token": "true",
  "model_path": "http://xxxxx/savedmodel_example.zip",
  "processor": "tensorflow_cpu_1.14",
  "metadata": {
    "instance": 1,
    "cpu": 1,
    "gpu": 0,
    "memory": 2000
  }
}
```
DSW
Similar to using the EASCMD client. Create a service.json configuration file. For more information, see Deploy model services by using EASCMD.

TensorFlow 1.15 processor (PAI-Blade Agility Edition)

The EAS TensorFlow 1.15 processor loads TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deployment. For more information, see TensorFlow FAQ.

Note

This processor does not support custom TensorFlow operations.
This processor includes the PAI-Blade Agility Edition optimization engine for deploying PAI-Blade-optimized TensorFlow models.

Deploy a TensorFlow model in one of the following ways:

Console
Set Processor Type to TensorFlow1.15. For more information, see Deploy a custom inference service.

EASCMD

In the service.json configuration file, set processor to tensorflow_cpu_1.15 or tensorflow_gpu_1.15. Select the code that matches your deployment resources. A mismatch between processor and resource type causes deployment failure. Example:

{
  "name": "tf_serving_test",
  "generate_token": "true",
  "model_path": "http://xxxxx/savedmodel_example.zip",
  "processor": "tensorflow_cpu_1.15",
  "metadata": {
    "instance": 1,
    "cpu": 1,
    "gpu": 0,
    "memory": 2000
  }
}

DSW
Similar to using EASCMD. Create a service.json configuration file. For more information, see Deploy model services by using EASCMD. For parameter descriptions, see Create a service.

TensorFlow 2.3 processor

The EAS TensorFlow 2.3 processor loads TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deployment. For more information, see TensorFlow FAQ.

Note

This processor does not support custom TensorFlow operations.

Deploy a TensorFlow model in one of the following ways:

Console
Set Processor Type to TensorFlow2.3. For more information, see Deploy a service by using the console.

EASCMD

In the service.json configuration file, set processor to tensorflow_cpu_2.3 Example:

{
  "name": "tf_serving_test",
  "generate_token": "true",
  "model_path": "http://xxxxx/savedmodel_example.zip",
  "processor": "tensorflow_cpu_2.3",
  "metadata": {
    "instance": 1,
    "cpu": 1,
    "gpu": 0,
    "memory": 2000
  }
}

DSW
Similar to using EASCMD. Create a service.json configuration file. For more information, see Deploy model services by using EASCMD.

PyTorch 1.6 processor (PAI-Blade Agility Edition)

The EAS PyTorch 1.6 processor loads models in TorchScript format. For more information, see the official TorchScript documentation.

Note

This processor does not support PyTorch extensions or non-tensor model inputs and outputs.
This processor includes the PAI-Blade (Agility Edition) optimization engine for deploying optimized PyTorch models.

Deploy a TorchScript model in one of the following ways:

Console
Set Processor Type to PyTorch 1.6. For more information, see Deploy a custom inference service.

EASCMD client

In the service.json configuration file, set processor to pytorch_cpu_1.6 or pytorch_gpu_1.6. Select a value based on deployment resources. A mismatch between processor and resource type causes deployment failure. Example:

{
  "name": "pytorch_serving_test",
  "generate_token": "true",
  "model_path": "http://xxxxx/torchscript_model.pt",
  "processor": "pytorch_gpu_1.6",
  "metadata": {
    "instance": 1,
    "cpu": 1,
    "gpu": 1,
    "cuda": "10.0",
    "memory": 2000
  }
}

DSW
Similar to using the EASCMD client. Create a service.json configuration file. For more information, see Deploy model services by using EASCMD. For parameter descriptions, see Create a service.

Caffe processor

The EAS Caffe processor loads deep learning models trained with Caffe. Specify the model and weight file names in the model package.

Note

This processor does not support custom data layers.

Deploy a Caffe model in the following ways:

Console
Set Processor Type to Caffe. For more information, see Deploy a custom inference service.

EASCMD client

In the service.json configuration file, set processor to caffe_cpu or caffe_gpu based on the resource type. A mismatch between processor and resource type causes deployment failure. Example:

{
  "name": "caffe_serving_test",
  "generate_token": "true",
  "model_path": "http://xxxxx/caffe_model.zip",
  "processor": "caffe_cpu",
  "model_config": {
    "model": "deploy.prototxt",
    "weight": "bvlc_reference_caffenet.caffemodel"
  },
  "metadata": {
    "instance": 1,
    "cpu": 1,
    "gpu": 0,
    "memory": 2000
  }
}

DSW
Similar to using the EASCMD client. Create a service.json configuration file. For more information, see Deploy model services by using EASCMD.

PS processor

The EAS PS processor loads models in the PS format.

Deploy a PS model and send requests to the service.

Deploy a PS model in one of the following ways:
- Console
  Set Processor Type to PS Algorithm. For more information, see Custom deployment.
- EASCMD client
  In the service.json configuration file, set processor to parameter_sever.
```
{
  "name":"ps_smart",
  "model_path": "oss://examplebucket/xlab_m_pai_ps_smart_b_1058272_v0.tar.gz",
  "processor": "parameter_sever",
  "metadata": {
    "region": "beijing",
    "cpu": 1,
    "instance": 1,
    "memory": 2048
  }
}
```
- DSW
  Similar to using the EASCMD client. Create a service.json configuration file. For more information, see Deploy model services using the EASCMD client.

Request format

The processor supports both single and batch predictions. The request format is the same: a JSON array of feature objects.

Single request example

curl "http://eas.location/api/predict/ps_smart" -d "[
            {
                "f0": 1,
                "f1": 0.2,
                "f3": 0.5
            }
]"

Batch request example

curl "http://eas.location/api/predict/ps_smart" -d "[
        {
            "f0": 1,
            "f1": 0.2,
            "f3": 0.5
        },
        {
            "f0": 1,
            "f1": 0.2,
            "f3": 0.5
        }
]"

Response

The response format is the same for single and batch requests: an array of response objects. Each response object corresponds to the request object at the same position.

[
  {
    "label":"xxxx",
    "score" : 0.2,
    "details" : [{"k1":0.3}, {"k2":0.5}]
  },
  {
    "label":"xxxx",
    "score" : 0.2,
    "details" : [{"k1":0.3}, {"k2":0.5}]
  }
]

EasyTransfer processor

The EAS EasyTransfer processor loads TensorFlow-based NLP models trained with EasyTransfer.

Deploy an EasyTransfer model in the following ways:

Console
Select EasyTransfer for the Processor Type parameter. For more information, see Deploy a custom inference service.
EASCMD client
In the service.json configuration file, set processor to easytransfer_cpu or easytransfer_gpu based on deployment resources. A mismatch between processor and resources causes deployment failure. In model_config, set type to the model type used during training. The following example uses a text classification model. For other parameters, see Create a service.
- Configuration for GPU deployment (using a public resource group as an example)
```
{
  "name": "et_app_demo",
  "metadata": {
    "instance": 1
  },
  "cloud": {
    "computing": {
      "instance_type": "ecs.gn6i-c4g1.xlarge"
    }
  },
  "model_path": "http://xxxxx/your_model.zip",
  "processor": "easytransfer_gpu",
  "model_config": {
    "type": "text_classify_bert"
  }
}
```
- Configuration for CPU deployment
```
{
  "name": "et_app_demo",
  "model_path": "http://xxxxx/your_model.zip",
  "processor": "easytransfer_cpu",
  "model_config": {
    "type":"text_classify_bert"
  },
  "metadata": {
    "instance": 1,
    "cpu": 1,
    "memory": 4000
  }
}
```
Supported task types:
Task type
Type
Text matching
text_match_bert
Text classification
text_classify_bert
Sequence labeling
sequence_labeling_bert
Text vectorization
vectorization_bert

EasyNLP processor

The EAS EasyNLP processor loads PyTorch-based NLP models trained with EasyNLP.

Deploy an EasyNLP model in one of the following ways:

Console
Set Processor Type to EasyNLP. For more information, see Deploy a custom inference service.

EASCMD client

In the service.json configuration file, set processor to easynlp. In model_config, set type to the training task type. The following example uses a single-label text classification model. For other parameters, see Create a service.

{
  "name": "easynlp_app_demo",
  "metadata": {
    "instance": 1
  },
  "cloud": {
    "computing": {
      "instance_type": "ecs.gn6i-c4g1.xlarge"
    }
  },
  "model_config": {
    "app_name": "text_classify",
    "type": "text_classify"
  },
  "model_path": "http://xxxxx/your_model.tar.gz",
  "processor": "easynlp"
}

Supported task types:

Task type	Value
Single-label text classification	text_classify
Multi-label text classification	text_classify_multi
Text matching	text_match
Sequence labeling	sequence_labeling
Text vectorization	vectorization
Chinese text summarization (GPU)	sequence_generation_zh
English text summarization (GPU)	sequence_generation_en
Machine reading comprehension (Chinese)	machine_reading_comprehension_zh
Machine reading comprehension (English)	machine_reading_comprehension_en
WUKONG_CLIP (GPU)	wukong_clip
CLIP (GPU)	clip

After deployment, on the Elastic Algorithm Service (EAS) page, click Invocation Information in the Service Type column of the target service to view the endpoint and token. Call the service using the following Python example.

import requests
# Replace with your service endpoint.
url = '<eas-service-url>'
# Replace with your token.
token = '<eas-service-token>'
# Prepare the request data. The following example is for text classification.
request_body = {
    "first_sequence": "hello"
}
 
headers = {"Authorization": token}
resp = requests.post(url=url, headers=headers, json=request_body)
print(resp.content.decode())

EasyCV processor

The EAS EasyCV processor loads deep learning models trained with EasyCV.

Deploy an EasyCV model in one of the following ways:

Console
Set Processor Type to EasyCV. For more information, see Deploy a custom inference service.

EASCMD client

In the service.json configuration file, set processor to easycv. In model_config, set type to the model type used during training. The following example uses an image classification model. For other parameters, see Create a service.

{
  "name": "easycv_classification_example",
  "processor": "easycv",
  "model_path": "oss://examplebucket/epoch_10_export.pt",
  "model_config": {"type":"TorchClassifier"},
  "metadata": {
    "instance": 1
  },
  "cloud": {
    "computing": {
      "instance_type": "ecs.gn5i-c4g1.xlarge"
    }
  }
}

Supported job types:

Job type	model_config
Image classification	{"type":"TorchClassifier"}
Object detection	{"type":"DetectionPredictor"}
Semantic segmentation	{"type":"SegmentationPredictor"}
YOLOX	{"type":"YoloXPredictor"}
Video classification	{"type":"VideoClassificationPredictor"}

After deployment, go to the Elastic Algorithm Service (EAS) page. Find the service, and in the Service Type column, click Invocation Information to view the endpoint and token. The following Python example shows how to call the service.

import requests
import base64
import json
resp = requests.get('http://examplebucket.oss-cn-zhangjiakou.aliyuncs.com/images/000000123213.jpg')
ENCODING = 'utf-8'
datas = json.dumps( {
            "image": base64.b64encode(resp.content).decode(ENCODING)
            })
# Replace with your authentication token.
head = {
   "Authorization": "NTFmNDJlM2E4OTRjMzc3OWY0NzI3MTg5MzZmNGQ5Yj***"
}
for x in range(0,10):
  	# Replace with your service endpoint.
    resp = requests.post("http://150231884461***.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/easycv_classification_example", data=datas, headers=head)
    print(resp.text)

Base64-encode the image or video data for transmission. Use the image key for image data and the video key for video data.

EasyVision processor

The EAS EasyVision processor loads deep learning models trained with EasyVision.

Deploy an EasyVision model in one of the following ways:

Console
Set Processor Type to EasyVision. For more information, see Deploy a custom inference service.

EASCMD client

In the service.json configuration file, set processor to easy_vision_cpu_tf1.12_torch151 or easy_vision_gpu_tf1.12_torch151. Select the code that matches your deployment resources. A mismatch between processor and resource type causes deployment failure. In model_config, set type to the model type used for training. Examples. For other parameters, see Create a service:

Configuration for GPU deployment

{
  "name": "ev_app_demo",
  "processor": "easy_vision_gpu_tf1.12_torch151",
  "model_path": "oss://path/to/your/model",
  "model_config": "{\"type\":\"classifier\"}",
  "metadata": {
    "resource": "your_resource_name",
    "cuda": "9.0",
    "instance": 1,
    "memory": 4000,
    "gpu": 1,
    "cpu": 4,
    "rpc.worker_threads" : 5
  }
}

Configuration for CPU deployment

{
  "name": "ev_app_cpu_demo",
  "processor": "easy_vision_cpu_tf1.12_torch151",
  "model_path": "oss://path/to/your/model",
  "model_config": "{\"type\":\"classifier\"}",
  "metadata": {
    "resource": "your_resource_name",
    "instance": 1,
    "memory": 4000,
    "gpu": 0,
    "cpu": 4,
    "rpc.worker_threads" : 5
  }
}

MediaFlow processor

The EAS MediaFlow processor is an orchestration engine for analyzing and processing video, audio, and images.

Deploy a MediaFlow model in one of the following ways:

Console
Set Processor Type to MediaFlow. For more information, see Deploy a custom inference service.

EASCMD client

In the service.json configuration file, set processor to mediaflow. This processor requires additional configuration fields. For other fields, see Create a service:

graph_pool_size: Number of graph pools.
worker_threads: Number of worker threads.

Examples:

Configuration for deploying a video classification model.

{
  "model_entry": "video_classification/video_classification_ext.js", 
  "name": "video_classification", 
  "model_path": "oss://path/to/your/model", 
  "generate_token": "true", 
  "processor": "mediaflow", 
  "model_config" : {
      "graph_pool_size":8,
      "worker_threads":16
  },
  "metadata": {
    "eas.handlers.disable_failure_handler" :true,
    "resource": "your_resource_name", 
      "rpc.worker_threads": 30,
      "rpc.enable_jemalloc": true,
    "rpc.keepalive": 500000, 
    "cpu": 4, 
    "instance": 1, 
    "cuda": "9.0", 
    "rpc.max_batch_size": 64, 
    "memory": 10000, 
    "gpu": 1 
  }
}

Configuration for deploying an automated speech recognition (ASR) model.

{
  "model_entry": "asr/video_asr_ext.js", 
  "name": "video_asr", 
  "model_path": "oss://path/to/your/model", 
  "generate_token": "true", 
  "processor": "mediaflow", 
  "model_config" : {
      "graph_pool_size":8,
      "worker_threads":16
  },
  "metadata": {
    "eas.handlers.disable_failure_handler" :true,
    "resource": "your_resource_name", 
      "rpc.worker_threads": 30,
      "rpc.enable_jemalloc": true,
    "rpc.keepalive": 500000, 
    "cpu": 4, 
    "instance": 1, 
    "cuda": "9.0", 
    "rpc.max_batch_size": 64, 
    "memory": 10000, 
    "gpu": 1 
  }
}

The configurations for ASR and video classification differ mainly in model_entry, name, and model_path. Modify these fields for your model.

Triton processor

Triton Inference Server is an NVIDIA online serving framework. It provides an interface for deploying and managing models on GPUs and is compatible with the KFServing API standard. Key features:

Deploys models from various frameworks, such as TensorFlow, PyTorch, ONNX Runtime, TensorRT, and custom backends.
Runs multiple models concurrently on a GPU to improve utilization.
Supports HTTP/gRPC protocols and binary format extension to reduce request size.
Supports Dynamic Batching to improve service throughput.

Triton Inference Server is available on EAS as a built-in Triton processor.

Note

Available only in public preview in the China (Shanghai) region.
All models must be stored in OSS. Activate OSS and upload your model files to an OSS bucket first. For more information, see Simple Upload.

Deploy and call a Triton processor service.

Deploy with the Triton processor

Deploy Triton model services only by using EASCMD. For more information, see Create a service. In the service.json configuration file, set processor to triton. Because Triton retrieves models from OSS, configure the required OSS parameters. Example service.json:

{
  "name": "triton_test",                          
  "processor": "triton",
  "processor_params": [
    "--model-repository=oss://triton-model-repo/models", 
    "--allow-http=true", 
  ],
  "metadata": {
    "instance": 1,
    "cpu": 4,
    "gpu": 1,
    "memory": 10000,
    "resource":"<your resource id>"
  }
}

Triton-specific parameters are listed below. For other parameters, see Parameters in service.json.

Parameter		Description
processor_params		Parameters passed to Triton Server at startup. Unsupported parameters are automatically filtered. Supported parameters are listed in the following set of parameters that can be passed to the Triton server. model-repository is required. For optional parameters, see main.cc.
oss_endpoint		OSS endpoint. If not specified, the system uses OSS in the same region as the EAS service. Specify this for cross-region OSS. For values, see Regions and Endpoints.
metadata	resource	ID of the EAS exclusive resource group for deploying the model service. The Triton processor requires an EAS exclusive resource group. For more information, see Use EAS exclusive resource groups.

Table 1. Supported parameters for the Triton server

Parameter	Required	Description
model-repository	Yes	The path must be specified as anOSSpath. The system does not support directly using the Bucket root directory as themodel-repository. You must specify a Bucket subdirectory under the Bucket. For example,`oss://triton-model-repo/models`, wheretriton-model-repois the Bucket name,modelsis a Bucket subdirectory under the Bucket.
log-verbose	No	For more information, see main.cc.
log-info	No
log-warning	No
log-error	No
exit-on-error	No
strict-model-config	No
strict-readiness	No
allow-http	No
http-thread-count	No
pinned-memory-pool-byte-size	No
cuda-memory-pool-byte-size	No
min-supported-compute-capability	No
buffer-manager-thread-count	No
backend-config	No

Call the service with the native Triton client

Install NVIDIA's official Triton client:

pip3 install nvidia-pyindex
pip3 install tritonclient[all]

Download a test image:

wget http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/doc-assets/cat.png

Send a binary-format request to the Triton processor service using the Python client:

import numpy as np
import time
from PIL import Image

import tritonclient.http as httpclient
from tritonclient.utils import InferenceServerException

URL = "<service url>"  # Replace <service url> with your service endpoint.
HEADERS = {"Authorization": "<service token>"} # Replace <service token> with your service access token.
input_img = httpclient.InferInput("input", [1, 299, 299, 3], "FP32")
img = Image.open('./cat.png').resize((299, 299))
img = np.asarray(img).astype('float32') / 255.0
input_img.set_data_from_numpy(img.reshape([1, 299, 299, 3]), binary_data=True)

output = httpclient.InferRequestedOutput(
    "InceptionV3/Predictions/Softmax", binary_data=True
)
triton_client = httpclient.InferenceServerClient(url=URL, verbose=False)

start = time.time()
for i in range(10):
    results = triton_client.infer(
        "inception_graphdef", inputs=[input_img], outputs=[output], headers=HEADERS
    )
    res_body = results.get_response()
    elapsed_ms = (time.time() - start) * 1000
    if i == 0:
        print("model name: ", res_body["model_name"])
        print("model version: ", res_body["model_version"])
        print("output name: ", res_body["outputs"][0]["name"])
        print("output shape: ", res_body["outputs"][0]["shape"])
    print("[{}] Avg rt(ms): {:.2f}".format(i, elapsed_ms))
    start = time.time()